a parallel implementation of the element-free galerkin method on a network of pcs

179
i A PARALLEL IMPLEMENTATION OF THE ELEMENT-FREE GALERKIN METHOD ON A NETWORK OF PCs by Thiti Vacharasintopchai A thesis submitted in partial fulfillment of the requirements for the degree of Master of Engineering Examination Committee Dr. William J. Barry (Chairman) Professor Worsak Kanok-Nukulchai Dr. Pennung Warnitchai Nationality Thai Previous Degree Bachelor of Civil Engineering, Chulalongkorn University, Bangkok, Thailand Scholarship Donor Asian Institute of Technology Partial Scholarship Asian Institute of Technology School of Civil Engineering Bangkok, Thailand April 2000

Upload: thiti-vacharasintopchai

Post on 15-Jan-2015

686 views

Category:

Technology


10 download

DESCRIPTION

Issue Date: Apr-2000Type: ThesisPublisher: Asian Institute of TechnologyAbstract: The element-free Galerkin method (EFGM) is a recently developed numerical technique for solving problems in a wide range of application areas including solid and fluid mechanics. The primary benefit of these methods is the elimination of the need for meshing (or remeshing) complex three-dimensional problem domains. With EFGM, the discrete model of the object is completely described by nodes and a description of the problem domain boundary. However, the elimination of meshing difficulties does not come freely since the EFGM is much more computationally expensive than the finite element method (FEM), especially for three-dimensional and non-linear applications. Parallel processing has long been an available technique to improve the performance of scientific computing programs, including the finite element method. With efficient programming, parallel processing can overcome the high computing time that is typically required in analyses employing EFGM or other meshless methods. This work focuses on the application of the concepts in parallel processing to EFGM analyses, particularly in the formulation of the stiffness matrix, the assembly of the system of discrete equations, and the solution for nodal unknowns, so that the time required for EFGM analyses is reduced. Several low-cost personal computers are joined together to form a parallel computer with the potential for raw computing power comparable to that of the fastest serial computers. The processors communicate via a local high-speed network using the Message Passing Interface (MPI), a standard library of functions that enables parallel programs to be executed on and communicate efficiently over a variety of machines. To provide a comparison between the parallelized and the serial versions of the EFGM computer program, several benchmark 3D structural mechanics problems are analyzed to show that the parallelized EFGM program can provide substantially shorter run time than the serial program without loss of solution accuracy.URI: http://dspace.siu.ac.th/handle/1532/134

TRANSCRIPT

Page 1: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

i

A PARALLEL IMPLEMENTATION OF THE ELEMENT-FREE GALERKIN METHOD ON A NETWORK OF PCs

by

Thiti Vacharasintopchai A thesis submitted in partial fulfillment of the requirements for the degree of Master of Engineering Examination Committee Dr. William J. Barry (Chairman) Professor Worsak Kanok-Nukulchai Dr. Pennung Warnitchai Nationality Thai Previous Degree Bachelor of Civil Engineering, Chulalongkorn University, Bangkok, Thailand Scholarship Donor Asian Institute of Technology Partial Scholarship

Asian Institute of Technology School of Civil Engineering

Bangkok, Thailand April 2000

Page 2: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

ii

ACKNOWLEDGMENT

I would like to express profound gratitude to Dr. William J. Barry, my advisor, who always gave invaluable guidance, inspirational suggestions, encouragements and supports all the way through this research. I would like to express sincere appreciation to Professor Worsak Kanok-Nukulchai and Dr. Pennung Warnitchai for serving as the examination committee members. I would also like to thank Dr. Putchong Uthayopas, a faculty in Department of Computer Engineering, Kasetsart University, who, through electronic correspondences, introduced me to the Beowulf parallel processing world. Granted access to School of Civil Engineering high-performance computer workstations and financial support from Asian Institute of Technology are thankfully acknowledged. In addition, I wish to thank my friends, especially, Mr. Bunpot Nicrowanajamrat, Mr. Teera Tosukhowong and Ms. Gallissara Agavatpanitch, for their generosities throughout my twenty-month residence at AIT. Friendliness is the most important factor that makes this institute a pleasant place to live in.

Last but not least, I wish to dedicate this research work to my parents and family members, who, for better or worse, give encouragements and supports throughout my hardest times.

Page 3: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

iii

ABSTRACT

The element-free Galerkin method (EFGM) is a recently developed numerical technique for solving problems in a wide range of application areas including solid and fluid mechanics. The primary benefit of these methods is the elimination of the need for meshing (or re-meshing) complex three-dimensional problem domains. With EFGM, the discrete model of the object is completely described by nodes and a description of the problem domain boundary. However, the elimination of meshing difficulties does not come freely since the EFGM is much more computationally expensive than the finite element method (FEM), especially for three-dimensional and non-linear applications. Parallel processing has long been an available technique to improve the performance of scientific computing programs, including the finite element method. With efficient programming, parallel processing can overcome the high computing time that is typically required in analyses employing EFGM or other meshless methods. This work focuses on the application of the concepts in parallel processing to EFGM analyses, particularly in the formulation of the stiffness matrix, the assembly of the system of discrete equations, and the solution for nodal unknowns, so that the time required for EFGM analyses is reduced. Several low-cost personal computers are joined together to form a parallel computer with the potential for raw computing power comparable to that of the fastest serial computers. The processors communicate via a local high-speed network using the Message Passing Interface (MPI), a standard library of functions that enables parallel programs to be executed on and communicate efficiently over a variety of machines. To provide a comparison between the parallelized and the serial versions of the EFGM computer program, several benchmark 3D structural mechanics problems are analyzed to show that the parallelized EFGM program can provide substantially shorter run time than the serial program without loss of solution accuracy.

Page 4: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

iv

TABLE OF CONTENTS

Chapter Title Page Title page i Acknowledgment ii Abstract iii Table of Contents iv List of Figures vi List of Tables viii List of Appendices ix 1. Introduction 1 1.1 Motivation 1 1.2 Problem Statement 2 1.3 Objectives 2 1.4 Scope 3 1.5 Research Approach 3 1.6 Contributions 3 2. Literature Review 4 2.1 Element-free Galerkin Method (EFGM) 4 2.2 Parallel Computing 15 2.3 Applications of Parallel Processing in Computational Mechanics 18 2.4 The NASA Beowulf Parallel Computer 19 3. Building the Parallel Computing Infrastructures 21 3.1 Hardware and Operating System Installation 21 3.2 Software Configuration 25 4. Development of the Parallel EFGM Software 28 4.1 Design Consideration 28 4.2 Fundamental Tools 30 4.3 Implementation 36 5. Numerical Results 53 5.1 Linear Displacement Field Patch Test 53 5.2 Cantilever Beam with End Load 59 5.3 Pure Bending of Thick Circular Arch 64 5.4 Extension of a Strip with One Circular Hole 68 5.5 Overall Performance 73

Page 5: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

v

TABLE OF CONTENTS (cont’d) Chapter Title Page 6. Conclusion and Recommendations 76 6.1 Conclusion 76 6.2 Recommendations 76 References 78 Appendix A 84 Appendix B 87 Appendix C 104 Appendix D 116

Page 6: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

vi

LIST OF FIGURES

Figure Title Page 3-1 The AIT Beowulf Hardware Configuration 22 3-2 The AIT Beowulf NFS Configuration 27 4-1 The Multicomputer Parallel Machine Model 28 4-2 Compressed Row Storage (CRS) 32 4-3 Symmetric Compressed Row Storage 33 4-4 Illustration of the Qserv Concept 34 4-5 Flowchart of Qserv 35 4-6 Flowchart of ParEFG 39 4-7 Flowchart of the ddefg_stiff Module 40 4-8 Flowchart of the ddforce Module 43 4-9 Flowcharts of the master_ddsolve and the worker_ddsolve Modules 46 4-10 Flowcharts of the master_parallel_gauss and the worker_parallel_gauss Modules 47 4-11 Row-wise Cyclic Striped Partitioning of Matrices 48 4-12 A parallel_gauss Package 48 4-13 Flowchart of the parallel_gauss Module 49 5-1 Linear Displacement Field Patch Test 53 5-2 Displacement in the x-direction along the Line y=1.50, z=1.50 for the Linear Displacement Field Patch Test 55 5-3 Displacement in the y-direction along the Line y=1.50, z=1.50 for the Linear Displacement Field Patch Test 55 5-4 Displacement in the z-direction along the Line y=1.50, z=1.50 for the Linear Displacement Field Patch Test 56 5-5 Tensile Stress in the x-direction along the Line x=3.00, z=1.50 for the Linear Displacement Field Patch Test 58 5-6 Average Speedups for the Linear Displacement Field Patch Test 58 5-7 Average Efficiencies for the Linear Displacement Field Patch Test 58 5-8 Cantilever Beam with End Load 59 5-9 Vertical Displacement along the Neutral Axis (Line y=0.50, z=0.50) for a Cantilever Beam under a Concentrated Force 60 5-10 Bending Stress Distribution along the Line x=6.00, z=0.50 for a Cantilever Beam under a Concentrated Force 61 5-11 Average Speedups for a Cantilever Beam under a Concentrated Force 63 5-12 Average Efficiencies for a Cantilever Beam under a Concentrated Force 63 5-13 Pure Bending of Thick Circular Arch 64

Page 7: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

vii

LIST OF FIGURES (cont’d)

Figure Title Page 5-14 Displacement in the x-direction along the Neutral Axis of a Thick Circular Arch under Pure Bending 65 5-15 Tangential Stress Distribution through the Thickness of a Thick Circular Arch under Pure Bending 65 5-16 Average Speedups for a Thick Circular Arch under Pure Bending 67 5-17 Average Efficiencies for a Thick Circular Arch under Pure Bending 67 5-18 Extension of a Strip with One Circular Hole 68 5-19 Tensile Stress Distribution along the Line through the Center of the Hole and Perpendicular to the x-axis, for a Strip with One Circular Hole under Uniform Tension 70 5-20 Tensile Stress Distribution along the Line through the Center of the Hole and Perpendicular to the y-axis, for a Strip with One Circular Hole under Uniform Tension 70 5-21 Average Speedups for a Strip with One Circular Hole under Uniform Tension 72 5-22 Average Efficiencies for a Strip with One Circular Hole under Uniform Tension 72 5-23 Speedups of the Stiffness Computing Module under Various Number of Processes and Degrees of Freedom 73 5-24 Speedups of the Parallel Equation Solver Module under Various Number of Processes and Degrees of Freedom 74 5-25 Efficiencies of the Stiffness Computing Module under Various Number of Processes and Degrees of Freedom 74 5-26 Efficiencies of the Parallel Equation Solver Module under Various Number of Processes and Degrees of Freedom 75

Page 8: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

viii

LIST OF TABLES Table Title Page 3-1 The Server Hardware Configuration 23 3-2 The Workstation Hardware Configuration 24 3-3 Networking Equipments 24 3-4 Common Network Properties 25 3-5 Nodal Specific Network Properties 25 4-1 Partitioning of the Major Tasks in ParEFG 30 4-2 Frequently Used Meschach Library Functions 30 4-3 Frequently Used MPI Library Functions 32 5-1 Average Run Times for the Linear Displacement Field Patch Test 57 5-2 Average Speedups for the Linear Displacement Field Patch Test 57 5-3 Average Efficiencies for the Linear Displacement Field Patch Test 57 5-4 Average Run Times for a Cantilever Beam Under a Concentrated Force 62 5-5 Average Speedups for a Cantilever Beam Under a Concentrated Force 62 5-6 Average Efficiencies for a Cantilever Beam Under a Concentrated Force 62 5-7 Average Run Times for a Thick Circular Arch Under Pure Bending 66 5-8 Average Speedups for a Thick Circular Arch Under Pure Bending 66 5-9 Average Efficiencies for a Thick Circular Arch Under Pure Bending 66 5-10 Average Run Times for a Strip with One Circular Hole under Uniform Tension 71 5-11 Average Speedups for a Strip with One Circular Hole under Uniform Tension 71 5-12 Average Efficiencies for a Strip with One Circular Hole under Uniform Tension 71

Page 9: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

ix

LIST OF APPENDICES

Appendix Title Page A Configuration Files 84 A1 Common Network Configuration Files 84 A2 The NFS Configuration Files 85 B Input Files 87 B1 Linear Displacement Field Patch Test 87 B2 Cantilever Beam with End Load 89 B3 Pure Bending of Thick Circular Arch 92 B4 Extension of a Strip with One Circular Hole 95 C Sample Output File 104 C1 ParEFG Interpretation of the Input Data 104 C2 Analysis Results 113 C3 Analysis Logs 115 D Source Codes 116 D1 The Queue Server 116 D2 The Parallel EFGM Analysis Software 117

Page 10: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

1

CHAPTER 1

INTRODUCTION

1.1 Motivation

In performing the finite element analysis of structural components, meshing, which is the process of discretizing the problem domain into small sub-regions or elements with specific nodal connectivities, can be a tedious and time-consuming task. Although some relatively simple geometric configurations may be meshed automatically, some complex geometric configurations require manual preparation of the mesh. The element-free Galerkin method (EFGM), one of the recently developed meshless methods, avoids the need for meshing by employing a moving least-squares (MLS) approximation for the field quantities of interest (displacements in solid mechanics applications). With EFGM, the discrete model of the object is completely described by nodes and a description of the problem domain boundary. This is a particular advantage for problems such as the modeling of crack propagation with arbitrary and complex paths, the analysis of structures with moving interfaces, or the analysis of structural components undergoing large deformations. Since no remeshing is required for each step in the analysis, geometric discontinuities of the problem domain can be more easily handled.

However, the advantage of avoiding the requirement of a mesh does not come cheap. EFGM is much more computationally expensive than the finite element method (FEM). The increased computational cost is especially evident for three-dimensional and non-linear applications of the EFGM, due to the usage of MLS shape functions, which are formulated by a least-squares procedure at each integration point. This computational costliness is the predominant drawback of EFGM.

Parallel processing has long been an available technique to improve the performance of scientific computing programs, including the finite element method. According to reference [14], the ‘divide and conquer’ paradigm, which is the concept of partitioning a large task into several smaller tasks, is frequently employed in parallel programming. These smaller tasks are then assigned to various computer processors. Kumar et al. [59] compared parallel processing to a master-workers relationship. The master divides a task into a set of subtasks assigned to multiple workers. Workers then cooperate and accomplish the task in unison. With efficient programming, parallel processing can significantly reduce the high computing time that is typically required in EFGM analyses.

A network of personal computers (PCs), which are typically much cheaper than workstation-class computers, is sufficient for the development of parallel processing algorithms [42]. From references [8,42,56,59,77] it can be inferred that by connecting these low-cost PC's to form a parallel computer, it is possible to obtain raw computing power comparable to that of the fastest serial computers such as the Cray vectorized supercomputers. The connection of such PC’s can be accomplished by using the Message Passing Interface

Page 11: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

2

(MPI) [32], which is a message-passing1 standard that enables parallel programs to be executed on and to communicate efficiently over a variety of machines.

The focus of this research is to apply the concepts in parallel processing to EFGM analyses, particularly in the formulation of the stiffness matrix, the assembly of the system of discrete equations, and the solution for nodal unknowns, to minimize the time required for the EFGM analyses of structural components.

1.2 Problem Statement

The goal of this work is to design, implement, and test a parallel computer code for the analysis of structural components by the element-free Galerkin method. Past developments in parallelizing the finite element method and other meshless methods are studied, evaluated, and further developed within the EFGM framework, resulting in fast and efficient EFGM analyses. Benchmark problems in three-dimensional elasticity are analyzed to show that the parallelized EFGM computer code provides substantially shorter run time than the serial EFGM computer code, without discrepancy of results. In the future, the resulting parallelized analysis tool may be extended to more complex problems in which EFGM gives distinguished advantages over FEM, such as the aforementioned modeling of crack propagation, the analysis of structures with moving interfaces, and large deformation and large strain analysis of solids.

1.3 Objectives

The specific objectives of this research are listed as follows:

1) To set up a parallel computer from a network of personal computers.

2) To investigate past developments in parallelizing the finite element method and other meshless methods.

3) To identify and evaluate several algorithmic alternatives for parallelizing the element-free Galerkin method.

4) To develop and implement a parallel computer code to compute the EFGM stiffness matrix and the EFGM vector of equivalent nodal forces, assemble the system of equations, solve the system of equations, and post-process the solution data.

5) To provide accuracy, run time, and speedup comparisons between the parallelized version of EFGM and the serial version, as applied to the aforementioned benchmark problems in 3D elasticity.

1 A type of interaction method for parallel processors. See Section 2.2.2 for a detailed explanation.

Page 12: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

3

1.4 Scope

Since the problem of concern is to parallelize the EFGM code, so that the run time of EFGM analyses is reduced, the development and implementation of the code is limited to three-dimensional problems, since they require significantly longer run times, when compared with two-dimensional problems, even without the use of complex geometrically or materially non-linear formulations. Thus, benchmark problems in three-dimensional elastostatics are considered in this work. Once an efficient linear code is achieved, it may be extended in the future to the analysis of non-linear problems, such as material plasticity and large strain analyses.

1.5 Research Approach

To achieve the above stated objectives, concepts from both structural engineering and computer science are applied. Because most parallel computing software libraries and tools have been developed in the UNIX operating system, the Linux operating system, which is a free implementation of UNIX on personal computers, was used in this research. Algorithms were developed and implemented into a computer program using the C programming language. An existing serial computer program for EFGM analysis [60,61], written in the C language, was studied and parallelized. This serial code was also used to analyze the benchmark problems, and the results are presented in Chapter 5 for the sake of comparison.

The parallel program developed in this research employs the message passing facilities of mpich [62], which is a portable public domain implementation of the full MPI specification, developed at Argonne National Laboratory (ANL) in the United States, for a wide variety of parallel computing environments. Since MPI is an industrial standard [32], using MPI for all message-passing communication provides portable building blocks for the implementation of large-scale EFGM application codes on a variety of more sophisticated parallel computers.

1.6 Contributions

This research addresses the computational costliness associated with EFGM, especially for three-dimensional applications, through the development and implementation of parallel algorithms and computer codes. The development of this work may be incorporated into more complex EFGM applications for accurate and efficient analysis of three-dimensional, non-linear mechanics of solids, with substantially reduced computational time.

Page 13: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

4

CHAPTER 2

LITERATURE REVIEW

2.1 Element-free Galerkin Method (EFGM)

2.1.1 General

Meshless methods, numerical analysis techniques whose discrete model of the structural component or object is described by only nodes and a description of the problem domain boundary, were first developed in the late 1970s. It was mentioned in references [53] and [49] that the first meshless method, which was called Smoothed Particle Hydrodynamics (SPH), was developed by Lucy [24] in 1977 for modeling astrophysical phenomena. Gingold and Monaghan [44] used this method for problems on infinite domains, i.e. no boundaries, such as rotating stars and dust clouds. Libersky et al. [25] extended SPH to solid mechanics problems but problems associated with instability of the solutions were reported [21].

In 1992, Nayroles et al. [3] applied a least-squares technique in conjunction with the Galerkin method for solving 2D problems in solid mechanics and heat conduction. They called this the diffuse element method (DEM). A basis function and a weight function were used to form a smooth approximation based on a set of nodes with no explicit elements. The basic idea was to replace the FEM interpolation by a local, weighted, least-squares fitting valid within a small neighborhood surrounding each nodal point. They suggested, with great insight, that adding and removing nodes or locally modifying the distribution of nodes was much easier than completely rebuilding FEM meshes.

Belytschko et al. [51] showed that the approximation used in the work of Nayroles et al. [3] was in fact the moving least-squares (MLS) approximation described by Lancaster et al. in reference [40]. Since MLS approximation functions were not interpolating functions, the essential boundary conditions could not be directly satisfied by the Galerkin method. They refined the DEM by implementing a higher order of Gaussian quadrature, adding certain terms in the shape function derivatives that had been formerly omitted by Nayroles et al. [3], and employing Lagrange multipliers to enforce essential boundary conditions. The result was a new Galerkin method, that utilized moving least-squares approximants, and was called the element-free Galerkin method (EFGM). The method has been proven very effective for solving a wide range of problems in 2D and 3D solid mechanics, such as static fracture mechanics and crack propagation [34,38,69].

It was cited in reference [53] that in addition to the works of Nayroles et al. [3] and Belytschko et al. [51], there have been several other meshless methods developed, namely, the reproducing kernel particle method (RKPM) [64], the hp-clouds methods [4], the partition of unity finite element method (PUFEM) [20], the particle in cell (PIC) method [9], the generalized finite difference method [55], and the finite point method [10]. A comprehensive review of meshless methods can be found in reference [49].

Page 14: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

5

2.1.2 Development of the EFGM

Since its debut in 1994, the benefits of EFGM have been demonstrated in many fields. A large volume of research has contributed to EFGM development in a relatively short period of time. As a pioneer work, Belytschko et al. [51] applied the EFGM to two-dimensional elastostatics, static fracture mechanics, and steady-state heat conduction problems. It was shown that EFGM does not exhibit any volumetric locking even when linear basis functions are used, the rate of convergence of the method may significantly exceed that of the finite element method, and a high resolution of localized steep gradients can be achieved. They suggested that since element connectivities were not needed and the accuracy was not significantly affected by irregular nodal arrangements, progressively growing cracks could be easily modeled. It was noted that the use of Lagrange multipliers complicated the solution process and pointed out that the problem could be remedied by the use of perturbed Lagrangian or other penalty methods.

Because the MLS approximation function does not produce exact field values at nodal points, the imposition of essential boundary conditions is a major problem in MLS-based meshless methods such as EFGM. As a result, a great deal of research in the area of EFGM has focused on finding a better technique for the imposition of such boundary conditions.

Lu et al. [73], realizing that the use of Lagrange multipliers increases the cost of solving the linear algebraic equations in EFGM, developed a new implementation of EFGM by replacing the Lagrange multipliers at the outset by their physical meaning, resulting in a banded and positive-definite discrete system of equations. Orthogonal MLS approximants were also constructed to eliminate the need for matrix inversion at each quadrature point. They solved two-dimensional elastostatic and static fracture mechanics problems with their new method and compared the results to those from the original EFGM with Lagrange multipliers. From the comparison, although higher efficiency was achieved, the accuracy of this new method was inferior to the original method.

Krongauz and Belytschko [67] proposed a method to impose the essential boundary conditions in EFGM using finite elements. They used a technique that employed a strip of finite elements along the essential boundaries. The shape functions from the edge finite elements were combined with the MLS shape functions that are employed in EFGM analyses. With this technique, the essential boundary conditions could be imposed directly as with finite elements. They claimed that, from numerical studies of elastostatic problems, the high convergence rate associated with MLS approximation was still retained. However, this is not always true. The high convergence rate was achieved because only small strips of finite elements were used; therefore EFGM errors still dominated the numerical solutions. If the finite elements had been used so extensively that their errors dominated, a lower convergence rate, consistent with that of the FEM, would have been obtained.

Belytschko et al. [48] referred to the technique in the previous paragraph as the coupling of FEM-EFGM. In contrast to the work by Krongauz and Belytschko, in which finite elements were used in a small fraction of the problem domain, it was recommended that EFGM be used in only relatively small regions of the problem domain where it was most beneficial, such as near crack tips or other locations of singularity. It was noted in reference [53] that EFGM could provide an excellent complement to the FEM in situations where finite elements were not effective. Hegen [7] also proposed the same idea about the coupling of

Page 15: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

6

FEM and EFGM. Both Belytschko et al. [48] and Hegen [7] suggested that since EFGM required considerably more computer resources, limiting EFGM modeling to the needed areas could save significant computational time. Therefore, the coupling of FEM and EFGM might be viewed as a technique to impose the essential boundary conditions and also as a technique to speed up the computational time. It was noted in reference [38] that EFGM could be coupled seamlessly with parallel versions of finite element programs, thus making the analysis run even faster. Belytschko et al. [48] used FEM-EFGM coupling to solve two-dimensional problems in elastostatics, elastodynamics, dynamic fracture mechanics, and a one-dimensional wave propagation problem, while Hegen [7] used the same technique to solve two-dimensional elastostatic and static fracture mechanics problems. High efficiency inherited from FEM was obtained as expected. However, the high convergence rate of EFGM was lost because the FEM error dominated the numerical solutions.

Mukherjee et al. [68] developed an alternative strategy for imposing the essential boundary conditions. They proposed a new definition of the discrete norm that is typically minimized to obtain the coefficients in MLS approximations2. It was reported that their strategy worked well for 2D EFGM problems. Zhu et al. [57] presented a modified collocation method and a penalty formulation to enforce the essential boundary conditions in EFGM, as an alternative to the method of Lagrange multipliers. It was reported that their formulation gave a symmetric positive-definite system of equations while the absence of volumetric locking and a high convergence rate were retained.

Kaljevic and Saigal [15] applied a technique that employed singular weight functions in the formulation of MLS approximations. With the singular weight function, the approximants passed through their respective nodes and therefore, the essential boundary conditions could be explicitly satisfied. With the use of singular weight functions, the MLS approximants could then be termed interpolants. This technique resulted in a reduced number of positive-define and banded discrete equations. Two-dimensional elastostatic and static fracture mechanics problems were solved. They reported that both higher efficiency and higher accuracy, as compared to the previous implementations of EFGM, were achieved.

In addition to the development of techniques for the imposition of essential boundary conditions, there were a number of works to improve other aspects of EFGM. A representative selection of these works is described in the following paragraphs.

Belytschko et al. [50] developed a new procedure, for computing shape functions for the EFGM, that preserves the continuity of functions in domains with concave boundaries. The procedure was applied to elastostatic and static fracture problems. Overall accuracy was improved while the convergence rate was unchanged. A new method for calculation of MLS approximants and their derivatives was also devised. It was reported that this method gave a substantial decrease in computational time as compared to the previous formulations.

Beissel and Belytschko [46] explored the possibility of evaluating the integrals of the weak form only at the nodes. Nodal integration would make EFGM truly element-free, that is, the need for background integration cells would be eliminated. It was shown that their nodal integration scheme suffered from spurious singular modes resulting from under-integration of the weak form. A technique for treating this instability was developed and tested for

2 See equation (2.2) on page 8.

Page 16: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

7

elastostatic problems. Good numerical results were achieved after this treatment. However, it was noted in reference [53] that the accuracy obtained was inferior to that of the background integration cell method.

Kaljevic and Saigal [15] developed a numerical integration scheme which employed the concept of dividing the rectangular integration cells that partially belong to the problem domain into sub-cells that completely belong to the domain. With this technique the automatic and accurate treatment of two-dimensional domains with arbitrary geometric configurations was made possible.

Krysl and Belytschko [37] examined the construction of the shape functions in EFGM and discussed the implications that the choices of those shape functions have on the convergence rates. It was shown that, for non-convex domain boundaries, it was possible to construct and use discontinuous weight functions that lead to discontinuous shape functions. The convergence rate of the variant of the EFGM that used such a construction of shape functions was not affected by the discontinuities when linear shape functions were used.

Häussler-Combe and Korn [58] presented a scheme for automatic, adaptive EFGM analysis. Based on the interpolation error estimation and geometric subdivision of the problem domain into integration cells, they developed an a posteriori adaptive strategy to move, discard, or introduce nodes to the nodal discretization of the problem domain. Dense nodal arrangements were generated automatically in sub-domains where high accuracy was needed. The technique showed good results for elastic, elastoplastic, and static fracture mechanics problems.

Belytschko and Fleming [53] compared the methods for smoothing the approximations near non-convex boundaries, such as cracks, and techniques for enriching the EFGM approximations near the tip of linear elastic cracks. They introduced a penalty method-based contact algorithm for enforcing crack contact in the overall compressive fields. To illustrate the new technique, crack propagation under compressive loading with crack surface contact was simulated. It was found that their numerical results closely matched experimental results.

Dolbow and Belytschko [19] investigated the numerical integration of Galerkin weak forms for the meshless methods using EFGM as a case study. They pointed out that the construction of quadrature cells without consideration of the local supports of the shape functions could result in a considerable amount of integration error, and presented a technique to construct integration cells that align with the shape function supports to improve the Gauss quadrature accuracy.

Meshless methods remain a very active area of research as evidenced by the numerous articles that appear each month in the top international journals in the field of computational mechanics. As meshless methods mature and are applied to increasingly challenging problems, interest in parallel algorithms for structural analysis with meshless methods is almost certain to significantly increase.

2.1.3 Formulation of the EFGM

The original formulation of EFGM by Belytschko et al. [51] in which Lagrange multipliers were used to impose the essential boundary conditions is presented in this section.

Page 17: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

8

The MLS approximation is first introduced. Then, various types of weight functions are presented. Finally, the formulation of discrete equations, for application to 3D elastostatics, is described. The formulation of the improved versions of the EFGM can be found from the respective previously cited references.

MLS approximation In the moving least-squares approximation, we let the approximation of the function

u(x) be written as

∑ ≡=

m

jjj

h apu a(x)(x)pxxx T)()()( (2.1)

where m is the number of terms in the basis, )(xjp are monomial basis functions, and )(xja are their as yet undetermined coefficients. Note that both )(xjp and )(xja are functions of the spatial coordinates, x. Examples of commonly used linear and quadratic bases are as follows:

One dimensions Two dimensions Three dimensions

Linear basis

[ ]x,1=Tp [ ]yx,,1=

Tp [ ]zyx ,,,1=Tp

Quadratic basis

[ ]2,,1 xx=Tp [ ]22 ,,,,,1 yxxyyx=

Tp [ ]222 ,,,,,,,,,1 zyxyzxzxyzyx=Tp

The coefficients )(xja in equation (2.1) are obtained by minimizing a weighted, discrete L2 norm as follows:

[ ]2)( I

n

Iuw −−=∑ a(x))(xpxxJ I

TI (2.2)

where Iu is the nodal value of u at Ixx = , and n is the number of nodal points that are visible from x, i.e. the weight functions )( Ixx −w are non-zero. The regions surrounding nodal points in which the weight functions are non-zero are termed the domains of influence of their respective nodal points.

The stationarity of J in equation (2.2) with respect to a(x) leads to the following linear relation between a(x) and Iu :

uB(x)a(x)A(x) = (2.3)

Page 18: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

9

or

uB(x)(x)Aa(x) 1−= (2.4)

where A(x) and B(x) are the matrices defined by

∑=n

IIw )(xp)p(xxA(x) I

TI)( and )()( Ixxx −≡ wwI (2.5)

[ ],)(,,)(,)( 221 )p(xx)p(xx)p(xxB(x) n1 nwww l= (2.6)

[ ]nuuu ,,, 21 l=Tu . (2.7)

Therefore, we have

∑∑ ∑≡=−

n

I

m

j

n

IIIIjIj

h uupu )()()()( xB(x)(x)Axx 1 φ (2.8)

where the shape function )(xIφ is defined by

∑−

=

m

jjIjI p )()()( B(x)(x)Axx 1φ . (2.9)

The partial derivatives of )(xIφ can be obtained as follows:

∑−−−

++=

m

jjI,i,ijjIijiI pp )()()( ,, BABABAx 111φ (2.10)

where

111 AAAA −−−

−= ,i,i (2.11)

and the index following a comma indicates a spatial derivative.

Weight functions

The weight functions )()( Ixxx −≡ wwI play an important role in the performance of the EFGM. They should be constructed so that they are positive and so that a unique solution a(x) of equation (2.3) is guaranteed. Also, they should decrease in magnitude as the distance from x to Ix increases.

Page 19: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

10

Let maxd be the size of the support for the weight function, Ixx −=Id , and

maxddd I= . Commonly used MLS weight functions [49] are presented as follows:

Exponential:

>

≤−

=

101)exp(

)(2

dfordford

dwα

(2.12)

Cubic spline:

>

≤<−+−

≤+−

=

10

121

3444

34

2144

32

)( 32

32

dfor

dforddd

dfordd

dw (2.13)

Quartic spline:

>

≤−+−

=

1013861

)(432

dfordforddd

dw (2.14)

Singular:

>

=

.10

1111

)(

2

2

dfor

dfordddw (2.15)

According to reference [49], the exponential weight function is actually C -1 continuous

since it is not equal to zero at d = 1, but for numerical purposes, it resembles a weight function with C1 continuity or higher; in the exponent, the parameter 4.0=α results in

002.0)1( ≅w . The cubic and quartic spline weight functions, constructed to possess C2 continuity, are more favorable than the exponential weight function because they provide better continuity and are computationally less demanding. The singular weight function allows the direct imposition of essential boundary conditions, thus eliminating the need for Lagrange multipliers. Kaljevic and Saigal noted in reference [15] that singularities will not complicate EFGM problems since they can be removed through algebraic manipulations.

Formulation of discrete equations An EFGM formulation for 3D elastostatics, starting from the variational form and

employing Lagrange multipliers, is now presented.

Consider the following three-dimensional problem on the domain Ω bounded by Γ :

Ω0bσ in=+⋅∇ (2.16)

where σ is the Cauchy stress tensor, which corresponds to the displacement field u and b is the body force vector. The boundary conditions are given as follows:

Page 20: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

11

tΓtnσ on=⋅ (2.17)

uΓuu on= (2.18)

in which the superposed bar denotes prescribed boundary values, and n is the unit normal to the domain boundary Γ .

The variational or weak form of the equilibrium equation is posed as follows. Consider trial functions 1Hu(x)∈ and Lagrange multipliers 0H∈λ and test functions 1Hv(x)∈∂ and

0H∈∂λ . Then if

∫ Ω∇∂

Ω

Ts σ:)v( d - ∫ Ω⋅∂

Ω

T bv d - ∫ Γ⋅∂

T tv d

- ∫ Γ−⋅∂

uu dT )(λ - ∫ Γ⋅∂

Tv dλ = 0 01 HHv ∈∂∈∀∂ λ, (2.19)

the equilibrium equation (2.16) and the boundary conditions (2.17) and (2.18) are satisfied. Here T

s v∇ is the symmetric part of Tv∇ ; and 1H and 0H denote the Sobolev spaces of degree one and zero, respectively. Note that the trial functions, computed using MLS approximation functions, do not satisfy the essential boundary conditions and therefore the use of Lagrange multipliers is necessitated.

In order to obtain the discrete system of equations from the weak form (2.19), the approximate solution u and the test function v∂ are constructed according to equation (2.8). The Lagrange multiplier λ is written as

uΓxx ∈= ,)()( II sN λλ (2.20)

uΓxx ∈∂=∂ ,)()( II sN λλ (2.21)

where )(sN I is a Lagrange interpolant and s is the arc length along the problem domain boundary; the repeated indices designate summations. The final discrete equations can be obtained by substituting the trial functions, test functions and equations (2.20) and (2.21) into the weak form (2.19), which yields

=

qfu

0GGK

(2.22)

where

∫ Ω=

ΩJ

TIIJ BDBK d , (2.23a)

Page 21: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

12

∫ Γ−=

uΓKIK NG dIφ , (2.23b)

∫ ∫ Ω+Γ=

tΓ ΩI btf dd II φφ , (2.23c)

and

∫ Γ−=

uΓKK uNq d (2.23d)

where

=

xIzI

yIzI

xIyI

zI

yI

xI

,,

,,

,,

,

,

,

0

0

0

00

00

00

φφ

φφ

φφ

φ

φ

φ

IB , (2.24a)

=

K

K

K

NN

N

000000

KN , (2.24b)

=

GG

Gccc

cccccc

D

000000000000000000)1(000)1(000)1(

ννν

ννν

ννν

,

and

)21)(1( νν −+

=Ec ;

)1(2 ν+

=EG for isotropic materials. (2.24c)

In the above expressions E and ν are Young's modulus and Poisson's ratio, respectively.

Page 22: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

13

2.1.4 Applications of the EFGM

In addition to the test problems mentioned in Section 2.1.2, EFGM has been applied by many researchers to several different classes of problems.

Belytschko et al. [52] applied EFGM to two-dimensional static and dynamic fracture mechanics. They demonstrated the capability of EFGM to model complex problems involving the evolution of growing cracks. In both static and dynamic cases a growing crack could be modeled simply by extending the free boundaries associated with the crack. Lu et al. [72] used EFGM to solve one-dimensional wave propagation and two-dimensional dynamic fracture problems. They developed the weak form of the kinematic boundary condition to enforce the kinematic boundary condition. It was shown that accurate mode I and mode II stress intensity factors could be computed by EFGM.

Krysl and Belytschko used EFGM in the analysis of arbitrary Kirchhoff plates [35] and shells [36]. The satisfaction of C1 continuity was easily met since EFGM required only C1 weight functions; therefore the Mindlin-Reissner theory or the discrete Kirchhoff theory was not necessary. High accuracy was achieved for arbitrary grid geometries in clamped and simply supported plates. Membrane locking for the shell cases was alleviated by enlarging the domain of influences of the nodes for the quadratic basis and was completely eliminated by using the quartic polynomial basis.

The application of EFGM to solid mechanics problems containing material inhomogeneities was presented by Cordes and Moran [26]. Very accurate displacement results were reported, and a set of filtering schemes was introduced to improve the numerical solution in the stress and strain fields.

Additional problems in two-dimensional dynamic fracture mechanics were solved with EFGM by Belytschko and Tabbara [54]. They suggested that the method had the potential to accurately predict almost arbitrary crack paths, and could be easily extended to anisotropic and non-linear problems. Qualitative behaviors of the their numerical results agreed well with the experimental results. Sukumar et al. [34] applied coupled FEM-EFGM to problems in three-dimensional fracture mechanics. Domain integral methods were used to evaluate the stress intensity factors along a three-dimensional crack front. Fleming et al. [29] introduced an enriched EFGM formulation for fracture problems. It was shown that the new formulation greatly reduced the numerical stress oscillations near the crack tips and yielded accurate stress intensity factors with the use of significantly less degrees of freedom.

The EFGM analysis of stable crack growth in an elastic solid was pioneered by Xu and Saigal [70]. In their formulation, the inertia force term in the momentum equation was converted to a spatial gradient term by employing the steady state conditions. A convective domain was employed to account for the analysis domain moving at the same speed as the crack front. Good agreements of the numerical results with the analytical solutions were reported. They noted that their formulation was a promising tool for the analysis of stable crack growth problems in both elastic-plastic and elastic-viscoplastic solids. In reference [71], Xu and Saigal extended their work to the analysis of steady quasi-static crack growth under plane strain conditions in elastic-perfectly plastic materials. Numerical studies showed very good agreements with the corresponding asymptotic solutions. Recently the EFGM analysis of steady dynamic crack growth in elastic-plastic materials was done by the same researchers

Page 23: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

14

[69]. They considered both rate-independent materials and rate-dependent materials. Numerical results also agreed well with the analytical solutions.

The application of EFGM to acoustic wave propagation was investigated by Bouillard and Suleau [43]. They implemented an EFGM for analyzing harmonic forced response in acoustic problems. When compared to FEM, it was reported that EFGM gave a better control of dispersion and pollution errors, which were the specific problems associated with acoustic numerical analysis.

The most advanced and efficient application of EFGM seems to be the recent work by Krysl and Belytschko [38]. The coupling of FEM-EFGM was used to model arbitrary three-dimensional propagating cracks in elastic bodies. An EFG super-element was developed and embedded in an explicit finite element system. The super-element was used in the region of the problem domain where there was a potential that cracks would propagate through. Complex simulations such as the mixed-mode growth of center through-crack in a finite plate, the mode-I surface-breaking penny-shaped crack in a cube, the penny-shaped crack growing under general mixed-mode conditions, and the torsion-tension rectangular bar with center through crack, were successfully analyzed.

2.1.5 Computational cost and efficiency of the EFGM

One of the major disadvantages of the EFGM is the increased computational cost, when compared to the FEM. Belytschko et al. stated in reference [50] that the additional computational load of EFGM was from several sources listed as follows:

1. The need to identify the nodes in the domain of influence for all points at which the approximating function is calculated;

2. The relative complexity of the shape functions, which increased the cost of evaluating them and their derivatives;

3. The additional expense of dealing with essential boundary conditions.

To date, published works have dealt with the second and the third items in the above list. From Section 2.1.2, the work by Lu et al. [73] simplified the computation of MLS shape functions and the treatment of essential boundary conditions. The work by Belytschko et al. [50] simplified the calculation of MLS shape functions and their derivatives. In addition, the work by Belytschko et al. [48,67] and Hegen [7] that dealt with the coupling of FEM-EFGM simplified the treatment of essential boundary conditions. Belytschko et al. even suggested in reference [48] that EFGM be used only as required for higher accuracy in a problem domain consisting primarily of finite elements so that the computational costliness of the EFGM could be avoided. This was reflected through their recent work in three-dimensional crack propagation simulation in which EFGM was coupled with the parallel version of an FEM program [38], as discussed in Section 2.1.4. It should be noted that the coupling of FEM-EFGM as suggested by Belytschko et al. [48] significantly reduces the high convergence rate associated with the original versions of EFGM. The development of a truly parallel EFGM code resulting in fast EFGM solutions with high convergence rate has not been found in the available literature.

Page 24: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

15

2.2 Parallel Computing

2.2.1 General

Many of today's complex problems in physics, chemistry, biology, meteorology, and engineering require computational speeds well over the limits attainable by sequential machines to get real-time solutions. Carter et al. stated in reference [65] that the trends in development of electronic machines for large scientific computations are pointing toward parallel computer architectures as an answer to the increasing users' demand. According to reference [2], parallel processing is a method of using many small tasks to solve one large problem. It was cited in reference [14] that the ‘divide and conquer’ paradigm, which is the concept of partitioning a large task into several smaller tasks assigned to various computer processors, is frequently employed in parallel programming. Kumar et al. [59] compared parallel processing to a master-workers relationship in which the master divides a task into a set of subtasks assigned to multiple workers who cooperate and accomplish the task in unison. Parallel computing is likely to be the most important tool for sophisticated problems in the near future.

2.2.2 Taxonomy of parallel architectures

Parallel computers can be constructed in many ways. In this section, the taxonomy of parallel architectures taken from reference [59] will be described:

Control mechanism Parallel computers may be classified by their control mechanism as single instruction

stream, multiple data stream (SIMD) or multiple instruction stream, multiple data stream (MIMD). In SIMD parallel computers, all processing units execute the same instruction synchronously, whereas, in MIMD parallel computers, each processor in the computer is capable of executing instructions independent of the other processors.

SIMD computers require less hardware than MIMD computers because they have only one global control unit. They also require less memory because only one copy of the program needs to be stored. In contrast, MIMD computers store the program and operating system on each processor. SIMD computers are naturally suited for data-parallel programs; that is, programs in which the same set of instructions are executed on a large data set. Moreover, SIMD computers require less startup time for communicating with neighboring processors. However, a drawback and limitation of SIMD computers is that different processors cannot execute different instructions in the same clock cycle.

Interaction method Parallel computers are also classified by the method of interaction among processors as

the message-passing and the shared-address-space architectures. In a message-passing architecture, processors are connected using a message-passing interconnection network. Each processor has its own memory called the local or private memory, which is accessible only to

Page 25: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

16

that processor. Processors can interact only by passing messages to each other. This architecture is also referred to as a distributed-memory or private-memory architecture. Message-passing MIMD computers are commonly referred to as multicomputers. The shared-address-space architecture, on the other hand, provides hardware support for read and write access by all processors to a shared address space or memory. Processors interact by modifying data objects stored in the shared address space. Shared-address-space MIMD computers are often referred to as multiprocessors.

It is easy to emulate a message-passing architecture containing p processors on a shared-address-space computer with an identical number of processors. This is done by partitioning the shared address space into p disjoint parts and assigning one such partition exclusively to each processor. A processor sends a message to another processor by writing into the other processor's partition of memory. However, emulating a shared-address-space architecture on a message-passing computer is costly, since accessing another processor's memory requires sending and receiving messages. Therefore, shared-address-space computers provide greater flexibility in programming. Moreover, some problems require rapid access by all processors to large data structures that may be changing dynamically. Such access is better supported by shared-address-space architectures. Nevertheless, the hardware needed to provide a shared-address-space tends to be more expensive than that for a message-passing. As a result, according to references [2,8,32], message-passing is the most widely used interaction method for parallel computers.

Interconnection networks Shared-address-space computers and message-passing computers can be constructed by

connecting processors and memory units using a variety of interconnection networks which can be classified as static or dynamic. Static networks consist of point-to-point communication links among processors and are also referred to as direct networks. Static networks are typically used to construct message-passing computers. On the other hand, dynamic networks are built using switches and communication links. Communication links are connected to one another dynamically by the switching elements to establish paths among processors and memory banks. Dynamic networks are referred to as indirect networks and are normally used to construct shared-address-space computers.

Processor granularity A parallel computer may be composed of a small number of very powerful processors

or a large number of relatively less powerful processors. Processors that belong to the former class are called coarse-grained computers, while those belonging to the latter are called fine-grained computers. Processors that are situated between these are medium-grained computers.

Different applications are suited to coarse-, medium-, or fine-grained computers to varying degrees. Many applications have only a limited amount of concurrency3. Such applications cannot make effective use of a large number of less powerful processors, and are best suited to coarse-grained computers. Fine-grained computers, however, are more cost effective for applications with a high degree of concurrency. 3 The apparently simultaneous execution of two or more routines or programs [33].

Page 26: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

17

The granularity of a parallel computer can be defined as the ratio of the time required for a basic communication operation to the time required for a basic computation. Parallel computers for which this ratio is small are suitable for algorithms requiring frequent communication; that is, algorithms in which the grain size of the computation (before a communication is required) is small. Since such algorithms contain fine-grained parallelism, these parallel computers are often called fine-grained computers. On the contrary, parallel computers for which this ratio is large are suited to algorithms that do not require frequent communication. These computers are referred to as coarse-grained computers.

2.2.3 Performance measures for parallel systems

In measuring the performance of a given algorithm, a sequential algorithm is usually evaluated in terms of its execution time, expressed as a function of the size of its input [59]. However, the execution time of a parallel algorithm depends not only on the input size but also on the architecture of the parallel computer and the number of processors. Therefore, a parallel algorithm cannot be evaluated in isolation from a parallel architecture. A parallel system is defined as the combination of an algorithm and the parallel architecture on which it is implemented. According to the theory of parallel computing in reference [59], there are many measures that are commonly used for evaluating the performance of parallel systems. These will be presented as follows:

Run time The serial run time of a program is the time elapsed between the beginning and the end

of its execution on a sequential computer. The parallel run time is the time that elapses from the moment that a parallel computation starts to the moment that the last processor finishes execution. The serial runtime and the parallel run time are denoted by:

STtimerunSerial = (2.25a)

PTtimerunParallel = (2.25b)

Speedup When evaluating a parallel system, we are often interested in knowing how much

performance gain is achieved by parallelizing a given application over a sequential implementation. Speedup, denoted by S, is a measure that captures the relative benefit of solving a problem in parallel. It is formally defined as the ratio of the serial run time of the best sequential algorithm for solving a problem to the time taken by the parallel algorithm to solve the same problem on p processors. The p processors used by the parallel algorithms are assumed to be identical to the one used by the sequential algorithm. Mathematically, the speedup can be expressed as:

P

S

TT

S = (2.26)

Page 27: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

18

Efficiency Only an ideal parallel system containing p processors can deliver a speedup equal to p.

In practice, ideal behavior is not achieved because while executing a parallel algorithm, the processors cannot devote 100 percent of their time to the computations of the algorithm. Efficiency, denoted by E, is a measure of the fraction of time for which a processor is usefully employed. It is defined as the ratio of speedup (S), to the number of processors (p). In an ideal parallel system, speedup is equal to p and efficiency is equal to one. In practice, speedup is less than p and efficiency is between zero and one, depending on the degree of effectiveness with which the processors are utilized. Mathematically, efficiency is given by:

pSE = (2.27)

Cost The cost of solving a problem on a parallel system is defined as the product of parallel

run time and the number of processors used. Cost is sometimes referred to as work or processor-time product. It reflects the sum of the time that each processor spends solving the problem. The cost of solving a problem on a single processor is the execution time of the fastest known sequential algorithm. A parallel system is said to be cost-optimal if the cost of solving a problem on a parallel computer is comparable to the execution time of the fastest-known sequential algorithm on a single processor.

2.3 Applications of Parallel Processing in Computational Mechanics

As in other fields of scientific and engineering applications, parallel processing has been a promising tool to solve complex FEM problems [1,12,17,22,65,66]. Chiang and Fulton [22] mentioned two methods for parallelizing FEM based on the generation of the element stiffness matrix and force vector, namely, element-by-element parallelism and the subdomain parallelism. In element-by-element parallelism, each processor calculates the matrix and vector of its own elements, one at a time, whereas in the subdomain parallelism method, each processor is responsible for a certain subdomain and calculates all the matrices and vectors of those elements. Subdomain parallelism is similar in basic idea to that of the domain decomposition method which has so far been the predominant method in parallel FEM applications. Examples of the domain decomposition method can be found in references [12], [17] and [66].

Escaig and Marin stated in reference [66] that the domain decomposition method consists of partitioning the initial problem domain into subdomains, solving the initial problem on each subdomain, solving a problem at the interfaces of the subdomains, and back-substituting this solution to the respective subdomains. They noted that, in addition to the benefits of parallel computing, the domain decomposition method offers the possibility of recalculating the solution of a non-linear problem at each step only in the affected subdomains, resulting in even faster solutions. This property may result in a large reduction of the total execution time for the problems over which the non-linearity is irregularly distributed. Yagawa et al. [12] pointed out that the domain decomposition method is a coarse-

Page 28: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

19

grained algorithm. From their study, they claimed that higher performance was achieved when the size of the subdomains were increased, resulting in a larger granularity of the parallel computation.

Besides FEM, parallel processing is also being utilized in the study of meshless methods. According to Günter et al. [11], the domain decomposition concept was used in the RKPM parallel implementation on a distributed memory parallel computer. The domains of influence were analyzed by a parallel analysis technique in the preprocessing step. Each quadrature point was given a tag identifying the processor owning the given point. The quadrature point information was then distributed to and evaluated by its respective processor in parallel. A special technique to enforce the essential boundary conditions was developed. With this technique, the need for additional communication within the solver in order to satisfy the boundary conditions was eliminated. Discrete equations were solved in parallel by the routines in the Portable, Extensible Toolkit for Scientific Computation (PETSc) [45].

2.4 The NASA Beowulf Parallel Computer

Recent rapid increase in performance of mass market commodity PC microprocessors and the significant difference in pricing between PCs and relatively expensive scientific workstations have provided an opportunity for substantial gains in performance-to-cost ratio. This leads to the idea of harnessing PC technology in parallel ensembles to provide high-end capability for scientific and engineering applications [8]. It was cited in reference [42] that the advancement in microprocessor technology enables the Intel based PCs to deliver performance comparable to that of supercomputers. In addition, the availability of low-cost Local Area Network (LAN) connection makes it cheap and easy to combine these powerful PCs or workstations4 to build a high-performance parallel computing environment.

The effort to deliver low-cost high-performance computing platforms to scientific communities has been going on for many years. It was mentioned in reference [42] that a network of PCs is a good candidate since it has the same architecture as the distributed memory multicomputer system5. Many research groups have assembled commodity off-the-shelf (COTS) PCs and fast LAN connections to build parallel computers. The parallel computers of this type are suitable for coarse-grained applications that are not communication intensive because of the high communication start-up time and the limited bandwidth associated with the underlying network architectures [27].

The parallel computers built from the network of PCs, clusters, can be classified by workstation ownership into two types, namely, the dedicated clusters and the non-dedicated clusters [27]. In the case of dedicated clusters, the particular individuals do not own the workstations and the resources are shared so that parallel computing can be performed across the entire cluster. On the contrary, in the case of non-dedicated clusters, the individuals own their workstations and the parallel applications are executed by utilizing idle CPU cycles. The advantage of the former type over the latter is the fast interactive response from the dedicated nodes. 4 The combinations of input, output, and computing hardware that can be used for work by the individuals [33]. 5 The message-passing MIMD parallel computer. See Section 2.2.2 for more detail.

Page 29: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

20

The NASA Beowulf project was one of the largest initiatives for the dedicated cluster-based parallel computer. According to reference [8], the Beowulf project was a NASA initiative sponsored by the High Performance Computing and Communications (HPCC) program to explore the potential of Pile-of-PCs6 and to develop the necessary methodologies to apply these low cost system configurations to NASA computational requirements in the Earth and space sciences. The project emphasized three governing principles [27] that were:

No custom hardware components usage

Beowulf exploited the use of commodity components and computer industry standards that had been developed under competitive market conditions and were in mass production. No individual vendor owned the right to the product therefore the system could be comprised of the hardware components from many sources.

Incremental growth and technology tracking As new PC technologies became available, the Beowulf system administrators had total control over the configuration of the cluster. They could choose to selectively upgrade some components of the system with the new ones that were best suited to their application needs, rather than being restricted to the vendor-based configurations.

Usage of readily available and free software components Beowulf used public domain operating systems and software libraries, which were supplied with source codes. This type of software had been widely accepted and developed in the academic community, therefore the administrators could be confident that their system would deliver high software performance at lowest cost.

The operating point targeted by the Beowulf project was intended for scientific applications and users requiring repeated use of large data sets and large applications with easily delineated coarse-grained parallelism [56]. It was reported in reference [8] that a 16-processor Beowulf costing less than $50,000 sustained 1.25 Gigaflops7, which is comparable to the much more expensive supercomputers, on a scientific space simulation. Because of the low cost but high performance feature, currently, many Beowulf-type parallel computers have been built across the world [77]. In Thailand, a Beowulf-type parallel computer named SMILE was built by Uthayopas et al. [42] at the Faculty of Engineering, Kasetsart University. The Beowulf-type parallel computers provide universities, often with limited resources, an excellent platform to teach parallel programming courses and provide cost effective computing to their computational scientists as well.

6 The term used to describe a loose ensemble of PCs applied in concert to a single problem [8]. 7 109 floating-point operations per second. According to reference [33], FLOPS, a measure of a computer's power, is the number of arithmetic operation performed on data stored in floating-point notation in one second.

Page 30: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

21

CHAPTER 3

BUILDING THE PARALLEL COMPUTING INFRASTRUCTURES

It can be concluded from Section 2.4 that the Beowulf-type parallel computer is a very good choice for parallel processing in an academic environment because of its low cost and high performance characteristics. Therefore, the parallel implementation of the EFGM is done on this platform. The procedure to build the AIT Beowulf, a four-node Beowulf-like parallel computer will be described in this section. The node, in the context of the Beowulf, is one of several computers that are connected via a local area network (LAN) to form a parallel computer.

3.1 Hardware and Operating System Installation

Based on the guidelines in references [42] and [75], currently the AIT Beowulf (see Figure 3-1) is comprised of one server8 node and three workstation9 nodes with the configurations of which are presented in Table 3-1 and Table 3-2, respectively. Dual CPU arrangement was chosen for the server node so that symmetric multiprocessing (SMP)10 could be explored in future works. These nodes are attached to the hub11 described in Table 3-3 to form a local area network.

After the hardware components were connected, Red Hat Linux 6.0, a distribution of the Linux operating system, was installed on each node. Red Hat Linux comes with many choices of operating system components, called packages, to match users’ needs. For the AIT Beowulf, the server operating-system packages were installed on the server node in addition to the workstation packages that were common to all nodes.

Linux, the necessary operating system for the Beowulf-type parallel computer, is a public domain POSIX-compliant UNIX-like operating system that runs on personal computers [27]. Linux is necessary, according to the Beowulf principles [27], because it is readily available and distributed free-of-charge. POSIX, the acronym for Portable Operating System Interface for UNIX, is an IEEE (Institute of Electrical and Electronic Engineers) standard that defines a set of operating-system services. Programs that adhere to this standard can be easily ported from one system to another [33]. Since Linux provides a POSIX compatible UNIX environment, serial and parallel applications written for the computers running UNIX, for example, scientific workstations and supercomputers, can be compiled and run seamlessly on the Beowulf. Red Hat Linux is chosen for the AIT Beowulf because of its powerful network management software and ease of installation. 8 A computer running administrative software that controls access to the network and its resources, such as disk drives, and provides resources to computers functioning as workstations on the local area network [33]. 9 See the previous definition on page 19. 10 A computer architecture in which multiple processors share the same memory, which contains one copy of the operating system, one copy of any applications that are in use, and one copy of the data [33]. 11 A device that joins communication lines at a central location in the network and provides a common connection to all devices on the network [33].

Page 31: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

22

Like the ordinary UNIX networked computer, each node of a Beowulf requires users’ accounts and consistent network properties. Users accounts are created by conventional UNIX system administrative commands that can be found in reference [39]. The consistent network properties for the nodes in the AIT Beowulf are defined based on the RFC 1918 private Internet Protocol addresses (IP addresses) guidelines. The ‘request for comment’ (RFC) is the document in which a standard, a protocol, or other information pertaining to the operation of the Internet is published [33]. The RFC 1918 can also be obtained from the Internet at URL:http://www.alternic.net/rfcs/1900/rfc1918.txt.html. The properties that are common to all nodes are presented in Table 3-4. The properties that are specific to each node are presented in Table 3-5. Since these properties are defined based on the Internet standard, it will be possible to implement the Internet access capability to the AIT Beowulf in the future. For more information about how to assign network properties to the nodes, the readers are referred to references [28] and [39]. Reference [28] contains introductory materials for Linux system administration. Detailed resources on UNIX system administration can be found in reference [39]. An example of the network configuration files for the AIT Beowulf can be found in Appendix A1 on page 84.

100 Mbps Fast Ethernet Hub

Server Nodesvr1.cml.ait.ac.th

192.168.1.1

Workstation Node 1nod1.cml.ait.ac.th

192.168.1.11

Workstation Node 2nod2.cml.ait.ac.th

192.168.1.12

Workstation Node 3nod3.cml.ait.ac.th

192.168.1.13

Figure 3-1 The AIT Beowulf Hardware Configuration

Page 32: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

23

Table 3-1 The Server Hardware Configuration12

Item Description

CPU Dual Intel Pentium III-450 MHz

Motherboard Dual CPU server motherboard

Main Memory 128-MB SDRAM

Hard Drive 16-GB Ultra DMA/66 ATA hard drive

CD-ROM Drive Generic IDE CD-ROM drive

Floppy Disk Drive Generic 1.44-MB floppy disk drive

Network Card 100-Mbps Fast Ethernet card

Display Adaptor OpenGL-capable graphic adaptor

Monitor 17-inch monitor

Keyboard Generic PS/2 keyboard

Mouse Generic PS/2 mouse

12 These tables contain many technical terms and abbreviations that are very common in the computer industry. The readers are referred to standard computer hardware textbooks, such as reference [31], for detailed definitions.

Page 33: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

24

Table 3-2 The Workstation Hardware Configuration13

Item Description

CPU Intel Pentium III-450 MHz

Motherboard Generic motherboard

Main Memory 64-MB SDRAM

Hard Drive 8-GB Ultra DMA-66 ATA Hard Drive

CD-ROM Drive Generic IDE CD-ROM drive

Floppy Disk Drive Generic 1.44-MB floppy disk drive

Network Card 100-Mbps Fast Ethernet card

Display Adaptor Generic display adaptor

Monitor Not required

Keyboard Not required

Mouse Not required

Table 3-3 Networking Equipments14

Item Description

Ethernet Hub 8-port 100-Mbps stackable Fast Ethernet hub

LAN Cable UTP CAT-5 cables with RJ-45 connectors

13 See footnote 12 on page 23. 14 See footnote 12 on page 23.

Page 34: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

25

Table 3-4 Common Network Properties

Item Assigned Value

Network 192.168.1.0

Gateway 192.168.1.254

Broadcast 192.168.1.255

Netmask 255.255.255.0

Domain Name cml.ait.ac.th

Table 3-5 Nodal Specific Network Properties

Node Computer Name Full Name IP Address

Server svr1 svr1.cml.ait.ac.th 192.168.1.1

Workstation #1 nod1 nod1.cml.ait.ac.th 192.168.1.11

Workstation #2 nod2 nod2.cml.ait.ac.th 192.168.1.12

Workstation #3 nod3 nod3.cml.ait.ac.th 192.168.1.13

3.2 Software Configuration

In addition to the installation of hardware components and operating systems, the configuration or installation of the Beowulf fundamental software, which can be divided into the software libraries and the system services, is required. The software libraries that must be installed are the message-passing library and the application specific libraries. For this thesis, the only application specific library is Meschach, a matrix computation library. The system services that must be configured are the Remote Shell and the Network File System.

3.2.1 Software libraries

Message-passing library The Beowulf is a message-passing MIMD parallel computer (see Section 2.4),

therefore, a message-passing infrastructure is needed. The mpich library [62], which is the most widely used [26] free implementation of the Message Passing Interface (MPI) [32], was chosen for the AIT Beowulf. MPI is a message passing standard defined by the MPI Forum, a committee composed of vendors and users formed at the Supercomputing Conference in

Page 35: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

26

1992. Since the goals of the MPI design were portability, efficiency and functionality [27], the parallel software written for the AIT Beowulf can be easily ported to more sophisticated parallel computers. The installation of the mpich library is straightforward and will not be discussed here. The readers are referred to reference [63] for installation procedures. The library is available from the Internet at URL:http://www.mcs.anl.gov/mpi/mpich.

Matrix computation library The EFGM relies heavily on matrix computations. Matrix multiplications and

inversions are needed each time that the MLS shape functions or shape function derivatives are evaluated. Therefore, an efficient and reliable matrix computation library is essential. Meschach, a powerful matrix computation library by Stewart and Leyk [6] at the Australian National University, was chosen. The Meschach library provides user-friendly routines, with sophisticated algorithms, to address all basic operations dealing with matrices and vectors. The readers are referred to reference [6] for detailed installation procedures.

3.2.2 System services

Remote shell

The mpich library mentioned above can be used on a wide variety of parallel-computing platforms. For the Beowulf-type parallel computers, the UNIX remote shell utility is required to run the message-passing parallel software [62].

According to reference [39], the remote shell utility or the rsh command is a UNIX remote utility that allows a user to execute a program on a remote system15 without passing through the password authentication login process. Users can simply specify the hostname or IP address of the remote host to execute a command on that machine.

For remote utilities, which include the Remote Shell, to function, the following are required:

a) A remote shell server program on the remote host

b) An entry in the /etc/hosts file on the remote host

c) An entry in either .rhosts or the /etc/hosts.equiv file on the remote host

The remote shell server program is automatically started in typical Red Hat Linux installations and need not be configured again. The use of a /etc/hosts.equiv file is not advisable for UNIX security reasons [39]. Therefore, the .rhosts file was used in the AIT Beowulf. For details on setting up the /etc/hosts and the .rhosts files, the readers are referred to Appendix A1.3 and A1.4, respectively.

15 A remote computer is a computer that is accessed through some type of communication lines, rather than directly accessed through the keyboard-and-monitor terminal [33].

Page 36: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

27

Network File System The Network File System (NFS) is a UNIX system service developed by

Sun Microsystems to allow users to access a remote file system16 while making it appear as if it were local17. In order to do this, the server has to export the file system to the workstations through settings in the /etc/exports file. The workstations mount18 those exported directories or file systems to their local file system through settings in the /etc/fstab file. Details on configuring the /etc/exports and the /etc/fstab files are presented in Appendix A2.

The use of NFS is necessary for the Beowulf because the parallel application files have to be locally accessible to every node; as the number of nodes increases, it would be practically impossible to make copies of these files on every node in the cluster. The AIT Beowulf NFS configuration is shown in Figure 3-2. The /usr/local and the /home/shared directories on the server node are exported to all workstation nodes. The former stores the mpich and Meschach software libraries, while the latter stores the application files. Software development is done on the server node and the resulting executable files are stored in the /home/shared exported directory. Once the software is to be run, all nodes in the cluster perform the input and output operations to the same exported file systems on the server node.

Server Nodesvr1.cml.ait.ac.th

192.168.1.1

Workstation Node 1nod1.cml.ait.ac.th

192.168.1.11

Workstation Node 2nod2.cml.ait.ac.th

192.168.1.12

Workstation Node 3nod3.cml.ait.ac.th

192.168.1.13

/usr/local

and

/home/shared

export

mountmount

mount

Figure 3-2 The AIT Beowulf NFS Configuration 16 In an operating system, file system is the overall structure in which files are named, stored, and organized. A file system consists of files, directories, and the information needed to locate and access these items [33]. 17 The opposite of remote (see footnote 15). A local device is one that can be accessed directly rather than by means of communication lines [33]. 18 To make the data storage medium accessible to a computer’s local file system [33].

Page 37: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

28

CHAPTER 4

DEVELOPMENT OF THE PARALLEL EFGM SOFTWARE

After the parallel computing infrastructures were set up, ParEFG, the parallel element-free Galerkin analysis computer code was designed. ParEFG is the parallelized version of the Plastic Element-Free Galerkin (PLEFG) software, which was developed by Barry and Saigal [61]. According to reference [60], PLEFG has the capability to analyze three-dimensional, small strain, elastic and elastoplastic problems with nonlinear isotropic and kinematic strain hardening. However, the nonlinear features are beyond the scope of this thesis (see Section 1.4) and are not available in the current version of ParEFG.

4.1 Design Considerations

As discussed in Section 2.4, the architecture of the Beowulf-type parallel computers resemble that of the multicomputer, the message-passing MIMD (multiple instruction multiple data) parallel system, and therefore the multicomputer parallel machine model is employed. In the multicomputer model, the parallel computer is comprised of a number of von Neumann computers19, or nodes, linked by an interconnected network. Each computer executes its own program. This program may access local memory and may send and receive messages over the network. Messages are used to communicate with other computers or, equivalently, to read from and write to remote memories. The multicomputer parallel machine model is illustrated in Figure 4-1 below.

CPUMemory

I N T E R C O N N E C T I O N

Figure 4-1 The Multicomputer Parallel Machine Model Source: Foster [16]

From reference [16], four properties are desirable for high-performance parallel

software: concurrency, the ability to perform many actions at the same time; scalability, the

19 A von Neumann computer is a robust sequential machine model used to study algorithms and programming languages in computer science [16].

Page 38: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

29

resilience of an algorithm to increasing processor counts; locality, a high ratio of local memory accesses to remote memory accesses; and modularity, the decomposition of complex software into a collection of simpler routines. Software with high concurrency keeps the processors working nearly all the time. Software with high scalability displays increasing speedup with an increasing number of processors. Software with high locality accesses local data more frequently than remote data so that the time spent on data transfer is kept to a minimum. Software with high modularity is built from many simple components that are easy to debug and reuse.

In summary, ParEFG is designed and implemented based on the multicomputer parallel machine model, and the concurrency, scalability, locality, and modularity of the design are given particular consideration. Concurrency, scalability and locality can be obtained, based on the divide-and-conquer paradigm, by proper partitioning of the larger analysis tasks into several smaller tasks assigned to the processors, and proper load balancing among the processors so that they can finish working on their tasks at the same time (and are thus never idle).

With regards to task partitioning, the guidelines taken from reference [16] are as follows:

1. The partition must define at least one order of magnitude more subtasks than processors in the target parallel computer.

2. The partition must avoid redundant computation and storage requirements so that the resulting algorithm is scalable when dealing with large problems.

3. The partition must be such that the subtasks are of comparable size so that it would be possible to allocate equal amounts of work to each processor.

4. The partition must be such that the number of subtasks scale with problem size. An increase in problem size should increase the number of tasks rather than the size of individual tasks.

In addition to the four given guidelines, the tasks must be partitioned such that the subtasks can be processed independently, without having to wait for results from others processors.

The major tasks in ParEFG can be listed as computing the global structural stiffness matrix, computing the vector of equivalent nodal forces, solving the global system of equations for the nodal unknowns, and post-processing the solution to obtain the desired displacements and stresses. The partitioning of these tasks is shown in Table 4-1. Based on the guidelines presented above, the tasks of computing the vector of equivalent nodal forces and the global stiffness matrix are partitioned at the integration cell level, that is, one subtask corresponds to one integration cell. The task of post-processing the solution for the desired displacements is partitioned based on the desired displacement locations; one subtask corresponds to one desired displacement location. Similarly, the task of post-processing the solution for the desired stresses is partitioned based on the desired stress locations; one subtasks corresponds to one desired stress location. However, the row-wise cyclic striped partitioning method is used in solving the global system of equations to obtain equation-solving concurrency, as suggested in reference [59]. This will be discussed again in Section 4.3.4.

Page 39: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

30

Table 4-1 Partitioning of the Major Tasks in ParEFG

Tasks Partitioning

Computing the global stiffness matrix

Based on the stiffness background integration cells

Computing the vector of equivalent nodal forces

Based on the integration cells in the force patch data

Solving the global system of equations

Row-wise cyclic striped partitioning of the system of equations

Post-processing the solution for desired displacements

Based on the desired displacement locations

Post-processing the solution for desired stresses

Based on the desired stress locations

4.2 Fundamental Tools

4.2.1 Matrix computation library and message-passing library

The necessary tools needed for the implementation of ParEFG are the matrix computation library and the message-passing library. As discussed in Section 3.2.1, Meschach is used for all vector and matrix computations. The MPI-compliant mpich library is used for all message-passing requests among the ParEFG software processes. (Note the use of the term process: since one processor may run multiple copies of an MPI program at the same time, from now on, each copy of an MPI program on any processor will be referred to as a process.) The frequently used Meschach and MPI library functions are presented in Table 4-2 and Table 4-3, respectively. Detailed explanations about Meschach functions can be found in reference [6]. The readers are referred to [32] and [41] for more details about the MPI function calls.

Table 4-2 Frequently Used Meschach Library Functions

Function Description

get_col() Extract a column from a matrix

get_row() Extract a row from a matrix

m_add() Add two matrices

m_copy() Copy a dense matrix

Page 40: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

31

Table 4-2 Frequently Used Meschach Library Functions (cont’d)

Function Description

m_finput() Input a matrix from a file

m_foutput() Output a matrix to a file

m_free() Deallocate a matrix

m_get() Allocate and initialize a matrix

m_ident() Set a matrix to identity matrix

m_inverse() Invert a matrix

m_mlt() Multiply two matrices

m_transp() Transpose a matrix

mmtr_mlt() Compute the ABT product

mv_mlt() Compute the Ax product

set_col() Set the column of a matrix to a given vector

set_row() Set the row a matrix to a given vector

sm_mlt() Scalar-matrix multiplication

sv_mlt() Scalar-vector multiplication

v_add() Add two vectors

v_copy() Copy a vector

v_finput() Input a vector from a file

v_foutput() Output a vector to a file

v_free() Deallocate a vector

v_get() Allocate and initialize a vector

v_sub() Subtract two vectors

vm_mlt() Compute the xTA product

Page 41: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

32

Table 4-3 Frequently Used MPI Library Functions

Function Description

MPI_Barrier()Pause the program until the function is called by every process

MPI_Bcast() Broadcast and synchronize a message

MPI_Comm_rank() Determine the identifier of the calling process

MPI_Comm_size() Determine the number of total running processes

MPI_Finalize() Terminate an MPI parallel program

MPI_Init() Initialize an MPI parallel program

MPI_Recv() Receive a message from a specified process

MPI_Send() Send a message to a specified process

The Compressed Row Storage (CRS) format [74] shown in Figure 4-2 is implemented as an extension to the existing Meschach matrix data structure to facilitate matrix data transfer by MPI function calls. In CRS, a matrix consisting of many zeros is transformed into three vectors, namely, aval, cval and rp. As the number of zero elements increases, for example, as the size of the stiffness matrix grows, a substantial decrease in data transfer time is obtained. The implementation of CRS also takes symmetry of the stiffness matrix into account, as illustrated in Figure 4-3. With consideration to symmetry, the data transfer time is even more decreased. Therefore, the high locality of ParEFG is maintained.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

aval 10 3 3 3 9 7 8 4 7 8 8 3 8 7 9 8 9 9 2 4 2 -1cval 0 1 3 0 1 2 4 5 1 2 3 0 2 3 4 1 3 4 5 1 4 5

rp 0 3 8 11 15 19 22

−1200402990800978030088704807930030310row 0:

row 1:

row 2:

row 3:row 4:

row 5:

col 0

:

col 1

:

col 2

:

col 3

:

col 4

:

col 5

:

Figure 4-2 Compressed Row Storage (CRS)

Page 42: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

33

−12004099080

7803870

9310row 0:

row 1:

row 2:

row 3:row 4:

row 5:

col 0

:

col 1

:

col 2

:

col 3

:

col 4

:

col 5

:

sym

0 1 2 3 4 5 6 7 8 9 10 11 12 13

aval 10 3 9 7 8 3 8 7 8 9 9 4 2 -1 cval 0 0 1 1 2 0 2 3 1 3 4 1 4 5

rp 0 1 3 5 8 11 14

Figure 4-3 Symmetric Compressed Row Storage

4.2.2 Dynamic load-balancing agent

Load balancing, mentioned in Section 4.1, plays an important role in the performance of parallel software, including ParEFG. If unbalanced workloads are assigned to the processes, some processes will finish their work first and have to wait for the other processes to finish, resulting in longer run times. There are two basic approaches to balance these workloads: one is to use a pre-analysis software module to optimize load balancing among the processes; the other is to balance the load during the execution of the parallel software.

The former approach was used by Danielson et al. [23] within the framework of the Reproducing Kernel Particle Method [64]. It was reported [23] that balanced workloads could be obtained, but with substantially increased time spent on the optimization task. They stated that this approach might be unfeasible for large three-dimensional models. With regards to the computation of three-dimensional MLS shape functions and shape function derivatives, it is difficult and computationally expensive to search for the number of nodes that are seen by each integration point, and to balance the workload such that the processes finish working at the same time. Thus, the latter approach is more appropriate for the computationally intensive tasks related to the EFGM.

In this research, a dynamic load-balancing agent named Qserv is developed based on the latter approach. It balances the computational load among the processes in the AIT Beowulf during the run time. Qserv employs the queuing technique illustrated in Figure 4-4. The unprocessed subtasks resemble the clients waiting to be served and the ParEFG processes resemble the servers continuously serving the clients. Qserv acts like a clerk that directs the waiting clients to the available servers. When one server finishes serving a client, it asks for one more client from Qserv. Qserv keeps on assigning the clients to the servers until there is no client waiting at all.

Page 43: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

34

The flowchart of the Qserv computer code is presented in Figure 4-5. To separate the dynamic workload allocation from normal ParEFG operations, the communication between Qserv and the ParEFG processes is done through the UNIX socket concept invented at the University of California at Berkeley [5]. When the Qserv process is started, it creates a socket that allows the ParEFG processes, analogous to the plugs, to simultaneously connect. Initially, the number of total unprocessed subtasks known to Qserv is zero, and one ParEFG process, usually the master, has to inform Qserv the actual value. This number is stored in the max_num variable and can be altered by processes through the SET_MAX_NUM request. A ParEFG process can ask Qserv, through the GET_NUM request, for a subtask to work on. It will be assigned the numerical identifier of an unprocessed subtask, starting from zero to max_num. When the unprocessed subtasks are exhausted, an ALL_DONE signal will be sent to acknowledge the requesting process. During the execution of Qserv, a process can also reset the subtask identifier counter by the RESET_COUNTER request. Qserv will keep on serving ParEFG processes until the TERMINATE signal is received.

SERVER

NODE 2Subtask 3

Being Served

SERVER

NODE 1

SERVER

NODE 4

SERVER

NODE 3Subtask 2

Being Served

WAITING CLIENTS (SUBTASKS)

Subtask 7 Subtask 8 Subtask 9 Subtask 10 Subtask 11 Subtask 12 Subtask 13 Subtask 14

Subtask 4SERVED

Subtask 1SERVED

Subtask 6Arrived

Subtask 5Arrived

A GROUP OF SERVERS

Figure 4-4 Illustration of the Qserv Concept

Page 44: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

35

Initialize the Socket

Accept a client connectionrequest and update max_fd

START

ENDrunstate =TERMINATE

NO

fd <=max_fd

Errorreceiving

request_msg

Close theconnectionand update

max_fd

Process therequest

Receiverequest_msg

YES

runstate = READYcount = 0

max_num = 0

Close theSocketYES

NO

request msg =TERMINATE

request msg =RESET_COUNTER

request msg =SET_MAX_NUM

request msg =GET_NUM

runstate =TERMINATE count = 0

get the newmax_num from

the client

max_num = newmax_num

count <=max_num

send 'count' to theclient

count = count + 1

send the messageALL_DONE to the

client

YES YES YES

YES

NO

YES

move to the next client

YES

NO

fd = current client identifiermax_fd = number of client connections maintained

runstate = run state of the server programcount = current counter value

max_num = maximum counter valuerequest_msg = current client's request message

Figure 4-5 Flowchart of Qserv

Page 45: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

36

4.3 Implementation

In this section, the implementation of ParEFG will be discussed. When a parallel program is run, each parallel processor will have one copy of the program, termed a process. One process is assigned as the master process while the remaining processes are worker processes, as with the master and workers in the divide-and-conquer paradigm discussed in Section 1.1. The MPI default process identifier of the master is 0. In addition to performing the basic tasks of a worker process, the master process will have to do additional work involved with coordinating the tasks among all the workers. Therefore the master process is assigned to run on the server node, which is the most powerful processor, in terms of both processor speed and core memory, in the AIT Beowulf.

The flowchart of ParEFG is shown in Figure 4-6. The analysis procedures in ParEFG can be grouped into five phases, namely, the pre-processing phase, the stiffness matrix formulation phase, the force vector formulation phase, the solution phase, and the post-processing phase. The custom-made parallel Gaussian elimination equation solver, developed based on the algorithm presented in reference [59], is employed in the solution phase because most parallel equation solvers available in the literature are specially made for banded, sparse matrices, which do not match the banded, dense property of the EFGM stiffness matrices. Detailed implementations of the analysis phases are discussed as follows:

4.3.1 Pre-processing phase

When a parallel EFGM analysis is started, the master processes the input file and broadcasts the processed input data, such as the EFGM parameters, the material properties, the nodal coordinates, the distributed load data, and the desired displacement and stress locations, to the workers. The master process and the worker processes then connect to the Qserv queue server in order to use the dynamic load-balancing facilities.

4.3.2 Stiffness matrix formulation phase

After the pre-processing phase, the processes call the ddefg_stiff module (see the flowchart in Figure 4-7). In this module, the master sends the SET_MAX_NUM request to Qserv to set the total number of background integration cells for the global stiffness matrix so that the master and the workers can work on the assigned integration cells, through the GET_NUM request, to compute the global stiffness matrix. When the processors finish working, that is, they receive the ALL_DONE signal from Qserv, the partial global stiffness matrices are transferred to and assembled by the master. The assembled global stiffness matrix is kept on the master node waiting for the solution stage.

4.3.3 Force vector formulation phase

In this phase the processes call the ddforce module (see the flowchart in Figure 4-8). The workers call this module immediately after it enter this phase; however, the master alone assembles the vector of concentrated nodal loads, in which no shape functions or shape

Page 46: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

37

function derivatives are needed. After the master is finished computing the vector of concentrated loads, it sends the SET_MAX_NUM request to Qserv to set the total number of distributed load integration patches. The master and the workers then work on the assigned integration patches, through the GET_NUM request, to compute the global vector of equivalent nodal loads. When the processes receive the ALL_DONE signal from Qserv, the partial global vector of equivalent nodal loads are transferred to and assembled by the master. Having the global vector of equivalent nodal loads stored locally, the master sums the vector of concentrated loads and the vector of equivalent nodal loads, scales the resultant load vector by the user-supplied load factor, and then proceeds to the solution stage.

4.3.4 Solution phase

After the global stiffness matrix and the global force vector are computed, the system of equations is solved. The solution phase is separated into two steps: first, the boundary condition step, which deals with the application of the boundary condition; second, the parallel Gaussian elimination step, which deals with the parallel solution of the system of equations.

Boundary condition step Before the system of equations can be solved, the boundary conditions must be applied.

The modules for the master and the workers involved in this step are the master_ddsolve and the worker_ddsolve, respectively. The flowcharts for these modules are presented in Figure 4-9. The master applies the boundary conditions to the locally stored system of equations then calls the master_parallel_gauss module to have it solved in parallel. The workers do not have the system of equations in local memories, they calls the worker_parallel_gauss module to participate in the parallel solving routine.

Parallel Gaussian elimination step The system of discrete equations is solved in the parallel Gaussian elimination step. The

participating modules for the master and the workers are the master_parallel_gauss and the worker_parallel_gauss, respectively. The flowcharts for these modules are presented in Figure 4-10.

To facilitate the message-passing communication, the global stiffness matrix and the global force vector, which are stored locally on the master, are combined into a new two-dimensional array or matrix. This new matrix is then partitioned into packages, and the packages are distributed to the processes, including the master process, in the parallel machine. According to the algorithm presented in reference [59], the system of equations should be partitioned in the row-wise cyclic striped manner (see Figure 4-11) for highest concurrency. The resulting package is illustrated in Figure 4-12.

After the processes receive the packages, they call the parallel_gauss module, which is the main module that perform the parallel Gaussian elimination. The flowchart of this module is presented in Figure 4-13. Similar to the serial Gaussian elimination, n-1 iterations have to be done on the n-by-n system of equations. If the stiffness matrix (or the

Page 47: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

38

coefficient matrix) is represented by A and the right hand side force vector is represented by b, according to the flowchart, the k-th row of the global equations is divided by the A[k,k] element. After this step, the divided row is broadcast to all processes so that they can perform the forward elimination step of their locally stored rows. When the forward elimination step is done, the backward substituting step is performed.

In the backward substitution step, the k-th row of the global equation is back-substituted first and the global solution for this row is obtained. This solution is then broadcast to all processes so that they can substitute it to their locally stored rows. When the backward substitution step is done, the whole global system of equations is solved. The worker processes then transfer their locally stored global solution to the master. The master assembles the global solution vector (or vector of the solved unknown displacements), writes it to the output file, and uses it to post-process for the desired displacements and stresses.

4.3.5 Post-processing phase

After the unknown nodal displacements are obtained, the processes enter the post-processing phase. In this phase, the processes share their workloads based on the row-wise cyclic striped partitioning of the desired displacement location matrix and the desired stress location matrix. As an example, according to Figure 4-11, if there are 10 desired displacement locations and 10 desired stress locations, Process 0 (or the master) will be responsible for the interpolations and computations of the zeroth, the fourth, and the eight desired displacements and stresses, and Process 3 will be responsible for the interpolations and computations of the third and the seventh desired displacements and stresses. When the displacements and stresses are computed, they are gathered to the master to be written to the output file.

Page 48: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

39

START

Broadcast the processed input data

Connect to the queue server

ddefg_stiff(form the stiffness matrix)

Form the concentrated load vector

ddforce(form the distributed load vector)

Assemble the global force vectors

master_ddsolve(apply B.C.'s then solve eqns)

ddpost(post-process for desired

displacements and stresses)

END

START

Receive the processed input data

Connect to the queue server

ddefg_stiff(form the stiffness matrix)

ddforce(form the distributed load vector)

worker_ddsolve(solve eqns)

ddpost(post-process for desired

displacements and stresses)

END

broadcast

gather

gather

collaborate

gather

MASTER PROCESS WORKER PROCESSES

Disconnect from the queue server Disconnect from the queue server

Write the post-processedresults to the output file

Write nodal displacementsto the output file

dd_input(process the input file)

Figure 4-6 Flowchart of ParEFG

Page 49: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

40

ABBREVIATIONS

gp = integration point

VARIABLES

ncell = number of background integration cellsevec = EFGM parameterspvec = problem parametersmvec = material parametersCNODES = background integration cell nodal coordinatesCCON = background integration cell nodal connectivityNODES = nodal coordinatesmyid = process identifiericell = number of current background integration cellngauss = number of integration points in each directiongweight = total integration weight at a gpdvol = differential volume in an integration cell

Figure 4-7 Flowchart of the ddefg_stiff module

Page 50: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

41

START

Get myid, ncell, evec, pvec, mvec, CNODES,CCON, NODES from the caller function

Initialize the Stiffness Matrix, K

myid =MASTER Set max_num of Qserv to ncellYES

NO

Get icell to work onfrom Qserv

icell =ALL_DONE

Gather the stiffnessmatrices to the master END

Get the integration points and weights incurrent cell

Get the list of the connecting nodesin this cell

Get the X, Y, Z coordinates of the connecting nodes

Loop over the gp inthe cell

NO

YES

Form the material matrix, D(equation 2.24c)

Figure 4-7 Flowchart of the ddefg_stiff module (cont’d)

Page 51: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

42

current gp> number of

last gp

Evaluate FEM shape function and derivativesat current gp

Calculate the Jacobian value for current gp

dvol = gweight x Jacobian

Determine global coordinate of current gp

Evaluate the B matrix at current gp(equation 2.24a)

Sum up the effects ofcontributing nodes

current node> last node in

the list

Add the stiffness contribution to the nodesinfluencing current gp

NO

YES

NO

Move to next gpMove to next node YES

Search for the nodes contributing to that gp andsave them to a list

Formulate the MLS shape functionderivatives

Compute stiffness contribution from current gp(equation 2.23a)

Figure 4-7 Flowchart of the ddefg_stiff module (cont’d)

Page 52: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

43

ABBREVIATIONS

gp = integration point

VARIABLES

nforce = number of distributed load integration patchesevec = EFGM parameterspvec = problem parametersFNODES = distributed nodal load dataFDATA = distributed load patch nodal connectivityNODES = nodal coordinatesmyid = process identifieriforce = number of current force integration cellngauss = number of integration points in each directiongweight = total integration weight at a gpdarea = differential area in an integration cell

Figure 4-8 Flowchart of the ddforce module

Page 53: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

44

START

Get myid, nforce, evec, pvec, FNODES,FDATA, NODES from the caller function

Initialize the Force Vector, f

myid =MASTER

Set max_num ofQserv to nforceYES

NO

Get iforce to workon from Qserv

iforce =ALL_DONE

Gather the force vectors tothe master END

Read current cell connectivity from FDATA

Read the distributed load nodal data forcurrent connecting nodes from FNODES

With ngauss from evec, get the natual coordinates and theassociated weight values for current integration cell

Loop over the gp inthe cell

NO

YES

Figure 4-8 Flowchart of the ddforce module (cont’d)

Page 54: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

45

current gp> number of

last gp

Evaluate FEM shape function and derivativesat current gp

Calculate the Jacobian value for current gp

darea = gweight x Jacobian

Determine global coordinate of current gp

Determine the traction at current gp

Search for the nodes contributing to that gp andsave them to a list

Formulate the MLS shape functions

Sum up the effects ofcontributing nodes

current node> last node in

the list

Calculate the force vector contribution fromcurrent node

Add the force vector contribution to theglobal force vector

NO

NO

Move to next gpMove to next node YES

YES

Figure 4-8 Flowchart of the ddforce module (cont’d)

Page 55: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

46

START

Apply the boundary conditions to thesystem of equations

x = master_parallel_gauss(..., ..., K , f)

return the solved displacementvector to the caller

END

START

call worker_parallel_gauss(..., ..., ...)

END

collaborate

MASTER PROCESS WORKER PROCESSES

Get K, f,and boundary conditions

from the caller

VARIABLES

K = global stiffness matrixf = global force vectorx = solved displacement vector

Figure 4-9 Flowcharts of the master_ddsolve and the worker_ddsolve modules

Page 56: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

47

START

Prepare the packages:keep one to process locally,

distribute the rest

xlocal = parallel_gauss(...,...,local package)

Gather xlocal from workers and assembleinto the global displacement vector, x

END

START

xlocal = parallel_gauss(...,...,local package)

END

collaborate

MASTER PROCESS WORKER PROCESSES

Get the system ofequation (K and f) from

the caller

VARIABLES

K = global stiffness matrixf = global force vectorx = solved displacement vectorxlocal = solved displacement vector

associating to thelocally stored equations

Receive a package to be locally storedfrom the master

Send xlocal to the mastergather

distribute

Return x to the caller

Figure 4-10 Flowcharts of the master_parallel_gauss and the worker_parallel_gauss modules

Page 57: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

48

global row 4

global row 5

global row 6

global row 7

global row 0

global row 1

global row 2

global row 3

global row 8

global row 9

Global Matrix

global row 0 local row 0

global row 4 local row 1

global row 8 local row 2

Process 0

global row 1 local row 0

global row 5 local row 1

global row 9 local row 2

Process 1

global row 2 local row 0

global row 6 local row 1

Process 2

global row 3 local row 0

global row 7 local row 1

Process 3

Local Matrices

Figure 4-11 Row-wise Cyclic Striped Partitioning of Matrices

global row 1 local row 0

global row 5 local row 1

global row 9 local row 2

Process 1

K f

Figure 4-12 A parallel_gauss Package

Page 58: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

49

VARIABLES

A = package of the locally storedstiffness matrix and force vector

soln = locally stored solution vectorassociating with the local rows

n = total number of equations, i.e.,the size of global stiffness matrix

nrow = number of local rowsc = the most recent forward eliminated rowr = the most recent final backward substituted

element of solnk = iteration index

STATUS OF LOCAL ROWS

DONE_NOTHING = untouched local rowDONE_ELIM = forward elimination done

DONE_SUBST = backward substitution done

Figure 4-13 Flowchart of the parallel_gauss module

Page 59: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

50

START

Get a local package, A, from thecaller

Set the status of all local rows asDONE_NOTHING

k = 0

k <= n - 1

There is a row whoseglobal row number = k

mark the detected row:mark = local row number of

the global k-th row

YES

NO

Perform the division step on the marked row:for j := k+1 to n-1 do

A[mark,j] := A[mark,j]/A[mark,k];endforA[mark,k] := 1.00;

Set the status of the marked row asDONE_ELIM

Set c equal to the marked row andprepare to broadcast

YES

NO

Figure 4-13 Flowchart of the parallel_gauss module (cont’d)

Page 60: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

51

Boadcast and synchronize c

Perform the forward elimination step on local rows:for i :=0 to nrow-1 do

if (status of the i-th row is not DONE_ELIM) docount := 1;for j := k+1 to n do

A[i,j] := A[i,j]-A[i,k]*c[count];count := count+1;

endforA[i,k] := 0.00;

endifendfor

k = n - 1

Move to the next iteration:k = k+1;

k >= 0

There is a row whoseglobal row number = k

Set the status of the detected row asDONE_SUBST

Make soln equal to the locally stored force vectorsoln = the n-th column of A

Set r equal to the element of solnassociating with the detected row

and prepare to broadcast

NO

YES

YES

NO

Figure 4-13 Flowchart of the parallel_gauss module (cont’d)

Page 61: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

52

Boadcast and synchronize r

Perform the backward substitution step on local rows:for i :=0 to nrow-1 do

if (status of the i-th row is not DONE_SUBST) dosoln[i] := soln[i]-A[i,k]

endifendfor

Move to the next iteration:k = k-1;

return soln to the caller

END

Figure 4-13 Flowchart of the parallel_gauss module (cont’d)

Page 62: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

53

CHAPTER 5

NUMERICAL RESULTS

Several numerical examples are solved to illustrate the performance and to verify the validity of ParEFG. In the following sections, it will be shown that the results obtained from ParEFG closely match the analytical solutions and exactly match those obtained from the PLEFG serial EFGM analysis software, and the analysis run times, as compared to PLEFG, are reduced.

5.1 Linear Displacement Field Patch Test

5.1.1 Problem description

Y

X

6.0

3.0

Y

Z

3.0

3.0

SIDE VIEW SECTION VIEW

Figure 5-1 Linear Displacement Field Patch Test

A linear displacement field patch test [51] was performed on a square rod with 3-unit height, 3-unit width, and 6-unit length, as shown in Figure 5-1. The input file for this problem is presented in Appendix B1. The planes X=0, Y=0, and Z=0 were fixed in the x-, y-, and z-directions, respectively. A uniform tension load of magnitude 1.0 was applied to the plane X=6.0. A uniform nodal distribution of 447 ×× , an integration cell structure of 336 ×× , and a 666 ×× Gaussian quadrature scheme in each integration cell were employed in the x, y, and z-direction, respectively. Linear basis functions were used in the calculation of the MLS interpolants. The domain of influence for each node was set to a constant value of 4.0. The material parameters used in this analysis were: modulus of elasticity, 0.1=E ; and Poisson’s ratio, 25.0=ν . The analytical solutions for this problem are given as:

Displacements: zuyuxu zyx νν −=−== ,, , and (5.1a)

Stress: 0.1=xxσ . (5.1b)

Page 63: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

54

5.1.2 Accuracy and performance

Figures 5-2 to 5-5 show the comparison of the analytical solutions with the results obtained from PLEFG and ParEFG. In all figures, the results obtained from ParEFG closely match the analytical solutions and exactly match those obtained from PLEFG.

Average speedups20 and average efficiencies21 are shown in Figure 5-6 and Figure 5-7, respectively. Ideally, the speedup values should be the same as the number of processors, and the efficiency values should always be 100 percent.

From Figure 5-6 it can be seen that as the number of processors increased, the overall speedups and the speedups in the stiffness matrix formulation phase approached the theoretical values. However, the speedups in the force vector formulation phase went beyond the theoretical limits, and the speedups in the solution phase decreased with the increasing number of processor counts. The former may be due to cache22 effects of the AIT Beowulf hardware while the latter may be due to the high communication-to-computation ratio. The oddity in the force vector formulation phase did not affect the overall performance of ParEFG since, from Table 5-1, the run time of this phase was only a very small fraction of the overall run time. Therefore, effects of the force vector formulation will be neglected. The 447 ×× nodal distribution in this problem yielded only 336 equations to be solved by the parallel solver. According to the parallel Gaussian elimination algorithm, the performance of the solver should tend to increase with the problem size. It is expected that speedups in the solution phase would increase with the increasing number of processor counts, as the size of the problem grows.

In terms of efficiency, from Figure 5-7, the overall efficiency and the efficiency of the stiffness matrix formulation phase decreased as the number of processors increased. The problem was even worse during the solution phase since the inter-processor communication increased as the number of processors grew. As in the case of the speedup, it is expected that the efficiencies would be higher as the size of the problem grows.

20 See Section 2.2.3 for definitions. 21 See Section 2.2.3 for definitions. 22 A special memory subsystem in which frequently used data values are duplicated for quick access [33].

Page 64: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

55

0.00

1.00

2.00

3.00

4.00

5.00

6.00

0.00 1.00 2.00 3.00 4.00 5.00 6.00

Location, X

Dis

plac

emen

t, u x

Analytical

PLEFG

ParEFG

Figure 5-2 Displacement in the x-direction along the Line y=1.50, z=1.50 for the Linear Displacement Field Patch Test

-0.60

-0.50

-0.40

-0.30

-0.20

0.00 1.00 2.00 3.00 4.00 5.00 6.00

Location, X

Dis

plac

emen

t, u y

Analytical

PLEFG

ParEFG

Figure 5-3 Displacement in the y-direction along the Line y=1.50, z=1.50 for the Linear Displacement Field Patch Test

Page 65: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

56

-0.60

-0.50

-0.40

-0.30

-0.20

0.00 1.00 2.00 3.00 4.00 5.00 6.00

Location, X

Dis

plac

emen

t, u

z

Analytical

PLEFG

ParEFG

Figure 5-4 Displacement in the z-direction along the Line y=1.50, z=1.50 for the Linear Displacement Field Patch Test

0.80

0.90

1.00

1.10

1.20

0.00 0.50 1.00 1.50 2.00 2.50 3.00

Location, Y

σσ σσxx

Analytical

PLEFG

ParEFG

Figure 5-5 Tensile Stress in the x-direction along the Line x=3.00, z=1.50 for the Linear Displacement Field Patch Test

Page 66: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

57

Table 5-1 Average Run Times for the Linear Displacement Field Patch Test

Average Run Time, seconds NP Overall K f Solver

1 555.536E+00 552.511E+00 2.183E+00 520.107E-03 2 293.384E+00 291.485E+00 1.083E+00 595.419E-03 3 198.143E+00 196.709E+00 703.235E-03 535.894E-03 4 151.508E+00 149.911E+00 529.114E-03 894.768E-03

Table 5-2 Average Speedups for the Linear Displacement Field Patch Test

Average SpeedupNP Theoretical Overall K f Solver

1 1.00 1.00 1.00 1.00 1.002 2.00 1.89 1.90 2.02 0.873 3.00 2.80 2.81 3.10 0.974 4.00 3.67 3.69 4.13 0.58

Table 5-3 Average Efficiencies for the Linear Displacement Field Patch Test

Average EfficiencyNP Theoretical Overall K f Solver

1 100% 100% 100% 100% 100%2 100% 95% 95% 101% 44%3 100% 93% 94% 103% 32%4 100% 92% 92% 103% 15%

Page 67: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

58

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

4.00

4.50

0 1 2 3 4 5

Number of Processors

Spee

dup

Theoretical

Overall

K

f

Solver

Figure 5-6 Average Speedups for the Linear Displacement Field Patch Test

0%

20%

40%

60%

80%

100%

120%

0 1 2 3 4 5

Number of Processors

Effic

ienc

y

TheoreticalOverallK

fSolver

Figure 5-7 Average Efficiencies for the Linear Displacement Field Patch Test

Page 68: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

59

5.2 Cantilever Beam with End Load

5.2.1 Problem description

Y

Z

L = 12.0P

Y

X H=1.0

D=1.0

SIDE VIEW SECTION VIEW

Figure 5-8 Cantilever Beam with End Load

A cantilever beam with height 0.1=H unit, depth 0.1=D unit, and length 0.12=L units, as shown in Figure 5-8, was considered. The input file for this problem can be

found in Appendix B2. The left end of the beam was fixed. A uniformly distributed shearing traction, with a total equivalent load of 0.10=P , was applied to the free end of the beam. A uniform nodal distribution of 5511 ×× , an integration cell structure of 4410 ×× , and a

666 ×× Gaussian quadrature rule in each direction were employed in the x, y, and z-directions, respectively. Quadratic basis functions were used in the calculation of the MLS interpolants. The domain of influence for each node was set to a constant value of 4.8. The material parameters used in the analysis were: modulus of elasticity, 7100.21 ×=E ; and Poisson’s ratio, 30.0=ν . According to Euler beam theory, the analytical solutions for this problem are given as:

Displacement: ( )xLEIxPu y −−= 3

6

2

, and (3.2a)

Bending Stress: I

yPLxx =σ . (3.2b)

5.2.2 Accuracy and performance

Figures 5-9 and 5-10 show the comparison of the analytical solutions with the results obtained from PLEFG and ParEFG. The vertical displacements and bending stresses obtained from ParEFG, as shown in Figure 5-9 and Figure 5-10, closely match those obtained from the analytical solution and exactly match those obtained from PLEFG.

Average speedups and average efficiencies are shown in Figure 5-11 and Figure 5-12, respectively. From Figure 5-11 it can be seen that the overall speedups and the speedups in

Page 69: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

60

the stiffness matrix formulation phase approached the theoretical values, which is in the same trend as the Linear Displacement Field Patch Test problem. This figure shows also that the speedups in the solution phase are better than those obtained in Section 5.1, as expected. The reason is that the 5511 ×× nodal distribution yielded 825 equations, which is 2.5 times more than those in the previous section, therefore the communication-to-computation ratio decreased, resulting in the higher performance of the parallel solver. In terms of efficiency, from Figure 5-12, the overall efficiency and the efficiency of the stiffness matrix formulation phase were slightly better while the efficiency during the solution phase was increased by nearly two times. The reason for this is the same as in the case of speedup, that is, the communication-to-computation ratio decreased.

000E+00

100E-06

200E-06

300E-06

400E-06

0.00 2.00 4.00 6.00 8.00 10.00 12.00Location, X

Vert

ical

Dis

plac

emen

t, u y

Beam Theory

PLEFG

ParEFG

Figure 5-9 Vertical Displacement along the Neutral Axis (line y=0.50, z=0.50) for a Cantilever Beam under a Concentrated Force

Page 70: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

61

-400.00

-200.00

0.00

200.00

400.00

-0.50 -0.25 0.00 0.25 0.50

Location, Y

Ben

ding

Str

ess

Beam Theory

PLEFG

ParEFG

Figure 5-10 Bending Stress Distribution along the line x=6.00, z=0.50 for a Cantilever Beam under a Concentrated Force

Page 71: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

62

Table 5-4 Average Run Times for a Cantilever Beam under a Concentrated Force

Average Run Time, seconds NP Overall K f Solver

1 9.031E+03 9.012E+03 7.670E+00 9.692E+00 2 4.703E+03 4.690E+03 3.812E+00 8.563E+00 3 3.190E+03 3.180E+03 2.752E+00 6.354E+00 4 2.403E+03 2.394E+03 1.914E+00 6.133E+00

Table 5-5 Average Speedups for a Cantilever Beam under a Concentrated Force

Average SpeedupNP Theoretical Overall K f Solver

1 1.00 1.00 1.00 1.00 1.002 2.00 1.92 1.92 2.01 1.133 3.00 2.83 2.83 2.79 1.534 4.00 3.76 3.76 4.01 1.58

Table 5-6 Average Efficiencies for a Cantilever Beam under a Concentrated Force

Average EfficiencyNP Theoretical Overall K f Solver

1 100% 100% 100% 100% 100%2 100% 96% 96% 101% 57%3 100% 94% 94% 93% 51%4 100% 94% 94% 100% 40%

Page 72: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

63

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

4.00

4.50

0 1 2 3 4 5

Number of Processors

Spee

dup

Theoretical

Overall

Kf

Solver

Figure 5-11 Average Speedups for a Cantilever Beam under a Concentrated Force

0%

20%

40%

60%

80%

100%

120%

0 1 2 3 4 5

Number of Processors

Effic

ienc

y

TheoreticalOverallKfSolver

Figure 5-12 Average Efficiencies for a Cantilever Beam under a Concentrated Force

Page 73: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

64

5.3 Pure Bending of Thick Circular Arch

5.3.1 Problem description

Z

Y

H=1.0

D=1.0

SECTION VIEW

Y

MX

SIDE VIEW

r 0θθθθ

M applied asa Distributed Load

X

Figure 5-13 Pure Bending of Thick Circular Arch

A circular arch with average radius 5.10 =r units, height 0.1=H unit; and depth 0.1=D unit, as shown in Figure 5-13, was considered. The input file for this problem can be

found in Appendix B3. The left end of the arch was fixed. A distributed load of constant magnitude in the z-direction and varying in the x-direction, which has the equivalent effect of a counter-clockwise moment of magnitude 600=M , was applied at the free end. A nodal distribution of 5513 ×× , an integration cell structure of 4412 ×× , and a 666 ×× Gaussian quadrature rule in each integration cell were employed in the θ -, H-, and D-directions, respectively. Quadratic basis functions were used in the calculation of the MLS interpolants. The domain of influence for each node was set to a constant value of 1.33. The material parameters used in the analysis were: modulus of elasticity, 7100.21 ×=E ; and Poisson’s ratio, 30.0=ν . The analytical solutions for this problem are taken from reference [61] and plotted in Figures 5-11 and 5-12.

5.3.2 Accuracy and performance

Figures 5-14 and 5-15 show the comparison of the analytical solutions with the solutions obtained from PLEFG and ParEFG. Again, in all figures, the results obtained from ParEFG closely match the analytical solutions and exactly match those obtained from PLEFG. This shows that the accuracy of ParEFG was maintained.

The average speedups and average efficiencies, as shown Figure 5-16 and Figure 5-17, approach the values expected under ideal conditions, as in the example presented in Section 5.2. This result is expected since the same quadratic basis functions were used and the

5513 ×× nodal distribution yielded 975 degrees of freedom, which is only 1.2 times more than in the case of the cantilever beam.

Page 74: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

65

-10E-06

000E+00

10E-06

20E-06

30E-06

40E-06

50E-06

0 15 30 45 60 75 90

Location, θθθθ (deg)

Dis

plac

emen

t, u x

Analytical

PLEFG

ParEFG

Figure 5-14 Displacement in the x-direction along the Neutral Axis of a Thick Circular Arch under Pure Bending

-2000

-1000

0

1000

2000

3000

1.00 1.20 1.40 1.60 1.80 2.00

Location, Radius (in)

Tang

entia

l Str

ess

Analytical

PLEFG

ParEFG

Figure 5-15 Tangential Stress Distribution through the Thickness of a Thick Circular Arch under Pure Bending

Page 75: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

66

Table 5-7 Average Run Times for a Thick Circular Arch under Pure Bending

Average Run Time, seconds NP Overall K f Solver

1 14.794E+03 14.770E+03 1.103E-03 15.947E+002 7.630E+03 7.612E+03 1.291E-03 13.932E+003 5.158E+03 5.144E+03 1.327E-03 10.370E+004 3.877E+03 3.865E+03 1.193E-03 9.467E+00

Table 5-8 Average Speedups for a Thick Circular Arch under Pure Bending

Average SpeedupNP Theoretical Overall K f Solver

1 1.00 1.00 1.00 1.00 1.002 2.00 1.94 1.94 0.85 1.143 3.00 2.87 2.87 0.83 1.544 4.00 3.82 3.82 0.92 1.68

Table 5-9 Average Efficiencies for a Thick Circular Arch under Pure Bending

Average EfficiencyNP Theoretical Overall K f Solver

1 100% 100% 100% 100% 100%2 100% 97% 97% 43% 57%3 100% 96% 96% 28% 51%4 100% 95% 96% 23% 42%

Page 76: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

67

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

4.00

4.50

0 1 2 3 4 5

Number of Processors

Spee

dup

TheoreticalOverallK

fSolver

Figure 5-16 Average Speedups for a Thick Circular Arch under Pure Bending

0%

20%

40%

60%

80%

100%

120%

0 1 2 3 4 5

Number of Processors

Effic

ienc

y

Theoretical

Overall

K

f

Solver

Figure 5-17 Average Efficiencies for a Thick Circular Arch under Pure Bending

Page 77: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

68

5.4 Extension of a Strip with One Circular Hole

5.4.1 Problem description

1.08.0

2.0a=0.25

S=10.0 S=10.0

SIDE VIEW SECTION VIEW

a) Actual Problem

Y

X

4.0

1.0a=

0.25

Y

Z

S=10.0

1.0

SIDE VIEW SECTION VIEW

b) The Analysis Model

Figure 5-18 Extension of a Strip with One Circular Hole

A strip with one circular hole subjected to uniformly tensile traction, 0.10=S , at both ends, as shown in Figure 5-18a, was considered. The geometrical data was assumed as: width, 0.2=W ; length, 0.8=L ; thickness, 0.1=H ; and radius of the hole, 25.0=a . The input file for this problem can be found in Appendix B4. Symmetry was considered and only one quarter of the strip was modeled, as shown in Figure 5-18b. Due to symmetry, the x-displacement was fixed on the face X=0, and the y-displacement was fixed on the face Y=0. The quarter strip was discretized into 950 nodes and 648 integration cells. A 444 ×× Gaussian quadrature scheme was employed in the x, y, and z-directions, respectively in each

Page 78: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

69

integration cell. Quadratic basis functions were used in the calculation of the MLS interpolants. The domain of influence was set to a constant value of 1.0. The material parameters used in the analysis were: modulus of elasticity, 7100.21 ×=E ; and Poisson’s ratio, 30.0=ν . The analytical solutions for this problem [47] in cylindrical coordinates are:

θσ 2cos4312

12 2

2

4

4

2

2

−++

−=

ra

raS

raS

rr , and (3.3a)

θσθθ

2cos312

12 4

4

2

2

+−

+=

raS

raS . (3.3b)

5.4.2 Accuracy and performance

The comparison of the analytical solutions with the results obtained from PLEFG and ParEFG is shown in Figures 5-19 and 5-20. The results obtained from ParEFG exactly match those obtained from PLEFG, however there are some discrepancies between these numerical solutions and the analytical solutions. In both figures, the results are oscillating, which is one indicator of under-integration of the global structural stiffness matrix. The tensile stress distribution along the x-axis in Figure 5-20 appears to oscillate about the analytical solution while the tensile stress distribution along the y-axis in Figure 5-19 appears to oscillate about an imaginary line drawn slightly above the analytical solution. This discrepancy may be due to the fact that quadratic basis functions were used to approximate the quartic analytical solutions in equations (3.3a) and (3.3b). If the quartic basis functions had been used while the integration rule was held constant, the results in both figures may both oscillate about the analytical solution.

Average speedups and average efficiencies are shown in Figure 5-21 and Figure 5-22, respectively. In this problem, the overall speedups and the speedups in the stiffness matrix formulation phase were decreased, as compared to values obtained in the example of Section 5.3, by half while the speedup in the solution phase increased. The overall efficiency and the efficiency in the stiffness matrix formulation phase decreased, from the range of 95 to 97 percent to the range of 50 to 70 percent, while the efficiency in the solution phase was unchanged. A 950-node discretization of the problem domain was used, which yielded 2,850 degrees of freedom. In the solution phase, this large number of degrees of freedom (equations) resulted in a trend of increasing efficiency since the communication-to-computation ratio is decreasing. However, in the stiffness matrix formulation phase, this large number of degrees of freedom reduced efficiency since the local core memory was exhausted. The EFGM stiffness matrix is dense, and a dense stiffness matrix of 2,850 double precision elements occupies nearly 62 megabytes (2,8502 times 8 bytes per element), which is nearly equal to the core memory of the worker nodes. A portion of core memory must be provided to the operating system; therefore, with such a large number of degrees of freedom, frequent swapping to memory page files on the worker nodes was necessary. Page file swapping is very slow compared to the time required for accessing core memory. This is most likely the reason that the speedup and efficiency of the stiffness matrix formulation phase dramatically dropped, resulting in the low overall speedup and efficiency.

Page 79: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

70

0

10

20

30

40

50

60

70

0.25 0.50 0.75 1.00Location, Y

σσ σσxx

Timoshenko

PLEFG

ParEFG

Figure 5-19 Tensile Stress Distribution along the Line through the Center of the Hole and Perpendicular to the x-axis,

for a Strip with One Circular Hole under Uniform Tension

0

5

10

15

0.25 1.00 1.75 2.50 3.25 4.00

Location, X

σσ σσxx

Timoshenko

PLEFG

ParEFG

Figure 5-20 Tensile Stress Distribution along the Line through the Center of the Hole and Perpendicular to the y-axis,

for a Strip with One Circular Hole under Uniform Tension

Page 80: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

71

Table 5-10 Average Run Times for a Strip with One Circular Hole under Uniform Tension

Average Run Time, seconds NP Overall K f Solver

1 59.754E+03 59.055E+03 6.554E+00 432.611E+002 44.704E+03 44.165E+03 2.941E+00 409.879E+003 35.566E+03 35.195E+03 1.980E+00 284.910E+004 28.742E+03 28.455E+03 1.566E+00 221.999E+00

Table 5-11 Average Speedups for a Strip with One Circular Hole under Uniform Tension

Average SpeedupNP Theoretical Overall K f Solver

1 1.00 1.00 1.00 1.00 1.002 2.00 1.34 1.34 2.23 1.063 3.00 1.68 1.68 3.31 1.524 4.00 2.08 2.08 4.18 1.95

Table 5-12 Average Efficiencies for a Strip with One Circular Hole under Uniform Tension

Average EfficiencyNP Theoretical Overall K f Solver

1 100% 100% 100% 100% 100%2 100% 67% 67% 111% 53%3 100% 56% 56% 110% 51%4 100% 52% 52% 105% 49%

Page 81: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

72

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

4.00

4.50

0 1 2 3 4 5

Number of Processors

Spee

dup

TheoreticalOverallKfSolver

Figure 5-21 Average Speedups for a Strip with One Circular Hole under Uniform Tension

0%

20%

40%

60%

80%

100%

120%

0 1 2 3 4 5

Number of Processors

Effic

ienc

y

Theoretical

Overall

K

f

Solver

Figure 5-22 Average Efficiencies for a Strip with One Circular Hole under Uniform Tension

Page 82: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

73

5.5 Overall Performance

Speedup and efficiency results from numerical examples in Section 5.1 to 5.4 are summarized in Figures 5-23 to 5-26. As shown in Figure 5-23 and Figure 5-25, when the number of degrees of freedom is less than 1,000, the speedup and the efficiency of the stiffness matrix formulation phase gradually increase. However, the speedup and the efficiency of this phase begin to decrease when the number of degrees of freedom exceeds 1,000. One thousand degrees of freedom appears to be the point at which memory page file swapping, discussed in Section 5.4.2, begins to occur. Figure 5-24 and Figure 5-26 show the speedup and the efficiency of the solution phase, respectively. The optimal points, in terms of speedup, to use the parallel Gaussian elimination solver were at about 350, 550, and 600 equations for two, three, and four processors, respectively. When the number of equations was more than 1000, the speedup and efficiency of the solver began to decrease. This may be due to the same reason as in the stiffness matrix formulation phase, that is, memory page file swapping commences. Hence, it can be concluded that the current implementation of ParEFG is scalable up to 1,000 degrees of freedom. Recommendations for increasing the number of degrees of freedom for which ParEFG is scalable are presented in Chapter 6.

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

4.00

4.50

0 1000 2000 3000Degrees of Freedom

Spee

dup

NP1

NP2

NP3NP4

Figure 5-23 Speedups of the Stiffness Computing Module under Various Number of Processes and Degrees of Freedom

Page 83: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

74

0.00

0.50

1.00

1.50

2.00

2.50

0 1000 2000 3000

Degrees of Freedom

Spee

dup

NP1NP2NP3NP4

Figure 5-24 Speedups of the Parallel Equation Solver Module under Various Number of Processes and Degrees of Freedom

50%

60%

70%

80%

90%

100%

110%

0 1000 2000 3000Degrees of Freedom

Effic

ienc

y

NP1

NP2

NP3

NP4

Figure 5-25 Efficiencies of the Stiffness Computing Module under Various Number of Processes and Degrees of Freedom

Page 84: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

75

0%

20%

40%

60%

80%

100%

120%

0 500 1000 1500 2000 2500 3000Degrees of Freedom

Effic

ienc

y

NP1

NP2

NP3

NP4

Figure 5-26 Efficiencies of the Parallel Equation Solver Module under Various Number of Processes and Degrees of Freedom

Page 85: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

76

CHAPTER 6

CONCLUSION AND RECOMMENDATIONS

6.1 Conclusion

AIT Beowulf, a high-performance yet low-cost parallel computer assembled from a network of commodity personal computers, was established. A parallel implementation of the element-free Galerkin method, called ParEFG, was developed on this platform based on an existing sequential element-free Galerkin method computer code. Four desired properties of parallel software, which are concurrency, scalability, locality, and modularity, were taken into account during the design of the parallel version of the element-free Galerkin method. A dynamic load-balancing algorithm was utilized for the computation of the structural stiffness matrix and external force vector and a parallel Gaussian elimination algorithm was employed in the solution for the nodal unknowns (displacements). Several numerical examples showed that the displacements and stresses obtained from the parallel implementation closely matched the analytical solutions and exactly matched solutions obtained by the sequential element-free Galerkin method software. With Qserv, a dynamic load-balancing algorithm, high scalability was obtained for the three-dimensional structural mechanics problems up to approximately 1,000 degrees of freedom. However, scalability was not achieved for larger problems, due to the requirement of full stiffness matrix storage on each processor while only 64 megabytes of memory was available on each worker node. The parallel Gaussian elimination equation solver took less time to solve the system of equation than its sequential counterpart. With larger systems of equations, the efficiency of the parallel equation solver tended to increase because of the increased computation-to-communication ratio. Nevertheless, with the current implementation of the parallel solver, when the number of equations was more than 1,000, high efficiency could not be obtained.

6.2 Recommendations

Because of time constraints, some desirable features were identified but not implemented as part of this research. Recommended extensions to this thesis, which are classified as refinement of ParEFG memory usage, enrichment of the element-free Galerkin analysis software, enhancement of the communication infrastructures, installation of the Internet connection, and design of user-friendly graphical interfaces, are discussed in the following sections.

6.2.1 Refinement of ParEFG memory usage

Full dense stiffness matrix storage is required in the stiffness matrix formulation module of ParEFG. The equation solver module also requires that the full global stiffness matrix and force vector be assembled to the master processor before they are distributed across the parallel machine to the worker nodes. This is not the most efficient method since, as the

Page 86: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

77

number of degrees of freedom or equations grows, the node’s local memory will be exhausted and frequent swapping to memory page files on the hard disk drive will be necessary. Page file swapping is very slow compared to the time required for accessing core memory and thus it is recommended that alternative algorithms are developed to address this problem and obtain a higher level of efficiency of the parallel element-free Galerkin method software.

6.2.2 Enrichment of the element-free Galerkin analysis software

Only the linear elastic analysis option was implemented in the current parallel element-free Galerkin analysis software. However, one attractive feature of the element-free Galerkin method, when compared with the finite element method, is the capability to perform nonlinear analyses of complex structural components without the need for tedious remeshing as the configuration of the problem domain changes. Consequently, it is recommended that small-strain and large-strain geometrically nonlinear formulations, as well as the rate-independent elastoplastic materially nonlinear formulations, be implemented into the parallel element-free Galerkin analysis software.

6.2.3 Enhancement of the communication infrastructures

One factor that affects the performance of parallel software is the remote data communication (see Section 4.1 on page 28). High-performance parallel software should spend the least possible amount of time on data transfer. The MPI message-passing library provides many advanced features that may reduce the communication time but were not investigated during this work. Therefore it is recommended to investigate the MPI advanced features and possibly implement applicable ones into the communication infrastructures of the parallel element-free Galerkin analysis software.

6.2.4 Installation of Internet connection

Currently, there is one access to the AIT Beowulf, that is, by directly logging on to the console. As a result, the parallel computer can serve only one user at a time. Internet connection provides a convenient solution to this problem. If the master node was available to the other computers on the Internet, any user wishing to use the parallel computer could simultaneously login and use it as if they are at the console. Thus, it is recommended that an Internet connection be installed on the AIT Beowulf to maximize the benefit of the parallel machine. This may be done by setting up the telnet connection service or by the remote shell utilities discussed in Section 3.2.

6.2.5 Design of a user-friendly graphical user interfaces

Creating the input files and interpreting the output files can be confusing to users; therefore graphical user interfaces (GUIs) for pre-processing and post-processing are preferable. It is recommended that user-friendly GUI front-end software be developed to provide accesses to the ParEFG engine. This could be written, for example, as a platform-independent Java Applet or a Microsoft Windows based application.

Page 87: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

78

REFERENCES

Printed Publication

1 A. D. Grosso and G. Righetti, “Finite element techniques and artificial intelligence on parallel machines”, Computers and Structures, Vol. 30, No. 4, pp. 999-1007, 1988.

2 A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, and V. Sunderam, PVM: Parallel Virtual Machine, The MIT Press, Cambridge, Massachusetts, USA, 1994.

3 B. Nayroles, G. Touzot, and P. Villon, “Generalizing the finite element methods: Diffuse approximation and diffuse elements”, Computational Mechanics, Vol. 10, pp. 307-318, 1992.

4 C. A. M. Duarte and J. T. Oden, “An hp adaptive method using clouds”, Computer Methods in Applied Mechanics and Engineering, Vol. 139, pp. 237-262, 1996.

5 C. Brown, UNIX Distributed Programming, Prentice Hall International (UK) Limited, UK, 1994.

6 D. E. Stewart and Z. Leyk, Meschach: Matrix Computations in C, Proceedings of the Center for Mathematics and Its Applications, Vol. 32, Australian National University, 1994.

7 D. Hegen, “Element-free Galerkin methods in combination with finite element approaches”, Computer Methods in Applied Mechanics and Engineering, Vol. 135, pp. 143-166, 1996.

8 D. Ridge, D. Becker, P. Merkey, and T. Sterling, “Beowulf: Harnessing the power of parallelism in a Pile-of-PCs”, Proceedings of the IEEE Aerospace conference, 1997.

9 D. Sulsky, S. J. Zhou, and H. L. Schreyer, “Application of a particle-in-cell method to solid mechanics”, Computer and Physics Communications, Vol. 87, pp. 236-253, 1995.

10 E. Onate, S. Idelsohn, O. C. Zienkiewicz, R. L. Taylor, and C. Sacco, “A stabilized finite point method for analysis of fluid mechanics problems”, Computer Methods in Applied Mechanics and Engineering, Vol. 139, pp. 315-346, 1996.

11 F. Günter, W. K. Liu, and D. Diachin, “Multi-scale meshfree parallel computations for viscous, compressible flows”, Accepted for publication in a special issue of Computer Methods in Applied Mechanics and Engineering, 1998.

12 G. Yagawa, N. Soneda, and S. Yoshimura, “A large scale finite element analysis using domain decomposition method on a parallel computer”, Computers and Structures, Vol. 38, No. 5/6, pp. 615-625, 1991.

13 G. Yagawa and T. Yamada, “Free mesh method: A new meshless finite element method”, Computational Mechanics, Vol. 18, Issue 5, pp. 383-386, 1996.

14 H. Adeli and O. Kamal, Parallel Processing in Structural Engineering, Elsevier Science Publishers Ltd., U.K., 1993.

Page 88: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

79

15 I. Kaljevic and S. Saigal, “An improved element free Galerkin formulation”, International Journal for Numerical Methods in Engineering, Vol. 40, pp. 2953-2974, 1997.

16 I. T. Foster, Designing and Building Parallel Programs, Addison-Wesley Publishing Company, USA, 1995.

17 J. C. Luo and M. B. Friedman, “A parallel computational model for the finite element method on a memory-sharing multiprocessor computer”, Computer Methods in Applied Mechanics and Engineering, Vol. 84, pp. 193-209, 1990.

18 J. Dolbow and T. Belytschko, “An introduction to programming the meshless Element-Free Galerkin Method”, Archives of Computational Methods in Engineering, Vol. 5, No. 3, pp. 207-241, 1998.

19 J. Dolbow and T. Belytschko, “Numerical integration of the Galerkin weak form in meshfree methods”, Accepted for publication in Computational Mechanics, September 1998.

20 J. M. Melenk and I. Babuska, “The partition of unity finite element method: Basic theory and applications”, Computer Methods in Applied Mechanics and Engineering, Vol. 139, pp. 289-314, 1996.

21 J. W. Swegle, D. L. Hicks, and S. W. Attaway, “Smoothed particle hydrodynamics stability analysis”, Journal of Computational Physics, Vol. 116, pp. 123-134, 1995.

22 K. N. Chiang and R. E. Fulton, “Concepts and implementation of parallel finite element analysis”, Computers and Structures, Vol. 36, No. 6, pp. 1039-1046, 1990.

23 K. T. Danielson, S. Hao, W. K. Liu, A. Uras, and S. Li, “Parallel computation of meshless methods for explicit dynamic analysis”, Accepted for publication in International Journal for Numerical Methods in Engineering, 1999.

24 L. B. Lucy, “A numerical approach to the testing of the fission hypothesis”, The Astronomical Journal, Vol. 82, No. 12, pp. 1013-1024, 1977.

25 L. D. Libersky and A. G. Petschek, “Smoothed particle hydrodynamics with strength of materials”, in The Next Free Lagrange Conference, H. Trease, J. Fritts, W. Crowley, ed., pp. 248-257, 1991.

26 L. W. Cordes and B. Moran, “Treatment of material discontinuity in the element-free Galerkin method”, Computer Methods in Applied Mechanics and Engineering, Vol. 139, pp. 75-89, 1996.

27 M. Baker and R. Buyya, “Cluster computing: The commodity supercomputer”, Software—Practice and Experience, Vol. 29, No. 6, pp. 551-576, 1999.

28 M. F. Komarinski and C. Collett, Linux System Administration Handbook, Prentice-Hall, Inc., USA, 1998.

29 M. Fleming, Y. A. Chu, B. Moran, and T. Belytschko, “Enriched element-free Galerkin methods for crack tip fields”, International Journal for Numerical Methods in Engineering, Vol. 40, pp. 1483-1504, 1997.

Page 89: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

80

30 M. Shirazaki and G. Yagawa, “Large-scale parallel flow analysis based on free mesh method: A virtually meshless method”, Computer Methods in Applied Mechanics and Engineering, Vol. 174, No. 3/4, May 1999.

31 M. Shnier, Dictionary of PC Hardware and Data Communications Terms, O’Reilly and Associates, USA, 1996.

32 M. Snir, S. Otto, S. Huss-Lederman, D. Walker, and J. Dongarra, MPI: The Complete Reference, The MIT Press, Cambridge, Massachusetts, USA, 1996.

33 Microsoft Press Computer Dictionary, 3rd ed., Microsoft Press, USA, 1997.

34 N. Sukumar, B. Moran, T. Black, and T. Belytschko, “An element-free Galerkin method for three-dimensional fracture mechanics”, Computational Mechanics, Vol. 20, Issue 1/2, pp. 170-175, 1997.

35 P. Krysl, T. Belytschko, “Analysis of thin plates by the element-free Galerkin method”, Computational Mechanics, Vol. 17, Issue 1/2, pp. 26-35, 1995.

36 P. Krysl and T. Belytschko, “Analysis of thin shells by the element-free Galerkin method”, International Journal of Solids and Structures, Vol. 33, No. 20-22, pp. 3057-3080, 1996.

37 P. Krysl and T. Belytschko, “Element-free Galerkin method: Convergence of the continuous and discontinuous shape functions”, Computer Methods in Applied Mechanics and Engineering, Vol. 148, No. 3/4, September 1997.

38 P. Krysl and T. Belytschko, “The element-free Galerkin method for dynamic propagation of arbitrary 3-D cracks”, International Journal for Numerical Methods in Engineering, Vol. 44, pp. 767-800, 1999.

39 P. Kuo, Special Edition Using UNIX, 3rd ed., QUE Press, USA, 1998.

40 P. Lancaster and K. Salkauskas, “Surfaces generated by Moving Least Squares Methods”, Mathematics of Computation, Vol. 37, No. 155, pp. 141-158, July 1981.

41 P. S. Pacheco, A User’s Guide to MPI, University of San Francisco, USA, 1998.

42 P. Uthayopas, T. Angskun, and J. Maneesilp, Building a parallel computer from cheap PCs: SMILE cluster experiences, Technical Report, Computer and Network Systems Research Laboratory, Kasetsart University, Thailand, July 1999.

43 Ph. Bouillard and S. Suleau, “Element-free Galerkin solutions for Helmholtz problems: formulation and numerical assessment of the pollution effect”, Computer Methods in Applied Mechanics and Engineering, Vol. 162, pp. 317-335, 1998.

44 R. A. Gingold and J. J. Monaghan, “Kernel estimates as a basis for general particle methods in hydrodynamics”, Journal of Computational Physics, Vol. 46, pp. 429-453, 1982.

45 S. Balay, W. D. Gropp, L. C. McInnes, and B. F. Smith, PETSc 2.0 Users Manual, Technical Report ANL-95/11 – Revision 2.0.24, Argonne National Laboratory, 1999.

46 S. Beissel and T. Belytschko, “Nodal integration of the element-free Galerkin method”, Computer Methods in Applied Mechanics and Engineering, Vol. 139, pp. 49-74, 1996.

Page 90: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

81

47 S. P. Timoshenko and J. N. Goodier, Theory of Elasticity, 3rd ed., McGraw-Hill, 1970.

48 T. Belytschko, D. Organ, Y. Krongauz, “A coupled finite element-element-free Galerkin method”, Computational Mechanics, Vol. 17, No. 3, pp. 186-195, December 1995.

49 T. Belytschko, Y. Krongauz, D. Organ, M. Fleming, and P. Krysl, “Meshless methods: An overview and recent developments”, Computer Methods in Applied Mechanics and Engineering, Vol. 139, No. 1-4, pp. 3-47, 1996.

50 T. Belytschko, Y. Krongauz, M. Fleming, D. Organ, and W. K. Liu, “Smoothing and accelerated computations in the element-free Galerkin method”, Journal of Computational and Applied Mathematics, Vol. 74, No. 1/2, pp. 111-126, November 1996.

51 T. Belytschko, Y. Y. Lu, and L. Gu, “Element-free Galerkin methods”, International Journal for Numerical Methods in Engineering, Vol. 37, pp. 229-256, 1994.

52 T. Belytschko, Y. Y. Lu, L. Gu, and M. Tabbara, “Element-free Galerkin methods for static and dynamic fracture”, International Journal of Solids Structures, Vol. 32, No.17/18, pp. 2547-2570, 1995.

53 T. Belytschko and M. Fleming, “Smoothing, enrichment and contact in the element-free Galerkin method”, Computers and Structures, Vol. 71, pp. 173-195, 1999.

54 T. Belytschko and M. Tabbara, “Dynamic fracture using element-free Galerkin methods”, International Journal for Numerical Methods in Engineering, Vol. 39, pp. 923-938, 1996.

55 T. Liszka and J. Orkisz, “The finite difference method at arbitrary irregular grids and its application in applied mechanics”, Computer and Structures, Vol. 11, pp. 83-95, 1980.

56 T. Sterling, D. Becker, D. Savarese, J. E. Dorband, U. A. Ranawake, and C. V. Packer, “Beowulf: A parallel workstation for scientific computation”, Proceedings of the 1995 International Conference on Parallel Processing (ICPP), Vol. 1, pp. 11-14, August 1995.

57 T. Zhu and S. N. Atluri, “A modified collocation method and a penalty formulation for enforcing the essential boundary conditions in the element-free Galerkin method”, Computational Mechanics, Vol. 21, Issue 3, pp. 211-222, 1998.

58 U. Häussler-Combe and C. Korn, “An adaptive approach with the element-free-Galerkin method”, Computer Methods in Applied Mechanics and Engineering, Vol. 162, pp. 203-222, 1998.

59 V. Kumar, A. Grama, A. Gupta, and G. Karypis, Introduction to Parallel Computing: Design and Analysis of Algorithms, The Benjamin/Cummings Publishing Company, Inc., USA, 1994.

60 W. Barry, Plastic Element-Free Galerkin (PLEFG) User’s Guide, Department of Civil and Environmental Engineering, Carnegie Mellon University, USA, 1998.

Page 91: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

82

61 W. Barry and S. Saigal, “A three-dimensional element-free Galerkin elastic and elastoplastic formulation”, International Journal for Numerical Methods in Engineering, Vol. 46, pp. 671-693, 1999.

62 W. Gropp and E. Lusk, User's Guide for mpich, a Portable Implementation of MPI, Technical Report ANL-96/6, Argonne National Laboratory, USA, 1996.

63 W. Gropp and E. Lusk, Installation Guide to mpich, a Portable Implementation of MPI, Technical Report ANL-96/5, Argonne National Laboratory, USA, 1996.

64 W. K. Liu, S. Jun, and Y. F. Zhang, “Reproducing kernel particle methods”, International Journal for Numerical Methods in Engineering, Vol. 20, pp. 1081-1106, 1995.

65 W. T. Carter, Jr., T. L. Sham, and K. H. Laws, “A parallel finite element method and its prototype implementation on a Hypercube”, Computers and Structures, Vol. 31, No. 6, pp. 921-934, 1989.

66 Y. Escaig and P. Marin, “Domain decomposition methods and non-linear problems”, in Advances in Computational Structures Technology, B. H. V. Topping, ed., Civil-Comp Press, Edinburgh, UK, 1996.

67 Y. Krongauz and T. Belytschko, “Enforcement of essential boundary conditions in meshless approximations using finite elements”, Computer Methods in Applied Mechanics and Engineering, Vol. 131, pp. 133-145, 1996.

68 Y. X. Mukherjee and S. Mukherjee, “On boundary conditions in the element-free Galerkin method”, Computational Mechanics, Vol. 19, Issue 4, pp. 264-327, 1997.

69 Y. Xu and S. Saigal, “An element free Galerkin analysis of steady dynamic growth of a mode I crack in elastic-plastic materials”, International Journal of Solids and Structures, Vol. 36, pp. 1045-1079, 1999.

70 Y. Xu and S. Saigal, “An element free Galerkin formulation for stable crack growth in an elastic solid”, Computer Methods in Applied Mechanics and Engineering, Vol. 154, pp. 331-343, 1998.

71 Y. Xu and S. Saigal, “Element free Galerkin study of steady quasi-static crack growth in plane strain tension in elastic-plastic materials”, Computational Mechanics, Vol. 22, Issue 3, pp. 255-265, 1998.

72 Y. Y. Lu, T. Belytschko, and M. Tabbara, “Element-free Galerkin method for wave propagation and dynamic fracture”, Computer Methods in Applied Mechanics and Engineering, Vol. 126, pp. 131-153, 1995.

73 Y. Y. Lu, T. Belytschko, L. Gu, “A new implementation of the element free Galerkin method”, Computer Methods in Applied Mechanics and Engineering, Vol. 113, pp. 397-414, 1994.

Page 92: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

83

Internet Sources

74 J. Dongarra, “Compressed Row Storage (CRS)”, Survey of Sparse Matrix Storage Formats, November 1995. URL:http://www.netlib.org/linalg/html_templates/node91.html

75 J. Radajewski and D. Eadline, “Beowulf HOWTO”, November 1998. URL:http://www.linux.org/help/ldp/howto/Beowulf-HOWTO.html

76 N. Langfeldt, “NFS HOWTO”, October 1999. URL:http://www.linux.org/help/ldp/howto/NFS-HOWTO.html

77 P. Merkey, “Beowulf: Introduction & overview”, Center of Excellence in Space Data and Information Sciences, University Space Research Association, Goddard Space Flight Center, Maryland, USA, September 1998. URL:http://www.beowulf.org/intro.html

Page 93: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

84

APPENDIX A

CONFIGURATION FILES

For illustration and troubleshooting purposes, the AIT Beowulf configuration files, namely, the common network configuration files, the Network File System (NFS) configuration files, and the mpich machine file, will be presented in this appendix. Linux network properties are configured in the same manner as the UNIX operating system. They are based on the text files stored in the /etc system directory. The users must have the administrator or root privileges on the computers to edit these files. Generally, if the Red Hat Linux distribution is used, the common network configuration files will be automatically configured.

A1 Common Network Configuration Files

The common network configuration files for the server node and the workstation nodes will be presented in this section.

A1.1 /etc/sysconfig/network

The /etc/sysconfig/network is the file that stores the basic network configuration of a computer. The computer name (host name), the computer domain name, and the gateway Internet Protocol address (IP address) must be assigned in this file. The content of this file for svr1 server node is:

NETWORKING=yesFORWARD_IPV4=yesHOSTNAME=svr1.cml.ait.ac.thDOMAINNAME=cml.ait.ac.thGATEWAY=192.168.1.254GATEWAYDEV=eth0

A1.2 /etc/HOSTNAME

The /etc/HOSTNAME is the file that store the full name of each computer. As an example, the content of this file for svr1 server node is:

#Full Namesvr1.cml.ait.ac.th

A1.3 /etc/hosts

The /etc/hosts is the file that stores the information about the computers that can be accessed inside a network. This file must be the same on every node comprising the Beowulf. The content of this file for the AIT Beowulf is:

Page 94: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

85

#IP Address Full Name Aliases127.0.0.1 localhost.localdomain localhost192.168.1.1 svr1.cml.ait.ac.th svr1192.168.1.11 nod1.cml.ait.ac.th nod1192.168.1.12 nod2.cml.ait.ac.th nod2192.168.1.13 nod3.cml.ait.ac.th nod3

The first column contains the unique IP addresses of the computers; the second column and the third column contain the full names and the aliases of the respective computers.

A1.4 .rhosts

The .rhosts is the file used to identify the hosts and users that can be trusted a single user on the host. This file is necessary for the Beowulf parallel computer and must be stored in the users’ home directories on every node. The content of this file for the AIT Beowulf is:

svr1.cml.ait.ac.th gengnod1.cml.ait.ac.th gengnod2.cml.ait.ac.th gengnod3.cml.ait.ac.th geng

The trusted host names and the trusted user names are given in the first column and second column respectively. The host names must exist in the /etc/hosts file and must be the full names, not the aliases.

A2 The NFS Configuration Files

The specific NFS configuration files for the server node and the workstation nodes will be presented in this section.

A2.1 /etc/exports

To share files among the Beowulf nodes, the directories containing the files must be properly configured in the /etc/exports file on the server node. The content of this file for the svr1 server node is as follows:

#DIRECTORY OPTIONS/home/shared nod1(rw) nod2(rw) nod3(rw)/usr/local nod1(ro) nod2(ro) nod3(ro)

The first column contains the directory names on the server node that are to be shared. The second column contains specific options for the workstation nodes. The rw option stands for read-and-write and the ro option stands for read-only. According to the above configuration, read and write permissions to the shared directory /home/shared on the svr1 server node will be granted to the nod1 workstation node. On the other hand, only the read permission to the shared directory /usr/local will be granted.

Page 95: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

86

A2.2 /etc/fstab

In order for the workstation nodes to access the exported directories the file /etc/fstab must also be configured. This file must be present on every workstation node in the Beowulf. It gives the information on how the workstation nodes will refer to the exported directories from the server. As an example, the content of this file for the nod1 workstation node is as follows:

#device mountpoint fs-type options/dev/hda5 / ext2 defaults 1 1/dev/hda6 swap swap defaults 0 0/dev/fd0 /mnt/floppy ext2 noauto 0 0/dev/cdrom /mnt/cdrom iso9660 noauto,ro 0 0none /proc proc defaults 0 0none /dev/pts devpts mode=0662 0 0svr1:/home/shared /home/shared nfs defaults 0 0svr1:/usr/local /usr/local nfs defaults 0 0

The NFS information is in the last two lines of the example files. According to the

above configuration, nod1 will refer to the svr1’s exported /home/shared directory as /home/shared. The svr1’s exported /usr/local directory will be referred to as /usr/local

Page 96: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

87

APPENDIX B

INPUT FILES

In this appendix, input files for the linear displacement field patch test, the cantilever beam with end load, the pure bending of thick circular arch, and the extension of a strip with one circular hole numerical examples discussed Chapter 5 will be presented. For more information on how to create the input files, the readers are referred to PLEFG User’s Guide [60].

B1 Linear Displacement Field Patch Test (tension.in) # ==========================================================# Linear Elastic Analysis# LINEAR DISPLACEMENT FIELD PATCH TEST# Thiti Vacharasintopchai# March 14, 2000# ==========================================================

# The Vector of Problem ParametersVector: dim: 9# pcode axcode ndim nsteps lfactor print_flag post_flag mtype log_flag

4 1 3 1 1.0 0 0 0 0

# The Vector of EFG ParametersVector: dim: 5# ngauss order weight_type search_type param

6 1 2 1 4

# The Vector of Material Parameters1Matrix: 1 by 5# Young's_Modulus Poisson's_Ratio Yield_Stress Hardening_Param Densityrow 0: 1.0 0.25 10.0 0.15 1.0

# Point One Vector## axcode = 1# x y zVector: dim: 3

0.0 0.0 0.0

# Point Two Vector## axcode = 1# x y zVector: dim: 3

6.0 3.0 3.0

# Number of Cells VectorVector: dim: 36 3 3

# Number of Nodes VectorVector: dim: 37 4 4

Page 97: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

88

# Number of Post-processing Nodes VectorVector: dim: 310 5 5

# Read File Flag NODES (Do not need the nodal coordinate matrix to follow)0

# Read File Flag CNODES (Do not need the nodal coordinate matrix to follow)0

# Read File Flag CCON (Do not need the nodal coordinate matrix to follow)0

# Read File Flag PNODES (Do not need the nodal coordinate matrix to follow)0

# Read File Flag PCON (Do not need the nodal coordinate matrix to follow)0

# The Applied Point Load Data0Matrix: 0 by 3

# The Applied Distributed Load Data25Matrix: 25 by 5row 0: 6.0 0.00 3.00 1.0 1row 1: 6.0 0.00 2.25 1.0 1row 2: 6.0 0.00 1.50 1.0 1row 3: 6.0 0.00 0.75 1.0 1row 4: 6.0 0.00 0.00 1.0 1row 5: 6.0 0.75 3.00 1.0 1row 6: 6.0 0.75 2.25 1.0 1row 7: 6.0 0.75 1.50 1.0 1row 8: 6.0 0.75 0.75 1.0 1row 9: 6.0 0.75 0.00 1.0 1row 10: 6.0 1.50 3.00 1.0 1row 11: 6.0 1.50 2.25 1.0 1row 12: 6.0 1.50 1.50 1.0 1row 13: 6.0 1.50 0.75 1.0 1row 14: 6.0 1.50 0.00 1.0 1row 15: 6.0 2.25 3.00 1.0 1row 16: 6.0 2.25 2.25 1.0 1row 17: 6.0 2.25 1.50 1.0 1row 18: 6.0 2.25 0.75 1.0 1row 19: 6.0 2.25 0.00 1.0 1row 20: 6.0 3.00 3.00 1.0 1row 21: 6.0 3.00 2.25 1.0 1row 22: 6.0 3.00 1.50 1.0 1row 23: 6.0 3.00 0.75 1.0 1row 24: 6.0 3.00 0.00 1.0 1

16Matrix: 16 by 4row 0: 0 1 6 5row 1: 1 2 7 6row 2: 2 3 8 7row 3: 3 4 9 8row 4: 5 6 11 10row 5: 6 7 12 11row 6: 7 8 13 12row 7: 8 9 14 13row 8: 10 11 16 15

Page 98: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

89

row 9: 11 12 17 16row 10: 12 13 18 17row 11: 13 14 19 18row 12: 15 16 21 20row 13: 16 17 22 21row 14: 17 18 23 22row 15: 18 19 24 23

# The displacement boundary condition data0Matrix: 0 by 3

# The fixed plane data3Matrix: 3 by 8row 0: 1 0.0 0.0 0.0 0.0 -1.0 0.0 0.0row 1: 2 0.0 0.0 0.0 0.0 0.0 -1.0 0.0row 2: 3 0.0 0.0 0.0 0.0 0.0 0.0 -1.0

# The locations where displacements are desired7Matrix: 7 by 3row 0: 0.0 1.5 1.5row 1: 1.0 1.5 1.5row 2: 2.0 1.5 1.5row 3: 3.0 1.5 1.5row 4: 4.0 1.5 1.5row 5: 5.0 1.5 1.5row 6: 6.0 1.5 1.5

# The locations where stresses are desired7Matrix: 7 by 3row 0: 3.0 0.0 1.5row 1: 3.0 0.5 1.5row 2: 3.0 1.0 1.5row 3: 3.0 1.5 1.5row 4: 3.0 2.0 1.5row 5: 3.0 2.5 1.5row 6: 3.0 3.0 1.5

# No J-integrals0

B2 Cantilever Beam with End Load (beam.in) # ==========================================================# Linear Elastic Analysis# CANTILEVER BEAM WITH END LOAD# Thiti Vacharasintopchai# March 14, 2000# ==========================================================

# The Vector of Problem ParametersVector: dim: 9# pcode axcode ndim nsteps lfactor print_flag post_flag mtype log_flag

4 1 3 1 10.0 0 0 0 0

Page 99: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

90

# The Vector of EFG ParametersVector: dim: 5# ngauss order weight_type search_type param

6 2 2 1 4.8

# The Vector of Material Parameters1Matrix: 1 by 5# YM PR SIGY K RHOrow 0: 21e7 0.3 1e100 0.0 1.0

# Point One Vector## axcode = 1# x y zVector: dim: 3

0.0 0.0 0.0

# Point Two Vector## axcode = 1# x y zVector: dim: 3

12.0 1.0 1.0

# Number of Cells VectorVector: dim: 310 4 4

# Number of Nodes VectorVector: dim: 311 5 5

# Number of Post-processing Nodes VectorVector: dim: 311 5 5

# Read File Flag (Do not need the nodal coordinate matrix to follow)0

# Read File Flag (Do not need the nodal coordinate matrix to follow)0

# Read File Flag (Do not need the nodal coordinate matrix to follow)0

# Read File Flag (Do not need the nodal coordinate matrix to follow)0

# Read File Flag (Do not need the nodal coordinate matrix to follow)0

# The Applied Point Load Data0Matrix: 0 by 3

# The Applied Distributed Load Data25Matrix: 25 by 5row 0: 12.0 0.00 1.00 1.0 2row 1: 12.0 0.00 0.75 1.0 2row 2: 12.0 0.00 0.50 1.0 2row 3: 12.0 0.00 0.25 1.0 2

Page 100: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

91

row 4: 12.0 0.00 0.00 1.0 2row 5: 12.0 0.25 1.00 1.0 2row 6: 12.0 0.25 0.75 1.0 2row 7: 12.0 0.25 0.50 1.0 2row 8: 12.0 0.25 0.25 1.0 2row 9: 12.0 0.25 0.00 1.0 2row 10: 12.0 0.50 1.00 1.0 2row 11: 12.0 0.50 0.75 1.0 2row 12: 12.0 0.50 0.50 1.0 2row 13: 12.0 0.50 0.25 1.0 2row 14: 12.0 0.50 0.00 1.0 2row 15: 12.0 0.75 1.00 1.0 2row 16: 12.0 0.75 0.75 1.0 2row 17: 12.0 0.75 0.50 1.0 2row 18: 12.0 0.75 0.25 1.0 2row 19: 12.0 0.75 0.00 1.0 2row 20: 12.0 1.00 1.00 1.0 2row 21: 12.0 1.00 0.75 1.0 2row 22: 12.0 1.00 0.50 1.0 2row 23: 12.0 1.00 0.25 1.0 2row 24: 12.0 1.00 0.00 1.0 2

16Matrix: 16 by 4row 0: 0 1 6 5row 1: 1 2 7 6row 2: 2 3 8 7row 3: 3 4 9 8row 4: 5 6 11 10row 5: 6 7 12 11row 6: 7 8 13 12row 7: 8 9 14 13row 8: 10 11 16 15row 9: 11 12 17 16row 10: 12 13 18 17row 11: 13 14 19 18row 12: 15 16 21 20row 13: 16 17 22 21row 14: 17 18 23 22row 15: 18 19 24 23

# The displacement boundary condition data0Matrix: 0 by 3

# The fixed plane data3Matrix: 3 by 8row 0: 1 0.0 0.0 0.0 0.0 1.0 0.0 0.0row 1: 2 0.0 0.0 0.0 0.0 1.0 0.0 0.0row 2: 3 0.0 0.0 0.0 0.0 1.0 0.0 0.0

# The locations where displacements are desired11Matrix: 11 by 3row 0: 0.0 0.5 0.5row 1: 1.2 0.5 0.5row 2: 2.4 0.5 0.5row 3: 3.6 0.5 0.5row 4: 4.8 0.5 0.5row 5: 6.0 0.5 0.5row 6: 7.2 0.5 0.5row 7: 8.4 0.5 0.5

Page 101: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

92

row 8: 9.6 0.5 0.5row 9: 10.8 0.5 0.5row 10: 12.0 0.5 0.5

# The locations where stresses are desired11Matrix: 11 by 3row 0: 6 0.0 0.5row 1: 6 0.1 0.5row 2: 6 0.2 0.5row 3: 6 0.3 0.5row 4: 6 0.4 0.5row 5: 6 0.5 0.5row 6: 6 0.6 0.5row 7: 6 0.7 0.5row 8: 6 0.8 0.5row 9: 6 0.9 0.5row 10: 6 1.0 0.5

# No J-integrals0

B3 Pure Bending of Thick Circular Arch (arch.in) # ==========================================================# Linear Elastic Analysis# PURE BENDING OF THICK CIRCULAR ARCH# Thiti Vacharasintopchai# March 14, 2000# ==========================================================

# The Vector of Problem ParametersVector: dim: 9# pcode axcode ndim nsteps lfactor pflag post_flag mtype log_flag

4 2 3 1 1.0 0 0 0 0

# The Vector of EFG ParametersVector: dim: 5# ngauss order weight_type search_type param

5 2 2 1 1.33

# The Vector of Material Parameters1Matrix: 1 by 5# Young's_Modulus Poisson's_Ratio Yield_Stress Hardening_Param Densityrow 0: 21e7 0.3 1e64 0.0 1.0

# Point One Vector## axcode = 2# theta r zVector: dim: 3

90.0 1.0 0.0

# Point Two Vector## axcode = 2# theta r zVector: dim: 3

0.0 2.0 1.0

Page 102: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

93

# Number of Cells VectorVector: dim: 312 4 4

# Number of Nodes VectorVector: dim: 313 5 5

# Number of Post-processing Nodes VectorVector: dim: 312 4 4

# Read File Flag NODES (Do not need the nodal coordinate matrix to follow)0

# Read File Flag CNODES (Do not need the nodal coordinate matrix to follow)0

# Read File Flag CCON (Do not need the nodal coordinate matrix to follow)0

# Read File Flag PNODES (Do not need the nodal coordinate matrix to follow)0

# Read File Flag PCON (Do not need the nodal coordinate matrix to follow)0

# The Applied Point Load Data25Matrix: 25 by 3row 0: 2 12 -27.4176368876row 1: 2 25 -26.3995317401row 2: 2 38 5.3453488068row 3: 2 51 28.1784451569row 4: 2 64 20.2933746640row 5: 2 77 -54.8352737751row 6: 2 90 -52.7990634801row 7: 2 103 10.6906976136row 8: 2 116 56.3568903137row 9: 2 129 40.5867493279row 10: 2 142 -54.8352737751row 11: 2 155 -52.7990634801row 12: 2 168 10.6906976136row 13: 2 181 56.3568903137row 14: 2 194 40.5867493279row 15: 2 207 -54.8352737751row 16: 2 220 -52.7990634801row 17: 2 233 10.6906976136row 18: 2 246 56.3568903137row 19: 2 259 40.5867493279row 20: 2 272 -27.4176368876row 21: 2 285 -26.3995317401row 22: 2 298 5.3453488068row 23: 2 311 28.1784451569row 24: 2 324 20.2933746640

# The Applied Distributed Load Data0Matrix: 0 by 5

0Matrix: 0 by 4

Page 103: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

94

# The displacement boundary condition data6Matrix: 6 by 3# dir node valrow 0: 2 25 0.0row 1: 2 90 0.0row 2: 2 155 0.0row 3: 2 220 0.0row 4: 2 285 0.0row 5: 3 155 0.0

# The fixed plane data1Matrix: 1 by 8

# dir val x0 y0 z0 nx ny nzrow 0: 1 0.0 0.0 0.0 0.0 1.0 0.0 0.0

# The locations where displacements are desired11Matrix: 11 by 3row 0: 0.00000 1.50000 0.50000row 1: 0.23465 1.48153 0.50000row 2: 0.46353 1.42658 0.50000row 3: 0.68099 1.33651 0.50000row 4: 0.88168 1.21353 0.50000row 5: 1.06066 1.06066 0.50000row 6: 1.21353 0.88168 0.50000row 7: 1.33651 0.68099 0.50000row 8: 1.42658 0.46353 0.50000row 9: 1.48153 0.23465 0.50000row 10: 1.50000 0.00000 0.50000

# The locations where stresses are desired11Matrix: 11 by 3row 0: 0.70711 0.70711 0.50000row 1: 0.77782 0.77782 0.50000row 2: 0.84853 0.84853 0.50000row 3: 0.91924 0.91924 0.50000row 4: 0.98995 0.98995 0.50000row 5: 1.06066 1.06066 0.50000row 6: 1.13137 1.13137 0.50000row 7: 1.20208 1.20208 0.50000row 8: 1.27279 1.27279 0.50000row 9: 1.34350 1.34350 0.50000row 10: 1.41421 1.41421 0.50000

Page 104: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

95

B4 Extension of a Strip with One Circular Hole (strip.in) # ==========================================================# Linear Elastic Analysis# EXTENSION OF A STRIP WITH CIRCULAR HOLE# Thiti Vacharasintopchai# March 14, 2000# ==========================================================

# The Vector of Problem ParametersVector: dim: 9# pcode axcode ndim nsteps lfactor print_flag post_flag mtype log_flag

4 1 3 1 10.0 0 0 0 0

# The Vector of EFG ParametersVector: dim: 5# ngauss order weight_type search_type param

4 2 2 1 1.0

# The Matrix of Material Parameters1Matrix: 1 by 5# Young's_Modulus Poisson's_Ratio Yield_Stress Hardening_Param Densityrow 0: 21.0E+07 0.30 1.00E+12 0.00 1.0

# Dummy Data for Automatic Mesh Generation# ----------------------------------------

# Point One Vector## axcode = 1# x y zVector: dim: 3

0.00 0.00 -1.00

# Point Two Vector## axcode = 1# x y zVector: dim: 3

4.00 1.00 0.00

# Number of Cells VectorVector: dim: 3

8 2 2

# Number of Nodes VectorVector: dim: 3

9 3 3

# Number of Post-processing Nodes VectorVector: dim: 3

9 3 3

# Manual Mesh Generation# ----------------------

Page 105: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

96

# 'Nodal Definitions'# The Nodal Coordinates950Matrix: 950 by 3row 0: 176.776695300E-03 176.776695300E-03 000.000000000E+00row 1: 000.000000000E+00 250.000000000E-03 000.000000000E+00row 2: 160.696902419E-03 191.511110775E-03 000.000000000E+00row 3: 143.394109090E-03 204.788011055E-03 000.000000000E+00row 4: 125.000000005E-03 216.506350921E-03 000.000000000E+00row 5: 105.654565444E-03 226.576946731E-03 000.000000000E+00row 6: 85.505035842E-03 234.923155169E-03 000.000000000E+00row 7: 64.704761305E-03 241.481456543E-03 000.000000000E+00row 8: 43.412044459E-03 246.201938229E-03 000.000000000E+00row 9: 21.788935748E-03 249.048674508E-03 000.000000000E+00row 10: 000.000000000E+00 1.000000000E+00 000.000000000E+00row 11: 000.000000000E+00 916.666666667E-03 000.000000000E+00row 12: 000.000000000E+00 833.333333333E-03 000.000000000E+00row 13: 000.000000000E+00 750.000000000E-03 000.000000000E+00row 14: 000.000000000E+00 666.666666667E-03 000.000000000E+00row 15: 000.000000000E+00 583.333333333E-03 000.000000000E+00

. . . .

. . . .

. . . .row 936: 1.884172839E+00 484.337785200E-03 -500.000000000E-03row 937: 1.884172839E+00 484.337785200E-03 -750.000000000E-03row 938: 2.307338271E+00 565.248005924E-03 -250.000000000E-03row 939: 2.307338271E+00 565.248005924E-03 -500.000000000E-03row 940: 2.307338271E+00 565.248005924E-03 -750.000000000E-03row 941: 2.730503704E+00 646.158226655E-03 -250.000000000E-03row 942: 2.730503704E+00 646.158226655E-03 -500.000000000E-03row 943: 2.730503704E+00 646.158226655E-03 -750.000000000E-03row 944: 3.153669136E+00 727.068447393E-03 -250.000000000E-03row 945: 3.153669136E+00 727.068447393E-03 -500.000000000E-03row 946: 3.153669136E+00 727.068447393E-03 -750.000000000E-03row 947: 3.576834568E+00 807.978668137E-03 -250.000000000E-03row 948: 3.576834568E+00 807.978668137E-03 -500.000000000E-03row 949: 3.576834568E+00 807.978668137E-03 -750.000000000E-03

# 'Integration Cell Definitions'# The Integration Cell Nodal Coordinates950Matrix: 950 by 3row 0: 176.776695300E-03 176.776695300E-03 000.000000000E+00row 1: 000.000000000E+00 250.000000000E-03 000.000000000E+00row 2: 160.696902419E-03 191.511110775E-03 000.000000000E+00row 3: 143.394109090E-03 204.788011055E-03 000.000000000E+00row 4: 125.000000005E-03 216.506350921E-03 000.000000000E+00row 5: 105.654565444E-03 226.576946731E-03 000.000000000E+00row 6: 85.505035842E-03 234.923155169E-03 000.000000000E+00row 7: 64.704761305E-03 241.481456543E-03 000.000000000E+00row 8: 43.412044459E-03 246.201938229E-03 000.000000000E+00row 9: 21.788935748E-03 249.048674508E-03 000.000000000E+00row 10: 000.000000000E+00 1.000000000E+00 000.000000000E+00row 11: 000.000000000E+00 916.666666667E-03 000.000000000E+00row 12: 000.000000000E+00 833.333333333E-03 000.000000000E+00row 13: 000.000000000E+00 750.000000000E-03 000.000000000E+00row 14: 000.000000000E+00 666.666666667E-03 000.000000000E+00row 15: 000.000000000E+00 583.333333333E-03 000.000000000E+00

. . . .

. . . .

. . . .row 931: 1.037841975E+00 322.517343772E-03 -750.000000000E-03row 932: 1.461007407E+00 403.427564482E-03 -250.000000000E-03

Page 106: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

97

row 933: 1.461007407E+00 403.427564482E-03 -500.000000000E-03row 934: 1.461007407E+00 403.427564482E-03 -750.000000000E-03row 935: 1.884172839E+00 484.337785200E-03 -250.000000000E-03row 936: 1.884172839E+00 484.337785200E-03 -500.000000000E-03row 937: 1.884172839E+00 484.337785200E-03 -750.000000000E-03row 938: 2.307338271E+00 565.248005924E-03 -250.000000000E-03row 939: 2.307338271E+00 565.248005924E-03 -500.000000000E-03row 940: 2.307338271E+00 565.248005924E-03 -750.000000000E-03row 941: 2.730503704E+00 646.158226655E-03 -250.000000000E-03row 942: 2.730503704E+00 646.158226655E-03 -500.000000000E-03row 943: 2.730503704E+00 646.158226655E-03 -750.000000000E-03row 944: 3.153669136E+00 727.068447393E-03 -250.000000000E-03row 945: 3.153669136E+00 727.068447393E-03 -500.000000000E-03row 946: 3.153669136E+00 727.068447393E-03 -750.000000000E-03row 947: 3.576834568E+00 807.978668137E-03 -250.000000000E-03row 948: 3.576834568E+00 807.978668137E-03 -500.000000000E-03row 949: 3.576834568E+00 807.978668137E-03 -750.000000000E-03

# The Integration Cell Connectivities648Matrix: 648 by 8row 0: 0 2 36 35 203 206 308 305row 1: 2 3 44 36 206 209 332 308row 2: 3 4 52 44 209 212 356 332row 3: 4 5 60 52 212 215 380 356row 4: 5 6 68 60 215 218 404 380row 5: 6 7 76 68 218 221 428 404row 6: 7 8 84 76 221 224 452 428row 7: 8 9 92 84 224 227 476 452row 8: 9 1 18 92 227 200 233 476row 9: 35 36 37 34 305 308 311 302row 10: 36 44 45 37 308 332 335 311row 11: 44 52 53 45 332 356 359 335row 12: 52 60 61 53 356 380 383 359row 13: 60 68 69 61 380 404 407 383row 14: 68 76 77 69 404 428 431 407row 15: 76 84 85 77 428 452 455 431row 16: 84 92 93 85 452 476 479 455row 17: 92 18 17 93 476 233 236 479row 18: 34 37 38 33 302 311 314 299row 19: 37 45 46 38 311 335 338 314row 20: 45 53 54 46 335 359 362 338

. . . .

. . . .

. . . .row 614: 796 820 823 799 628 636 637 629row 615: 820 844 847 823 636 644 645 637row 616: 844 868 871 847 644 652 653 645row 617: 868 892 895 871 652 660 661 653row 618: 892 916 919 895 660 668 669 661row 619: 916 940 943 919 668 676 677 669row 620: 940 295 292 943 676 131 130 677row 621: 742 775 778 739 610 621 622 609row 622: 775 799 802 778 621 629 630 622row 623: 799 823 826 802 629 637 638 630row 624: 823 847 850 826 637 645 646 638row 625: 847 871 874 850 645 653 654 646row 626: 871 895 898 874 653 661 662 654row 627: 895 919 922 898 661 669 670 662row 628: 919 943 946 922 669 677 678 670row 629: 943 292 289 946 677 130 129 678row 630: 739 778 781 736 609 622 623 608row 631: 778 802 805 781 622 630 631 623

Page 107: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

98

row 632: 802 826 829 805 630 638 639 631row 633: 826 850 853 829 638 646 647 639row 634: 850 874 877 853 646 654 655 647row 635: 874 898 901 877 654 662 663 655row 636: 898 922 925 901 662 670 671 663row 637: 922 946 949 925 670 678 679 671row 638: 946 289 286 949 678 129 128 679row 639: 736 781 733 709 608 623 607 599row 640: 781 805 730 733 623 631 606 607row 641: 805 829 727 730 631 639 605 606row 642: 829 853 724 727 639 647 604 605row 643: 853 877 721 724 647 655 603 604row 644: 877 901 718 721 655 663 602 603row 645: 901 925 715 718 663 671 601 602row 646: 925 949 712 715 671 679 600 601row 647: 949 286 259 712 679 128 119 600

# 'Post-processing Grid Definitions'# The Post-processing Grid Nodal Coordinates950Matrix: 950 by 3row 0: 176.776695300E-03 176.776695300E-03 000.000000000E+00row 1: 000.000000000E+00 250.000000000E-03 000.000000000E+00row 2: 160.696902419E-03 191.511110775E-03 000.000000000E+00row 3: 143.394109090E-03 204.788011055E-03 000.000000000E+00row 4: 125.000000005E-03 216.506350921E-03 000.000000000E+00row 5: 105.654565444E-03 226.576946731E-03 000.000000000E+00row 6: 85.505035842E-03 234.923155169E-03 000.000000000E+00row 7: 64.704761305E-03 241.481456543E-03 000.000000000E+00row 8: 43.412044459E-03 246.201938229E-03 000.000000000E+00row 9: 21.788935748E-03 249.048674508E-03 000.000000000E+00row 10: 000.000000000E+00 1.000000000E+00 000.000000000E+00row 11: 000.000000000E+00 916.666666667E-03 000.000000000E+00row 12: 000.000000000E+00 833.333333333E-03 000.000000000E+00row 13: 000.000000000E+00 750.000000000E-03 000.000000000E+00row 14: 000.000000000E+00 666.666666667E-03 000.000000000E+00row 15: 000.000000000E+00 583.333333333E-03 000.000000000E+00row 16: 000.000000000E+00 500.000000000E-03 000.000000000E+00row 17: 000.000000000E+00 416.666666667E-03 000.000000000E+00row 18: 000.000000000E+00 333.333333333E-03 000.000000000E+00row 19: 4.000000000E+00 1.000000000E+00 000.000000000E+00row 20: 3.555555556E+00 1.000000000E+00 000.000000000E+00row 21: 3.111111111E+00 1.000000000E+00 000.000000000E+00row 22: 2.666666667E+00 1.000000000E+00 000.000000000E+00row 23: 2.222222222E+00 1.000000000E+00 000.000000000E+00row 24: 1.777777778E+00 1.000000000E+00 000.000000000E+00row 25: 1.333333333E+00 1.000000000E+00 000.000000000E+00

. . . .

. . . .

. . . .row 926: 614.676542943E-03 241.607123068E-03 -250.000000000E-03row 927: 614.676542943E-03 241.607123068E-03 -500.000000000E-03row 928: 614.676542943E-03 241.607123068E-03 -750.000000000E-03row 929: 1.037841975E+00 322.517343772E-03 -250.000000000E-03row 930: 1.037841975E+00 322.517343772E-03 -500.000000000E-03row 931: 1.037841975E+00 322.517343772E-03 -750.000000000E-03row 932: 1.461007407E+00 403.427564482E-03 -250.000000000E-03row 933: 1.461007407E+00 403.427564482E-03 -500.000000000E-03row 934: 1.461007407E+00 403.427564482E-03 -750.000000000E-03row 935: 1.884172839E+00 484.337785200E-03 -250.000000000E-03row 936: 1.884172839E+00 484.337785200E-03 -500.000000000E-03row 937: 1.884172839E+00 484.337785200E-03 -750.000000000E-03

Page 108: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

99

row 938: 2.307338271E+00 565.248005924E-03 -250.000000000E-03row 939: 2.307338271E+00 565.248005924E-03 -500.000000000E-03row 940: 2.307338271E+00 565.248005924E-03 -750.000000000E-03row 941: 2.730503704E+00 646.158226655E-03 -250.000000000E-03row 942: 2.730503704E+00 646.158226655E-03 -500.000000000E-03row 943: 2.730503704E+00 646.158226655E-03 -750.000000000E-03row 944: 3.153669136E+00 727.068447393E-03 -250.000000000E-03row 945: 3.153669136E+00 727.068447393E-03 -500.000000000E-03row 946: 3.153669136E+00 727.068447393E-03 -750.000000000E-03row 947: 3.576834568E+00 807.978668137E-03 -250.000000000E-03row 948: 3.576834568E+00 807.978668137E-03 -500.000000000E-03row 949: 3.576834568E+00 807.978668137E-03 -750.000000000E-03

# The Post-processing Grid Connectivities

648Matrix: 648 by 8row 0: 0 2 36 35 203 206 308 305row 1: 2 3 44 36 206 209 332 308row 2: 3 4 52 44 209 212 356 332row 3: 4 5 60 52 212 215 380 356row 4: 5 6 68 60 215 218 404 380row 5: 6 7 76 68 218 221 428 404row 6: 7 8 84 76 221 224 452 428row 7: 8 9 92 84 224 227 476 452row 8: 9 1 18 92 227 200 233 476row 9: 35 36 37 34 305 308 311 302row 10: 36 44 45 37 308 332 335 311row 11: 44 52 53 45 332 356 359 335row 12: 52 60 61 53 356 380 383 359row 13: 60 68 69 61 380 404 407 383row 14: 68 76 77 69 404 428 431 407row 15: 76 84 85 77 428 452 455 431row 16: 84 92 93 85 452 476 479 455row 17: 92 18 17 93 476 233 236 479row 18: 34 37 38 33 302 311 314 299row 19: 37 45 46 38 311 335 338 314row 20: 45 53 54 46 335 359 362 338row 21: 53 61 62 54 359 383 386 362row 22: 61 69 70 62 383 407 410 386row 23: 69 77 78 70 407 431 434 410row 24: 77 85 86 78 431 455 458 434row 25: 85 93 94 86 455 479 482 458

. . . .

. . . .

. . . .row 626: 871 895 898 874 653 661 662 654row 627: 895 919 922 898 661 669 670 662row 628: 919 943 946 922 669 677 678 670row 629: 943 292 289 946 677 130 129 678row 630: 739 778 781 736 609 622 623 608row 631: 778 802 805 781 622 630 631 623row 632: 802 826 829 805 630 638 639 631row 633: 826 850 853 829 638 646 647 639row 634: 850 874 877 853 646 654 655 647row 635: 874 898 901 877 654 662 663 655row 636: 898 922 925 901 662 670 671 663row 637: 922 946 949 925 670 678 679 671row 638: 946 289 286 949 678 129 128 679row 639: 736 781 733 709 608 623 607 599row 640: 781 805 730 733 623 631 606 607row 641: 805 829 727 730 631 639 605 606row 642: 829 853 724 727 639 647 604 605

Page 109: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

100

row 643: 853 877 721 724 647 655 603 604row 644: 877 901 718 721 655 663 602 603row 645: 901 925 715 718 663 671 601 602row 646: 925 949 712 715 671 679 600 601row 647: 949 286 259 712 679 128 119 600

# Loading Conditions# ------------------

# No Applied Point Load Data0Matrix: 0 by 3

# The Distributed Load Nodal Data25Matrix: 25 by 5row 0: 4.00 0.00 -1.00 +1.00 1row 1: 4.00 0.25 -1.00 +1.00 1row 2: 4.00 0.50 -1.00 +1.00 1row 3: 4.00 0.75 -1.00 +1.00 1row 4: 4.00 1.00 -1.00 +1.00 1row 5: 4.00 0.00 -0.75 +1.00 1row 6: 4.00 0.25 -0.75 +1.00 1row 7: 4.00 0.50 -0.75 +1.00 1row 8: 4.00 0.75 -0.75 +1.00 1row 9: 4.00 1.00 -0.75 +1.00 1row 10: 4.00 0.00 -0.50 +1.00 1row 11: 4.00 0.25 -0.50 +1.00 1row 12: 4.00 0.50 -0.50 +1.00 1row 13: 4.00 0.75 -0.50 +1.00 1row 14: 4.00 1.00 -0.50 +1.00 1row 15: 4.00 0.00 -0.25 +1.00 1row 16: 4.00 0.25 -0.25 +1.00 1row 17: 4.00 0.50 -0.25 +1.00 1row 18: 4.00 0.75 -0.25 +1.00 1row 19: 4.00 1.00 -0.25 +1.00 1row 20: 4.00 0.00 0.00 +1.00 1row 21: 4.00 0.25 0.00 +1.00 1row 22: 4.00 0.50 0.00 +1.00 1row 23: 4.00 0.75 0.00 +1.00 1row 24: 4.00 1.00 0.00 +1.00 1

# The Distributed Load Connectivities16Matrix: 16 by 4row 0: 0 1 6 5row 1: 1 2 7 6row 2: 2 3 8 7row 3: 3 4 9 8row 4: 5 6 11 10row 5: 6 7 12 11row 6: 7 8 13 12row 7: 8 9 14 13row 8: 10 11 16 15row 9: 11 12 17 16row 10: 12 13 18 17row 11: 13 14 19 18row 12: 15 16 21 20row 13: 16 17 22 21row 14: 17 18 23 22row 15: 18 19 24 23

Page 110: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

101

# Boundary Conditions# -------------------

# No Nodal Displacement Boundary Condition Data# dir node val0Matrix: 0 by 3

# The Fixed Plane Data3Matrix: 3 by 8row 0: 1 0.00 0.00 1.00 0.00 -1.00 0.00 0.00row 1: 2 0.00 4.00 0.00 0.00 0.00 -1.00 0.00row 2: 3 0.00 2.00 0.50 0.00 0.00 0.00 1.00

# Post-processing Parameters# --------------------------

# The Desired Displacement Locations100Matrix: 100 by 3row 0: 0.0000 0.2500 -0.5000row 1: 0.0000 0.2653 -0.5000row 2: 0.0000 0.2806 -0.5000row 3: 0.0000 0.2959 -0.5000row 4: 0.0000 0.3112 -0.5000row 5: 0.0000 0.3265 -0.5000row 6: 0.0000 0.3418 -0.5000row 7: 0.0000 0.3571 -0.5000row 8: 0.0000 0.3724 -0.5000row 9: 0.0000 0.3878 -0.5000row 10: 0.0000 0.4031 -0.5000row 11: 0.0000 0.4184 -0.5000row 12: 0.0000 0.4337 -0.5000row 13: 0.0000 0.4490 -0.5000row 14: 0.0000 0.4643 -0.5000row 15: 0.0000 0.4796 -0.5000row 16: 0.0000 0.4949 -0.5000row 17: 0.0000 0.5102 -0.5000row 18: 0.0000 0.5255 -0.5000row 19: 0.0000 0.5408 -0.5000row 20: 0.0000 0.5561 -0.5000row 21: 0.0000 0.5714 -0.5000row 22: 0.0000 0.5867 -0.5000row 23: 0.0000 0.6020 -0.5000row 24: 0.0000 0.6173 -0.5000row 25: 0.0000 0.6327 -0.5000

. . . .

. . . .

. . . .row 76: 2.2398 0.0000 -0.5000row 77: 2.3163 0.0000 -0.5000row 78: 2.3929 0.0000 -0.5000row 79: 2.4694 0.0000 -0.5000row 80: 2.5459 0.0000 -0.5000row 81: 2.6224 0.0000 -0.5000row 82: 2.6990 0.0000 -0.5000row 83: 2.7755 0.0000 -0.5000row 84: 2.8520 0.0000 -0.5000row 85: 2.9286 0.0000 -0.5000row 86: 3.0051 0.0000 -0.5000

Page 111: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

102

row 87: 3.0816 0.0000 -0.5000row 88: 3.1582 0.0000 -0.5000row 89: 3.2347 0.0000 -0.5000row 90: 3.3112 0.0000 -0.5000row 91: 3.3878 0.0000 -0.5000row 92: 3.4643 0.0000 -0.5000row 93: 3.5408 0.0000 -0.5000row 94: 3.6173 0.0000 -0.5000row 95: 3.6939 0.0000 -0.5000row 96: 3.7704 0.0000 -0.5000row 97: 3.8469 0.0000 -0.5000row 98: 3.9235 0.0000 -0.5000row 99: 4.0000 0.0000 -0.5000

# The Desired Stress Locations100Matrix: 100 by 3row 0: 0.0000 0.2500 -0.5000row 1: 0.0000 0.2653 -0.5000row 2: 0.0000 0.2806 -0.5000row 3: 0.0000 0.2959 -0.5000row 4: 0.0000 0.3112 -0.5000row 5: 0.0000 0.3265 -0.5000row 6: 0.0000 0.3418 -0.5000row 7: 0.0000 0.3571 -0.5000row 8: 0.0000 0.3724 -0.5000row 9: 0.0000 0.3878 -0.5000row 10: 0.0000 0.4031 -0.5000row 11: 0.0000 0.4184 -0.5000row 12: 0.0000 0.4337 -0.5000row 13: 0.0000 0.4490 -0.5000row 14: 0.0000 0.4643 -0.5000row 15: 0.0000 0.4796 -0.5000row 16: 0.0000 0.4949 -0.5000row 17: 0.0000 0.5102 -0.5000row 18: 0.0000 0.5255 -0.5000row 19: 0.0000 0.5408 -0.5000row 20: 0.0000 0.5561 -0.5000row 21: 0.0000 0.5714 -0.5000row 22: 0.0000 0.5867 -0.5000row 23: 0.0000 0.6020 -0.5000row 24: 0.0000 0.6173 -0.5000row 25: 0.0000 0.6327 -0.5000

. . . .

. . . .

. . . .row 76: 2.2398 0.0000 -0.5000row 77: 2.3163 0.0000 -0.5000row 78: 2.3929 0.0000 -0.5000row 79: 2.4694 0.0000 -0.5000row 80: 2.5459 0.0000 -0.5000row 81: 2.6224 0.0000 -0.5000row 82: 2.6990 0.0000 -0.5000row 83: 2.7755 0.0000 -0.5000row 84: 2.8520 0.0000 -0.5000row 85: 2.9286 0.0000 -0.5000row 86: 3.0051 0.0000 -0.5000row 87: 3.0816 0.0000 -0.5000row 88: 3.1582 0.0000 -0.5000row 89: 3.2347 0.0000 -0.5000row 90: 3.3112 0.0000 -0.5000row 91: 3.3878 0.0000 -0.5000row 92: 3.4643 0.0000 -0.5000

Page 112: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

103

row 93: 3.5408 0.0000 -0.5000row 94: 3.6173 0.0000 -0.5000row 95: 3.6939 0.0000 -0.5000row 96: 3.7704 0.0000 -0.5000row 97: 3.8469 0.0000 -0.5000row 98: 3.9235 0.0000 -0.5000row 99: 4.0000 0.0000 -0.5000

# Fracture Mechanic Parameters# ----------------------------

# No J-integral0

Page 113: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

104

APPENDIX C

SAMPLE OUTPUT FILE

A representative output file from the numerical example in Section 5.4, together with the comments in the bold italic typeface, is presented in this appendix. A ParEFG output file is comprised of three parts. The first part contains a ParEFG interpretation of the input data (see the input file in Appendix B4). The second part contains analysis results, namely, the solved nodal displacements and the interpolated desired displacements and stresses. And, the third part contains analysis logs, namely, the memory usage log and the analysis time log.

C1 ParEFG Interpretation of the input data

The Vector of Problem Parameters

** pvec **Vector: dim: 9

4 1 3 1 100 0 0 0

The Vector of EFGM Parameters ** evec **Vector: dim: 5

4 2 2 1 1

The Vector of Material Parameters ** mvec **Matrix: 1 by 5row 0: 210000000 0.3 1e+12 0

1

Point One Vector ** p1 **Vector: dim: 3

0 0 -1

Point Two Vector ** p2 **Vector: dim: 3

4 1 0

Page 114: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

105

Number of Cells Vector ** nc **Vector: dim: 3

8 2 2

Number of Nodes Vector ** nn **Vector: dim: 3

9 3 3

Number of Post-processing Nodes Vector ** np **Vector: dim: 3

9 3 3

Integration Cell Nodal Definitions ** CNODES **Matrix: 950 by 3row 0: 0.176776695 0.176776695 0row 1: 0 0.25 0row 2: 0.160696902 0.191511111 0row 3: 0.143394109 0.204788011 0row 4: 0.125 0.216506351 0row 5: 0.105654565 0.226576947 0row 6: 0.0855050358 0.234923155 0row 7: 0.0647047613 0.241481457 0row 8: 0.0434120445 0.246201938 0row 9: 0.0217889357 0.249048675 0row 10: 0 1 0row 11: 0 0.916666667 0row 12: 0 0.833333333 0row 13: 0 0.75 0row 14: 0 0.666666667 0row 15: 0 0.583333333 0row 16: 0 0.5 0row 17: 0 0.416666667 0row 18: 0 0.333333333 0row 19: 4 1 0row 20: 3.55555556 1 0row 21: 3.11111111 1 0row 22: 2.66666667 1 0row 23: 2.22222222 1 0row 24: 1.77777778 1 0row 25: 1.33333333 1 0. . . .. . . .. . . .

row 926: 0.614676543 0.241607123 -0.25row 927: 0.614676543 0.241607123 -0.5row 928: 0.614676543 0.241607123 -0.75row 929: 1.03784198 0.322517344 -0.25row 930: 1.03784198 0.322517344 -0.5row 931: 1.03784198 0.322517344 -0.75row 932: 1.46100741 0.403427564 -0.25

Page 115: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

106

row 933: 1.46100741 0.403427564 -0.5row 934: 1.46100741 0.403427564 -0.75row 935: 1.88417284 0.484337785 -0.25row 936: 1.88417284 0.484337785 -0.5row 937: 1.88417284 0.484337785 -0.75row 938: 2.30733827 0.565248006 -0.25row 939: 2.30733827 0.565248006 -0.5row 940: 2.30733827 0.565248006 -0.75row 941: 2.7305037 0.646158227 -0.25row 942: 2.7305037 0.646158227 -0.5row 943: 2.7305037 0.646158227 -0.75row 944: 3.15366914 0.727068447 -0.25row 945: 3.15366914 0.727068447 -0.5row 946: 3.15366914 0.727068447 -0.75row 947: 3.57683457 0.807978668 -0.25row 948: 3.57683457 0.807978668 -0.5row 949: 3.57683457 0.807978668 -0.75

Integration Cell Connectivities ** CCON **Matrix: 648 by 8row 0: 0 2 36 35

203 206 308 305row 1: 2 3 44 36

206 209 332 308row 2: 3 4 52 44

209 212 356 332row 3: 4 5 60 52

212 215 380 356row 4: 5 6 68 60

215 218 404 380row 5: 6 7 76 68

218 221 428 404row 6: 7 8 84 76

221 224 452 428row 7: 8 9 92 84

224 227 476 452row 8: 9 1 18 92

227 200 233 476row 9: 35 36 37 34

305 308 311 302row 10: 36 44 45 37

308 332 335 311row 11: 44 52 53 45

332 356 359 335row 12: 52 60 61 53

356 380 383 359row 13: 60 68 69 61

380 404 407 383row 14: 68 76 77 69

404 428 431 407row 15: 76 84 85 77

428 452 455 431row 16: 84 92 93 85

452 476 479 455row 17: 92 18 17 93

476 233 236 479row 18: 34 37 38 33

302 311 314 299

Page 116: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

107

row 19: 37 45 46 38311 335 338 314

row 20: 45 53 54 46335 359 362 338

row 21: 53 61 62 54359 383 386 362

row 22: 61 69 70 62383 407 410 386

row 23: 69 77 78 70407 431 434 410

row 24: 77 85 86 78431 455 458 434

row 25: 85 93 94 86455 479 482 458

. . . . .

. . . . .

. . . . .row 626: 871 895 898 874

653 661 662 654row 627: 895 919 922 898

661 669 670 662row 628: 919 943 946 922

669 677 678 670row 629: 943 292 289 946

677 130 129 678row 630: 739 778 781 736

609 622 623 608row 631: 778 802 805 781

622 630 631 623row 632: 802 826 829 805

630 638 639 631row 633: 826 850 853 829

638 646 647 639row 634: 850 874 877 853

646 654 655 647row 635: 874 898 901 877

654 662 663 655row 636: 898 922 925 901

662 670 671 663row 637: 922 946 949 925

670 678 679 671row 638: 946 289 286 949

678 129 128 679row 639: 736 781 733 709

608 623 607 599row 640: 781 805 730 733

623 631 606 607row 641: 805 829 727 730

631 639 605 606row 642: 829 853 724 727

639 647 604 605row 643: 853 877 721 724

647 655 603 604row 644: 877 901 718 721

655 663 602 603row 645: 901 925 715 718

663 671 601 602row 646: 925 949 712 715

671 679 600 601row 647: 949 286 259 712

679 128 119 600

Page 117: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

108

Nodal Definitions ** NODES **Matrix: 950 by 3row 0: 0.176776695 0.176776695 0row 1: 0 0.25 0row 2: 0.160696902 0.191511111 0row 3: 0.143394109 0.204788011 0row 4: 0.125 0.216506351 0row 5: 0.105654565 0.226576947 0row 6: 0.0855050358 0.234923155 0row 7: 0.0647047613 0.241481457 0row 8: 0.0434120445 0.246201938 0row 9: 0.0217889357 0.249048675 0row 10: 0 1 0row 11: 0 0.916666667 0row 12: 0 0.833333333 0row 13: 0 0.75 0row 14: 0 0.666666667 0row 15: 0 0.583333333 0row 16: 0 0.5 0row 17: 0 0.416666667 0row 18: 0 0.333333333 0row 19: 4 1 0row 20: 3.55555556 1 0row 21: 3.11111111 1 0row 22: 2.66666667 1 0row 23: 2.22222222 1 0row 24: 1.77777778 1 0row 25: 1.33333333 1 0. . . .. . . .. . . .

row 926: 0.614676543 0.241607123 -0.25row 927: 0.614676543 0.241607123 -0.5row 928: 0.614676543 0.241607123 -0.75row 929: 1.03784198 0.322517344 -0.25row 930: 1.03784198 0.322517344 -0.5row 931: 1.03784198 0.322517344 -0.75row 932: 1.46100741 0.403427564 -0.25row 933: 1.46100741 0.403427564 -0.5row 934: 1.46100741 0.403427564 -0.75row 935: 1.88417284 0.484337785 -0.25row 936: 1.88417284 0.484337785 -0.5row 937: 1.88417284 0.484337785 -0.75row 938: 2.30733827 0.565248006 -0.25row 939: 2.30733827 0.565248006 -0.5row 940: 2.30733827 0.565248006 -0.75row 941: 2.7305037 0.646158227 -0.25row 942: 2.7305037 0.646158227 -0.5row 943: 2.7305037 0.646158227 -0.75row 944: 3.15366914 0.727068447 -0.25row 945: 3.15366914 0.727068447 -0.5row 946: 3.15366914 0.727068447 -0.75row 947: 3.57683457 0.807978668 -0.25row 948: 3.57683457 0.807978668 -0.5row 949: 3.57683457 0.807978668 -0.75

Page 118: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

109

Post-processing Nodal Definitions ** PNODES **Matrix: 950 by 3row 0: 0.176776695 0.176776695 0row 1: 0 0.25 0row 2: 0.160696902 0.191511111 0row 3: 0.143394109 0.204788011 0row 4: 0.125 0.216506351 0row 5: 0.105654565 0.226576947 0row 6: 0.0855050358 0.234923155 0row 7: 0.0647047613 0.241481457 0row 8: 0.0434120445 0.246201938 0row 9: 0.0217889357 0.249048675 0row 10: 0 1 0row 11: 0 0.916666667 0row 12: 0 0.833333333 0row 13: 0 0.75 0row 14: 0 0.666666667 0row 15: 0 0.583333333 0row 16: 0 0.5 0row 17: 0 0.416666667 0row 18: 0 0.333333333 0row 19: 4 1 0row 20: 3.55555556 1 0row 21: 3.11111111 1 0row 22: 2.66666667 1 0row 23: 2.22222222 1 0row 24: 1.77777778 1 0row 25: 1.33333333 1 0. . . .. . . .. . . .

row 926: 0.614676543 0.241607123 -0.25row 927: 0.614676543 0.241607123 -0.5row 928: 0.614676543 0.241607123 -0.75row 929: 1.03784198 0.322517344 -0.25row 930: 1.03784198 0.322517344 -0.5row 931: 1.03784198 0.322517344 -0.75row 932: 1.46100741 0.403427564 -0.25row 933: 1.46100741 0.403427564 -0.5row 934: 1.46100741 0.403427564 -0.75row 935: 1.88417284 0.484337785 -0.25row 936: 1.88417284 0.484337785 -0.5row 937: 1.88417284 0.484337785 -0.75row 938: 2.30733827 0.565248006 -0.25row 939: 2.30733827 0.565248006 -0.5row 940: 2.30733827 0.565248006 -0.75row 941: 2.7305037 0.646158227 -0.25row 942: 2.7305037 0.646158227 -0.5row 943: 2.7305037 0.646158227 -0.75row 944: 3.15366914 0.727068447 -0.25row 945: 3.15366914 0.727068447 -0.5row 946: 3.15366914 0.727068447 -0.75row 947: 3.57683457 0.807978668 -0.25row 948: 3.57683457 0.807978668 -0.5row 949: 3.57683457 0.807978668 -0.75

Page 119: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

110

Applied Nodal Load Definitions ** NLOAD **Matrix: 0 by 3

Applied Distributed Load Nodal Data ** FNODES **Matrix: 25 by 5row 0: 4 0 -1 1

1row 1: 4 0.25 -1 1

1row 2: 4 0.5 -1 1

1row 3: 4 0.75 -1 1

1row 4: 4 1 -1 1

1row 5: 4 0 -0.75 1

1row 6: 4 0.25 -0.75 1

1row 7: 4 0.5 -0.75 1

1row 8: 4 0.75 -0.75 1

1row 9: 4 1 -0.75 1

1row 10: 4 0 -0.5 1

1row 11: 4 0.25 -0.5 1

1row 12: 4 0.5 -0.5 1

1row 13: 4 0.75 -0.5 1

1row 14: 4 1 -0.5 1

1row 15: 4 0 -0.25 1

1row 16: 4 0.25 -0.25 1

1row 17: 4 0.5 -0.25 1

1row 18: 4 0.75 -0.25 1

1row 19: 4 1 -0.25 1

1row 20: 4 0 0 1

1row 21: 4 0.25 0 1

1row 22: 4 0.5 0 1

1row 23: 4 0.75 0 1

1row 24: 4 1 0 1

1

Page 120: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

111

Applied Distributed Load Nodal Connectivities ** FDATA **Matrix: 16 by 4row 0: 0 1 6 5row 1: 1 2 7 6row 2: 2 3 8 7row 3: 3 4 9 8row 4: 5 6 11 10row 5: 6 7 12 11row 6: 7 8 13 12row 7: 8 9 14 13row 8: 10 11 16 15row 9: 11 12 17 16row 10: 12 13 18 17row 11: 13 14 19 18row 12: 15 16 21 20row 13: 16 17 22 21row 14: 17 18 23 22row 15: 18 19 24 23

Specified Nodal Displacements ** NDISP **Matrix: 0 by 3

Specified Displacements on the Entire Plane ** FPDATA **Matrix: 3 by 8row 0: 1 0 0 1

0 -1 0 0row 1: 2 0 4 0

0 0 -1 0row 2: 3 0 2 0.5

0 0 0 1

Desired Displacement Locations ** DLOC **Matrix: 100 by 3row 0: 0 0.25 -0.5row 1: 0 0.2653 -0.5row 2: 0 0.2806 -0.5row 3: 0 0.2959 -0.5row 4: 0 0.3112 -0.5row 5: 0 0.3265 -0.5row 6: 0 0.3418 -0.5row 7: 0 0.3571 -0.5row 8: 0 0.3724 -0.5row 9: 0 0.3878 -0.5row 10: 0 0.4031 -0.5row 11: 0 0.4184 -0.5row 12: 0 0.4337 -0.5row 13: 0 0.449 -0.5row 14: 0 0.4643 -0.5row 15: 0 0.4796 -0.5row 16: 0 0.4949 -0.5

Page 121: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

112

row 17: 0 0.5102 -0.5row 18: 0 0.5255 -0.5row 19: 0 0.5408 -0.5row 20: 0 0.5561 -0.5row 21: 0 0.5714 -0.5row 22: 0 0.5867 -0.5row 23: 0 0.602 -0.5row 24: 0 0.6173 -0.5row 25: 0 0.6327 -0.5

. . . .

. . . .

. . . .row 76: 2.2398 0 -0.5row 77: 2.3163 0 -0.5row 78: 2.3929 0 -0.5row 79: 2.4694 0 -0.5row 80: 2.5459 0 -0.5row 81: 2.6224 0 -0.5row 82: 2.699 0 -0.5row 83: 2.7755 0 -0.5row 84: 2.852 0 -0.5row 85: 2.9286 0 -0.5row 86: 3.0051 0 -0.5row 87: 3.0816 0 -0.5row 88: 3.1582 0 -0.5row 89: 3.2347 0 -0.5row 90: 3.3112 0 -0.5row 91: 3.3878 0 -0.5row 92: 3.4643 0 -0.5row 93: 3.5408 0 -0.5row 94: 3.6173 0 -0.5row 95: 3.6939 0 -0.5row 96: 3.7704 0 -0.5row 97: 3.8469 0 -0.5row 98: 3.9235 0 -0.5row 99: 4 0 -0.5

Desired Stress Locations ** SLOC **Matrix: 100 by 3row 0: 0 0.25 -0.5row 1: 0 0.2653 -0.5row 2: 0 0.2806 -0.5row 3: 0 0.2959 -0.5row 4: 0 0.3112 -0.5row 5: 0 0.3265 -0.5row 6: 0 0.3418 -0.5row 7: 0 0.3571 -0.5row 8: 0 0.3724 -0.5row 9: 0 0.3878 -0.5row 10: 0 0.4031 -0.5row 11: 0 0.4184 -0.5row 12: 0 0.4337 -0.5row 13: 0 0.449 -0.5row 14: 0 0.4643 -0.5row 15: 0 0.4796 -0.5row 16: 0 0.4949 -0.5row 17: 0 0.5102 -0.5row 18: 0 0.5255 -0.5row 19: 0 0.5408 -0.5

Page 122: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

113

row 20: 0 0.5561 -0.5row 21: 0 0.5714 -0.5row 22: 0 0.5867 -0.5row 23: 0 0.602 -0.5row 24: 0 0.6173 -0.5row 25: 0 0.6327 -0.5

. . . .

. . . .

. . . .row 76: 2.2398 0 -0.5row 77: 2.3163 0 -0.5row 78: 2.3929 0 -0.5row 79: 2.4694 0 -0.5row 80: 2.5459 0 -0.5row 81: 2.6224 0 -0.5row 82: 2.699 0 -0.5row 83: 2.7755 0 -0.5row 84: 2.852 0 -0.5row 85: 2.9286 0 -0.5row 86: 3.0051 0 -0.5row 87: 3.0816 0 -0.5row 88: 3.1582 0 -0.5row 89: 3.2347 0 -0.5row 90: 3.3112 0 -0.5row 91: 3.3878 0 -0.5row 92: 3.4643 0 -0.5row 93: 3.5408 0 -0.5row 94: 3.6173 0 -0.5row 95: 3.6939 0 -0.5row 96: 3.7704 0 -0.5row 97: 3.8469 0 -0.5row 98: 3.9235 0 -0.5row 99: 4 0 -0.5

C2 Analysis Results

Solved Nodal Displacements *** NODAL DISPLACEMENTS 1,1 ***

NODE UX UY UZ0 2.61962e-08 -1.19929e-08 0.00000e+001 0.00000e+00 -1.96624e-08 0.00000e+002 2.42967e-08 -1.27255e-08 0.00000e+003 2.23201e-08 -1.35439e-08 0.00000e+004 2.05680e-08 -1.43885e-08 0.00000e+005 1.92339e-08 -1.52010e-08 0.00000e+006 1.82335e-08 -1.61019e-08 0.00000e+007 1.87236e-08 -1.67112e-08 0.00000e+008 1.65658e-08 -1.84865e-08 0.00000e+009 2.56958e-08 -1.84053e-08 0.00000e+00

10 0.00000e+00 -2.74234e-08 0.00000e+0011 0.00000e+00 -2.63523e-08 0.00000e+0012 0.00000e+00 -2.53435e-08 0.00000e+0013 0.00000e+00 -2.44469e-08 0.00000e+0014 0.00000e+00 -2.37317e-08 0.00000e+0015 0.00000e+00 -2.32353e-08 0.00000e+0016 0.00000e+00 -2.30048e-08 0.00000e+0017 0.00000e+00 -2.30996e-08 0.00000e+0018 0.00000e+00 -2.34459e-08 0.00000e+00

Page 123: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

114

19 1.99206e-07 -1.41993e-08 0.00000e+0020 1.78065e-07 -1.43034e-08 0.00000e+0021 1.57040e-07 -1.42982e-08 0.00000e+0022 1.35826e-07 -1.41777e-08 0.00000e+0023 1.14581e-07 -1.38855e-08 0.00000e+0024 9.29669e-08 -1.34407e-08 0.00000e+0025 7.03559e-08 -1.33899e-08 0.00000e+00. . . .. . . .. . . .

926 4.32839e-08 -3.97839e-09 3.41842e-09927 4.34645e-08 -3.94322e-09 6.46028e-09928 4.32188e-08 -3.63848e-09 9.27153e-09929 6.05649e-08 -3.64591e-09 3.35700e-09930 6.05155e-08 -3.57169e-09 6.68368e-09931 6.04412e-08 -3.33254e-09 1.00031e-08932 7.90859e-08 -4.72074e-09 3.51685e-09933 7.90482e-08 -4.64352e-09 7.04305e-09934 7.90517e-08 -4.49793e-09 1.06016e-08935 9.86015e-08 -6.25367e-09 3.59626e-09936 9.85980e-08 -6.20169e-09 7.18274e-09937 9.85790e-08 -6.12770e-09 1.07793e-08938 1.18549e-07 -7.75184e-09 3.60043e-09939 1.18552e-07 -7.72643e-09 7.19495e-09940 1.18521e-07 -7.69988e-09 1.07912e-08941 1.38664e-07 -9.13507e-09 3.58714e-09942 1.38660e-07 -9.12631e-09 7.17573e-09943 1.38635e-07 -9.12148e-09 1.07621e-08944 1.58844e-07 -1.04090e-08 3.57648e-09945 1.58838e-07 -1.04036e-08 7.16182e-09946 1.58813e-07 -1.04033e-08 1.07458e-08947 1.79120e-07 -1.15452e-08 3.56601e-09948 1.79120e-07 -1.15436e-08 7.14129e-09949 1.79087e-07 -1.15401e-08 1.07203e-08

Interpolated Desired Displacements ***DESIRED DISPLACEMENTS***

UX UY UZ-1.31252e-24 -2.07257e-08 7.67091e-095.72914e-09 -2.08224e-08 7.79507e-098.40863e-09 -2.12337e-08 7.90628e-097.19464e-09 -2.20480e-08 7.96759e-094.00012e-09 -2.32199e-08 7.97600e-097.48024e-10 -2.42877e-08 7.96427e-09

-1.31750e-10 -2.44616e-08 8.00345e-098.98861e-10 -2.39389e-08 8.08717e-091.77377e-09 -2.34775e-08 8.15056e-091.67747e-09 -2.33713e-08 8.16596e-09

. . .

. . .

. . .1.84639e-07 -2.12436e-12 7.12989e-091.88275e-07 -3.07387e-12 7.12117e-091.91916e-07 -2.70317e-12 7.11221e-091.95577e-07 -8.91806e-13 7.10248e-091.99230e-07 0.00000e+00 7.09057e-09

Page 124: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

115

Interpolated Desired Stresses ***DESIRED STRESSES***

S1 S2 S3 S4 S5 S67.13329e+01 2.61464e+01 2.53861e+01 2.96888e+00 -1.98779e-02 3.03980e-011.08798e+02 4.23510e+01 4.14256e+01 3.41752e+01 1.53525e-02 6.58132e-016.17883e+01 1.58994e+01 1.93254e+01 5.42463e+00 -8.56036e-02 4.10943e-013.41941e+01 -1.98413e+00 5.65126e+00 -8.64255e+00 -1.28420e-01 1.31329e-012.38423e+01 -1.00647e+01 1.37663e-01 -1.48761e+01 -1.31869e-01 -1.19146e-012.54889e+01 -1.04197e+00 3.37518e+00 -7.68324e+00 -1.30693e-01 -3.36002e-023.28554e+01 1.74526e+01 1.11209e+01 6.93041e+00 -1.28658e-01 3.19557e-013.31174e+01 2.17113e+01 1.24177e+01 1.08015e+01 -1.36592e-01 3.74117e-012.75139e+01 1.48885e+01 8.64761e+00 6.23152e+00 -1.53515e-01 1.49152e-012.23551e+01 7.15838e+00 4.78793e+00 6.52257e-01 -1.60318e-01 -9.77437e-02

. . . . . .

. . . . . .

. . . . . .9.97256e+00 -2.81577e-03 -1.03989e-03 1.27744e-02 -2.83028e-03 8.31051e-049.99125e+00 3.45283e-03 1.23269e-02 9.28044e-03 -2.86928e-03 8.90987e-041.00241e+01 1.45860e-02 3.25877e-02 -2.91678e-03 -3.07786e-03 8.39814e-041.00925e+01 3.99753e-02 6.76139e-02 -2.53061e-02 -3.74280e-03 6.08397e-049.93604e+00 -2.69576e-02 8.41000e-03 -4.07149e-02 -6.40856e-03 5.95998e-04

C3 Analysis Logs

Memory Usage Log MEMORY INFORMATION (standard types):type MAT 467281724 alloc. bytes 17285 alloc. variablestype BAND 0 alloc. bytes 0 alloc. variablestype PERM 48 alloc. bytes 1 alloc. variabletype VEC 13390896 alloc. bytes 590 alloc. variablestype IVEC 0 alloc. bytes 0 alloc. variablestype ITER 0 alloc. bytes 0 alloc. variablestype SPROW 0 alloc. bytes 0 alloc. variablestype SPMAT 0 alloc. bytes 0 alloc. variablestotal: 480672668 alloc. bytes 17876 alloc. variables

************************************************************************************* E N D O F O U T P U T F I L E *************************************************************************************

Analysis Time Log Time taken for the whole program was 27452.1062 seconds.

create the stiffness matrix: 27171.9659 seconds. [ 99%]create the force vector: 1.610069 seconds. [ 0.00587%]solve the system of equations: 215.211201 seconds. [ 0.784%]2850 equations solved

post-process the desired displ: 13.569051 seconds. [ 0.0494%]post-process the desired stresses: 49.095893 seconds. [ 0.179%]miscellaneous tasks: 0.654131 seconds. [ 0.00238%]

Analyzed on '4 computing nodes' with 'svr1.cml.ait.ac.th' as the masterSat Mar 25 01:27:02 2000

Page 125: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

116

APPENDIX D

SOURCE CODES

Source codes of Qserv, the queue server, and ParEFG, the parallel EFGM analysis software, are presented in this appendix. They can be built by UNIX Make utility with the attached Makefile files. Files indicated by the asterisks (*) were taken from the original PLEFG serial EFGM analysis code and are included here for referential purpose.

D1 The Queue Server

Makefile

SRV = qserv

all:cc -o $(SRV) $(SRV).crm *.o

clean:rm $(SRV)

Header files

File name: qcodes.h #define QSERV_PORT 9636#define ALL_DONE -1#define RESET_COUNTER 'r'#define SET_MAX_NUM 's'#define GET_NUM 'g'#define TERMINATE 't'#define READY 'r'

C Source Code Files

File name: qserv.c /*********************************//* General Purposes Queue Server *//*********************************/

#include <netinet/in.h>#include <sys/socket.h>#include <sys/time.h>#include <sys/types.h>#include <errno.h>#include <stdio.h>#include <unistd.h>#include "qcodes.h"

#define MAX_CLIENT 5

int main( int argc, char **argv )/** Variable Declarations*/

/* Socket Connection Facilities */int sock; //the rendezvous descriptorint fd; //the connection descriptorint max_fd; //the maximum number of fdstruct sockaddr_in server;//the server's internet domain addr

struct sockaddr_in client;//the client's internet domain addr

int client_len; //the length of theclient's addressfd_set test_set, ready_set;//the file descriptor sets used by

select()

Page 126: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

117

/* Socket Messages */char request_msg;int buffer;

/* Internal Data */int max_num = 0, count = 0;char runstate;

/** Create the Socket*/sock = socket( AF_INET, SOCK_STREAM, 0 );if ( sock < 0 ) perror( "creating stream socket" );exit( 1 );

/** Bind the Socket*/server.sin_family = AF_INET;server.sin_addr.s_addr = htonl( INADDR_ANY);server.sin_port = htons( QSERV_PORT );

if ( bind( sock, (struct sockaddr *)&server, sizeof server ) < 0 ) perror( "\nbinding socket" );exit( 2 );

/** Wait for the Clients and Serve*/listen( sock, MAX_CLIENT );max_fd = sock;FD_ZERO( &test_set );FD_SET( sock, &test_set );runstate = READY;

while( runstate != TERMINATE )

memcpy( &ready_set, &test_set, sizeoftest_set );select( max_fd+1, &ready_set, NULL, NULL,

NULL );

/* accept for the new clients */if ( FD_ISSET( sock, &ready_set ) )

client_len = sizeof client;fd = accept( sock, (struct sockaddr *)

&client, &client_len );FD_SET( fd, &test_set );if ( fd > max_fd ) max_fd = fd;

/* serve each ready client */for ( fd = 0; fd <= max_fd; fd++ )if ( ( fd != sock) && FD_ISSET( fd,

&ready_set ) )if (read( fd, &request_msg, sizeof

request_msg ) != sizeof request_msg)/* if there are no more requests,

close the connection */close( fd );FD_CLR( fd, &test_set );else/* process the requests */switch ( request_msg )case TERMINATE:runstate = TERMINATE;break;case RESET_COUNTER:count = 0;break;case SET_MAX_NUM:read( fd, &buffer, sizeof buffer );max_num = buffer;break;case GET_NUM:buffer = ( count <= max_num ) ?

count++ : ALL_DONE;write( fd, &buffer, sizeof buffer );break;

close( sock );

return 0;

D2 The Parallel EFGM Analysis Software

Makefile

PROG_NAME = parefg4

ALL: $(PROG_NAME)

MES_HOME = /usr/local/mesMES_LIB = -L$(MES_HOME)/lib -lmes

LIBS = $(MES_LIB)

Page 127: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

118

#linkerLINK = mpicc -o

#############################################################################

plefg_objs = basis.o cells.o efg_stiff.o force.o gauss.o grule.o \iscan.o isorc.o material.o norms.o output.o post.o setup.o shapes.o \sheps.o solve.o util.o weights.o

ddefg_objs = dd_input.o ddefg_stiff.o ddforce.o ddpost.o master.o \master_ddsolve.o master_parallel_gauss.o mpi_mes.o parefg_main.o \post_output.o qclient.o worker.o worker_ddsolve.o worker_parallel_gauss.o \parallel_gauss.o

objects = $(plefg_objs) $(ddefg_objs)

#############################################################################$(PROG_NAME):

$(LINK) $(PROG_NAME) $(objects) $(LIBS)$RM *.o

#############################################################################

Header files

File name: basis.h* #ifndef _GBORD#define _GBORDint gbord(int order,int ndim);#endif

#ifndef _GBASVEC#define _GBASVECVEC *gbasvec(VEC *xs,int order);#endif

#ifndef _GDBASVEC#define _GDBASVECMAT *gdbasvec(VEC *xs,int order);#endif

#ifndef _GPMAT#define _GPMATMAT *gpmat(MAT *xx,int order);#endif

File name: cells.h* #ifndef _GCDATA#define _GCDATAvoid gcdata(VEC *p1,VEC *p2,VEC *n,MAT**cnodes,MAT **ccon);#endif

#ifndef _GCDATA2#define _GCDATA2void gcdata2(VEC *p1,VEC *p2,VEC *n,MAT**cnodes,MAT **ccon);#endif

File name: constants.h* #ifndef _PI

#define _PI#define PI 3.141592653589793#endif

#ifndef _FOURTH#define _FOURTH#define FOURTH 0.25#endif

#ifndef _EIGHTH#define _EIGHTH#define EIGHTH 0.125#endif

#ifndef _HALF#define _HALF#define HALF 0.5#endif

#ifndef _ZERO#define _ZERO#define ZERO 0.0#endif

#ifndef _NBCCOM#define _NBCCOM#define NBCCOM 4#endif

#ifndef _FIXED#define _FIXED#define FIXED -1#endif

#ifndef _BRICK#define _BRICK#define BRICK 1#endif

#ifndef _PSTRESS#define _PSTRESS#define PSTRESS 1#endif

#ifndef _PSTRAIN

Page 128: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

119

#define _PSTRAIN#define PSTRAIN 2#endif

#ifndef _PASS#define _PASS#define PASS 1#endif

#ifndef _FAIL#define _FAIL#define FAIL 0#endif

#ifndef _TRACTION#define _TRACTION#define TRACTION 1#endif

#ifndef _DISPLACEMENT#define _DISPLACEMENT#define DISPLACEMENT 2#endif

File name: ddefg_stiff.h extern void ddefg_stiff( MPI_Comm comm, intmyid, int sock_fd, FILE *f_log, int ncell,VEC *evec, VEC *pvec, MAT *mvec,MAT *ccon, MAT *cnodes, MAT *nodes, MAT **K);

File name: ddforce.h extern void ddforce( MPI_Comm comm, intmyid, int sock_fd, FILE *f_log, int nforce,VEC *evec, VEC *pvec,MAT *FDATA, MAT *FNODES, MAT *NODES, VEC**f );

File name: ddpost.h extern void ddpost_displ( MPI_Comm comm,int myid, int sock_fd, FILE *f_log,int ndloc, VEC *evec, VEC *pvec, MAT *DLOC,MAT *MDISP, MAT *NODES, MAT **DDISP );

extern void ddpost_stress( MPI_Comm comm,int myid, int sock_fd, FILE *f_log,int nsloc, VEC *evec, VEC *pvec, MAT *mvec,VEC *DISP, MAT *NODES, MAT *SLOC, MAT**DSTRESS );

File name: dd_input.h extern void dd_read_input( FILE *fin, FILE*fout, VEC **evec, VEC **pvec,MAT **mvec, MAT **CCON, MAT **CNODES, MAT**DLOC, MAT **FDATA, MAT **FNODES, MAT**FPDATA, MAT **NCON, MAT **NDISP, MAT**NLOAD, MAT **NODES, MAT **PCON,MAT **PNODES, MAT **SLOC, int *num_cells,

int *num_forces, int *num_dloc, int*num_sloc );

File name: efg_stiff.h* #ifndef _ISTIFF#define _ISTIFFMAT *efg_stiff(int b_flag,MAT *cnodes,MAT*ccon,MAT *nodes, VEC *pvec,VEC *evec,MAT*mvec, MAT *THETAS,MAT *FLOWS,MAT**GPDATA);#endif

#ifndef _BMAT3D#define _BMAT3DMAT *bmat3d(VEC *phix,VEC *phiy,VEC*phiz);#endif

#ifndef _BMAT2D#define _BMAT2DMAT *bmat2d(VEC *phix,VEC *phiy);#endif

File name: force.h* #ifndef _IFNODE#define _IFNODEVEC *ifnode(MAT *NLOAD,int nnodes,intndim);#endif

#ifndef _IFORCE#define _IFORCEVEC *iforce(MAT *FNODES,MAT *FDATA,MAT*NODES,VEC *evec,VEC *pvec);#endif

File name: gauss.h* #ifndef _NORSET#define _NORSETMAT *norset(double s1,double s2,int n,intnsub);#endif

#ifndef _GET_TRIANGLE_DATA#define _GET_TRIANGLE_DATAMAT *get_triangle_data(inttriangle_order);#endif

File name: grule.h* #ifndef _GRULE#define _GRULEvoid grule(int nord,VEC **gp,VEC **gw);#endif

File name: input.h* #ifndef _GET_INPUT#define _GET_INPUT

Page 129: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

120

void read_input(FILE *fin,FILE *fout, VEC**pvec,VEC **evec,MAT **mvec, MAT**CNODES,MAT **CCON,MAT **NODES,MAT **NCON,MAT **PNODES,MAT **PCON, MAT **NLOAD,MAT**FNODES,MAT **FDATA,MAT **NDISP, MAT**FPDATA, MAT **DLOC,MAT **SLOC);#endif

File name: iscan.h* #ifndef _ISCAN#define _ISCANvoid iscan(VEC *xs,MAT *xx,intsearch_type,double param,int weight_type,MAT **list,VEC **index);#endif

File name: isorc.h* #ifndef _ISORC#define _ISORCint isorc(VEC *p1,MAT *xx);#endif

File name: master.h extern void master( MPI_Comm comm, int*argc, char ***argv );

File name: master_ddsolve.h extern VEC *master_ddsolve( MPI_Comm comm,MPI_Status *status, MAT *A, VEC *b, MAT*NDISP,MAT *FPDATA,MAT *NODES, VEC**fixed_list);

File name: master_parallel_gauss.h extern VEC *master_parallel_gauss( MPI_Commcomm, MPI_Status *status, MAT *Amat, VEC *b);

File name: material.h* #ifndef _HOOKE#define _HOOKEMAT *hooke(int pcode,double E,double nu);#endif

#ifndef _ISO#define _ISOdouble iso(double sy,double k,doublealpha);#endif

#ifndef _DISO#define _DISOdouble diso(double sy,double k,doublealpha);#endif

#ifndef _KIN

#define _KINdouble kin(double alpha);#endif

#ifndef _DKIN#define _DKINdouble dkin(double alpha);#endif

#ifndef _VNORM#define _VNORMdouble vnorm(VEC *vin);#endif

#ifndef _INT_CON#define _INT_CONvoid int_con(VEC **plastic_strain,VEC**back_stress,double *alpha, VEC**sig_trial,VEC *strain, double*theta,double *theta_bar,VEC**flow_direction, VEC *PCPS,double pca,double K,double ys,double ym,double nu);#endif

#ifndef _GET_CON_TAN#define _GET_CON_TANMAT *get_con_tan(VEC *gpt,MAT *mvec,VEC*pvec,VEC *flow_dir, double theta,doubletheta_bar);#endif

#ifndef _GET_MAT_NUM#define _GET_MAT_NUMint get_mat_num(VEC *gpt,int flag);#endif

File name: mpi_mes.h /**********************************************

MPI Meschach Data Transfer Subroutines

**********************************************/

extern int MPI_Send_vector( VEC **x, intdest, int tag, MPI_Comm comm );extern int MPI_Recv_vector( VEC **x, intsource, int tag, MPI_Comm comm, MPI_Status*status );extern int MPI_Bcast_vector( VEC **x, introot, MPI_Comm comm );extern int MPI_Gather_vector( VEC **x, introot, MPI_Comm comm );

extern int MPI_Send_matrix( MAT **A, intdest, int tag, MPI_Comm comm );extern int MPI_Recv_matrix( MAT **A, intsource, int tag, MPI_Comm comm, MPI_Status*status );extern int MPI_Bcast_matrix( MAT **A, introot, MPI_Comm comm );extern int MPI_Gather_matrix( MAT **A, introot, MPI_Comm comm );

Page 130: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

121

extern int MPI_Send_sym_matrix( MAT **A,int dest, int tag, MPI_Comm comm );extern int MPI_Recv_sym_matrix( MAT **A,int source, int tag, MPI_Comm comm,MPI_Status *status );extern int MPI_Gather_sym_matrix( MAT **A,int root, MPI_Comm comm );

File name: norms.h* #ifndef _L1NORM#define _L1NORMdouble l1norm(VEC *p1,VEC *p2);#endif

#ifndef _L2NORM#define _L2NORMdouble l2norm(VEC *p1,VEC *p2);#endif

#ifndef _DISTANCE#define _DISTANCEdouble distance(VEC *p1,VEC *p2);#endif

#ifndef _LINFNORM#define _LINFNORMdouble linfnorm(VEC *p1,VEC *p2);#endif

File name: output.h* #ifndef _OUTPUT_DISP#define _OUTPUT_DISPint output_disp(FILE *fin,FILE *fout,MAT*NODES,VEC *DISP, int i_load,int itcount);#endif

#ifndef _OUTPUT#define _OUTPUToutput(FILE *fin,FILE *fout, VEC *pvec,VEC*evec,MAT *mvec, MAT *NODES, VEC *DISP,MAT*PTS,MAT *STRESSES,MAT *STRAINS,MAT *PLASTIC_STRAIN,MAT *BACK_STRESS,VEC*ALPHA);#endif

File name: parallel_gauss.h extern VEC *parallel_gauss( MPI_Comm,MPI_Status *status, MAT *a );//returns the solution corresponding toMAT *a

File name: parmes.h /** parmes.h:ParEFG Global Header File* -----------------------------------*/

/** Include Files* -------------*/

#include <unistd.h>#include "mpi.h"#include "../mes/matrix.h"#include "mpi_mes.h"#include "qcodes.h"#include "qclient.h"

/** Constants Definitions* ---------------------*/#define COMM MPI_COMM_WORLD#define MASTER 0#define NO_DISTRIB_FORCE 0#define WITH_DISTRIB_FORCE 1#define MAX_NAME_LENGTH 40

File name: plefg.h /** The C standard libraries*/#include <stdio.h>#include <stdlib.h>#include <string.h>#include <limits.h>#include <math.h>

/** The MESCHACH libraries*/#include "../mes/matrix.h"#include "../mes/matrix2.h"#include "../mes/iter.h"#include "../mes/sparse.h"#include "../mes/sparse2.h"

/** The Serial PLEFG header files*/#include "constants.h"#include "efg_stiff.h"#include "force.h"#include "gauss.h"#include "grule.h"#include "iscan.h"#include "setup.h"#include "shapes.h"#include "solve.h"#include "material.h"#include "output.h"#include "post.h"#include "util.h"

File name: post.h* #ifndef _IGESTR#define _IGESTRdouble igestr(VEC *sig,int pcode,doublenu);#endif

#ifndef _IGSTRAIN#define _IGSTRAINVEC *igstrain(VEC *x,MAT *NODES,VEC*disp,VEC *evec,MAT **B, int gcount,MAT**GPDATA);

Page 131: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

122

#endif

#ifndef _IGSTRESS#define _IGSTRESSMAT *igstress(VEC *x,MAT *NODES,VEC*disp,VEC *pvec,VEC *evec,MAT *mvec);#endif

#ifndef _IGDSP#define _IGDSPVEC *igdsp(VEC *x,MAT *NODES,VEC *disp,intorder,int search_type, double param,intweight_type);#endif

#ifndef _TSTR#define _TSTRVEC *tstr(VEC *x,VEC *sig);#endif

File name: post_output.h* extern void post_output( FILE *f_out, MAT*DDISP, MAT *DSTRESS );

File name: qclient.h /** Generic Implementation*/

int connect_to_server( const char*server_name, const int port );/* returns the file descriptor to theconnection requested */

int send_request( const int sock_fd, constchar sock_msg );/* Send a request to the queue server */

int send_qdata( const int sock_fd, constint qdata );/* Send a data to the queue server, returns0 if the data was sent */

int get_qdata( const int sock_fd );/* Get a data from the queue server,returns 'qdata' */

/** ParEFG Specific Implementation*/

int get_num( const int sock_fd );/* Get a job from the queue server */

int set_max_num( const int sock_fd, constint max_cell_num );/* Set the maximum job number on the queueserver */

int stop_qserv( const int sock_fd );/* Terminate the queue server */

File name: qcodes.h #define QSERV_PORT 9636

#define ALL_DONE -1#define RESET_COUNTER 'r'#define SET_MAX_NUM 's'#define GET_NUM 'g'#define TERMINATE 't'#define READY 'r'

File name: setup.h* #ifndef _SETUP_DATA#define _SETUP_DATAint setup_data(FILE *fin,FILE *fout,int*EQCOUNT,

VEC *params,MAT *CORD,MAT *ICON,MAT*INBCS,

MAT **BCTYPE,MAT **BCVAL,MAT **ENUM);#endif

File name: shapes.h* #ifndef _GSHAPES#define _GSHAPESvoid gshapes(VEC *x,int order,VEC**phi,MAT **dphi);#endif

#ifndef _ISHP#define _ISHPvoid ishp(VEC *xs,MAT *xx,double soi,intorder,int weight_type,VEC **phi);#endif

#ifndef _IGSHP#define _IGSHPvoid igshp(VEC *xs,MAT *xx,doubleparam,int order, int weight_type,intsearch_type, MAT **LIST,VEC **index, VEC**phi);#endif

#ifndef _IDSHP#define _IDSHPvoid idshp(VEC *xs,MAT *xx,double soi,intorder,int weight_type, VEC **phix,VEC**phiy,VEC **phiz);#endif

#ifndef _IGDSHP#define _IGDSHPvoid igdshp(VEC *xs,MAT *xx,doubleparam,int order, int weight_type,intsearch_type, MAT **LIST,VEC **index,double*soi, VEC **phix,VEC **phiy,VEC **phiz);#endif

File name: sheps.h* #ifndef _SHEP#define _SHEPVEC *shep(VEC *xs,MAT *xx,double soi,intweight_type);#endif

#ifndef _DSHEP#define _DSHEPMAT *dshep(VEC *xs,MAT *xx,double soi,intweight_type);#endif

Page 132: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

123

File name: solve.h* #ifndef _SOLVE#define _SOLVEint solve(FILE *fin,FILE *fout,SPMAT *STIFF,VEC *FORCE,VEC **SOLU);

#endif

File name: util.h* #ifndef _GEQNUM#define _GEQNUMMAT *geqnum(MAT *NDISP,int nnodes,intndim);#endif

#ifndef _GKE#define _GKEMAT *gke(MAT *B,MAT *D,double dvol);#endif

#ifndef _GDA#define _GDAMAT *gda(MAT *U,MAT *UN,MAT *W,MAT *WN);#endif

#ifndef _GDB#define _GDBMAT *gdb(MAT *U,MAT *UN,MAT *W,MAT *WN);#endif

#ifndef _MTR3V_MLT#define _MTR3V_MLTVEC *mtr3v_mlt(MAT *M1,MAT *M2,MAT *M3,VEC*v1);#endif

#ifndef _GPHIN#define _GPHINVEC *gphin(VEC *sn,MAT *IA,MAT *AN,MAT*B,MAT *BN,MAT *C,MAT *CN, VEC *g,VEC *gn);#endif

#ifndef _SOLVE_EQ#define _SOLVE_EQVEC *solve_eq(MAT *A,VEC *b,MAT *NDISP,MAT*FPDATA,MAT *NODES, VEC **fixed_list);#endif

#ifndef _GET_R_THETA#define _GET_R_THETAVEC *get_r_theta(double x,double y);#endif

#ifndef _TSTR3D#define _TSTR3DVEC *tstr3d(VEC *pt,VEC *sig);#endif

#ifndef _DET

#define _DETdouble det(MAT *A);#endif

#ifndef _INV3#define _INV3MAT *inv3(MAT *A);#endif

#ifndef _GET_PLFAC#define _GET_PLFACdouble get_plfac(MAT *STRESS,VEC *params);#endif

#ifndef _GET_LOCAL_DATA#define _GET_LOCAL_DATAvoid get_local_data(double XCI,doubleETA,double TAU, VEC *X,VEC *Y,VEC *Z,double *detj,VEC **gpt);#endif

#ifndef _CHECK_SPARSENESS#define _CHECK_SPARSENESSdouble check_sparseness(MAT *M);#endif

File name: weights.h* #ifndef _IWT#define _IWTdouble iwt(VEC *p1,VEC *p2,doubleparam,int weight_type);#endif

#ifndef _IDWT#define _IDWTVEC *idwt(VEC *p1,VEC *p2,double param,intweight_type);#endif

File name: worker.h extern void worker( MPI_Comm comm );

File name: worker_ddsolve.h extern void worker_ddsolve( MPI_Comm comm,MPI_Status *status, int master_pid );

File name: worker_parallel_gauss.h extern void worker_parallel_gauss( MPI_Commcomm, MPI_Status *status, int master_pid );

C Source Code Files

Page 133: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

124

File name: basis.c* #include <stdio.h>#include <stdlib.h>#include <limits.h>

#include "math.h"#include "../mes/matrix.h"#include "../mes/matrix2.h"

int gbord(int order,int ndim)int m;

switch(ndim)case 1:printf("ERROR: 1D basis vectors are not

supported at this time!\n");break;

case 2:switch(order)case 1:m = 3;break;

case 2:m = 6;break;

case 3:m = 10;break;break;

case 3:switch(order)case 1:m = 4;break;

case 2:m = 10;break;

case 3:m = 20;break;break;

return(m);

VEC *gbasvec(VEC *xs,int order)int ndim;double x,y,z;VEC *bvec;

ndim = xs->dim;switch(ndim)case 1:

printf("ERROR: 1D basis vectors are notsupported at this time!\n");break;

case 2:x = xs->ve[0];y = xs->ve[1];switch(order)case 1:bvec = v_get(2);bvec->ve[0] = x;bvec->ve[1] = y;break;

case 2:bvec = v_get(5);bvec->ve[0] = x;bvec->ve[1] = y;bvec->ve[2] = x*x;bvec->ve[3] = x*y;bvec->ve[4] = y*y;break;

case 3:bvec = v_get(9);bvec->ve[0] = x;bvec->ve[1] = y;bvec->ve[2] = x*x;bvec->ve[3] = x*y;bvec->ve[4] = y*y;bvec->ve[5] = x*x*x;bvec->ve[6] = x*x*y;bvec->ve[7] = x*y*y;bvec->ve[8] = y*y*y;break;break;

case 3:x = xs->ve[0];y = xs->ve[1];z = xs->ve[2];switch(order)case 1:bvec = v_get(3);bvec->ve[0] = x;bvec->ve[1] = y;bvec->ve[2] = z;break;

case 2:bvec = v_get(9);bvec->ve[0] = x;bvec->ve[1] = y;bvec->ve[2] = z;bvec->ve[3] = x*x;bvec->ve[4] = y*y;bvec->ve[5] = z*z;bvec->ve[6] = x*y;bvec->ve[7] = x*z;bvec->ve[8] = y*z;break;

case 3:bvec = v_get(19);bvec->ve[0] = x;bvec->ve[1] = y;bvec->ve[2] = z;bvec->ve[3] = x*x;

Page 134: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

125

bvec->ve[4] = y*y;bvec->ve[5] = z*z;bvec->ve[6] = x*y;bvec->ve[7] = x*z;bvec->ve[8] = y*z;bvec->ve[9] = x*x*x;bvec->ve[10] = y*y*y;bvec->ve[11] = z*z*z;bvec->ve[12] = x*x*y;bvec->ve[13] = x*x*z;bvec->ve[14] = x*y*y;bvec->ve[15] = y*y*z;bvec->ve[16] = x*z*z;bvec->ve[17] = y*z*z;bvec->ve[18] = x*y*z;break;break;

return(bvec);

MAT *gdbasvec(VEC *xs,int order)int ndim;double x,y,z;MAT *dbvec;

ndim = xs->dim;switch(ndim)case 1:printf("ERROR: 1D basis vectors are not

supported at this time!\n");break;

case 2:x = xs->ve[0];y = xs->ve[1];switch(order)case 1:dbvec = m_get(2,2);dbvec->me[0][0] = 1.0;dbvec->me[1][0] = 0.0;

dbvec->me[0][1] = 0.0;dbvec->me[1][1] = 1.0;break;

case 2:dbvec = m_get(5,2);dbvec->me[0][0] = 1.0;dbvec->me[1][0] = 0.0;dbvec->me[2][0] = 2.0*x;dbvec->me[3][0] = y;dbvec->me[4][0] = 0.0;

dbvec->me[0][1] = 0.0;dbvec->me[1][1] = 1.0;dbvec->me[2][1] = 0.0;dbvec->me[3][1] = x;dbvec->me[4][1] = 2.0*y;break;

case 3:dbvec = m_get(9,2);dbvec->me[0][0] = 1.0;dbvec->me[1][0] = 0.0;dbvec->me[2][0] = 2.0*x;

dbvec->me[3][0] = y;dbvec->me[4][0] = 0.0;dbvec->me[5][0] = 3.0*x*x;dbvec->me[6][0] = 2.0*x*y;dbvec->me[7][0] = y*y;dbvec->me[8][0] = 0.0;

dbvec->me[0][1] = 0.0;dbvec->me[1][1] = 1.0;dbvec->me[2][1] = 0.0;dbvec->me[3][1] = x;dbvec->me[4][1] = 2.0*y;dbvec->me[5][1] = 0.0;dbvec->me[6][1] = x*x;dbvec->me[7][1] = 2.0*x*y;dbvec->me[8][1] = 3.0*y*y;break;break;

case 3:x = xs->ve[0];y = xs->ve[1];z = xs->ve[2];switch(order)case 1:dbvec = m_get(3,3);dbvec->me[0][0] = 1.0;dbvec->me[1][0] = 0.0;dbvec->me[2][0] = 0.0;

dbvec->me[0][1] = 0.0;dbvec->me[1][1] = 1.0;dbvec->me[2][1] = 0.0;

dbvec->me[0][2] = 0.0;dbvec->me[1][2] = 0.0;dbvec->me[2][2] = 1.0;break;

case 2:dbvec = m_get(9,3);dbvec->me[0][0] = 1.0;dbvec->me[1][0] = 0.0;dbvec->me[2][0] = 0.0;dbvec->me[3][0] = 2.0*x;dbvec->me[4][0] = 0.0;dbvec->me[5][0] = 0.0;dbvec->me[6][0] = y;dbvec->me[7][0] = z;dbvec->me[8][0] = 0.0;

dbvec->me[0][1] = 0.0;dbvec->me[1][1] = 1.0;dbvec->me[2][1] = 0.0;dbvec->me[3][1] = 0.0;dbvec->me[4][1] = 2.0*y;dbvec->me[5][1] = 0.0;dbvec->me[6][1] = x;dbvec->me[7][1] = 0.0;dbvec->me[8][1] = z;

dbvec->me[0][2] = 0.0;dbvec->me[1][2] = 0.0;dbvec->me[2][2] = 1.0;dbvec->me[3][2] = 0.0;dbvec->me[4][2] = 0.0;dbvec->me[5][2] = 2.0*z;dbvec->me[6][2] = 0.0;dbvec->me[7][2] = x;

Page 135: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

126

dbvec->me[8][2] = y;break;

case 3:dbvec = m_get(19,3);dbvec->me[0][0] = 1.0;dbvec->me[1][0] = 0.0;dbvec->me[2][0] = 0.0;dbvec->me[3][0] = 2.0*x;dbvec->me[4][0] = 0.0;dbvec->me[5][0] = 0.0;dbvec->me[6][0] = y;dbvec->me[7][0] = z;dbvec->me[8][0] = 0.0;dbvec->me[9][0] = 3.0*x*x;dbvec->me[10][0] = 0.0;dbvec->me[11][0] = 0.0;dbvec->me[12][0] = 2.0*x*y;dbvec->me[13][0] = 2.0*x*z;dbvec->me[14][0] = y*y;dbvec->me[15][0] = 0.0;dbvec->me[16][0] = z*z;dbvec->me[17][0] = 0.0;dbvec->me[18][0] = y*z;

dbvec->me[0][1] = 0.0;dbvec->me[1][1] = 1.0;dbvec->me[2][1] = 0.0;dbvec->me[3][1] = 0.0;dbvec->me[4][1] = 2.0*y;dbvec->me[5][1] = 0.0;dbvec->me[6][1] = x;dbvec->me[7][1] = 0.0;dbvec->me[8][1] = z;dbvec->me[9][1] = 0.0;dbvec->me[10][1] = 3.0*y*y;dbvec->me[11][1] = 0.0;dbvec->me[12][1] = x*x;dbvec->me[13][1] = 0.0;dbvec->me[14][1] = 2.0*x*y;dbvec->me[15][1] = 2.0*y*z;dbvec->me[16][1] = 0.0;dbvec->me[17][1] = z*z;dbvec->me[18][1] = x*z;

dbvec->me[0][2] = 0.0;dbvec->me[1][2] = 0.0;dbvec->me[2][2] = 1.0;dbvec->me[3][2] = 0.0;dbvec->me[4][2] = 0.0;dbvec->me[5][2] = 2.0*z;dbvec->me[6][2] = 0.0;dbvec->me[7][2] = x;dbvec->me[8][2] = y;dbvec->me[9][2] = 0.0;dbvec->me[10][2] = 0.0;dbvec->me[11][2] = 3.0*z*z;dbvec->me[12][2] = 0.0;dbvec->me[13][2] = x*x;dbvec->me[14][2] = 0.0;dbvec->me[15][2] = y*y;dbvec->me[16][2] = 2.0*x*z;dbvec->me[17][2] = 2.0*y*z;dbvec->me[18][2] = x*y;break;break;

return(dbvec);

MAT *gpmat(MAT *xx,int order)int i;int M,ndim,m;VEC *xi,*ibasvec;MAT *P;

M = xx->m;ndim = xx->n;

m = gbord(order,ndim);P = m_get(m-1,M);

for(i=0;i<M;i++)xi = get_row(xx,i,VNULL);ibasvec = gbasvec(xi,order);set_col(P,i,ibasvec);v_free(ibasvec); v_free(xi);

return(P);

File name: cells.c* #include <stdio.h>#include <stdlib.h>#include <limits.h>

#include "math.h"#include "../mes/matrix.h"#include "../mes/matrix2.h"

void gcdata(VEC *p1,VEC *p2,VEC *n,MAT**cnodes,MAT **ccon)int i,j,k,ndim,ncells,ncnodes,cnode,icell;int n1,n2,n3;VEC *cdim;

ndim = p1->dim;cdim = v_get(ndim);

ncells = 1;ncnodes = 1;for(i=0;i<ndim;i++)ncells = ncells * (int)(n->ve[i]);ncnodes = ncnodes * (int)(n->ve[i] + 1.0);cdim->ve[i] = (p2->ve[i] - p1->ve[i])/(n->ve[i]);

(*cnodes) = m_get(ncnodes,ndim);(*ccon) = m_get( ncells, (int) pow( 2,ndim ) );

switch(ndim)case 2:n1 = (int) n->ve[0];n2 = (int) n->ve[1];

cnode = 0;for(i=0;i<=n1;i++)for(j=0;j<=n2;j++)

Page 136: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

127

(*cnodes)->me[cnode][0] = p1->ve[0] +i*(cdim->ve[0]);

(*cnodes)->me[cnode][1] = p1->ve[1] +j*(cdim->ve[1]);

cnode += 1;

icell = 0;for(i=1;i<=n1;i++)for(j=1;j<=n2;j++)(*ccon)->me[icell][0] = (n2+1)*(i-1) + j

- 1;(*ccon)->me[icell][1] = (n2+1)*(i-0) + j

- 1;(*ccon)->me[icell][2] = (n2+1)*(i-0) +

j;(*ccon)->me[icell][3] = (n2+1)*(i-1) +

j;icell += 1;

break;

case 3:n1 = (int) n->ve[0];n2 = (int) n->ve[1];n3 = (int) n->ve[2];

cnode = 0;for(i=0;i<=n1;i++)for(j=0;j<=n2;j++)for(k=0;k<=n3;k++)(*cnodes)->me[cnode][0] = p1->ve[0] +

i*(cdim->ve[0]);(*cnodes)->me[cnode][1] = p1->ve[1] +

j*(cdim->ve[1]);(*cnodes)->me[cnode][2] = p1->ve[2] +

k*(cdim->ve[2]);cnode += 1;

icell = 0;for(i=1;i<=n1;i++)for(j=1;j<=n2;j++)for(k=1;k<=n3;k++)(*ccon)->me[icell][0] =

(n2+1)*(n3+1)*(i-1)+(n3+1)*(j-1)+ k - 1;(*ccon)->me[icell][1] =

(n2+1)*(n3+1)*(i-1)+(n3+1)*(j-1)+ k+1 - 1;(*ccon)->me[icell][2] =

(n2+1)*(n3+1)*(i-0)+(n3+1)*(j-1)+ k+1 - 1;(*ccon)->me[icell][3] =

(n2+1)*(n3+1)*(i-0)+(n3+1)*(j-1)+ k - 1;(*ccon)->me[icell][4] =

(n2+1)*(n3+1)*(i-1)+(n3+1)*(j-0)+ k - 1;(*ccon)->me[icell][5] =

(n2+1)*(n3+1)*(i-1)+(n3+1)*(j-0)+ k+1 - 1;(*ccon)->me[icell][6] =

(n2+1)*(n3+1)*(i-0)+(n3+1)*(j-0)+ k+1 - 1;(*ccon)->me[icell][7] =

(n2+1)*(n3+1)*(i-0)+(n3+1)*(j-0)+ k - 1;icell += 1;

break;

v_free(cdim);

void gcdata2(VEC *p1,VEC *p2,VEC *n,MAT**cnodes,MAT **ccon)int ndim;int i,j,k;int n1,n2,n3;int ir,iz,it;int ncnodes,ncells;int cnode,icell;double pi,t,r,z;VEC *cdim;

pi = 4.0*atan(1.0);ndim = p1->dim;cdim = v_get(ndim);

ncells = 1;ncnodes = 1;for(i=0;i<ndim;i++)ncells = ncells * (int) (n->ve[i]);ncnodes = ncnodes * (int) (n->ve[i] +1.0);cdim->ve[i] = (p2->ve[i] - p1->ve[i])/(n->ve[i]);

(*cnodes) = m_get(ncnodes,ndim);(*ccon) = m_get( ncells,(int) pow( 2, ndim) );

switch(ndim)case 2:n1 = (int) n->ve[0];n2 = (int) n->ve[1];

cnode = 0;for(ir=0;ir<=n2;ir++)for(it=0;it<=n1;it++)t = (p1->ve[0] + it*(cdim-

>ve[0]))*pi/180;r = p1->ve[1] + ir*(cdim->ve[1]);(*cnodes)->me[cnode][0] = r*cos(t);(*cnodes)->me[cnode][1] = r*sin(t);cnode += 1;

icell = 0;for(i=1;i<=n2;i++)for(j=1;j<=n1;j++)(*ccon)->me[icell][0] = (n1+1)*(i-1) + j

- 1;(*ccon)->me[icell][1] = (n1+1)*(i-0) + j

- 1;(*ccon)->me[icell][2] = (n1+1)*(i-0) +

j;(*ccon)->me[icell][3] = (n1+1)*(i-1) +

j;icell += 1;break;

case 3:

Page 137: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

128

n1 = (int) n->ve[0];n2 = (int) n->ve[1];n3 = (int) n->ve[2];

cnode = 0;for(iz=0;iz<=n3;iz++)for(ir=0;ir<=n2;ir++)for(it=0;it<=n1;it++)t = (p1->ve[0] + it*(cdim-

>ve[0]))*pi/180;r = p1->ve[1] + ir*(cdim->ve[1]);z = p1->ve[2] + iz*(cdim->ve[2]);(*cnodes)->me[cnode][0] = r*cos(t);(*cnodes)->me[cnode][1] = r*sin(t);(*cnodes)->me[cnode][2] = z;cnode += 1;

icell = 0;for(i=1;i<=n1;i++)for(j=1;j<=n2;j++)for(k=1;k<=n3;k++)(*ccon)->me[icell][0] =

(n1+1)*(n2+1)*(k-1)+(n1+1)*(j-1)+i - 1;(*ccon)->me[icell][1] =

(n1+1)*(n2+1)*(k-1)+(n1+1)*(j-1)+i + 1 - 1;(*ccon)->me[icell][2] =

(n1+1)*(n2+1)*(k-1)+(n1+1)*(j-0)+i + 1 - 1;(*ccon)->me[icell][3] =

(n1+1)*(n2+1)*(k-1)+(n1+1)*(j-0)+i - 1;(*ccon)->me[icell][4] =

(n1+1)*(n2+1)*(k-0)+(n1+1)*(j-1)+i - 1;(*ccon)->me[icell][5] =

(n1+1)*(n2+1)*(k-0)+(n1+1)*(j-1)+i + 1 - 1;(*ccon)->me[icell][6] =

(n1+1)*(n2+1)*(k-0)+(n1+1)*(j-0)+i + 1 - 1;(*ccon)->me[icell][7] =

(n1+1)*(n2+1)*(k-0)+(n1+1)*(j-0)+i - 1;icell += 1;

break;

v_free(cdim);

File name: ddefg_stiff.c #include "plefg.h"#include "parmes.h"

void ddefg_stiff( MPI_Comm comm, int myid,int sock_fd, FILE *f_log,

int ncell, VEC *evec, VEC *pvec, MAT*mvec,

MAT *ccon, MAT *cnodes,MAT *nodes, MAT **K )

/** EFG Stiffness for* Single 3D Linear Isotropic* Material*/

/*********************************//*** VARIABLE DECLARATIONS ***//*********************************/

int ngauss;int order;int search_type;int weight_type;double detj;double dvol;double E;double gweight;double nu;double param;

int icell;int ig, jg, kg;int ilist, jlist;int ispot, jspot;int inode;int M;int ncnodes;int ndim;int nnodes;int pcode;int pflag;int spot;MAT *B;MAT *D;MAT *dphi;MAT *KE;MAT *list;VEC *cellcon;VEC *gp;VEC *gpt;VEC *gw;VEC *index;VEC *lpt;VEC *phi;VEC *phix, *phiy, *phiz;VEC *X, *Y, *Z;

/***********************************//*** EXTRACT THE PARAMETERS ***//***********************************/

ngauss = (int) evec->ve[ 0 ];order = (int) evec->ve[ 1 ];weight_type = (int) evec->ve[ 2 ];search_type = (int) evec->ve[ 3 ];param = evec->ve[ 4 ];pflag = (int) pvec->ve[ 5 ];

nnodes = nodes->m;ndim = nodes->n;

pcode = (int) pvec->ve[ 0 ];E = mvec->me[ 0 ][ 0 ];nu = mvec->me[ 0 ][ 1 ];

/**************************************//*** PROCESS THE STIFFNESS DATA ***//**************************************/

/* Initialize the stiffness matrix */*K = m_get( ndim * nnodes, ndim * nnodes);

/* Form the material matrix */D = hooke( pcode, E, nu );

/* Initialize the queue server */if ( myid == MASTER ) set_max_num(sock_fd, ncell-1 );

Page 138: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

129

MPI_Barrier( comm );fprintf( f_log,"\n[%d] [Ready to Generate StiffnessMatrix]\n", myid );

/* Keep on working until the work is done*/while( ( icell = get_num( sock_fd ) ) !=ALL_DONE )/* Get the integration pts and weights */grule( ngauss, &gp, &gw );

/* Get a list of the nodes for this cell*/cellcon = get_row( ccon, icell, VNULL );

/* Get a list of the x, y, and zcoordinates of the cell nodes */ncnodes = 8; X = v_get( ncnodes ); Y =

v_get( ncnodes ); Z = v_get( ncnodes );

for( inode = 0; inode < ncnodes; inode++)spot = (int) cellcon->ve[ inode ];X->ve[ inode ] = cnodes->me[ spot ][ 0

];Y->ve[ inode ] = cnodes->me[ spot ][ 1

];Z->ve[ inode ] = cnodes->me[ spot ][ 2

];

lpt = v_get( ndim );gpt = v_get( ndim );

for( ig = 0; ig < ngauss; ig++ )for( jg = 0; jg < ngauss; jg++ )for( kg = 0; kg < ngauss; kg++ )gweight = gw->ve[ ig ] * gw->ve[ jg ] *

gw->ve[ kg ];

lpt->ve[ 0 ] = gp->ve[ ig ];lpt->ve[ 1 ] = gp->ve[ jg ];lpt->ve[ 2 ] = gp->ve[ kg ];

gshapes( lpt, 1, &phi, &dphi );phix = get_col( dphi, 0, VNULL );phiy = get_col( dphi, 1, VNULL );phiz = get_col( dphi, 2, VNULL );m_free( dphi );

detj = in_prod(phix,X) *( in_prod(phiy,Y)*in_prod(phiz,Z) –

in_prod(phiy,Z)*in_prod(phiz,Y) )-in_prod(phiy,X) *( in_prod(phix,Y)*in_prod(phiz,Z) –

in_prod(phix,Z)*in_prod(phiz,Y) )+in_prod(phiz,X) *( in_prod(phix,Y)*in_prod(phiy,Z) –

in_prod(phix,Z)*in_prod(phiy,Y) );

dvol = gweight * detj;

gpt->ve[ 0 ] = in_prod( phi, X );

gpt->ve[ 1 ] = in_prod( phi, Y );gpt->ve[ 2 ] = in_prod( phi, Z );

v_free( phix ); v_free( phiy ); v_free(phiz ); v_free( phi );

iscan( gpt, nodes, search_type, param,weight_type, &list, &index );

idshp( gpt, list, param, order,weight_type, &phix, &phiy, &phiz );

M = phix->dim;B = bmat3d( phix, phiy, phiz );KE = gke( B, D, dvol );

for( ilist = 0; ilist < M; ilist++ )for( jlist = 0; jlist < M; jlist++ )ispot = (int) index->ve[ ilist ];jspot = (int) index->ve[ jlist ];(*K)->me[ ispot*3+0 ][ jspot*3+0 ] +=

KE->me[ ilist*3+0 ][ jlist*3+0 ];(*K)->me[ ispot*3+0 ][ jspot*3+1 ] +=

KE->me[ ilist*3+0 ][ jlist*3+1 ];(*K)->me[ ispot*3+0 ][ jspot*3+2 ] +=

KE->me[ ilist*3+0 ][ jlist*3+2 ];(*K)->me[ ispot*3+1 ][ jspot*3+0 ] +=

KE->me[ ilist*3+1 ][ jlist*3+0 ];(*K)->me[ ispot*3+1 ][ jspot*3+1 ] +=

KE->me[ ilist*3+1 ][ jlist*3+1 ];(*K)->me[ ispot*3+1 ][ jspot*3+2 ] +=

KE->me[ ilist*3+1 ][ jlist*3+2 ];(*K)->me[ ispot*3+2 ][ jspot*3+0 ] +=

KE->me[ ilist*3+2 ][ jlist*3+0 ];(*K)->me[ ispot*3+2 ][ jspot*3+1 ] +=

KE->me[ ilist*3+2 ][ jlist*3+1 ];(*K)->me[ ispot*3+2 ][ jspot*3+2 ] +=

KE->me[ ilist*3+2 ][ jlist*3+2 ];m_free( B );m_free( list );m_free( KE );v_free( index );v_free( phix ); v_free( phiy ); v_free(

phiz );v_free( X ); v_free( Y ); if( ndim == 3 )

v_free( Z );v_free( gpt ); v_free( gp ); v_free( gw

);v_free( cellcon );v_free( lpt );fprintf( f_log, "[%d] integration cell

#%d is done.\n", myid, icell );m_free( D );fprintf( f_log, "[%d] [Stiffness MatrixGeneration Done]\n", myid );

/* Gather the generated 'symmetric'stiffness matrix */MPI_Barrier( comm );MPI_Gather_sym_matrix( K, MASTER, comm );

return;

Page 139: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

130

File name: ddforce.c #include "plefg.h"#include "parmes.h"

void ddforce( MPI_Comm comm, int myid, intsock_fd, FILE *f_log,

int nforce, VEC *evec, VEC *pvec,MAT *FDATA, MAT *FNODES, MAT

*NODES, VEC **f )/** EFG Force Vector*/

/*********************************//*** VARIABLE DECLARATIONS ***//*********************************/

int order;int ngauss;int search_type;int weight_type;double param;

int df;int i;int iforce;int ig, jg;int ii;int M;int nnodes;int ndim;int pflag;int spot;double darea;double detj;double gweight;double icomp, jcomp, kcomp;double trac;MAT *DPHI;MAT *LIST;VEC *index;VEC *lpt;VEC *phi;VEC *phix, *phiy;VEC *gp;VEC *gpt;VEC *gw;VEC *t;VEC *x, *y, *z;

/***********************************//*** EXTRACT THE PARAMETERS ***//***********************************/

ngauss = (int) evec->ve[ 0 ];order = (int) evec->ve[ 1 ];weight_type = (int) evec->ve[ 2 ];search_type = (int) evec->ve[ 3 ];param = evec->ve[ 4 ];pflag = (int) pvec->ve[ 5 ];

nnodes = NODES->m;ndim = NODES->n;

/***********************************//*** PROCESS THE FORCE DATA ***//***********************************/

/* Initialize the force vector */

*f = v_get( ndim * nnodes );

/* Initialize the queue server */if ( myid == MASTER ) set_max_num(sock_fd, nforce-1 );MPI_Barrier( comm );fprintf( f_log,"\n[%d] [Ready to Generate ForceVector]\n", myid );

switch( ndim )case 3: /* 3D Implementation */

x = v_get( 4 );y = v_get( 4 );z = v_get( 4 );t = v_get( 4 );lpt = v_get( 2 );gpt = v_get( 3 );

/* Keep on working until the work isdone */

while( ( iforce = get_num( sock_fd ) )!= ALL_DONE )

for( i = 0; i < 4; i++ )spot = (int) FDATA->me[ iforce ][ i ];x->ve[ i ] = FNODES->me[ spot ][ 0 ];y->ve[ i ] = FNODES->me[ spot ][ 1 ];z->ve[ i ] = FNODES->me[ spot ][ 2 ];t->ve[ i ] = FNODES->me[ spot ][ 3 ];df = (int) FNODES->me[ spot ][ 4 ];

grule( ngauss, &gp, &gw );for( ig = 0; ig < ngauss; ig++ )for( jg = 0; jg < ngauss; jg++ )lpt->ve[ 0 ] = gp->ve[ ig ];lpt->ve[ 1 ] = gp->ve[ jg ];gweight = gw->ve[ ig ] * gw->ve[ jg ];gshapes( lpt, 1, &phi, &DPHI );phix = get_col( DPHI, 0, VNULL );phiy = get_col( DPHI, 1, VNULL );

icomp = +in_prod(phix,y)*in_prod(phiy,z)

-in_prod(phiy,y)*in_prod(phix,z);

jcomp = -in_prod(phix,x)*in_prod(phiy,z)

+in_prod(phiy,x)*in_prod(phix,z);

kcomp = +in_prod(phix,x)*in_prod(phiy,y)

-in_prod(phiy,x)*in_prod(phix,y);

detj = sqrt( icomp*icomp + jcomp*jcomp+ kcomp*kcomp );

darea = gweight * detj;

gpt->ve[ 0 ] = in_prod( phi, x );gpt->ve[ 1 ] = in_prod( phi, y );

Page 140: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

131

gpt->ve[ 2 ] = in_prod( phi, z );trac = in_prod( phi, t );

m_free( DPHI );v_free( phi );v_free( phix );v_free( phiy );

iscan( gpt, NODES, search_type, param,weight_type, &LIST, &index );ishp( gpt, LIST, param, order,

weight_type, &phi );

M = phi->dim;

for( ii = 0; ii < M; ii++ )spot = (int) index->ve[ ii ];(*f)->ve[ 3*spot+df-1 ] += phi->ve[

ii ]*trac*darea;m_free( LIST );v_free( index );v_free( phi );v_free( gp ); v_free( gw );fprintf( f_log, "[%d] integration cell

#%d is done.\n", myid, iforce );fprintf( f_log, "[%d] [Force Vector

Generation Done]\n", myid );break;v_free( x ); v_free( y ); if( ndim == 3 )v_free( z );v_free( gpt ); v_free( lpt ); v_free( t );

/* Gather the generated force vector */MPI_Barrier( comm );MPI_Gather_vector( f, MASTER, comm );

return;

File name: ddpost.c #include "plefg.h"#include "parmes.h"

void ddpost_displ( MPI_Comm comm, int myid,int sock_fd, FILE *f_log,

int ndloc, VEC *evec, VEC *pvec, MAT*DLOC,

MAT *MDISP, MAT *NODES, MAT **DDISP )MAT *LIST;VEC *gpt;VEC *index;VEC *phi;

int ndim;int order;int phi_size;int search_type;int weight_type;double param;

int myid, np;int i, j;

int node_num;double ux, uy, uz;

/***********************************//*** EXTRACT THE PARAMETERS ***//***********************************/

order = (int) evec->ve[ 1 ];weight_type = (int) evec->ve[ 2 ];search_type = (int) evec->ve[ 3 ];param = evec->ve[ 4 ];

/*****************************************//*** INITIALIZE THE RESULTING MATRIX***/

/*****************************************/

ndim = NODES->n;(*DDISP) = m_get( ndloc, ndim );

/******************************************//*** COMPUTE THE DESIRED DISPLACEMENT***/

/******************************************/

MPI_Comm_rank( comm, &myid );MPI_Comm_size( comm, &np );

/* Post-process the desired displacements*/for ( i = 0; i < ndloc; i++ )if ( (i%np) == myid ) gpt = get_row( DLOC, i, VNULL );igshp( gpt, NODES, param, order,

weight_type, search_type,&LIST, &index, &phi );ux = 0.0; uy = 0.0; uz = 0.0;phi_size = phi->dim;for ( j = 0; j < phi_size; j++ ) node_num = (int) index->ve[ j ];ux += phi->ve[ j ] * MDISP->me[ node_num

][ 0 ];uy += phi->ve[ j ] * MDISP->me[ node_num

][ 1 ];uz += phi->ve[ j ] * MDISP->me[ node_num

][ 2 ];(*DDISP)->me[ i ][ 0 ] = ux;(*DDISP)->me[ i ][ 1 ] = uy;(*DDISP)->me[ i ][ 2 ] = uz;

m_free( LIST ); v_free( gpt ); v_free(index ); v_free( phi );

fprintf( f_log, "[%d] desired displ #%dis done.\n", myid, i );fprintf( f_log, "[%d] [Post-processing forDisplacements Done]\n", myid );MPI_Gather_matrix( DDISP, MASTER, comm );

return;

Page 141: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

132

void ddpost_stress( MPI_Comm comm, intmyid, int sock_fd, FILE *f_log,

int nsloc, VEC *evec, VEC *pvec, MAT*mvec,

VEC *DISP, MAT *NODES, MAT *SLOC, MAT**DSTRESS )MAT *RESULT;VEC *eps;VEC *gpt;VEC *sig;

int myid, np;int i;int post_flag;

/***********************************//*** EXTRACT THE PARAMETERS ***//***********************************/

post_flag = (int) pvec->ve[ 6 ];

/*****************************************//*** INITIALIZE THE RESULTING MATRIX***/

/*****************************************/

(*DSTRESS) = m_get( nsloc, 6 );

/*********************************//*** COMPUTE THE STRESSES ***//*********************************/

/* Post-process the stresses at thedesireddisplacement locations */switch( post_flag )case 0:/** Post-processing type 0:* nodal structure will be used as the

basis* for post-processing grid.*/MPI_Comm_rank( comm, &myid );MPI_Comm_size( comm, &np );

for ( i = 0; i < nsloc; i++ )if ( (i%np) == myid ) gpt = get_row( SLOC, i, VNULL );RESULT = igstress( gpt, NODES, DISP,

pvec, evec, mvec );eps = get_row( RESULT, 0, VNULL );sig = get_row( RESULT, 1, VNULL );set_row( *DSTRESS, i, sig );m_free( RESULT ); v_free( eps ); v_free(

gpt ); v_free( sig );fprintf( f_log, "[%d] desired stress #%d

is done.\n", myid, i );break;fprintf( f_log, "[%d] [Post-processing forStresses Done]\n", myid );

MPI_Gather_matrix( DSTRESS, MASTER, comm);

return;

File name: dd_input.c #include <stdio.h>#include <stdlib.h>#include <limits.h>#include "math.h"#include "../mes/matrix.h"#include "../mes/matrix2.h"#include "cells.h"

void dd_read_input( FILE *fin, FILE *fout,VEC **evec, VEC **pvec,

MAT **mvec, MAT **CCON, MAT**CNODES, MAT **DLOC,

MAT **FDATA, MAT **FNODES, MAT**FPDATA, MAT **NCON,

MAT **NDISP, MAT **NLOAD, MAT**NODES, MAT **PCON,

MAT **PNODES, MAT **SLOC, int*num_cells,

int *num_forces, int *num_dloc, int*num_sloc )int axcode;int mdum;int ndim;int rflag;int i;VEC *nc, *nn, *np, *p1, *p2;

/* ------------ Problem Parameters --------------------- */

*pvec = v_finput(fin,VNULL);fprintf(fout,"\n** pvec **\n");v_foutput(fout,*pvec);axcode = (int) (*pvec)->ve[1];ndim = (int) (*pvec)->ve[2];

/* ------------ EFG Parameters --------------------- */

*evec = v_get(5);v_finput(fin,*evec);fprintf(fout,"\n** evec **\n");v_foutput(fout,*evec);

/* ------------ Material Parameters --------------------- */

finput(fin,"Input mdum:","%d",&mdum);*mvec = m_get(mdum,5);m_finput(fin,*mvec);fprintf(fout,"\n** mvec **\n");m_foutput(fout,*mvec);

/* -----------------Domain Description ------------------ */

p1 = v_get(ndim);v_finput(fin,p1);fprintf(fout,"\n** p1 **\n");v_foutput(fout,p1);

Page 142: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

133

p2 = v_get(ndim);v_finput(fin,p2);fprintf(fout,"\n** p2 **\n");v_foutput(fout,p2);

nc = v_get(ndim);v_finput(fin,nc);fprintf(fout,"\n** nc **\n");v_foutput(fout,nc);

nn = v_get(ndim);v_finput(fin,nn);fprintf(fout,"\n** nn **\n");v_foutput(fout,nn);

np = v_get(3);v_finput(fin,np);fprintf(fout,"\n** np **\n");v_foutput(fout,np);

for(i=0;i<ndim;i++)nn->ve[i] = nn->ve[i] - 1;

/* --------- Automatic Generation of thenodal and cell structures - - */

switch(axcode)case 1:gcdata(p1,p2,nc,CNODES,CCON);gcdata(p1,p2,nn,NODES,NCON);gcdata(p1,p2,np,PNODES,PCON);break;

case 2:gcdata2(p1,p2,nc,CNODES,CCON);gcdata2(p1,p2,nn,NODES,NCON);gcdata2(p1,p2,np,PNODES,PCON);break;

/* --------- Manual Override of the Nodaland Cell Structures ------------ */

finput(fin,"Input rflag:","%d",&rflag);

if(rflag != 0)m_free(*NODES);*NODES = m_get(rflag,3);m_finput(fin,*NODES);printf("ATTENTION: Read NODES fromfile.\n");

finput(fin,"Input rflag:","%d",&rflag);

if(rflag != 0)m_free(*CNODES);*CNODES = m_get(rflag,3);m_finput(fin,*CNODES);printf("ATTENTION: Read CNODES fromfile.\n");

finput(fin,"Input rflag:","%d",&rflag);

if(rflag != 0)

m_free(*CCON);*CCON = m_get(rflag,8);m_finput(fin,*CCON);printf("ATTENTION: Read CCON fromfile.\n");

finput(fin,"Input rflag:","%d",&rflag);

if(rflag != 0)m_free(*PNODES);*PNODES = m_get(rflag,3);m_finput(fin,*PNODES);printf("ATTENTION: Read PNODES fromfile.\n");

finput(fin,"Input rflag:","%d",&rflag);

if(rflag != 0)m_free(*PCON);*PCON = m_get(rflag,8);m_finput(fin,*PCON);printf("ATTENTION: Read PCON fromfile.\n");

fprintf(fout,"\n** CNODES **\n");m_foutput(fout,*CNODES);fprintf(fout,"\n** CCON **\n");m_foutput(fout,*CCON);fprintf(fout,"\n** NODES **\n");m_foutput(fout,*NODES);fprintf(fout,"\n** PNODES **\n");m_foutput(fout,*PNODES);

/* --------- Clean up allocated cellstructure data ------- */

v_free(p1); v_free(p2); v_free(nn);v_free(nc); v_free(np);

/* ----------- Define the Point Loadboundary conditions -------- */

finput(fin,"Input mdum:","%d",&mdum);*NLOAD = m_get(mdum,3);m_finput(fin,*NLOAD);fprintf(fout,"\n** NLOAD **\n");m_foutput(fout,*NLOAD);

/* -------- Define the Distributed Loadboundary conditions ------------ */

finput(fin,"Input mdum:","%d",&mdum);*FNODES = m_get(mdum,ndim+2);m_finput(fin,*FNODES);fprintf(fout,"\n** FNODES **\n");m_foutput(fout,*FNODES);

finput(fin,"Input mdum:","%d",&mdum);*FDATA = m_get(mdum,ndim*2-2);m_finput(fin,*FDATA);fprintf(fout,"\n** FDATA **\n");m_foutput(fout,*FDATA);

finput(fin,"Input mdum:","%d",&mdum);*NDISP = m_get(mdum,3);m_finput(fin,*NDISP);

Page 143: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

134

fprintf(fout,"\n** NDISP **\n");m_foutput(fout,*NDISP);

finput(fin,"Input mdum:","%d",&mdum);*FPDATA = m_get(mdum,8);m_finput(fin,*FPDATA);fprintf(fout,"\n** FPDATA **\n");m_foutput(fout,*FPDATA);

finput(fin,"Input mdum:","%d",&mdum);*DLOC = m_get(mdum,ndim);m_finput(fin,*DLOC);fprintf(fout,"\n** DLOC **\n");m_foutput(fout,*DLOC);

finput(fin,"Input mdum:","%d",&mdum);*SLOC = m_get(mdum,ndim);m_finput(fin,*SLOC);fprintf(fout,"\n** SLOC **\n");m_foutput(fout,*SLOC);

/* Total number of integration cells */*num_cells = (*CCON)->m;

/* Total number of cells to integrate forthe force vector */*num_forces = (*FDATA)->m;

/* Total number of desired displacements*/*num_dloc = (*DLOC)->m;

/* Total number of desired stresses */*num_sloc = (*SLOC)->m;

File name: efg_stiff.c* #include <stdio.h>#include <stdlib.h>#include <limits.h>

#include "math.h"#include "../mes/matrix.h"#include "../mes/matrix2.h"

#include "material.h"#include "shapes.h"#include "util.h"#include "efg_stiff.h"#include "grule.h"#include "iscan.h"

MAT *bmat2d(VEC *phix,VEC *phiy)int ilist,M;MAT *B;

M = phix->dim;B = m_get(3,2*M);

for(ilist=0;ilist<M;ilist++)B->me[0][ilist*2] = phix->ve[ilist];B->me[1][ilist*2+1] = phiy->ve[ilist];B->me[2][ilist*2] = phiy->ve[ilist];B->me[2][ilist*2+1] = phix->ve[ilist];

return(B);

MAT *bmat3d(VEC *phix,VEC *phiy,VEC *phiz)int ilist,M;MAT *B;

M = phix->dim;B = m_get(6,3*M);

for(ilist=0;ilist<M;ilist++)B->me[0][ilist*3] = phix->ve[ilist];B->me[1][ilist*3+1] = phiy->ve[ilist];B->me[2][ilist*3+2] = phiz->ve[ilist];B->me[3][ilist*3] = phiy->ve[ilist];B->me[3][ilist*3+1] = phix->ve[ilist];B->me[4][ilist*3] = phiz->ve[ilist];B->me[4][ilist*3+2] = phix->ve[ilist];B->me[5][ilist*3+1] = phiz->ve[ilist];B->me[5][ilist*3+2] = phiy->ve[ilist];

return(B);

MAT *efg_stiff(int b_flag,MAT *cnodes,MAT*ccon,MAT *nodes,

VEC *pvec,VEC *evec,MAT *mvec,MAT *THETAS,MAT *FLOWS,MAT **GPDATA)

int ngauss,order,search_type,weight_type;double param;int f,pdone,pflag;intncells,nnodes,icell,inode,spot,ig,jg,kg,M,ilist,jlist,ispot,jspot,ndim;int ncnodes,gcount;double gweight,volume,detj,dvol;VEC*gp,*gw,*cellcon,*X,*Y,*Z,*phi,*phix,*phiy,*phiz,*lpt,*gpt,*index;VEC *flow_dir;double theta,theta_bar;MAT *K,*D,*dphi,*list,*B,*KE;

pflag = (int) pvec->ve[5];ngauss = (int) evec->ve[0];order = (int) evec->ve[1];weight_type = (int) evec->ve[2];search_type = (int) evec->ve[3];param = evec->ve[4];

ncells = ccon->m;nnodes = nodes->m;ndim = nodes->n;K = m_get(ndim*nnodes,ndim*nnodes);

/* Initialize the Volume/Area */volume = 0.0;

gcount = 0;f = -1;for(icell=0;icell<ncells;icell++)/* Print the Progress Meter */pdone =(int)(floor(10.0*((double)(icell+1))/((double)(ncells))));

Page 144: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

135

if( pdone != f )printf("%d",pdone);f = pdone;fflush(stdout);

/* Get the integration pts and weights */grule(ngauss,&gp,&gw);

/* Get a list of the nodes for this cell*/cellcon = get_row(ccon,icell,VNULL);

/* Get a list of the x,y, and zcoordinates of the cell nodes */ncnodes = 8;X = v_get(ncnodes); Y = v_get(ncnodes); Z= v_get(ncnodes);

for(inode=0;inode<ncnodes;inode++)spot = (int) (cellcon->ve[inode]);X->ve[inode] = cnodes->me[spot][0];Y->ve[inode] = cnodes->me[spot][1];Z->ve[inode] = cnodes->me[spot][2];

lpt = v_get(ndim);gpt = v_get(ndim);

for(ig=0;ig<ngauss;ig++)for(jg=0;jg<ngauss;jg++)for(kg=0;kg<ngauss;kg++)gweight = gw->ve[ig] * gw->ve[jg] * gw-

>ve[kg];

lpt->ve[0] = gp->ve[ig];lpt->ve[1] = gp->ve[jg];lpt->ve[2] = gp->ve[kg];

theta = THETAS->me[gcount][0];theta_bar = THETAS->me[gcount][1];flow_dir = get_row(FLOWS,gcount,VNULL);

gshapes(lpt,1,&phi,&dphi);phix = get_col(dphi,0,VNULL);phiy = get_col(dphi,1,VNULL);phiz = get_col(dphi,2,VNULL);m_free(dphi);

detj = in_prod(phix,X) *(in_prod(phiy,Y)*in_prod(phiz,Z) -

in_prod(phiy,Z)*in_prod(phiz,Y)) -in_prod(phiy,X) *

(in_prod(phix,Y)*in_prod(phiz,Z) -in_prod(phix,Z)*in_prod(phiz,Y)) +

in_prod(phiz,X) *(in_prod(phix,Y)*in_prod(phiy,Z) -

in_prod(phix,Z)*in_prod(phiy,Y));dvol = gweight*detj;

gpt->ve[0] = in_prod(phi,X);gpt->ve[1] = in_prod(phi,Y);gpt->ve[2] = in_prod(phi,Z);

/* Form the material matrix */D =

get_con_tan(gpt,mvec,pvec,flow_dir,theta,theta_bar);

v_free(phix); v_free(phiy);v_free(phiz); v_free(phi);

/* For the First Iteration */switch(b_flag)case 1://igdshp(gpt,nodes,param,order,// weight_type,search_type,// &list,&index,&soi,// &phix,&phiy,&phiz);

iscan(gpt,nodes,search_type,param,weight_type,&list,&index);

idshp(gpt,list,param,order,weight_type,&phix,&phiy,&phiz);

GPDATA[gcount] = m_get(index->dim,4);set_col(GPDATA[gcount],0,phix);set_col(GPDATA[gcount],1,phiy);set_col(GPDATA[gcount],2,phiz);set_col(GPDATA[gcount],3,index);break;

case 2:phix = get_col(GPDATA[gcount],0,VNULL);phiy = get_col(GPDATA[gcount],1,VNULL);phiz = get_col(GPDATA[gcount],2,VNULL);index =

get_col(GPDATA[gcount],3,VNULL);break;

if(pflag == 1)printf("%d %d %d %d %12.3e %d\n",icell,ig,jg,kg,param,phix->dim);

M = phix->dim;B = bmat3d(phix,phiy,phiz);KE = gke(B,D,dvol);

for(ilist=0;ilist<M;ilist++)for(jlist=0;jlist<M;jlist++)ispot = (int) (index->ve[ilist]);jspot = (int) (index->ve[jlist]);K->me[ispot*3+0][jspot*3+0] += KE-

>me[ilist*3+0][jlist*3+0];K->me[ispot*3+0][jspot*3+1] += KE-

>me[ilist*3+0][jlist*3+1];K->me[ispot*3+0][jspot*3+2] += KE-

>me[ilist*3+0][jlist*3+2];K->me[ispot*3+1][jspot*3+0] += KE-

>me[ilist*3+1][jlist*3+0];K->me[ispot*3+1][jspot*3+1] += KE-

>me[ilist*3+1][jlist*3+1];K->me[ispot*3+1][jspot*3+2] += KE-

>me[ilist*3+1][jlist*3+2];K->me[ispot*3+2][jspot*3+0] += KE-

>me[ilist*3+2][jlist*3+0];K->me[ispot*3+2][jspot*3+1] += KE-

>me[ilist*3+2][jlist*3+1];K->me[ispot*3+2][jspot*3+2] += KE-

>me[ilist*3+2][jlist*3+2];volume += dvol;

Page 145: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

136

gcount += 1;

if(b_flag == 1)m_free(list);

m_free(KE); m_free(B); m_free(D);v_free(flow_dir);

v_free(index); v_free(phix);v_free(phiy); v_free(phiz);

v_free(lpt); v_free(gpt);v_free(gp); v_free(gw); v_free(cellcon);v_free(X); v_free(Y);if(ndim==3)v_free(Z);

/* m_free(D); */

printf(" Volume is %10.5e\n",volume);return(K);

File name: force.c* #include <stdio.h>#include <stdlib.h>#include <limits.h>

#include "math.h"#include "../mes/matrix.h"#include "../mes/matrix2.h"

#include "shapes.h"#include "grule.h"#include "iscan.h"

VEC *ifnode(MAT *NLOAD,int nnodes,int ndim)int i,eqn,nnload;VEC *f,*dt;

f = v_get(nnodes*ndim);nnload = NLOAD->m;

for(i=0;i<nnload;i++)dt = get_row(NLOAD,i,VNULL);eqn = ndim * (int) (dt->ve[1]) + (int)(dt->ve[0]) - 1;f->ve[eqn] += dt->ve[2];v_free(dt);

return(f);

VEC *iforce(MAT *FNODES,MAT *FDATA,MAT*NODES,VEC *evec,VEC *pvec)int order,ngauss,search_type,weight_type;double param;inti,ig,jg,iforce,spot,nforce,nnodes,ndim,M,ii,df,pflag;doublearea,darea,gweight,detj,trac,icomp,jcomp,kcomp;

VEC*x,*y,*z,*t,*phi,*gp,*gw,*phix,*phiy,*lpt,*gpt,*f,*index;MAT *DPHI,*LIST;

ngauss = (int) evec->ve[0];order = (int) evec->ve[1];weight_type = (int) evec->ve[2];search_type = (int) evec->ve[3];param = evec->ve[4];pflag = (int) pvec->ve[5];

nforce = FDATA->m;nnodes = NODES->m;ndim = NODES->n;f = v_get(ndim*nnodes);

switch(ndim)case 3:

x = v_get(4);y = v_get(4);z = v_get(4);t = v_get(4);lpt = v_get(2);gpt = v_get(3);

area = 0.0;

for(iforce=0;iforce<nforce;iforce++)for(i=0;i<4;i++)spot = (int) FDATA->me[iforce][i];x->ve[i] = FNODES->me[spot][0];y->ve[i] = FNODES->me[spot][1];z->ve[i] = FNODES->me[spot][2];t->ve[i] = FNODES->me[spot][3];df = (int) FNODES->me[spot][4];

grule(ngauss,&gp,&gw);for(ig=0;ig<ngauss;ig++)for(jg=0;jg<ngauss;jg++)lpt->ve[0] = gp->ve[ig];lpt->ve[1] = gp->ve[jg];gweight = gw->ve[ig] * gw->ve[jg];gshapes(lpt,1,&phi,&DPHI);phix = get_col(DPHI,0,VNULL);phiy = get_col(DPHI,1,VNULL);

icomp = in_prod(phix,y) *in_prod(phiy,z) -

in_prod(phiy,y) * in_prod(phix,z);jcomp = - in_prod(phix,x) *

in_prod(phiy,z) +in_prod(phiy,x) * in_prod(phix,z);

kcomp = in_prod(phix,x) *in_prod(phiy,y) -

in_prod(phiy,x) * in_prod(phix,y);detj = sqrt(icomp*icomp + jcomp*jcomp +

kcomp*kcomp);darea = gweight*detj;

gpt->ve[0] = in_prod(phi,x);gpt->ve[1] = in_prod(phi,y);gpt->ve[2] = in_prod(phi,z);trac = in_prod(phi,t);

Page 146: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

137

v_free(phi); v_free(phix); v_free(phiy);m_free(DPHI);

iscan(gpt,NODES,search_type,param,weight_type,&LIST,&index);

if(pflag == 1)printf("param %10.5f npts

%5d\n",param,(int)(index->dim));

ishp(gpt,LIST,param,order,weight_type,&phi);

M = phi->dim;

for(ii=0;ii<M;ii++)spot = (int) index->ve[ii];f->ve[3*spot+df-1] += phi->ve[ii] *

trac * darea;v_free(index); v_free(phi);

m_free(LIST);

area += darea;v_free(gw); v_free(gp);break;

printf("Force Area is %10.5e\n\n",area);

v_free(x); v_free(y); v_free(t);v_free(lpt); v_free(gpt);if(ndim == 3)v_free(z);

return(f);

File name: gauss.c* #include <stdio.h>#include <stdlib.h>#include <string.h>#include <limits.h>

#include "math.h"#include "../mes/matrix.h"#include "../mes/matrix2.h"#include "constants.h"

MAT *norset(double s1,double s2,int n,intnsub)double EPS = 3.0e-11;MAT *gauss_data;int m,j,i,isub;double z1,z,xm,xl,pp,p3,p2,p1;double x1,x2;

gauss_data = m_get(nsub*n+1,2);

/* From Numerical Recipes in C */m = (n+1)/2;

for(isub=0;isub<nsub;isub++)x1 = s1 + isub*(s2-s1)/nsub;x2 = s1 + (isub+1)*(s2-s1)/nsub;

xm = 0.5*(x2+x1);xl = 0.5*(x2-x1);

for(i=1;i<=m;i++)z = cos(PI*(i-0.25)/(n+0.5));dop1 = 1.0;p2 = 0.0;for(j=1;j<=n;j++)p3 = p2;p2 = p1;p1 = ((2.0*j-1.0)*z*p2-(j-1.0)*p3)/j;pp = n*(z*p1-p2)/(z*z-1.0);z1 = z;z = z1-p1/pp;while (fabs(z-z1) > EPS );gauss_data->me[isub*n+i][0] = xm-xl*z;gauss_data->me[isub*n+n+1-i][0] =

xm+xl*z;gauss_data->me[isub*n+i][1] =

2.0*xl/((1.0-z*z)*pp*pp);gauss_data->me[isub*n+n+1-i][1] =

gauss_data->me[isub*n+i][1];

return(gauss_data);

MAT *get_triangle_data(int triangle_order)MAT *triangle_data;

triangle_data = m_get(triangle_order,4);if ( triangle_order == 1 )/* The area coordinates */triangle_data->me[0][0] =0.33333333333333;triangle_data->me[0][1] =0.33333333333333;triangle_data->me[0][2] =0.33333333333333;

/* The weights */triangle_data->me[0][3] =1.00000000000000;else if( triangle_order == 3 )/* The area coordinates */triangle_data->me[0][0] =0.66666666666667;triangle_data->me[0][1] =0.16666666666667;triangle_data->me[0][2] =0.16666666666667;

Page 147: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

138

triangle_data->me[1][0] =0.16666666666667;triangle_data->me[1][1] =0.66666666666667;triangle_data->me[1][2] =0.16666666666667;

triangle_data->me[2][0] =0.16666666666667;triangle_data->me[2][1] =0.16666666666667;triangle_data->me[2][2] =0.66666666666667;

/* The weights */triangle_data->me[0][3] =0.33333333333333;triangle_data->me[1][3] =0.33333333333333;triangle_data->me[2][3] =0.33333333333333;else if( triangle_order == 12 )/* The area coordinates */triangle_data->me[0][0] =0.873821971016996;triangle_data->me[0][1] =0.063089014491502;triangle_data->me[0][2] =0.063089014491502;

triangle_data->me[1][0] =0.063089014491502;triangle_data->me[1][1] =0.873821971016996;triangle_data->me[1][2] =0.063089014491502;

triangle_data->me[2][0] =0.063089014491502;triangle_data->me[2][1] =0.063089014491502;triangle_data->me[2][2] =0.873821971016996;

triangle_data->me[3][0] =0.501426509658179;triangle_data->me[3][1] =0.249286745170910;triangle_data->me[3][2] =0.249286745170910;

triangle_data->me[4][0] =0.249286745170910;triangle_data->me[4][1] =0.501426509658179;triangle_data->me[4][2] =0.249286745170910;

triangle_data->me[5][0] =0.249286745170910;triangle_data->me[5][1] =0.249286745170910;triangle_data->me[5][2] =0.501426509658179;

triangle_data->me[6][0] =0.636502499121399;

triangle_data->me[6][1] =0.310352451033784;triangle_data->me[6][2] =0.053145049844817;

triangle_data->me[7][0] =0.636502499121399;triangle_data->me[7][1] =0.053145049844817;triangle_data->me[7][2] =0.310352451033784;

triangle_data->me[8][0] =0.310352451033784;triangle_data->me[8][1] =0.636502499121399;triangle_data->me[8][2] =0.053145049844817;

triangle_data->me[9][0] =0.053145049844817;triangle_data->me[9][1] =0.636502499121399;triangle_data->me[9][2] =0.310352451033784;

triangle_data->me[10][0] =0.053145049844817;triangle_data->me[10][1] =0.310352451033784;triangle_data->me[10][2] =0.636502499121399;

triangle_data->me[11][0] =0.310352451033784;triangle_data->me[11][1] =0.053145049844817;triangle_data->me[11][2] =0.636502499121399;

/* The weights */triangle_data->me[0][3] =0.050844906370207;triangle_data->me[1][3] =0.050844906370207;triangle_data->me[2][3] =0.050844906370207;

triangle_data->me[3][3] =0.116786275726379;triangle_data->me[4][3] =0.116786275726379;triangle_data->me[5][3] =0.116786275726379;

triangle_data->me[6][3] =0.082851075618374;triangle_data->me[7][3] =0.082851075618374;triangle_data->me[8][3] =0.082851075618374;triangle_data->me[9][3] =0.082851075618374;triangle_data->me[10][3] =0.082851075618374;triangle_data->me[11][3] =0.082851075618374;

else

Page 148: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

139

printf("Error in get_triangle_data\n");

return(triangle_data);

File name: grule.c* #include <stdio.h>#include <stdlib.h>#include <limits.h>

#include "math.h"#include "../mes/matrix.h"#include "../mes/matrix2.h"

void grule(int nord,VEC **gp,VEC **gw)*gp = v_get(nord);*gw = v_get(nord);

switch(nord)case 1: (*gp)->ve[0] = 0.0;(*gw)->ve[0] = 2.0;break;

case 2: (*gp)->ve[0] = -.577350269189626;(*gp)->ve[1] = .577350269189626;(*gw)->ve[0] = 1.000000000000000;(*gw)->ve[1] = 1.000000000000000;break;

case 3: (*gp)->ve[0] = -0.774596669241484;(*gw)->ve[0] = 0.555555555555555;(*gp)->ve[1] = 0.000000000000000;(*gw)->ve[1] = 0.888888888888889;(*gp)->ve[2] = 0.774596669241484;(*gw)->ve[2] = 0.555555555555555;break;

case 4: (*gp)->ve[0] = -0.861136311594053;(*gw)->ve[0] = 0.347854845137454;(*gp)->ve[1] = -0.339981043584856;(*gw)->ve[1] = 0.652145154862546;(*gp)->ve[2] = 0.339981043584856;(*gw)->ve[2] = 0.652145154862546;(*gp)->ve[3] = 0.861136311594053;(*gw)->ve[3] = 0.347854845137454;break;

case 5: (*gp)->ve[0] = -0.906179845938664;(*gw)->ve[0] = 0.236926885056189;(*gp)->ve[1] = -0.538469310105683;(*gw)->ve[1] = 0.478628670499366;(*gp)->ve[2] = 0.000000000000000;(*gw)->ve[2] = 0.568888888888889;(*gp)->ve[3] = 0.538469310105683;(*gw)->ve[3] = 0.478628670499366;(*gp)->ve[4] = 0.906179845938664;(*gw)->ve[4] = 0.236926885056189;break;

case 6: (*gp)->ve[0] = -0.932469514203152;(*gw)->ve[0] = 0.171324492379170;(*gp)->ve[1] = -0.661209386466264;(*gw)->ve[1] = 0.360761573048139;(*gp)->ve[2] = -0.238619186083197;(*gw)->ve[2] = 0.467913934572691;(*gp)->ve[3] = 0.238619186083197;

(*gw)->ve[3] = 0.467913934572691;(*gp)->ve[4] = 0.661209386466264;(*gw)->ve[4] = 0.360761573048139;(*gp)->ve[5] = 0.932469514203152;(*gw)->ve[5] = 0.171324492379170;break;

case 7: (*gp)->ve[0] = -0.949107912342759;(*gw)->ve[0] = 0.129484966168869;(*gp)->ve[1] = -0.741531185599394;(*gw)->ve[1] = 0.279705391489277;(*gp)->ve[2] = -0.405845151377397;(*gw)->ve[2] = 0.381830050505119;(*gp)->ve[3] = 0.000000000000000;(*gw)->ve[3] = 0.417959183673469;(*gp)->ve[4] = 0.405845151377397;(*gw)->ve[4] = 0.381830050505119;(*gp)->ve[5] = 0.741531185599394;(*gw)->ve[5] = 0.279705391489277;(*gp)->ve[6] = 0.949107912342759;(*gw)->ve[6] = 0.129484966168869;break;

case 8: (*gp)->ve[0] = -0.960289856497536;(*gw)->ve[0] = 0.101228536290376;(*gp)->ve[1] = -0.796666477413627;(*gw)->ve[1] = 0.222381034453374;(*gp)->ve[2] = -0.525532409916329;(*gw)->ve[2] = 0.313706645877887;(*gp)->ve[3] = -0.183434642495650;(*gw)->ve[3] = 0.362683783378362;(*gp)->ve[4] = 0.183434642495650;(*gw)->ve[4] = 0.362683783378362;(*gp)->ve[5] = 0.525532409916329;(*gw)->ve[5] = 0.313706645877887;(*gp)->ve[6] = 0.796666477413627;(*gw)->ve[6] = 0.222381034453374;(*gp)->ve[7] = 0.960289856497536;(*gw)->ve[7] = 0.101228536290376;break;

case 9: (*gp)->ve[0] = -0.968160239507626;(*gw)->ve[0] = 0.081274388361575;(*gp)->ve[1] = -0.836031107326636;(*gw)->ve[1] = 0.180648160694858;(*gp)->ve[2] = -0.613371432700590;(*gw)->ve[2] = 0.260610696402935;(*gp)->ve[3] = -0.324253423403809;(*gw)->ve[3] = 0.312347077040003;(*gp)->ve[4] = 0.000000000000000;(*gw)->ve[4] = 0.330239355001260;(*gp)->ve[5] = 0.324253423403809;(*gw)->ve[5] = 0.312347077040003;(*gp)->ve[6] = 0.613371432700590;(*gw)->ve[6] = 0.260610696402935;(*gp)->ve[7] = 0.836031107326636;(*gw)->ve[7] = 0.180648160694858;(*gp)->ve[8] = 0.968160239507626;(*gw)->ve[8] = 0.081274388361575;break;

case 10:(*gp)->ve[0] = -0.973906528517172;(*gw)->ve[0] = 0.066671344308688;(*gp)->ve[1] = -0.865063366688985;(*gw)->ve[1] = 0.149451349150581;(*gp)->ve[2] = -0.679409568299024;(*gw)->ve[2] = 0.219086362515982;(*gp)->ve[3] = -0.433395394129247;(*gw)->ve[3] = 0.269266719309996;(*gp)->ve[4] = -0.148874338981631;(*gw)->ve[4] = 0.295524224714753;

Page 149: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

140

(*gp)->ve[5] = 0.148874338981631;(*gw)->ve[5] = 0.295524224714753;(*gp)->ve[6] = 0.433395394129247;(*gw)->ve[6] = 0.269266719309996;(*gp)->ve[7] = 0.679409568299024;(*gw)->ve[7] = 0.219086362515982;(*gp)->ve[8] = 0.865063366688985;(*gw)->ve[8] = 0.149451349150581;(*gp)->ve[9] = 0.973906528517172;(*gw)->ve[9] = 0.066671344308688;break;

File name: iscan.c* #include <stdio.h>#include <stdlib.h>#include <limits.h>

#include "math.h"#include "../mes/matrix.h"#include "../mes/matrix2.h"

#include "norms.h"#include "util.h"

void iscan(VEC *xs,MAT *xx,intsearch_type,double param,int weight_type,

MAT **list,VEC **index)int nn,nd,npts,in,id;double f;VEC *xn;VEC *dum_index;MAT *dum_list;

nn = (int)(xx->m);nd = (int)(xx->n);

dum_list = m_get(nn,nd);dum_index = v_get(nn);

npts = 0;

for(in=0;in<nn;in++)xn = get_row(xx,in,VNULL);

switch(weight_type)case 1: f = l1norm(xs,xn); break;case 2: f = l2norm(xs,xn); break;case 3: f = linfnorm(xs,xn); break;

if(f < param)dum_index->ve[npts] = in;for(id=0;id<nd;id++)dum_list->me[npts][id] = xn->ve[id];npts = npts + 1;

v_free(xn);

/* Pack the return objects */

*list = m_resize(dum_list,npts,nd);*index = v_resize(dum_index,npts);

File name: isorc.c* #include <stdio.h>#include <stdlib.h>#include <limits.h>

#include "math.h"#include "../mes/matrix.h"#include "../mes/matrix2.h"

#include "norms.h"

int isorc(VEC *p1,MAT *xx)double distance;double tol = 1e-5;int value = -1;int in,nn;VEC *pn;

nn = xx->m;for(in=0;in<nn;in++)pn = get_row(xx,in,VNULL);distance = l2norm(p1,pn);v_free(pn);if(distance < tol)value = in;

/* Return From Function */return(value);

File name: master.c #include <time.h>#include "plefg.h"#include "parmes.h"#include "dd_input.h"#include "ddefg_stiff.h"#include "ddforce.h"#include "ddpost.h"#include "master_ddsolve.h"#include "post_output.h"

/*************************************//* The MASTER process *//*************************************//* Current implementation is for *//* Linear 3D with post_flag = 0 only *//*************************************/

void master( MPI_Comm comm, int *argc, char***argv )/** VARIABLES DECLARATION*/int myid; //ID number of this masterprocessint np; //Number of total processesdouble t[ 100 ]; //Intermediate time logs

Page 150: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

141

double ts, te; //Process start time andfinish timedouble cumtime = 0; //Cumulative time logtime_t tstamp; //Time stamp

int pn_len; //Length of the processornamechar master_name[MAX_NAME_LENGTH];//processor namechar in_name[MAX_NAME_LENGTH]; //Inputfile namechar out_name[MAX_NAME_LENGTH]; //Outputfile namechar log_name[MAX_NAME_LENGTH]; //Log filename

int sock_fd; //Network Socket filedescriptorFILE *f_in; //Input file descriptorFILE *f_out; //Output file descriptorFILE *f_log; //Log file descriptor

MPI_Status status;

int ncell, nforce, ndloc, nsloc;

/* EFG Parameters */MAT *CNODES; //List of cell nodesMAT *CCON; //List of cell connectivitiesMAT *DLOC; //Desired displacementlocationsMAT *DDISP; //Matrix of desireddisplacementsMAT *DSTRESS; //Matrix of desiredstressesMAT *FDATA; //Forces dataMAT *FNODES; //List of force nodesMAT *FPDATA; //Fixed plane dataMAT *MDISP;MAT *NCON;MAT *NDISP;MAT *NLOAD;MAT *NODES;MAT *PCON;MAT *PNODES;MAT *SLOC; //Desired stress locationsMAT *mvec; //List of material propertiesVEC *evec; //The EFG parametersVEC *pvec; //Problem parameters

/* Other variables */int i, j; //Generic indicesint ndim; //Number of dimensionsint nnodes; //Number of nodesMAT *K; //Global stiffness matrixVEC *f; //Global force vectorVEC *fpt; //Force vector due to nodalloadVEC *fdst; //Force vector due todistributed loadVEC *fixed_list; //Fixed degree-of-freedoms listVEC *DISP; //Global displacement vector

MPI_Get_processor_name( master_name,&pn_len );MPI_Comm_rank( comm, &myid ); //Get thisprocess IDMPI_Comm_size( comm, &np ); //Get numberof total processes

/** Turn on the dynamic memory informationsystem*/mem_info_on( TRUE );

/** INITIAL I/O ROUTINES*/

/* Get the names of the files */if ( *argc == 3 ) strcpy( in_name, (*argv)[1] );strcpy( out_name, (*argv)[2] );printf( " input file : %s\n", in_name

);printf( " output file : %s\n", out_name

); else printf( " input file : " );scanf( "%s", in_name );printf( " output file : " );scanf( "%s", out_name );

ts = MPI_Wtime();

/* Open the input and output files*/f_in = fopen( in_name, "r" );f_out = fopen( out_name, "w" );

/* Read the input file and processparallel EFG parameters */fprintf( stdout, " Reading input from'%s'\n", in_name );

dd_read_input( f_in, f_out, &evec, &pvec,&mvec, &CCON, &CNODES, &DLOC, &FDATA,

&FNODES, &FPDATA, &NCON, &NDISP,&NLOAD, &NODES, &PCON, &PNODES,

&SLOC, &ncell, &nforce, &ndloc, &nsloc);

/* Broadcast the fundamental data toworkers */MPI_Bcast_vector( &evec, myid, comm );MPI_Bcast_vector( &pvec, myid, comm );MPI_Bcast_matrix( &mvec, myid, comm );MPI_Bcast_matrix( &CCON, myid, comm );MPI_Bcast_matrix( &CNODES, myid, comm );MPI_Bcast_matrix( &DLOC, myid, comm );MPI_Bcast_matrix( &FDATA, myid, comm );MPI_Bcast_matrix( &FNODES, myid, comm );MPI_Bcast_matrix( &FPDATA, myid, comm );

fprintf( stdout, "\n %d stiffnessintegration cells\n", ncell );fprintf( stdout, " %d force integrationcells\n", nforce );fprintf( stdout, " %d desireddisplacements\n", ndloc );fprintf( stdout, " %d desiredstresses\n", nsloc );

MPI_Bcast_matrix( &NCON, myid, comm );MPI_Bcast_matrix( &NDISP, myid, comm );MPI_Bcast_matrix( &NLOAD, myid, comm );MPI_Bcast_matrix( &NODES, myid, comm );MPI_Bcast_matrix( &PCON, myid, comm );

Page 151: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

142

MPI_Bcast_matrix( &PNODES, myid, comm );MPI_Bcast_matrix( &SLOC, myid, comm );MPI_Bcast( &ncell, 1, MPI_INTEGER, myid,comm );MPI_Bcast( &nforce, 1, MPI_INTEGER, myid,comm );MPI_Bcast( &ndloc, 1, MPI_INTEGER, myid,comm );MPI_Bcast( &nsloc, 1, MPI_INTEGER, myid,comm );

/* Broadcast the queue server name */MPI_Bcast( master_name, pn_len, MPI_CHAR,myid, comm );

/* Open the log file */if ( (int) pvec->ve[8] == 0 ) f_log = stdout; else MPI_Bcast( out_name, MAX_NAME_LENGTH,

MPI_CHAR, myid, comm );sprintf( log_name, "%s_pid%d.log",

out_name, myid );f_log = fopen( log_name, "w" );

/* Connect to the queue server */sock_fd = connect_to_server( master_name,QSERV_PORT );

/** FORM GLOBAL STIFFNESS MTX*/fprintf( f_log, "\n Forming the globalstiffness matrix\n" );ndim = NODES->n; nnodes = NODES->m;K = m_get( ndim*nnodes, ndim*nnodes );t[ 0 ] = MPI_Wtime();ddefg_stiff( comm, myid, sock_fd, f_log,ncell, evec, pvec, mvec,

CCON, CNODES, NODES, &K );t[ 1 ] = MPI_Wtime();

/** FORM GLOBAL FORCE VECTOR*/t[ 2 ] = MPI_Wtime();

fprintf( f_log, "\n Forming the globalforce vector\n" );

/* The concentrated forces */fprintf( f_log, " Forming theconcentrated force vector\n" );fpt = ifnode( NLOAD, nnodes, ndim );

/* The distributed forces */fdst = v_get( ndim*nnodes );if ( nforce > 0 ) fprintf( f_log, " Forming the

distributed force vector\n" );ddforce( comm, myid, sock_fd, f_log,

nforce, evec, pvec,FDATA, FNODES, NODES, &fdst );

/* Add them up */fprintf( f_log, "\n Adding the forcevectors\n" );f = v_add( fpt, fdst, VNULL );v_free( fpt ); v_free( fdst );

/* Factor the total load vector */fprintf( f_log, " Factoring the totalforce vector\n" );sv_mlt( pvec->ve[ 4 ], f, f );

t[ 3 ] = MPI_Wtime();

/** SOLVE THE DISCRETE EQNS*/fprintf( f_log, "\n Solving the discretesystem of equations...\n" );t[ 4 ] = MPI_Wtime();DISP = master_ddsolve( comm, &status, K,f, NDISP, FPDATA, NODES, &fixed_list );t[ 5 ] = MPI_Wtime();fprintf( f_log, " The system of equationssolved.\n" );MPI_Bcast_vector( &DISP, myid, comm );output_disp( f_in, f_out, NODES, DISP, 1,1 );

/** POST-PROCESSING*/fprintf( f_log, "\n Post-processing theresults...\n" );

/* Post-process for the desireddisplacements */t[ 6 ] = MPI_Wtime();MDISP = m_get( NODES->m, 3 );for ( i = 0; i < (int) NODES->m; i++ ) for ( j = 0; j < 3; j++ ) MDISP->me[ i ][ j ] = DISP->ve[ 3*i + j

];ddpost_displ( comm, myid, sock_fd, f_log,ndloc, evec, pvec,

DLOC, MDISP, NODES, &DDISP );t[ 7 ] = MPI_Wtime();

/* Post-process for the desired stresses*/t[ 8 ] = MPI_Wtime();ddpost_stress( comm, myid, sock_fd, f_log,nsloc, evec, pvec, mvec,

DISP, NODES, SLOC, &DSTRESS );t[ 9 ] = MPI_Wtime();

/* Print the post-processed results tooutput file */post_output( f_out, DDISP, DSTRESS );

/* Disconnect from the queue server */close( sock_fd );

/** Clean up the allocated memory*/m_free( CNODES ); m_free( CCON ); m_free(DDISP ); m_free( DLOC );m_free( DSTRESS ); m_free( FDATA );m_free( FNODES ); m_free( FPDATA );m_free( K ); m_free( MDISP ); m_free(mvec ); m_free( NCON );

Page 152: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

143

m_free( NDISP ); m_free( NLOAD ); m_free(NODES ); m_free( PCON );m_free( PNODES ); m_free( SLOC );v_free( DISP ); v_free( evec ); v_free( f); v_free( fixed_list );v_free( pvec );

/** Get information on amount of memoryallocated*/fprintf( f_out, "\n" );mem_info_file( f_out, 0 );

te = MPI_Wtime();tstamp = time( NULL );

/** Finish the output file*/fprintf( f_out, "\n" );fprintf( f_out,"********************************************************************************\n");fprintf( f_out,"***** E N D O F O U T P U T F IL E *****\n" );fprintf( f_out,"********************************************************************************\n");fprintf( f_out, "\n" );fprintf( f_out," Time taken for the whole program was%14.9g seconds.\n",te - ts );fprintf( f_out," create the stiffness matrix: %14.9gseconds. [%8.3g%%]\n",t[ 1 ] - t[ 0 ], ( t[ 1 ] - t[ 0 ] ) / (

te - ts ) * 100 );cumtime = cumtime + ( t[ 1 ] - t[ 0 ] );fprintf( f_out," create the force vector: %14.9gseconds. [%8.3g%%]\n",t[ 3 ] - t[ 2 ], ( t[ 3 ] - t[ 2 ] ) / (

te - ts ) * 100 );cumtime = cumtime + ( t[ 3 ] - t[ 2 ] );fprintf( f_out," solve the system of equations: %14.9gseconds. [%8.3g%%]\n",t[ 5 ] - t[ 4 ], ( t[ 5 ] - t[ 4 ] ) / (

te - ts ) * 100 );cumtime = cumtime + ( t[ 5 ] - t[ 4 ] );fprintf( f_out," %d equations solved\n\n", ndim*nnodes );fprintf( f_out," post-process the desired displ: %14.9gseconds. [%8.3g%%]\n",t[ 7 ] - t[ 6 ], ( t[ 7 ] - t[ 6 ] ) / (

te - ts ) * 100 );cumtime = cumtime + ( t[ 7 ] - t[ 6 ] );fprintf( f_out," post-process the desired stresses:%14.9g seconds. [%8.3g%%]\n",t[ 9 ] - t[ 8 ], ( t[ 9 ] - t[ 8 ] ) / (

te - ts ) * 100 );cumtime = cumtime + ( t[ 9 ] - t[ 8 ] );fprintf( f_out,

" miscellaneous tasks: %14.9g seconds.[%8.3g%%]\n",te - ts - cumtime, ( te - ts - cumtime )

/ ( te - ts ) * 100 );fprintf( f_out, "\n" );fprintf( f_out,"\n Analyzed on '%d computing nodes' with'%s' as the master\n",np, master_name );fprintf( f_out," %s\n", ctime( &tstamp ) );fprintf( f_out,"\n********************************************************************************\n\n" );fflush( f_out );

/** Finish the logging*/fprintf( f_log, " Finished!\n" );fprintf( f_log, " Time taken for thewhole program was %8.3g seconds.\n",te - ts );fprintf( f_log, "\n" );

/** Close the opened files*/fclose( f_in );fclose( f_out );fclose( f_log );

return;

File name: master_ddsolve.c #include "plefg.h"#include "parmes.h"#include "master_parallel_gauss.h"

VEC *master_ddsolve( MPI_Comm comm,MPI_Status *status, MAT *A, VEC *b,

MAT *NDISP,MAT *FPDATA,MAT *NODES,VEC **fixed_list)int ndim, nnodes;int iplane, inode;int nod, dir, size, loc;double val, correl;double x0, y0, z0;double nx, ny, nz;double xx, yy, zz;VEC *bcman; //boundary conditionmanipulating vectorVEC *x;

nnodes = NODES->m;ndim = NODES->n;size = A->m;

*fixed_list = v_get( b->dim );

/** Process the Nodal Constraints*/if ( ( NDISP != NULL ) || ( NDISP->m != 0) )

Page 153: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

144

for ( inode = 0; inode < (int)( NDISP->m); inode++ )/** get the preliminary data of* the current nodal constraints*/dir = (int) ( NDISP->me[ inode ][ 0 ] );nod = (int) ( NDISP->me[ inode ][ 1 ] );val = NDISP->me[ inode ][ 2 ];loc = 3 * nod + dir - 1; //the

associated d.o.f. number

/** mark the associated d.o.f. as 1.0

(fixed)*/(*fixed_list)->ve[ loc ] = 1.0;

/** process the RHS force vector*/bcman = get_col( A, loc, VNULL );bcman = sv_mlt( -val, bcman, VNULL );b = v_add( b, bcman, VNULL );b->ve[ loc ] = val;

/** process the A stiffness matrix*/v_zero( bcman );set_row( A, loc, bcman );set_col( A, loc, bcman );v_free( bcman );A->me[ loc ][ loc ] = 1.00;

/** Process the Planar Nodal Constraints* (fixed planes)* prescibed in FPDATA*/if ( ( FPDATA != NULL ) || ( FPDATA->m !=0 ) )for ( iplane = 0; iplane < (int)( FPDATA-

>m ); iplane++ )dir = (int)( FPDATA->me[ iplane ][ 0 ]

);val = FPDATA->me[ iplane ][ 1 ];x0 = FPDATA->me[ iplane ][ 2 ];y0 = FPDATA->me[ iplane ][ 3 ];z0 = FPDATA->me[ iplane ][ 4 ];nx = FPDATA->me[ iplane ][ 5 ];ny = FPDATA->me[ iplane ][ 6 ];nz = FPDATA->me[ iplane ][ 7 ];

for ( inode = 0; inode < nnodes; inode++)

xx = NODES->me[ inode ][ 0 ];yy = NODES->me[ inode ][ 1 ];zz = ( ndim == 3 ) ? NODES->me[ inode ][

2 ] : 0.0;

correl = nx*(xx-x0) + ny*(yy-y0) +nz*(zz-z0);

/** If 'correl' is within the tolerance

(1.0e-5),* process the RHS force and A matrix.*/if( fabs( correl ) < 1.0e-5 )loc = ndim*inode + dir - 1; //the

associated d.o.f. number

/** mark the associated d.o.f. as 1.0

(fixed)*/(*fixed_list)->ve[ loc ] = 1.0;

/** process the RHS force vector*/bcman = get_col( A, loc, VNULL );bcman = sv_mlt( -val, bcman, VNULL );b = v_add( b, bcman, VNULL );b->ve[ loc ] = val;

/** process the A stiffness matrix*/v_zero( bcman );set_row( A, loc, bcman );set_col( A, loc, bcman );v_free( bcman );A->me[ loc ][ loc ] = 1.00;

x = master_parallel_gauss( comm, status,A, b );

return( x );

File name: master_parallel_gauss.c #include "parmes.h"#include "parallel_gauss.h"

VEC *master_parallel_gauss( MPI_Comm comm,MPI_Status *status, MAT *Amat, VEC *b )int myid, np;int i, j, num_eqn, row, row_count;MAT *package;MAT *A, *a;VEC *x, *xlocal, *temp_vec;

MPI_Comm_size( comm, &np );MPI_Comm_rank( comm, &myid );

/* Store the global package in Matrix A */A = m_copy( Amat, MNULL );num_eqn = A->m;A = m_resize( A, num_eqn, num_eqn+1 );set_col( A, num_eqn, b );

/* Distribute the packages to the workers*/

Page 154: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

145

for ( i = 0; i < np; i++ )row_count = 0;row = row_count*np + i;package = m_get( row_count, num_eqn+1 );while ( row < num_eqn ) temp_vec = get_row( A, row, VNULL );m_resize( package, row_count+1,

num_eqn+1 );set_row( package, row_count, temp_vec );v_free( temp_vec );row_count++;row = row_count*np + i;if ( i == myid ) a = m_copy( package, MNULL ); else MPI_Send_matrix( &package, i, myid, comm

);m_free( package );

/* Perform Gaussian Elimination */xlocal = parallel_gauss( comm, status, a);

/* Collect the results */x = v_get( num_eqn );for ( i = 0; i < np; i++ )if ( i == myid ) for ( j = 0; j < (int) xlocal->dim; j++

) x->ve[ j*np + i ] = xlocal->ve[j]; else MPI_Recv_vector( &temp_vec, i, i, comm,

status );for ( j = 0; j < (int) temp_vec->dim;

j++ ) x->ve[ j*np + i ] = temp_vec->ve[j];v_free( temp_vec );

m_free( A ); m_free( a ); v_free( xlocal);

return x;

File name: material.c* #include <stdio.h>#include <stdlib.h>#include <limits.h>

#include "math.h"#include "../mes/matrix.h"#include "../mes/matrix2.h"

#include "material.h"

MAT *hooke(int pcode,double E,double nu)MAT *D;

switch(pcode)case 1:D = m_get(3,3);D->me[0][0] = 1.0;D->me[0][1] = nu;D->me[1][0] = nu;D->me[1][1] = 1.0;D->me[2][2] = (1.0-nu)/2.0;sm_mlt(E/(1.0-nu*nu),D,D);break;

case 2:D = m_get(4,4);D->me[0][0] = 1.0-nu;D->me[1][1] = 1.0-nu;D->me[2][2] = 1.0-nu;D->me[0][1] = nu;D->me[0][2] = nu;D->me[1][0] = nu;D->me[1][2] = nu;D->me[2][0] = nu;D->me[2][1] = nu;D->me[3][3] = (1.0-2.0*nu)/2.0;sm_mlt(E/(1.0+nu)/(1.0-2.0*nu),D,D);break;

case 3:D = m_get(4,4);D->me[0][0] = 1.0-nu;D->me[1][1] = 1.0-nu;D->me[2][2] = 1.0-nu;D->me[0][1] = nu;D->me[0][2] = nu;D->me[1][0] = nu;D->me[1][2] = nu;D->me[2][0] = nu;D->me[2][1] = nu;D->me[3][3] = (1.0-2.0*nu)/2.0;sm_mlt(E/(1.0+nu)/(1.0-2.0*nu),D,D);break;

case 4:D = m_get(6,6);D->me[0][0] = 1.0-nu;D->me[0][1] = nu;D->me[0][2] = nu;D->me[1][0] = nu;D->me[1][1] = 1.0-nu;D->me[1][2] = nu;D->me[2][0] = nu;D->me[2][1] = nu;D->me[2][2] = 1.0-nu;D->me[3][3] = (1.0-2.0*nu)/2.0;D->me[4][4] = (1.0-2.0*nu)/2.0;D->me[5][5] = (1.0-2.0*nu)/2.0;sm_mlt(E/(1.0+nu)/(1.0-2.0*nu),D,D);break;

return(D);

double iso(double sy,double k,double alpha)return(sy+k*alpha);

double diso(double sy,double k,doublealpha)

Page 155: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

146

return(k);

double kin(double alpha)return(0.0);

double dkin(double alpha)return(0.0);

double vnorm(VEC *vin)double norm;norm = sqrt(

pow(vin->ve[0],2) + pow(vin->ve[1],2) +pow(vin->ve[2],2) +

2.0*(pow(vin->ve[3],2) + pow(vin->ve[4],2) + pow(vin->ve[5],2))

);

return(norm);

void int_con(VEC **plastic_strain,VEC**back_stress,

double *alpha,VEC **stress,VEC *strain,double *theta,double *theta_bar,VEC

**flow_direction,VEC *PCPS,double pca,double K,double ys,double ym,double nu)

/********************************************************************//* **********WARNING BACK STRESS IS NOTPREVIOUSLY CONVERGED ****** */

/********************************************************************/

int it_count;doublekappa,mu,str_pr,n_trial_norm,f_trial,g,Dg,gamma,next_alpha;VEC*s_dev,*n_trial,*vdum,*el_strain,*s_trial;

/* Calculate the shear modulus */mu = ym/2.0/(1.0+nu);kappa = ym/3.0/(1.0-2.0*nu);

/* Compute the total strain pressure */str_pr = (strain->ve[0] + strain->ve[1] +strain->ve[2])/3.0;

/* Compute the deviatoric strain */s_dev = v_get(6);s_dev->ve[0] = strain->ve[0] - str_pr;s_dev->ve[1] = strain->ve[1] - str_pr;s_dev->ve[2] = strain->ve[2] - str_pr;s_dev->ve[3] = strain->ve[3];s_dev->ve[4] = strain->ve[4];s_dev->ve[5] = strain->ve[5];

/* Compute the trial stress */el_strain = v_sub(s_dev,PCPS,VNULL);/* s_trial =sv_mlt(2.0*mu,el_strain,VNULL); */

s_trial = v_get(6);s_trial->ve[0] = 2.0*mu*el_strain->ve[0];s_trial->ve[1] = 2.0*mu*el_strain->ve[1];s_trial->ve[2] = 2.0*mu*el_strain->ve[2];s_trial->ve[3] = mu*el_strain->ve[3];s_trial->ve[4] = mu*el_strain->ve[4];s_trial->ve[5] = mu*el_strain->ve[5];

/* Compute the relative stress */n_trial =v_sub(s_trial,*back_stress,VNULL);

/* Compute the norm of n_trial */n_trial_norm = vnorm(n_trial);

/* Calculate the yield criterion */f_trial = n_trial_norm -sqrt(2.0/3.0)*iso(ys,K,pca);

/* Check the yield condition */if(f_trial <= 0.0)*theta = 1.0;*theta_bar = 0.0;*flow_direction = v_get(6);(*stress)->ve[0] = s_trial->ve[0] +kappa*3.0*str_pr;(*stress)->ve[1] = s_trial->ve[1] +kappa*3.0*str_pr;(*stress)->ve[2] = s_trial->ve[2] +kappa*3.0*str_pr;(*stress)->ve[3] = s_trial->ve[3];(*stress)->ve[4] = s_trial->ve[4];(*stress)->ve[5] = s_trial->ve[5];else/* Solve for the consistency parameter,gamma */g = 1.0; gamma = 0.0; next_alpha = pca;

/* Solve for the consistency parameter */it_count = 0;while((fabs(g)>1e-10) && (it_count < 26))g = -sqrt(2.0/3.0)*iso(ys,K,next_alpha)+n_trial_norm-2.0*mu*gamma-sqrt(2.0/3.0)*(kin(next_alpha)-

kin(pca));Dg = -2.0*mu*(1.0+(diso(ys,K,next_alpha)

+dkin(next_alpha))/3.0/mu);

gamma = gamma - g/Dg;next_alpha = next_alpha +

sqrt(2.0/3.0)*gamma;it_count ++;

/* Update Alpha */(*alpha) = next_alpha;

/* Get the flow direction */*flow_direction =sv_mlt(1.0/n_trial_norm,n_trial,VNULL);

/* Update the plastic_strain */vdum =sv_mlt(gamma,*flow_direction,VNULL);

Page 156: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

147

v_add(vdum,*plastic_strain,*plastic_strain);v_free(vdum);

/* Update the stress */vdum = sv_mlt(-2.0*mu*gamma,*flow_direction,VNULL);v_add(s_trial,vdum,*stress); v_free(vdum);(*stress)->ve[0] += kappa*3.0*str_pr;(*stress)->ve[1] += kappa*3.0*str_pr;(*stress)->ve[2] += kappa*3.0*str_pr;

/* Update the consistent ep tangent moduliparameters */*theta = 1.0 - 2.0*mu*gamma/n_trial_norm;/* *theta_bar =1.0/(1.0+(diso(ys,K,*alpha)+dkin(*alpha))/3.0/mu)

- (1.0-(*theta)); */*theta_bar =1.0/(1.0+(diso(ys,K,pca)+dkin(*alpha))/3.0/mu)

- (1.0-(*theta));

/* Clean up some intermediate variable */v_free(el_strain); v_free(s_dev);v_free(s_trial); v_free(n_trial);

MAT *get_con_tan(VEC *gpt,MAT *mvec,VEC*pvec,VEC *n,

double theta,double theta_bar)int i,j,matnum;double ym,nu,kappa,mu;VEC *iv;MAT *IM,*IVO,*NO,*MAT1,*MAT2,*SM,*CTAN;

matnum = get_mat_num(gpt,(int)pvec->ve[7]);ym = mvec->me[matnum][0];nu = mvec->me[matnum][1];

kappa = ym/3.0/(1.0-2.0*nu);mu = ym/2.0/(1.0+nu);

/* Build an Identity Matrix */IM = m_get(6,6);for(i=0;i<6;i++)IM->me[i][i] = 1.0;

/* Build the Identity Vector */iv = v_get(6);for(i=0;i<3;i++)iv->ve[i] = 1.0;

/* Build the Final Factor Matrix */SM = m_get(6,6);SM->me[0][0] = 1.0;SM->me[1][1] = 1.0;SM->me[2][2] = 1.0;SM->me[3][3] = 0.5;SM->me[4][4] = 0.5;SM->me[5][5] = 0.5;

/* Build the Outer Product Matrices of'iv' and 'n' */IVO = m_get(6,6);

NO = m_get(6,6);

for(i=0;i<6;i++)for(j=0;j<6;j++)IVO->me[i][j] = (iv->ve[i])*(iv->ve[j]);NO->me[i][j] = (n->ve[i])*(n->ve[j]);

/* Build the Final Matrix */MAT1 = sm_mlt(kappa-2.0*mu*theta/3.0,IVO,MNULL);MAT2 = sm_mlt(2.0*mu*theta,IM,MNULL);

MAT2->me[3][3] = 0.5*MAT2->me[3][3];MAT2->me[4][4] = 0.5*MAT2->me[4][4];MAT2->me[5][5] = 0.5*MAT2->me[5][5];

CTAN = sm_mlt(-2.0*mu*theta_bar,NO,MNULL);m_add(CTAN,MAT1,CTAN);m_add(CTAN,MAT2,CTAN);

/* Free up the variables */m_free(MAT1); m_free(MAT2); m_free(SM);m_free(IVO); m_free(NO);m_free(IM); v_free(iv);

return(CTAN);

int get_mat_num(VEC *gpt,int flag)int matnum;double x,y,z,f,r;

x = gpt->ve[0];y = gpt->ve[1];z = gpt->ve[2];

switch(flag)case 0:matnum = 0;break;case 1:r = 2.0;f = pow(x-0.2*z+1.0,2) + pow(y-

0.2*z+1.0,2) - pow(r,2);if(f>0.0) /* Bone */matnum = 0;else /* Steel */matnum = 1;break;case 2:r = 0.5 + z/5.0;f = pow(x-0.5*z+3.5,2) + pow(y-

0.5*z+3.5,2) - pow(r,2);if(f>0.0) /* Bone */matnum = 0;else /* Steel */matnum = 1;break;

/* printf("matnum is %d\n",matnum); */return(matnum);

Page 157: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

148

File name: mpi_mes.c #include "parmes.h"#define TOLERANCE 1.00e-12

/****************************************************************************//*************** INTERNAL USE ONLY*******************//****************************************************************************/

/** Plain Meschach Vector Transfer*/extern int MPI_Send_vector( VEC **x, intdest, int tag, MPI_Comm comm );extern int MPI_Recv_vector( VEC **x, intsource, int tag, MPI_Comm comm,

MPI_Status *status );extern int MPI_Bcast_vector( VEC **x, introot, MPI_Comm comm );

/** Plain Meschach Matrix Transfer*/extern int MPI_Send_mes_matrix( MAT **A,int dest, int tag, MPI_Comm comm );extern int MPI_Recv_mes_matrix( MAT **A,int source, int tag, MPI_Comm comm,

MPI_Status *status );

/** Non-symmetric CRS transfer*/extern int m_get_nnz( MAT *A );extern int m_create_crs( MAT *A, int **rp,int **cval, double **aval );extern int MPI_Send_crs_matrix( MAT **A,int dest, int tag, MPI_Comm comm );extern int MPI_Recv_crs_matrix( MAT **A,int source, int tag, MPI_Comm comm,

MPI_Status *status );extern int MPI_Bcast_crs_matrix( MAT **A,int root, MPI_Comm comm );extern int MPI_Gather_crs_matrix( MAT **A,int root, MPI_Comm comm );

/** Symmetric CRS transfer* Transfer & Receive only LOWER triangularmatrix* The UPPER triangular matrix must beCREATED by USER*/extern int m_get_sym_nnz( MAT *A, double*max );extern double m_get_max_elmt( MAT *A );extern int m_create_sym_crs( MAT *A, int**rp, int **cval, double **aval );extern int MPI_Send_sym_crs_matrix( MAT**A, int dest, int tag, MPI_Comm comm );extern int MPI_Recv_sym_crs_matrix( MAT**A, int source, int tag, MPI_Comm comm,

MPI_Status *status );

/****************************************************************************//*************** EXTERNAL ACCESS*******************//****************************************************************************/

int MPI_Send_vector( VEC **x, int dest, inttag, MPI_Comm comm )unsigned int vec_size;

vec_size = (*x)->dim;MPI_Send( &vec_size, 1, MPI_UNSIGNED,dest, tag, comm );MPI_Send( (*x)->ve, vec_size, MPI_DOUBLE,dest, tag + 1, comm );

return 0;

int MPI_Recv_vector( VEC **x, int source,int tag, MPI_Comm comm,

MPI_Status *status )unsigned int vec_size;

MPI_Recv( &vec_size, 1, MPI_UNSIGNED,source, tag, comm, status );*x = v_get( vec_size );MPI_Recv( (*x)->ve, vec_size, MPI_DOUBLE,source, tag + 1, comm, status );

return 0;

int MPI_Bcast_vector( VEC **x, int root,MPI_Comm comm )int myid;unsigned int vec_size;

MPI_Comm_rank( comm, &myid );

if ( myid == root ) vec_size = (*x)->dim;MPI_Bcast( &vec_size, 1, MPI_UNSIGNED,root, comm );if ( myid != root ) *x = v_get( vec_size);MPI_Bcast( (*x)->ve, vec_size, MPI_DOUBLE,root, comm );

return 0;

int MPI_Gather_vector( VEC **x, int root,MPI_Comm comm )int myid, np;int i;VEC **buffer;MPI_Status status;

MPI_Comm_rank( comm, &myid );if ( myid == root ) /* get the number of processes */MPI_Comm_size( comm, &np );

Page 158: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

149

/* allocate an array of pointers for VECbuffers */buffer = (VEC **) malloc( (np-

1)*sizeof(VEC *) );/* initialize the buffers */for ( i = 0; i < np-1; i++ ) buffer[i] =

v_get( (*x)->dim );/* gather the vectors from the other

processes */for ( i = 1; i < np; i++ )MPI_Recv_vector( &buffer[i-1], i, i,

comm, &status );/* add the vectors to the root's vector

*/for ( i = 1; i < np; i++ ) *x = v_add( *x, buffer[i-1], VNULL );v_free( buffer[i-1] );/* free the allocated array of pointers

*/free( buffer ); else MPI_Send_vector( x, root, myid, comm );

return 0;

int MPI_Send_matrix( MAT **A, int dest, inttag, MPI_Comm comm )/* Uncompressed Transmission */// return MPI_Send_mes_matrix( A, dest,tag, comm );

/* Compressed Transmission */return MPI_Send_crs_matrix( A, dest, tag,comm );

int MPI_Recv_matrix( MAT **A, int source,int tag, MPI_Comm comm,

MPI_Status *status )/* Uncompressed Transmission */// return MPI_Recv_mes_matrix( A, source,tag, comm, status );

/* Compressed Transmission */return MPI_Recv_crs_matrix( A, source,tag, comm, status );

int MPI_Bcast_matrix( MAT **A, int root,MPI_Comm comm )/* Compressed Transmission */return MPI_Bcast_crs_matrix( A, root, comm);

int MPI_Gather_matrix( MAT **A, int root,MPI_Comm comm )/* Compressed Transmission */return MPI_Gather_crs_matrix( A, root,comm );

int MPI_Send_sym_matrix( MAT **A, int dest,int tag, MPI_Comm comm )return MPI_Send_sym_crs_matrix( A, dest,tag, comm );

int MPI_Recv_sym_matrix( MAT **A, intsource, int tag, MPI_Comm comm,

MPI_Status *status )return MPI_Recv_sym_crs_matrix( A, source,tag, comm, status );

int MPI_Gather_sym_matrix( MAT **A, introot, MPI_Comm comm )int i, j;

MPI_Gather_crs_matrix( A, root, comm );for ( i = 0; i < (int)(*A)->m; i++ ) for ( j = 0; j <= i; j++ ) (*A)->me[ j ][ i ] = (*A)->me[ i ][ j ];

return 0;

/****************************************************************************//*************** SIMPLE MESCHACH TRANSFER*******************//****************************************************************************/

int MPI_Send_mes_matrix( MAT **A, int dest,int tag, MPI_Comm comm )unsigned int i;unsigned int num_rows, num_cols;

num_rows = (*A)->m;num_cols = (*A)->n;

MPI_Send( &num_rows, 1, MPI_UNSIGNED,dest, tag++, comm );MPI_Send( &num_cols, 1, MPI_UNSIGNED,dest, tag++, comm );

for ( i = 0; i < num_rows; i++ ) MPI_Send( (*A)->me[ i ], num_cols,

MPI_DOUBLE, dest, tag++, comm );

return 0;

int MPI_Recv_mes_matrix( MAT **A, intsource, int tag, MPI_Comm comm,

MPI_Status *status )unsigned int i;

Page 159: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

150

unsigned int num_rows, num_cols;

MPI_Recv( &num_rows, 1, MPI_UNSIGNED,source, tag++, comm, status );MPI_Recv( &num_cols, 1, MPI_UNSIGNED,source, tag++, comm, status );

(*A) = m_get( num_rows, num_cols );

for ( i = 0; i < num_rows; i++ ) MPI_Recv( (*A)->me[ i ], num_cols,

MPI_DOUBLE, source, tag++,comm, status );

return 0;

/****************************************************************************//*************** NON-SYMMETRIC CRSTRANSFER *******************//****************************************************************************/

int m_get_nnz( MAT *A )

/* ==========================* Get the number non-zeros* ==========================*/

int i, j, count = 0;

for ( i = 0; i < (int) A->m; i++ ) for ( j = 0; j < (int) A->n; j++ ) if ( A->me[ i ][ j ] != 0.00 ) count++;

return count;

int m_create_crs( MAT *A, int **rp, int**cval, double **aval )

/*=================================================================* Converts a meschach matrix* into the CRS (Compressed Row Storageformat*=================================================================* *rp = address of the first element inthe caller's rp vector* *cval = address of the first element inthe caller's cval vector* *aval = address of the first element inthe caller's aval vector*=================================================================*/

/** Variable declaration* --------------------* nnz = number of non-zeros*/

int i, j, nnz;int count = 0;

/** Create the row pointer, the columnpointer and the elements vector* ------------------------------------------------------------------*/

/* calculate the number of total non-zeros*/nnz = m_get_nnz( A );

/* allocate the memory */*rp = (int *) malloc( ( A->m + 1 ) *sizeof( int ) );*cval = (int *) malloc( nnz * sizeof( int) );*aval = (double *) malloc( nnz * sizeof(double ) );

/* assign the values */for ( i = 0; i < (int) A->m; i++ ) (*rp)[ i ] = count;for ( j = 0; j < (int) A->n; j++ ) if ( A->me[ i ][ j ] != 0.00 ) (*cval)[ count ] = j;(*aval)[ count ] = A->me[ i ][ j ];count++;(*rp)[ A->m ] = count++;

return nnz;

int MPI_Send_crs_matrix( MAT **A, int dest,int tag, MPI_Comm comm )int param[ 3 ];int *rp, *cval;double *aval;

param[ 0 ] = (int) (*A)->m; //number ofrowsparam[ 1 ] = (int) (*A)->n; //number ofcolsparam[ 2 ] = m_create_crs( *A, &rp, &cval,&aval ); //number of non-zeros

MPI_Send( param, 3, MPI_INTEGER, dest,tag++, comm );MPI_Send( rp, param[0] + 1, MPI_INTEGER,dest, tag++, comm );MPI_Send( cval, param[ 2 ], MPI_INTEGER,dest, tag++, comm );MPI_Send( aval, param[ 2 ], MPI_DOUBLE,dest, tag++, comm );

free( rp );free( cval );

Page 160: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

151

free( aval );

return 0;

int MPI_Recv_crs_matrix( MAT **A, intsource, int tag, MPI_Comm comm,

MPI_Status *status )int j, count;int row, col;int param[ 3 ];int *rp, *cval;double *aval;double val;

/** param[ 0 ] = number of rows* param[ 1 ] = number of cols* param[ 2 ] = number of non-zeros*/

MPI_Recv( param, 3, MPI_INTEGER, source,tag++, comm, status );rp = (int *) malloc( ( param[ 0 ] + 1 ) *sizeof( int ) );cval = (int *) malloc( param[ 2 ] *sizeof( int ) );aval = (double *) malloc( param[ 2 ] *sizeof( double ) );MPI_Recv( rp, param[ 0 ] + 1, MPI_INTEGER,source, tag++, comm, status );MPI_Recv( cval, param[ 2 ], MPI_INTEGER,source, tag++, comm, status );MPI_Recv( aval, param[ 2 ], MPI_DOUBLE,source, tag++, comm, status );(*A) = m_get( param[ 0 ], param[ 1 ] );

count = 0;for ( row = 0; row < param[ 0 ]; row++ ) for ( j = 0; j < rp[ row+1 ] - rp[ row ];

j++ ) col = cval[ count ];val = aval[ count ];(*A)->me[ row ][ col ] = val;count++;

free( rp );free( cval );free( aval );

return 0;

int MPI_Bcast_crs_matrix( MAT **A, introot, MPI_Comm comm )int myid;int j, count;int row, col;int param[ 3 ];int *rp, *cval;double *aval;double val;

/** param[ 0 ] = number of rows

* param[ 1 ] = number of cols* param[ 2 ] = number of non-zeros*/

MPI_Comm_rank( comm, &myid );

if ( myid == root ) /** if I am the root, convert the matrix to

CRS*/param[ 0 ] = (int) (*A)->m; //number of

rowsparam[ 1 ] = (int) (*A)->n; //number of

colsparam[ 2 ] = m_create_crs( *A, &rp,

&cval, &aval ); //number of non-zeros

MPI_Bcast( param, 3, MPI_INTEGER, root,comm );

if ( myid != root ) /** if I am not the root, allocate the

memory for the CRS storage*/rp = (int *) malloc( ( param[ 0 ] + 1 ) *

sizeof( int ) );cval = (int *) malloc( param[ 2 ] *

sizeof( int ) );aval = (double *) malloc( param[ 2 ] *

sizeof( double ) );

MPI_Bcast( rp, param[0] + 1, MPI_INTEGER,root, comm );MPI_Bcast( cval, param[ 2 ], MPI_INTEGER,root, comm );MPI_Bcast( aval, param[ 2 ], MPI_DOUBLE,root, comm );

if ( myid != root ) /** if I am not the root, convert the

matrix back to Meschach format.*/(*A) = m_get( param[ 0 ], param[ 1 ] );

count = 0;for ( row = 0; row < param[ 0 ]; row++ )

for ( j = 0; j < rp[ row+1 ] - rp[ row

]; j++ ) col = cval[ count ];val = aval[ count ];(*A)->me[ row ][ col ] = val;count++;

free( rp ); free( cval ); free( aval );

return 0;

int MPI_Gather_crs_matrix( MAT **A, introot, MPI_Comm comm )

Page 161: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

152

int myid, np;int i;MAT **buffer;MPI_Status status;

MPI_Comm_rank( comm, &myid );

if ( myid == root ) /* get ther number of processes */MPI_Comm_size( comm, &np );/* allocate an array of pointers for MAT

buffers */buffer = (MAT **) malloc( (np-

1)*sizeof(MAT *) );/* initialize the buffers */for ( i = 0; i < np-1; i++ ) buffer[i] =

m_get( (*A)->m, (*A)->n );/* gather the matrices from the other

processes */for ( i = 1; i < np; i++ )MPI_Recv_crs_matrix( &buffer[i-1], i, i,

comm, &status );/* add the matrices to the root's matrix

*/for ( i = 1; i < np; i++ ) *A = m_add( *A, buffer[i-1], MNULL );m_free( buffer[i-1] );/* free the allocated array of pointers

*/free( buffer ); else MPI_Send_crs_matrix( A, root, myid, comm

);

return 0;

/****************************************************************************//*************** SYMMETRIC CRS TRANSFER*******************//****************************************************************************/

int m_get_sym_nnz( MAT *A, double *max )

/*===============================================* Get the number non-zeros in a symmetricmatrix*===============================================*/

int i, j, count = 0;

*max = m_get_max_elmt( A );for ( i = 0; i < (int) A->m; i++ ) for ( j = 0; j <= i; j++ ) if ( ( fabs( A->me[i][j] ) / *max ) >

TOLERANCE ) count++;

return count;

double m_get_max_elmt( MAT *A )int i, j;double max;

max = fabs( A->me[0][0] );for ( i = 0; i < (int) A->m; i++ ) for ( j = 0; j <= i; j++ ) if ( max < fabs( A->me[i][j] ) ) max = fabs( A->me[i][j] );

return max;

int m_create_sym_crs( MAT *A, int **rp, int**cval, double **aval )

/*=================================================================* Converts a meschach matrix* into the CRS (Compressed Row Storageformat*=================================================================* *rp = address of the first element inthe caller's rp vector* *cval = address of the first element inthe caller's cval vector* *aval = address of the first element inthe caller's aval vector*=================================================================*/

/** Variable declaration* --------------------* nnz = number of non-zeros*/

int i, j, nnz;int count = 0;double max;

/** Create the row pointer, the columnpointer and the elements vector* ------------------------------------------------------------------*/

/* calculate the number of total non-zerosin the symmetric matrix */nnz = m_get_sym_nnz( A, &max );

/* allocate the memory */*rp = (int *) malloc( ( A->m + 1 ) *sizeof( int ) );

Page 162: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

153

*cval = (int *) malloc( nnz * sizeof( int) );*aval = (double *) malloc( nnz * sizeof(double ) );

/* assign the values */for ( i = 0; i < (int) A->m; i++ ) (*rp)[i] = count;for ( j = 0; j <= i; j++ ) if ( ( fabs( A->me[i][j] ) / max ) >

TOLERANCE ) (*cval)[count] = j;(*aval)[count] = A->me[i][j];count++;(*rp)[ A->m ] = count++;

return nnz;

int MPI_Send_sym_crs_matrix( MAT **A, intdest, int tag, MPI_Comm comm )int param[3];int *rp, *cval;double *aval;

param[0] = (int) (*A)->m; //number of rowsparam[1] = (int) (*A)->n; //number of colsparam[2] = m_create_sym_crs( *A, &rp,&cval, &aval ); //number of non-zeros

MPI_Send( param, 3, MPI_INTEGER, dest,tag++, comm );MPI_Send( rp, param[0]+1, MPI_INTEGER,dest, tag++, comm );MPI_Send( cval, param[2], MPI_INTEGER,dest, tag++, comm );MPI_Send( aval, param[2], MPI_DOUBLE,dest, tag++, comm );

free( rp );free( cval );free( aval );

return 0;

int MPI_Recv_sym_crs_matrix( MAT **A, intsource, int tag, MPI_Comm comm,

MPI_Status *status )int j, count;int row, col;int param[3];int *rp, *cval;double *aval;double val;

/** param[0] = number of rows* param[1] = number of cols* param[2] = number of non-zeros*/

MPI_Recv( param, 3, MPI_INTEGER, source,tag++, comm, status );

rp = (int *) malloc( ( param[0]+1 ) *sizeof( int ) );cval = (int *) malloc( param[2] * sizeof(int ) );aval = (double *) malloc( param[2] *sizeof( double ) );MPI_Recv( rp, param[0]+1, MPI_INTEGER,source, tag++, comm, status );MPI_Recv( cval, param[2], MPI_INTEGER,source, tag++, comm, status );MPI_Recv( aval, param[2], MPI_DOUBLE,source, tag++, comm, status );(*A) = m_get( param[0], param[1] );

count = 0;for ( row = 0; row < param[ 0 ]; row++ ) for ( j = 0; j < rp[ row+1 ] - rp[ row ];

j++ ) col = cval[ count ];val = aval[ count ];(*A)->me[ row ][ col ] = val;

/** Un-remark the line below to create* the full (UPPER+LOWER triangular)

matrix*///(*A)->me[ col ][ row ] = val;

count++;

free( rp );free( cval );free( aval );

return 0;

File name: norms.c* #include <stdio.h>#include <stdlib.h>#include <limits.h>

#include "math.h"#include "../mes/matrix.h"#include "../mes/matrix2.h"

double l1norm(VEC *p1,VEC *p2)VEC *diff;double d;

diff = v_sub(p1,p2,VNULL);d = v_norm1(diff);

/* Deallocate the Matrices */v_free(diff);

/* Return From Function */return(d);

double l2norm(VEC *p1,VEC *p2)VEC *diff;double d;

Page 163: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

154

diff = v_sub(p1,p2,VNULL);d = v_norm2(diff);

/* Deallocate the Matrices */v_free(diff);

/* Return From Function */return(d);

double distance(VEC *p1,VEC *p2)VEC *diff;double d;

diff = v_sub(p1,p2,VNULL);d = v_norm2(diff);

/* Deallocate the Matrices */v_free(diff);

/* Return From Function */return(d);

double linfnorm(VEC *p1,VEC *p2)VEC *diff;double d;

diff = v_sub(p1,p2,VNULL);d = v_norm_inf(diff);

/* Deallocate the Matrices */v_free(diff);

/* Return From Function */return(d);

File name: output.c* #include <stdio.h>#include <stdlib.h>#include <string.h>#include <limits.h>

#include "math.h"#include "../mes/matrix.h"#include "../mes/matrix2.h"#include "constants.h"#include "util.h"

int output_disp(FILE *fin,FILE *fout,MAT*NODES,VEC *DISP,

int i_load,int itcount)/* Loop indices and counters */int i;

fprintf(fout,"\n *** NODAL DISPLACEMENTS%d,%d ***\n",i_load,itcount);fprintf(fout,"%4s %15s %15s%15s\n","NODE","UX","UY","UZ");for(i=0;i<(int)(NODES->m);i++)fprintf(fout,"%4d %15.5e %15.5e%15.5e\n",i,

DISP->ve[i*3+0],DISP->ve[i*3+1],

DISP->ve[i*3+2]);

/* Successfully exit this function */return(1);

int output(FILE *fin,FILE *fout,VEC*pvec,VEC *evec,MAT *mvec,

MAT *NODES,VEC *DISP,MAT *PTS,MAT*STRESSES,MAT *STRAINS,

MAT *PLASTIC_STRAIN,MAT *BACK_STRESS,VEC*ALPHA)/* Successfully exit this function */return(1);

File name: parallel_gauss.c #include "parmes.h"

#define UNMARKED -1#define DONE_NOTHING 0#define DONE_ELIM 1#define DONE_SUBST 2

VEC *parallel_gauss( MPI_Comm comm,MPI_Status *status, MAT *a )double *c, r;int myid, np;int i, j, k, n;int count, len, mark, nrow;int *gr, *done;VEC *soln;

MPI_Comm_rank( comm, &myid );MPI_Comm_size( comm, &np );

n = a->n - 1;nrow = a->m;done = (int *) malloc( nrow * sizeof( int) );gr = (int *) malloc( nrow * sizeof( int ));

/* Status Identifiers Initialization */for ( i = 0; i < nrow; i++ ) done[i] = DONE_NOTHING;gr[i] = i*np + myid;

/* Forward Elimination Step */for ( k = 0; k < n; k++ )

/* Allocate the memory for the c vector*/len = n-k+1;c = (double *) malloc( len * sizeof(

double ) );

/* Detect the division step */mark = UNMARKED;for ( i = 0; i < nrow; i++ ) if ( gr[i] == k ) mark = i;

Page 164: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

155

/* Perform the division step */if ( mark > UNMARKED )

/* divide the elements */for ( j = k+1; j < n+1; j++ ) a->me[mark][j] = a->me[mark][j] / a-

>me[mark][k];a->me[mark][k] = 1.00;

/* mark this row as done */done[mark] = DONE_ELIM;

/* assign this row to c vector */for ( j = 0; j < len; j++ ) c[j] = a->me[mark][j+k];

/* Synchronize the c vector */MPI_Bcast( c, len, MPI_DOUBLE, k % np,

comm );

/* Perform the elimination step */for ( i = 0; i < nrow; i++ ) if ( done[i] != DONE_ELIM ) count = 1;for ( j = k+1; j < n+1; j++ ) a->me[i][j] = a->me[i][j]-a-

>me[i][k]*c[count++];a->me[i][k] = 0.00;

/* Free the c vector */free( c );

/* Backward Substitution Step */soln = get_col( a, n, VNULL );for ( k = n-1; k >= 0; k-- )

/* Detect the finalization step */mark = UNMARKED;for ( i = 0; i < nrow; i++ ) if ( gr[i] == k ) mark = i;

/* Perform the finalization step */if ( mark > UNMARKED )

/* mark this row as done */done[mark] = DONE_SUBST;

/* assign this value to r */r = soln->ve[mark];

/* Synchronize r */MPI_Bcast( &r, 1, MPI_DOUBLE, k % np,

comm );

/* Perform the substitution step */for ( i = 0; i < nrow; i++ ) if ( done[i] != DONE_SUBST )

soln->ve[i] = soln->ve[i] - a->me[i][k]* r;

return soln;

File name: parefg_main.c #include "plefg.h"#include "parmes.h"#include "master.h"#include "worker.h"

int main( int argc, char **argv )int myid, np;

MPI_Init( &argc, &argv );MPI_Comm_rank( COMM, &myid ); //Getprocess IDMPI_Comm_size( COMM, &np ); //Get numberof total processes

if ( np < 2 ) printf( "\nError in starting up the

program: There must be at least TWOprocesses!\n\n");exit( 1 );

if ( myid == MASTER ) printf( "\n\n" );printf(

"***************************************************************\n" );printf( "*** Parallel PLastic Element-

Free Galerkin (ParPLEFG) ***\n" );printf(

"***************************************************************\n" );printf( "* Original serial code by Dr.

William J. Barry *\n" );printf( "* Parallel code by Thiti

Vacharasintopchai *\n" );printf( "* February 2000 *\n"

);printf(

"***************************************************************\n" );printf( "\n" );master( COMM, &argc, &argv );printf( "\nFINISHED!\n\n" ); else worker( COMM );

MPI_Finalize();

return 0;

File name: post.c* #include <stdio.h>#include <stdlib.h>#include <limits.h>

Page 165: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

156

#include "math.h"#include "../mes/matrix.h"#include "../mes/matrix2.h"

#include "material.h"#include "shapes.h"#include "efg_stiff.h"#include "iscan.h"

double igestr(VEC *sig,int pcode,double nu)double eff_stress,pressure,sigz;VEC *dev_stress;

switch(pcode)case 1:pressure = (sig->ve[0] + sig->ve[1])/3.0;dev_stress = v_get(3);dev_stress->ve[0] = sig->ve[0] -

pressure;dev_stress->ve[1] = sig->ve[1] -

pressure;dev_stress->ve[2] = sig->ve[2];break;

case 2:sigz = nu*(sig->ve[0] + sig->ve[1]);pressure = (sig->ve[0] + sig->ve[1] +

sigz)/3.0;dev_stress = v_get(4);dev_stress->ve[0] = sig->ve[0] -

pressure;dev_stress->ve[1] = sig->ve[1] -

pressure;dev_stress->ve[2] = sigz - pressure;dev_stress->ve[3] = sig->ve[2];break;

case 3:break;

case 4:pressure = (sig->ve[0] + sig->ve[1] +

sig->ve[2])/3.0;dev_stress = v_get(6);dev_stress->ve[0] = sig->ve[0] -

pressure;dev_stress->ve[1] = sig->ve[1] -

pressure;dev_stress->ve[2] = sig->ve[2] -

pressure;dev_stress->ve[3] = sig->ve[3];dev_stress->ve[4] = sig->ve[4];dev_stress->ve[5] = sig->ve[5];break;eff_stress = sqrt(1.5 *in_prod(dev_stress,dev_stress));v_free(dev_stress);return(eff_stress);

VEC *tstr(VEC *pt,VEC *sig)int ndim;double x,y,d,c,s;VEC *tsig;MAT *A,*CART,*CYL,*MDUM;

ndim = pt->dim;

x = pt->ve[0];y = pt->ve[1];d = sqrt(x*x+y*y);c = x/d; s = y/d;

A = m_get(3,3);A->me[0][0] = c;A->me[0][1] = s;A->me[1][0] = -s;A->me[1][1] = c;A->me[2][2] = 1.0;

CART = m_get(3,3);switch(ndim)case 2:CART->me[0][0] = sig->ve[0];CART->me[0][1] = sig->ve[2];CART->me[0][2] = 0.0;CART->me[1][0] = sig->ve[2];CART->me[1][1] = sig->ve[1];CART->me[1][2] = 0.0;CART->me[2][0] = 0.0;CART->me[2][1] = 0.0;CART->me[2][2] = 0.0;break;

case 3:CART->me[0][0] = sig->ve[0];CART->me[0][1] = sig->ve[3];CART->me[0][2] = sig->ve[4];CART->me[1][0] = sig->ve[3];CART->me[1][1] = sig->ve[1];CART->me[1][2] = sig->ve[5];CART->me[2][0] = sig->ve[4];CART->me[2][1] = sig->ve[5];CART->me[2][2] = sig->ve[2];break;

MDUM = mmtr_mlt(CART,A,MNULL);CYL = m_mlt(A,MDUM,MNULL);

switch(ndim)case 2:tsig = v_get(3);tsig->ve[0] = CYL->me[0][0];tsig->ve[1] = CYL->me[1][1];tsig->ve[2] = CYL->me[0][1];break;

case 3:tsig = v_get(6);tsig->ve[0] = CYL->me[0][0];tsig->ve[1] = CYL->me[1][1];tsig->ve[2] = CYL->me[2][2];tsig->ve[3] = CYL->me[0][1];tsig->ve[4] = CYL->me[0][2];tsig->ve[5] = CYL->me[1][2];break;

m_free(MDUM); m_free(CYL); m_free(CART),m_free(A);

return(tsig);

Page 166: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

157

VEC *igstrain(VEC *x,MAT *NODES,VEC*disp,VEC *evec,MAT **B,

int gcount,MAT **GPDATA)int i,M,nod,ndim;VEC *phix,*phiy,*phiz,*index,*u,*eps;

ndim = x->dim;

/* igdshp(x,NODES,param,order,weight_type,search_type,&LIST,INDEX,&soi,&phix,&phiy,&phiz); */

phix = get_col(GPDATA[gcount],0,VNULL);phiy = get_col(GPDATA[gcount],1,VNULL);phiz = get_col(GPDATA[gcount],2,VNULL);index = get_col(GPDATA[gcount],3,VNULL);

M = phix->dim;

switch(ndim)case 2:*B = bmat2d(phix,phiy);break;

case 3:*B = bmat3d(phix,phiy,phiz);break;

/* Build the vector of displacements u */u = v_get(ndim*M);for(i=0;i<M;i++)nod = (int) (index->ve[i]);u->ve[i*ndim+0] = disp->ve[nod*ndim+0];u->ve[i*ndim+1] = disp->ve[nod*ndim+1];if(ndim==3)u->ve[i*ndim+2] = disp->ve[nod*ndim+2];

/* Compute the stresses */eps = mv_mlt(*B,u,VNULL);

/* Free up the intermediate variables */v_free(phix); v_free(phiy); v_free(u);v_free(index);if(ndim==3)v_free(phiz);

return(eps);

MAT *igstress(VEC *x,MAT *NODES,VEC*disp,VEC *pvec,VEC *evec,MAT *mvec)int order,search_type,weight_type;double param;int i,M,nod;VEC *phix,*phiy,*phiz,*index,*u,*eps,*sig;MAT *B,*D,*LIST,*RES;

order = (int) evec->ve[1];weight_type = (int) evec->ve[2];search_type = (int) evec->ve[3];param = evec->ve[4];

iscan(x,NODES,search_type,param,weight_type,&LIST,&index);

idshp(x,LIST,param,order,weight_type,&phix,&phiy,&phiz);if((int)(pvec->ve[5]) == 1)printf("param %10.5f npts%5d\n",param,(int)(index->dim));

M = phix->dim;

B = bmat3d(phix,phiy,phiz);

/* Build the vector of displacements u */u = v_get(3*M);for(i=0;i<M;i++)nod = (int) (index->ve[i]);u->ve[i*3+0] = disp->ve[nod*3+0];u->ve[i*3+1] = disp->ve[nod*3+1];u->ve[i*3+2] = disp->ve[nod*3+2];

/* Compute the strain */eps = mv_mlt(B,u,VNULL);

/* Compute the Stress */D = hooke(4,mvec->me[0][0],mvec->me[0][1]);sig = mv_mlt(D,eps,VNULL);

/* Pack the Results */RES = m_get(2,6);set_row(RES,0,eps);set_row(RES,1,sig);

/* Free up the intermediate variables */v_free(phix); v_free(phiy); v_free(u);v_free(index);v_free(phiz); m_free(B); m_free(LIST);m_free(D); v_free(eps); v_free(sig);

return(RES);

VEC *igdsp(VEC *x,MAT *NODES,VEC *disp,intorder,int search_type,

double param,int weight_type)int M,i,nod,ndim;VEC *index,*phi,*ux,*uy,*uz,*dsp;MAT *LIST;

ndim = x->dim;

igshp(x,NODES,param,order,weight_type,search_type,&LIST,&index,&phi);

/* if( search_type == 1 )

iscan(x,NODES,search_type,param,weight_type,&LIST,&index,&soi);phi = ishp(x,LIST,soi,order,weight_type); */

M = phi->dim;ux = v_get(M);

Page 167: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

158

uy = v_get(M);if(ndim == 3)uz = v_get(M);for(i=0;i<M;i++)nod = (int) (index->ve[i]);ux->ve[i] = disp->ve[nod*ndim+0];uy->ve[i] = disp->ve[nod*ndim+1];if(ndim==3)uz->ve[i] = disp->ve[nod*ndim+2];

dsp = v_get(ndim);dsp->ve[0] = in_prod(phi,ux);dsp->ve[1] = in_prod(phi,uy);if(ndim == 3)dsp->ve[2] = in_prod(phi,uz);

/* Free the intermediate variables */v_free(index); v_free(phi); v_free(ux);v_free(uy);if(ndim==3)v_free(uz);m_free(LIST);return(dsp);

File name: post_output.c* #include "plefg.h"

void post_output( FILE *f_out, MAT *DDISP,MAT *DSTRESS )int i;int ndloc;

ndloc = DDISP->m;

/* Print the desired displacements tooutput file */fprintf( f_out, "\n***DESIREDDISPLACEMENTS***\n" );fprintf( f_out, "%15s %15s %15s\n", "UX","UY", "UZ" );for ( i = 0; i < ndloc; i++ ) fprintf( f_out, "%15.5e %15.5e %15.5e\n",DDISP->me[ i ][ 0 ],DDISP->me[ i ][ 1 ],DDISP->me[ i ][ 2 ] );

/* Print the stresses at the desireddisplacement locations to output file */fprintf( f_out, "\n***DESIREDSTRESSES***\n" );fprintf( f_out, "%15s %15s %15s %15s %15s%15s\n","S1", "S2", "S3", "S4", "S5", "S6" );for ( i = 0; i < ndloc; i++ ) fprintf( f_out, "%15.5e %15.5e %15.5e

%15.5e %15.5e %15.5e\n",DSTRESS->me[ i ][ 0 ],DSTRESS->me[ i ][ 1 ],DSTRESS->me[ i ][ 2 ],DSTRESS->me[ i ][ 3 ],DSTRESS->me[ i ][ 4 ],DSTRESS->me[ i ][ 5 ] );

return;

File name: qclient.c #include <sys/types.h>#include <sys/socket.h>#include <netinet/in.h>#include <errno.h>#include <netdb.h>#include <stdio.h>#include <unistd.h>#include "qcodes.h"

int connect_to_server( const char*server_name, const int port )int sock;struct sockaddr_in server;struct hostent *host_info;

if ( ( sock = socket( AF_INET,SOCK_STREAM, 0 ) ) < 0 ) return -1;

if ( ( host_info = gethostbyname(server_name ) ) == NULL ) return -2;

server.sin_family = host_info->h_addrtype;memcpy( (char *) &server.sin_addr,host_info->h_addr,host_info->h_length );server.sin_port = htons( port );

if ( connect( sock, (struct sockaddr *)&server, sizeof server ) < 0 ) return -3;

return sock;

int send_request( const int sock_fd, constchar sock_msg )write( sock_fd, &sock_msg, sizeof sock_msg);return 0;

int send_qdata( const int sock_fd, constint qdata )write( sock_fd, &qdata, sizeof qdata );return 0;

int get_qdata( const int sock_fd )int qdata;read( sock_fd, &qdata, sizeof qdata );return qdata;

Page 168: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

159

int get_num( const int sock_fd )send_request( sock_fd, GET_NUM );return get_qdata( sock_fd );

int set_max_num( const int sock_fd, constint max_cell_num )send_request( sock_fd, RESET_COUNTER );send_request( sock_fd, SET_MAX_NUM );send_qdata( sock_fd, max_cell_num );return 0;

int stop_qserv( const int sock_fd )send_request( sock_fd, TERMINATE );return 0;

File name: setup.c* #include <stdio.h>#include <stdlib.h>#include <string.h>#include <limits.h>

#include "math.h"#include "../mes/matrix.h"#include "../mes/matrix2.h"#include "constants.h"

int setup_data(FILE *fin,FILE *fout,int*EQCOUNT,

VEC *params,MAT *CORD,MAT *ICON,MAT*INBCS,

MAT **BCTYPE,MAT **BCVAL,MAT **ENUM)/* Loop indices and counters */int i,j;

int NOD,DIR,TYPE;double VAL;

int NTNODE,NDOF,NDIM,NBCC;

*EQCOUNT = 0;

/* Initialize the BCTYPE and BCVAL arraysto zero tractions */NTNODE = CORD->m;NDOF = (int) (params->ve[2]);*BCTYPE = m_get(NTNODE,NDOF);*BCVAL = m_get(NTNODE,NDOF);for(i=0;i<NTNODE;i++)for(j=0;j<NDOF;j++)(*BCTYPE)->me[i][j] = TRACTION;(*BCVAL)->me[i][j] = ZERO;

/* Go through the input BCS and adjust theBCTYPE AND BCVAL arrays */NBCC = INBCS->m;for(i=0;i<NBCC;i++)

TYPE = (int) (INBCS->me[i][0]);DIR = (int) (INBCS->me[i][1]);NOD = (int) (INBCS->me[i][2]);VAL = INBCS->me[i][3];

if( (TYPE==TRACTION) ||(TYPE==DISPLACEMENT) )(*BCTYPE)->me[NOD][DIR-1] = TYPE;(*BCVAL)->me[NOD][DIR-1] = VAL;elseprintf("Error: Invalid BCTYPE

encountered.\n");printf("Error: Current value of TYPE is

%d\n",TYPE);return(FAIL);

/* Go through each node and number theequations appropriately */*ENUM = m_get(NTNODE,NDOF);for(i=0;i<NTNODE;i++)for(j=0;j<NDOF;j++)TYPE = (int) ( (*BCTYPE)->me[i][j] );if( TYPE == TRACTION )(*ENUM)->me[i][j] = *EQCOUNT;(*EQCOUNT)++;else(*ENUM)->me[i][j] = FIXED;

NDIM = CORD->n;switch( NDIM )case 2:fprintf(fout,"\n ***NODAL EQUATION

NUMBERS***\n");fprintf(fout,"\n%4s %10s %10s\n",

"NODE","X-DIR","Y-DIR");for(i=0;i<NTNODE;i++)fprintf(fout,"%4d %10d %10d\n",i,(int)(*ENUM)->me[i][0],(int)(*ENUM)->me[i][1]);

break;case 3:fprintf(fout,"\n ***NODAL EQUATION

NUMBERS***\n");fprintf(fout,"\n%4s %10s %10s %10s\n",

"NODE","X-DIR","Y-DIR","Z-DIR");for(i=0;i<NTNODE;i++)fprintf(fout,"%4d %10d %10d %10d\n",i,(int)(*ENUM)->me[i][0],(int)(*ENUM)->me[i][1],(int)(*ENUM)->me[i][2]);

break;

/* Successfully exit this function */return(PASS);

Page 169: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

160

File name: shapes.c* #include <stdio.h>#include <stdlib.h>#include <limits.h>

#include "math.h"#include "../mes/matrix.h"#include "../mes/matrix2.h"

#include "basis.h"#include "sheps.h"#include "weights.h"#include "util.h"#include "iscan.h"#include "isorc.h"

void gshapes(VEC *x,int order,VEC **phi,MAT**dphi)int ndim;double xci,eta,zed;

ndim = x->dim;switch(ndim)case 1:xci = x->ve[0];switch(order)case 1:*phi = v_get(2);*dphi = m_get(2,1);(*phi)->ve[0] = 0.5*(1.0-xci);(*phi)->ve[1] = 0.5*(1.0+xci);(*dphi)->me[0][0] = -0.5;(*dphi)->me[0][1] = 0.5;break;

case 2:printf("1D Second Order Shape Functions

Not Implemented.\n");break;break;

case 2:xci = x->ve[0];eta = x->ve[1];switch(order)case 1:*phi = v_get(4);*dphi = m_get(4,2);(*phi)->ve[0] = 0.25*(1.0-xci)*(1.0-

eta);(*phi)->ve[1] = 0.25*(1.0+xci)*(1.0-

eta);(*phi)->ve[2] =

0.25*(1.0+xci)*(1.0+eta);(*phi)->ve[3] = 0.25*(1.0-

xci)*(1.0+eta);(*dphi)->me[0][0] =-0.25*(1.0-eta);(*dphi)->me[1][0] = 0.25*(1.0-eta);(*dphi)->me[2][0] = 0.25*(1.0+eta);(*dphi)->me[3][0] =-0.25*(1.0+eta);(*dphi)->me[0][1] =-0.25*(1.0-xci);(*dphi)->me[1][1] =-0.25*(1.0+xci);(*dphi)->me[2][1] = 0.25*(1.0+xci);(*dphi)->me[3][1] = 0.25*(1.0-xci);break;

case 2:printf("2D Second Order Shape Functions

Not Implemented.\n");break;break;

case 3:xci = x->ve[0];eta = x->ve[1];zed = x->ve[2];switch(order)case 1:*phi = v_get(8);*dphi = m_get(8,3);

(*phi)->ve[0] = 0.125*(1.0-xci)*(1.0-eta)*(1.0-zed);

(*phi)->ve[1] = 0.125*(1.0+xci)*(1.0-eta)*(1.0-zed);

(*phi)->ve[2] =0.125*(1.0+xci)*(1.0+eta)*(1.0-zed);

(*phi)->ve[3] = 0.125*(1.0-xci)*(1.0+eta)*(1.0-zed);

(*phi)->ve[4] = 0.125*(1.0-xci)*(1.0-eta)*(1.0+zed);

(*phi)->ve[5] = 0.125*(1.0+xci)*(1.0-eta)*(1.0+zed);

(*phi)->ve[6] =0.125*(1.0+xci)*(1.0+eta)*(1.0+zed);

(*phi)->ve[7] = 0.125*(1.0-xci)*(1.0+eta)*(1.0+zed);

(*dphi)->me[0][0] = -0.125*(1.0-eta)*(1.0-zed);

(*dphi)->me[1][0] = 0.125*(1.0-eta)*(1.0-zed);

(*dphi)->me[2][0] =0.125*(1.0+eta)*(1.0-zed);

(*dphi)->me[3][0] = -0.125*(1.0+eta)*(1.0-zed);

(*dphi)->me[4][0] = -0.125*(1.0-eta)*(1.0+zed);

(*dphi)->me[5][0] = 0.125*(1.0-eta)*(1.0+zed);

(*dphi)->me[6][0] =0.125*(1.0+eta)*(1.0+zed);

(*dphi)->me[7][0] = -0.125*(1.0+eta)*(1.0+zed);

(*dphi)->me[0][1] = -0.125*(1.0-xci)*(1.0-zed);

(*dphi)->me[1][1] = -0.125*(1.0+xci)*(1.0-zed);

(*dphi)->me[2][1] =0.125*(1.0+xci)*(1.0-zed);

(*dphi)->me[3][1] = 0.125*(1.0-xci)*(1.0-zed);

(*dphi)->me[4][1] = -0.125*(1.0-xci)*(1.0+zed);

(*dphi)->me[5][1] = -0.125*(1.0+xci)*(1.0+zed);

(*dphi)->me[6][1] =0.125*(1.0+xci)*(1.0+zed);

(*dphi)->me[7][1] = 0.125*(1.0-xci)*(1.0+zed);

(*dphi)->me[0][2] = -0.125*(1.0-xci)*(1.0-eta);

Page 170: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

161

(*dphi)->me[1][2] = -0.125*(1.0+xci)*(1.0-eta);

(*dphi)->me[2][2] = -0.125*(1.0+xci)*(1.0+eta);

(*dphi)->me[3][2] = -0.125*(1.0-xci)*(1.0+eta);

(*dphi)->me[4][2] = 0.125*(1.0-xci)*(1.0-eta);

(*dphi)->me[5][2] =0.125*(1.0+xci)*(1.0-eta);

(*dphi)->me[6][2] =0.125*(1.0+xci)*(1.0+eta);

(*dphi)->me[7][2] = 0.125*(1.0-xci)*(1.0+eta);

break;

case 2:printf("3D Second Order Shape Functions

Not Implemented.\n");break;break;

void ishp(VEC *xs,MAT *xx,double soi,intorder,int weight_type,VEC **phi)int i,M,ising;VEC *pxs,*xi,*s,*g,*vdum,*vdum2;MAT*P,*W,*ID,*V,*V_TRANS,*MDUM,*MDUM1,*U,*A,*B,*C,*IA;

M = xx->m;ising = isorc(xs,xx);pxs = gbasvec(xs,order);P = gpmat(xx,order);

/* Build the diagonal weight matrix */W = m_get(M,M);for(i=0;i<M;i++)xi = get_row(xx,i,VNULL);if(i != ising)W->me[i][i] = iwt(xs,xi,soi,weight_type);v_free(xi);

/* Form an MxM identity matrix */ID = m_get(M,M);m_ident(ID);

/* Form the MxM V matrix */s = shep(xs,xx,soi,weight_type);V = m_get(M,M);for(i=0;i<M;i++)set_row(V,i,s);

/* Form the U matrix */V_TRANS = m_transp(V,MNULL);MDUM = m_sub(ID,V_TRANS,MNULL);U = m_mlt(P,MDUM,MNULL);m_free(V_TRANS);m_free(MDUM);

/* Form A,B,and C */MDUM = mmtr_mlt(W,U,MNULL);A = m_mlt(U,MDUM,MNULL);m_free(MDUM);

B = m_mlt(U,W,MNULL);C = m_sub(ID,V,MNULL);

/* Form the g vector */vdum = mv_mlt(P,s,VNULL);g = v_sub(pxs,vdum,VNULL);v_free(vdum);

/* Form the inverse of the A matrix */catch(E_SING,IA = m_inverse(A,MNULL);,(*phi) = v_get(0);return;);

/* Form the interpolating functions */MDUM = m_mlt(B,C,MNULL);MDUM1 = m_transp(MDUM,MNULL);vdum = vm_mlt(IA,g,VNULL);vdum2 = mv_mlt(MDUM1,vdum,VNULL);*phi = v_add(s,vdum2,VNULL);

/* Free the intermediate variables */v_free(vdum); v_free(vdum2); v_free(s);v_free(g); v_free(pxs);m_free(A); m_free(B); m_free(C);m_free(MDUM1); m_free(IA); m_free(W);m_free(P); m_free(ID); m_free(V);m_free(MDUM); m_free(U);

return;

void igshp(VEC *xs,MAT *XX,double param,intorder,

int weight_type,int search_type,MAT **LIST,VEC **index,VEC **phi)

iscan(xs,XX,search_type,param,weight_type,LIST,index);

ishp(xs,*LIST,param,order,weight_type,phi);

void idshp(VEC *xs,MAT *xx,double soi,intorder,int weight_type,

VEC **phix,VEC **phiy,VEC **phiz)int i,M,ising,ndim;VEC *pxs,*xi,*s,*g,*pxsx,*pxsy,*pxsz,*dw;VEC *sx,*sy,*sz,*VDUM,*gx,*gy,*gz;MAT*P,*W,*ID,*V,*V_TRANS,*MDUM,*U,*A,*B,*C,*DPXS;MAT*WX,*WY,*WZ,*DSHEPS,*VX,*VY,*VZ,*UX,*UY,*UZ,*CX,*CY,*CZ,*BX,*BY,*BZ;MAT *AX,*AY,*AZ,*IA;

ndim = xs->dim;

M = xx->m;ising = isorc(xs,xx);P = gpmat(xx,order);

/* Build the diagonal weight matrix */W = m_get(M,M);for(i=0;i<M;i++)xi = get_row(xx,i,VNULL);if(i != ising)

Page 171: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

162

W->me[i][i] = iwt(xs,xi,soi,weight_type);v_free(xi);

/* Form an MxM identity matrix */ID = m_get(M,M);m_ident(ID);

/* Form the MxM V matrix */s = shep(xs,xx,soi,weight_type);V = m_get(M,M);for(i=0;i<M;i++)set_row(V,i,s);

/* Form the U matrix */V_TRANS = m_transp(V,MNULL);MDUM = m_sub(ID,V_TRANS,MNULL);U = m_mlt(P,MDUM,MNULL);m_free(V_TRANS);m_free(MDUM);

/* Form the A matrix */MDUM = mmtr_mlt(W,U,MNULL);A = m_mlt(U,MDUM,MNULL);m_free(MDUM);

/* Check the A matrix for ill-conditioning*//* pivot = px_get(A->m);cn = 1.0/LUcondest(A,pivot);printf("IDSHP3D: The CN is %e\n",cn);if(cn < 1e-20)(*phix) = v_get(0);(*phiy) = v_get(0);(*phiz) = v_get(0);return;px_free(pivot); */

/* Build the vectors of basis functionspxs,pxsx,pxsy, and pxsz */pxs = gbasvec(xs,order);DPXS = gdbasvec(xs,order);pxsx = get_col(DPXS,0,VNULL);pxsy = get_col(DPXS,1,VNULL);if( ndim == 3)pxsz = get_col(DPXS,2,VNULL);m_free(DPXS);

/* Form the WX, WY, and WZ weightderivative matrices */WX = m_get(M,M);WY = m_get(M,M);if( ndim == 3)WZ = m_get(M,M);for(i=0;i<M;i++)if( i != ising)xi = get_row(xx,i,VNULL);dw = idwt(xs,xi,soi,weight_type);WX->me[i][i] = dw->ve[0];WY->me[i][i] = dw->ve[1];if( ndim == 3)WZ->me[i][i] = dw->ve[2];v_free(dw);v_free(xi);

/* Form the Shepard Interpolantderivatives */DSHEPS = dshep(xs,xx,soi,weight_type);sx = get_col(DSHEPS,0,VNULL);sy = get_col(DSHEPS,1,VNULL);if( ndim == 3)sz = get_col(DSHEPS,2,VNULL);m_free(DSHEPS);

/* Form the VX,VY, and VZ matrices */VX = m_get(M,M);VY = m_get(M,M);if( ndim == 3)VZ = m_get(M,M);for(i=0;i<M;i++)set_row(VX,i,sx);set_row(VY,i,sy);if(ndim == 3)set_row(VZ,i,sz);

/* Form the UX,UY, and UZ matrices */UX = mmtr_mlt(P,VX,MNULL);sm_mlt(-1.0,UX,UX);UY = mmtr_mlt(P,VY,MNULL);sm_mlt(-1.0,UY,UY);if(ndim == 3)UZ = mmtr_mlt(P,VZ,MNULL);sm_mlt(-1.0,UZ,UZ);

/* Form the C,CX,CY,and CZ matrices */C = m_sub(ID,V,MNULL);CX = sm_mlt(-1.0,VX,MNULL);CY = sm_mlt(-1.0,VY,MNULL);if(ndim == 3)CZ = sm_mlt(-1.0,VZ,MNULL);

/* Form the B,BX,BY, and BZ matrices */B = m_mlt(U,W,MNULL);BX = gdb(U,UX,W,WX);BY = gdb(U,UY,W,WY);if(ndim == 3)BZ = gdb(U,UZ,W,WZ);

/* Form the AX,AY,and AZ matrices */AX = gda(U,UX,W,WX);AY = gda(U,UY,W,WY);if(ndim == 3)AZ = gda(U,UZ,W,WZ);

/* Form the g,gx,gy, and gz vectors */VDUM = mv_mlt(P,s,VNULL);g = v_sub(pxs,VDUM,VNULL);v_free(VDUM);

VDUM = mv_mlt(P,sx,VNULL);gx = v_sub(pxsx,VDUM,VNULL);v_free(VDUM);

VDUM = mv_mlt(P,sy,VNULL);gy = v_sub(pxsy,VDUM,VNULL);v_free(VDUM);

if(ndim == 3)VDUM = mv_mlt(P,sz,VNULL);gz = v_sub(pxsz,VDUM,VNULL);v_free(VDUM);

Page 172: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

163

/* Form the inverse of the A matrix */catch(E_SING,IA = m_inverse(A,MNULL);,(*phix) = v_get(0);return;);

/* Form the interpolating functionderivatives */(*phix) = gphin(sx,IA,AX,B,BX,C,CX,g,gx);(*phiy) = gphin(sy,IA,AY,B,BY,C,CY,g,gy);if(ndim == 3)(*phiz) = gphin(sz,IA,AZ,B,BZ,C,CZ,g,gz);

/* Free the intermediate variables */v_free(s); v_free(pxs); v_free(pxsx);v_free(pxsy);v_free(sx); v_free(sy); v_free(g);v_free(gx);v_free(gy);m_free(P); m_free(W); m_free(ID);m_free(V); m_free(U); m_free(A);m_free(WX); m_free(WY); m_free(VX);m_free(VY);m_free(UX); m_free(UY); m_free(C);m_free(CX); m_free(CY);m_free(B); m_free(BX); m_free(BY);m_free(AX);m_free(AY); m_free(IA);

if(ndim == 3)v_free(pxsz); v_free(sz); v_free(gz);m_free(WZ); m_free(VZ);m_free(UZ); m_free(CZ); m_free(BZ);m_free(AZ);

void igdshp(VEC *xs,MAT *XX,doubleparam,int order,

int weight_type,int search_type,MAT **LIST,VEC **index,double *soi,VEC **phix,VEC **phiy,VEC **phiz)

switch(search_type)case 1:

iscan(xs,XX,search_type,param,weight_type,LIST,index);

idshp(xs,*LIST,param,order,weight_type,phix,phiy,phiz);

break;

File name: sheps.c* #include <stdio.h>#include <stdlib.h>#include <limits.h>

#include "math.h"#include "../mes/matrix.h"

#include "../mes/matrix2.h"

#include "isorc.h"#include "weights.h"

VEC *shep(VEC *xs,MAT *xx,double soi,intweight_type)int nn,in,ising;double wsum,wi;VEC *sf;VEC *xn;

nn = xx->m;sf = v_get(nn);ising = isorc(xs,xx);

if(ising != -1)sf->ve[ising] = 1.0;elsewsum = 0.0;for(in=0;in<nn;in++)xn = get_row(xx,in,VNULL);wi = iwt(xs,xn,soi,weight_type);v_free(xn);sf->ve[in] = wi;wsum += wi;for(in=0;in<nn;in++)sf->ve[in] = sf->ve[in]/wsum;return(sf);

MAT *dshep(VEC *xs,MAT *xx,double soi,intweight_type)int in,id,nn,ndim,ising;double wsum,wi;VEC *sf;VEC *dwsum;VEC *xn;VEC *dwi;MAT *dsf;MAT *dshpf;

nn = xx->m;ndim = xx->n;dshpf = m_get(nn,ndim);sf = v_get(nn);dsf = m_get(nn,ndim);

wsum = 0.0;dwsum = v_get(ndim);

ising = isorc(xs,xx);if(ising == -1)for(in=0;in<nn;in++)xn = get_row(xx,in,VNULL);wi = iwt(xs,xn,soi,weight_type);dwi = idwt(xs,xn,soi,weight_type);

sf->ve[in] = wi;wsum += wi;for(id=0;id<ndim;id++)dsf->me[in][id] = dwi->ve[id];

Page 173: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

164

dwsum->ve[id] += dwi->ve[id];v_free(xn);v_free(dwi);

for(in=0;in<nn;in++)for(id=0;id<ndim;id++)dshpf->me[in][id] += (wsum*dsf-

>me[in][id] -dwsum->ve[id]*sf-

>ve[in])/pow(wsum,2);

m_free(dsf);v_free(sf);v_free(dwsum);

return(dshpf);

File name: solve.c* #include <stdio.h>#include <stdlib.h>#include <string.h>#include <limits.h>

#include "math.h"#include "../mes/matrix.h"#include "../mes/matrix2.h"#include "../mes/iter.h"#include "../mes/sparse.h"#include "../mes/sparse2.h"#include "constants.h"

int solve(FILE *fin,FILE *fout,SPMAT *STIFF,VEC *FORCE,VEC **SOLU)

int EQCOUNT;

EQCOUNT = STIFF->m;*SOLU = v_get(EQCOUNT);spCHfactor(STIFF);spCHsolve(STIFF,FORCE,*SOLU);

/* Successfully exit this function */return(1);

File name: util.c* #include <stdio.h>#include <stdlib.h>#include <limits.h>

#include "math.h"#include "../mes/matrix.h"#include "../mes/matrix2.h"#include "material.h"#include "constants.h"

#include "shapes.h"

MAT *geqnum(MAT *NDISP,int nnodes,int ndim)

int nod,dir,nbcs,i,j,eqc;MAT *EQNUM;

nbcs = NDISP->m;EQNUM = m_get(nnodes,ndim);

for(i=0;i<nbcs;i++)nod = (int) (NDISP->me[i][0]);dir = (int) (NDISP->me[i][1]) - 1;EQNUM->me[nod][dir] = -1;

eqc = 0;for(i=0;i<nnodes;i++)for(j=0;j<ndim;j++)if( EQNUM->me[i][j] == 0 )EQNUM->me[i][j] = eqc;eqc +=1;

return(EQNUM);

MAT *gke(MAT *B,MAT *D,double dvol)MAT *MDUM1,*MDUM2,*KE;

MDUM1 = m_mlt(D,B,MNULL);MDUM2 = mtrm_mlt(B,MDUM1,MNULL);KE = sm_mlt(dvol,MDUM2,MNULL);m_free(MDUM1); m_free(MDUM2);

return(KE);

MAT *gda(MAT *U,MAT *UN,MAT *W,MAT *WN)MAT *MPROD1,*MPROD2,*MPROD3,*MDUM,*AN;

MDUM = mmtr_mlt(W,U,MNULL);MPROD1 = m_mlt(UN,MDUM,MNULL);m_free(MDUM);

MDUM = mmtr_mlt(WN,U,MNULL);MPROD2 = m_mlt(U,MDUM,MNULL);m_free(MDUM);

MDUM = mmtr_mlt(W,UN,MNULL);MPROD3 = m_mlt(U,MDUM,MNULL);m_free(MDUM);

MDUM = m_add(MPROD1,MPROD2,MNULL);AN = m_add(MDUM,MPROD3,MNULL);m_free(MDUM);

m_free(MPROD1);m_free(MPROD2);m_free(MPROD3);

return(AN);

MAT *gdb(MAT *U,MAT *UN,MAT *W,MAT *WN)MAT *MPROD1,*MPROD2,*BN;

Page 174: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

165

MPROD1 = m_mlt(UN,W,MNULL);MPROD2 = m_mlt(U,WN,MNULL);BN = m_add(MPROD1,MPROD2,MNULL);m_free(MPROD1); m_free(MPROD2);

return(BN);

VEC *mtr3v_mlt(MAT *M1,MAT *M2,MAT *M3,VEC*v1)VEC *vdum1,*vdum2,*vans;

vdum1 = vm_mlt(M3,v1,VNULL);vdum2 = vm_mlt(M2,vdum1,VNULL);vans = vm_mlt(M1,vdum2,VNULL);v_free(vdum1); v_free(vdum2);

return(vans);

VEC *gphin(VEC *sn,MAT *IA,MAT *AN,MAT*B,MAT *BN,MAT *C,MAT *CN,

VEC *g,VEC *gn)VEC*vprod1,*vprod2,*vprod3,*vprod4,*vdum1,*vdum2,*vdum3,*phin;MAT *IAN;MAT *MDUM;

MDUM = m_mlt(AN,IA,MNULL);IAN = m_mlt(IA,MDUM,MNULL);sm_mlt(-1.0,IAN,IAN);

vprod1 = mtr3v_mlt(CN,B,IA,g);vprod2 = mtr3v_mlt(C,BN,IA,g);vprod3 = mtr3v_mlt(C,B,IAN,g);vprod4 = mtr3v_mlt(C,B,IA,gn);vdum1 = v_add(vprod1,vprod2,VNULL);vdum2 = v_add(vprod3,vprod4,VNULL);vdum3 = v_add(vdum1,vdum2,VNULL);phin = v_add(sn,vdum3,VNULL);

m_free(IAN); m_free(MDUM);v_free(vprod1); v_free(vprod2);v_free(vprod3); v_free(vprod4);v_free(vdum1); v_free(vdum2);v_free(vdum3);

return(phin);

VEC *solve_eq(MAT *A,VEC *b,MAT *NDISP,MAT*FPDATA,MAT *NODES,

VEC **fixed_list)inti,ncon,nod,dir,size,loc,ip,in,ndim,nnodes;double val,x0,y0,z0,nx,ny,nz,xx,yy,zz;VEC *vecout,*vecin,*x,*dsp;PERM *pivot;

ncon = NDISP->m;size = A->m;nnodes = NODES->m;ndim = NODES->n;

*fixed_list = v_get(b->dim);

/* Take Care of the Nodal Constraints */

if(NDISP != NULL)for(i=0;i<ncon;i++)dir = (int) (NDISP->me[i][0]);nod = (int) (NDISP->me[i][1]);val = NDISP->me[i][2];loc = 3*nod + dir - 1;

(*fixed_list)->ve[loc] = 1.0;

vecout = get_col(A,loc,VNULL);sv_mlt(-val,vecout,vecout);vecout->ve[loc] -= b->ve[loc];v_add(b,vecout,b);v_free(vecout);

vecin = v_get(size);vecin->ve[loc] = -1.0;set_col(A,loc,vecin);v_free(vecin);

/* Take Care of the Planar Constraints */if(FPDATA != NULL)for(ip=0;ip<(int)(FPDATA->m);ip++)dir = (int) (FPDATA->me[ip][0]);val = FPDATA->me[ip][1];x0 = FPDATA->me[ip][2];y0 = FPDATA->me[ip][3];z0 = FPDATA->me[ip][4];nx = FPDATA->me[ip][5];ny = FPDATA->me[ip][6];nz = FPDATA->me[ip][7];

for(in=0;in<nnodes;in++)xx = NODES->me[in][0];yy = NODES->me[in][1];

switch(ndim)case 2:zz = 0.0;break;

case 3:zz = NODES->me[in][2];break;

if( fabs(nx*(xx-x0) + ny*(yy-y0) +nz*(zz-z0)) < 1e-5 )loc = ndim*in + dir - 1;(*fixed_list)->ve[loc] = 1.0;vecout = get_col(A,loc,VNULL);sv_mlt(-val,vecout,vecout);vecout->ve[loc] -= b->ve[loc];v_add(b,vecout,b);v_free(vecout);

vecin = v_get(size);vecin->ve[loc] = -1.0;set_col(A,loc,vecin);v_free(vecin);

Page 175: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

166

pivot = px_get(size);LUfactor(A,pivot);x = LUsolve(A,pivot,b,VNULL);px_free(pivot);

/* Rebuild the Displacement Vector */dsp = v_copy(x,VNULL);

/* Consider the Fixed Nodes */for(i=0;i<ncon;i++)dir = (int) (NDISP->me[i][0]);nod = (int) (NDISP->me[i][1]);val = NDISP->me[i][2];loc = 3*nod + dir - 1;dsp->ve[loc] = val;

/* Consider the Fixed Planes */for(ip=0;ip<(int)(FPDATA->m);ip++)dir = (int) (FPDATA->me[ip][0]);val = FPDATA->me[ip][1];x0 = FPDATA->me[ip][2];y0 = FPDATA->me[ip][3];z0 = FPDATA->me[ip][4];nx = FPDATA->me[ip][5];ny = FPDATA->me[ip][6];nz = FPDATA->me[ip][7];

for(in=0;in<nnodes;in++)xx = NODES->me[in][0];yy = NODES->me[in][1];

switch(ndim)case 2:zz = 0.0;break;

case 3:zz = NODES->me[in][2];break;

if( fabs(nx*(xx-x0) + ny*(yy-y0) +nz*(zz-z0)) < 1e-5 )loc = ndim*in + dir - 1;dsp->ve[loc] = val;v_free(x);return(dsp);

VEC *get_r_theta(double x,double y)double r,theta,tol,pi;VEC *vecout;

vecout = v_get(2);tol = 1e-5;r = sqrt(x*x + y*y);pi = 4.0*atan(1.0);

if( fabs(x) <= tol )if( fabs(y) <= tol )theta = 0.0;elseif( y > 0.0 )theta = 90.0;elsetheta = 270.0;

vecout->ve[0] = r;vecout->ve[1] = theta;return(vecout);

if( x >= 0.0 )if( y >= 0.0 )theta = asin(y/r)*180.0/pi;elsetheta = 360.0 + asin(y/r)*180.0/pi;elseif( y >= 0.0 )theta = 180.0 - asin(y/r)*180.0/pi;elsetheta = 180.0 - asin(y/r)*180.0/pi;

vecout->ve[0] = r;vecout->ve[1] = theta;return(vecout);

VEC *tstr3d(VEC *pt,VEC *sig)double x,y,d,c,s;VEC *tsig;MAT *A,*CART,*CYL,*MDUM;

x = pt->ve[0];y = pt->ve[1];d = sqrt(x*x+y*y);c = x/d; s = y/d;

A = m_get(3,3);A->me[0][0] = c;A->me[0][1] = s;A->me[1][0] = -s;A->me[1][1] = c;A->me[2][2] = 1.0;

CART = m_get(3,3);CART->me[0][0] = sig->ve[0];CART->me[0][1] = sig->ve[3];CART->me[0][2] = sig->ve[4];CART->me[1][0] = sig->ve[3];CART->me[1][1] = sig->ve[1];CART->me[1][2] = sig->ve[5];CART->me[2][0] = sig->ve[4];CART->me[2][1] = sig->ve[5];CART->me[2][2] = sig->ve[2];

MDUM = mmtr_mlt(CART,A,MNULL);CYL = m_mlt(A,MDUM,MNULL);

tsig = v_get(6);tsig->ve[0] = CYL->me[0][0];

Page 176: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

167

tsig->ve[1] = CYL->me[1][1];tsig->ve[2] = CYL->me[2][2];tsig->ve[3] = CYL->me[0][1];tsig->ve[4] = CYL->me[0][2];tsig->ve[5] = CYL->me[1][2];

m_free(MDUM); m_free(CYL); m_free(CART),m_free(A);

return(tsig);

double det(MAT *A)int i;double val=1.0;PERM *pivot;MAT *MDUM;

MDUM = m_copy(A,MNULL);pivot = px_get(A->m);LUfactor(MDUM,pivot);

for(i=0;i<(int)(A->m);i++)val *= MDUM->me[i][i];

px_free(pivot); m_free(MDUM);

return(val);

MAT *inv3(MAT *A)doublefac,M11,M12,M13,M21,M22,M23,M31,M32,M33;MAT *MOUT;

MOUT = m_get(3,3);

M11 = A->me[0][0];M12 = A->me[0][1];M13 = A->me[0][2];M21 = A->me[1][0];M22 = A->me[1][1];M23 = A->me[1][2];M31 = A->me[2][0];M32 = A->me[2][1];M33 = A->me[2][2];

fac = M11*M22*M33 - M11*M23*M32 -M21*M12*M33 +M21*M13*M32 + M31*M12*M23 - M31*M13*M22;

MOUT->me[0][0] = (M22*M33 - M23*M32) /fac;MOUT->me[0][1] =-(M12*M33 - M13*M32) /fac;MOUT->me[0][2] = (M12*M23 - M13*M22) /fac;MOUT->me[1][0] =-(M21*M33 - M23*M31) /fac;MOUT->me[1][1] = (M11*M33 - M13*M31) /fac;MOUT->me[1][2] =-(M11*M23 - M13*M21) /fac;MOUT->me[2][0] = (M21*M32 - M22*M31) /fac;MOUT->me[2][1] =-(M11*M32 - M12*M31) /fac;MOUT->me[2][2] = (M11*M22 - M12*M21) /fac;

return(MOUT);

double get_plfac(MAT *STRESS,VEC *params)int ng,ig;double lfac,pressure;VEC *stress_i;

ng = STRESS->m;lfac = 1e6;for(ig=0;ig<ng;ig++)stress_i = get_row(STRESS,ig,VNULL);pressure = (stress_i->ve[0] + stress_i->ve[2] + stress_i->ve[2])/3.0;stress_i->ve[0] -= pressure;stress_i->ve[1] -= pressure;stress_i->ve[2] -= pressure;lfac = min(sqrt(2.0/3.0)*(params->ve[11])/vnorm(stress_i),lfac);v_free(stress_i);return(lfac);

void get_local_data(double XCI,doubleETA,double TAU,

VEC *X,VEC *Y,VEC *Z,double *detj,VEC **gpt)

VEC *lpt;VEC *phi,*phix,*phiy,*phiz;MAT *dphi;

lpt = v_get(3);lpt->ve[0] = XCI;lpt->ve[1] = ETA;lpt->ve[2] = TAU;

/* Get the detj and global coordinates */gshapes(lpt,1,&phi,&dphi);phix = get_col(dphi,0,VNULL);phiy = get_col(dphi,1,VNULL);phiz = get_col(dphi,2,VNULL);

*detj = in_prod(phix,X) *(in_prod(phiy,Y)*in_prod(phiz,Z) -

in_prod(phiy,Z)*in_prod(phiz,Y)) -in_prod(phiy,X) *

(in_prod(phix,Y)*in_prod(phiz,Z) -in_prod(phix,Z)*in_prod(phiz,Y)) +

in_prod(phiz,X) *(in_prod(phix,Y)*in_prod(phiy,Z) -

in_prod(phix,Z)*in_prod(phiy,Y));

*gpt = v_get(3);(*gpt)->ve[0] = in_prod(phi,X);(*gpt)->ve[1] = in_prod(phi,Y);(*gpt)->ve[2] = in_prod(phi,Z);

v_free(lpt); v_free(phi); v_free(phix);v_free(phiy); v_free(phiz);m_free(dphi);

double check_sparseness(MAT *M)int i,j,size,count;double ans,tol=1e-64;

Page 177: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

168

size = M->m;

count = 0;for(i=0;i<size;i++)for(j=0;j<size;j++)if(fabs(M->me[i][j]) > tol)count += 1;

ans =100.0*(double)count/(double)pow(size,2);

return(ans);

File name: weights.c* #include <stdio.h>#include <stdlib.h>#include <limits.h>

#include "math.h"#include "../mes/matrix.h"#include "../mes/matrix2.h"

#include "norms.h"

double iwt(VEC *p1,VEC *p2,double param,intweight_type)double f,r,wt;

switch(weight_type)case 1:f = l1norm(p1,p2);r = param/f;wt = pow(r,2) - 2*r + 1;break;

case 2:f = l2norm(p1,p2);r = param/f;wt = pow(r,2) - 2*r + 1;break;

case 3:f = linfnorm(p1,p2);r = param/f;wt = pow(r,2) - 2*r + 1;break;

return(wt);

VEC *idwt(VEC *p1,VEC *p2,double param,intweight_type)

/* Note: Still need to consider cornersfor weight type 3 */

int ndim,idim,mflag;double f,r,diff,mdiff;VEC *dfd;VEC *dwt;

ndim = p1->dim;dfd = v_get(ndim);dwt = v_get(ndim);

switch(weight_type)case 1:f = l1norm(p1,p2);r = param/f;

for(idim=0;idim<ndim;idim++)diff = p1->ve[idim] - p2->ve[idim];dfd->ve[idim] = diff/fabs(diff);

for(idim=0;idim<ndim;idim++)dwt->ve[idim] = (2*r - 2) * (-

param/pow(f,2)) * dfd->ve[idim];

break;

case 2:f = l2norm(p1,p2);r = param/f;

for(idim=0;idim<ndim;idim++)diff = p1->ve[idim] - p2->ve[idim];dfd->ve[idim] = diff/f;

for(idim=0;idim<ndim;idim++)dwt->ve[idim] = (2*r - 2) * (-

param/pow(f,2)) * dfd->ve[idim];

break;

case 3:f = linfnorm(p1,p2);r = param/f;

mflag = -1;mdiff = 0.0;for(idim=0;idim<ndim;idim++)diff = fabs(p1->ve[idim] - p2->ve[idim]);if(diff > mdiff)mdiff = diff;mflag = idim;

diff = p1->ve[mflag] - p2->ve[mflag];dfd->ve[mflag] = diff/fabs(diff);dwt->ve[mflag] = (2*r - 2) * (-

param/pow(f,2)) * dfd->ve[mflag];

break;

v_free(dfd);return(dwt);

File name: worker.c #include "plefg.h"

Page 178: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

169

#include "parmes.h"#include "ddefg_stiff.h"#include "ddforce.h"#include "ddpost.h"#include "worker_ddsolve.h"#include "post_output.h"

/*************************************//* The WORKER process *//*************************************//* Current implementation is for *//* Linear 3D with post_flag = 0 only *//*************************************/

void worker( MPI_Comm comm )/** VARIABLES DECLARATION*/int myid; //ID number of this masterprocessint root = MASTER; //ID number of theroot process

int sock_fd; //Network Socket filedescriptorFILE *f_log; //Log file descriptor

char master_name[MAX_NAME_LENGTH];//processor namechar log_name[MAX_NAME_LENGTH]; //Log filenamechar out_name[MAX_NAME_LENGTH]; //Outputfile name

MPI_Status status;

int ncell, nforce, ndloc, nsloc;

/* EFG Parameters */MAT *CNODES; //List of cell nodesMAT *CCON; //List of cell connectivitiesMAT *DLOC; //Desired displacementlocationsMAT *DDISP; //Matrix of desireddisplacementsMAT *DSTRESS; //Matrix of desiredstressesMAT *FDATA; //Forces dataMAT *FNODES; //List of force nodesMAT *FPDATA; //Fixed plane dataMAT *MDISP;MAT *NCON;MAT *NDISP;MAT *NLOAD;MAT *NODES;MAT *PCON;MAT *PNODES;MAT *SLOC; //Desired stress locationsMAT *mvec; //List of material propertiesVEC *evec; //The EFG parametersVEC *pvec; //Problem parameters

/* Other variables */int i, j; //Generic indicesint ndim; //Number of dimensionsint nnodes; //Number of nodesMAT *K; //Global stiffness matrixVEC *fdst; //Force vector due todistributed loadVEC *DISP; //Global displacement vector

MPI_Comm_rank( comm, &myid ); //Get thisprocess ID

/* Broadcast the fundamental data toworkers */MPI_Bcast_vector( &evec, root, comm );MPI_Bcast_vector( &pvec, root, comm );MPI_Bcast_matrix( &mvec, root, comm );MPI_Bcast_matrix( &CCON, root, comm );MPI_Bcast_matrix( &CNODES, root, comm );MPI_Bcast_matrix( &DLOC, root, comm );MPI_Bcast_matrix( &FDATA, root, comm );MPI_Bcast_matrix( &FNODES, root, comm );MPI_Bcast_matrix( &FPDATA, root, comm );MPI_Bcast_matrix( &NCON, root, comm );MPI_Bcast_matrix( &NDISP, root, comm );MPI_Bcast_matrix( &NLOAD, root, comm );MPI_Bcast_matrix( &NODES, root, comm );MPI_Bcast_matrix( &PCON, root, comm );MPI_Bcast_matrix( &PNODES, root, comm );MPI_Bcast_matrix( &SLOC, root, comm );MPI_Bcast( &ncell, 1, MPI_INTEGER, root,comm );MPI_Bcast( &nforce, 1, MPI_INTEGER, root,comm );MPI_Bcast( &ndloc, 1, MPI_INTEGER, root,comm );MPI_Bcast( &nsloc, 1, MPI_INTEGER, root,comm );

/* Broadcast the queue server name */MPI_Bcast( master_name, MAX_NAME_LENGTH,MPI_CHAR, root, comm );

/* Open the log file */if ( (int) pvec->ve[8] == 0 ) f_log = stdout; else MPI_Bcast( out_name, MAX_NAME_LENGTH,

MPI_CHAR, root, comm );sprintf( log_name, "%s_pid%d.log",

out_name, myid );f_log = fopen( log_name, "w" );

/* Connect to the queue server */sock_fd = connect_to_server( master_name,QSERV_PORT );

/** FORM GLOBAL STIFFNESS MTX*/fprintf( f_log, "\n[%d] Forming the globalstiffness matrix\n", myid );ndim = NODES->n; nnodes = NODES->m;K = m_get( ndim*nnodes, ndim*nnodes );ddefg_stiff( comm, myid, sock_fd, f_log,ncell, evec, pvec, mvec,

CCON, CNODES, NODES, &K );

/** FORM GLOBAL FORCE VECTOR*/fprintf( f_log, "\n[%d] Forming the globalforce vector\n", myid );

/* The distributed forces */fdst = v_get( ndim*nnodes );if ( nforce > 0 )

Page 179: A Parallel Implementation of the Element-Free Galerkin Method on a Network of PCs

170

fprintf( f_log, "[%d] Forming thedistributed force vector\n", myid );ddforce( comm, myid, sock_fd, f_log,

nforce, evec, pvec,FDATA, FNODES, NODES, &fdst );

v_free( fdst );

/** SOLVE THE DISCRETE EQNS*/fprintf( f_log, "\n[%d] Solving thediscrete system of equations...\n", myid );worker_ddsolve( comm, &status, root );fprintf( f_log, "[%d] The system ofequations solved.\n", myid );MPI_Bcast_vector( &DISP, root, comm );

/** POST-PROCESSING*/fprintf( f_log, "\n[%d] Post-processingthe results...\n", myid );

/* Post-process for the desireddisplacements */MDISP = m_get( NODES->m, 3 );for ( i = 0; i < (int) NODES->m; i++ ) for ( j = 0; j < 3; j++ ) MDISP->me[ i ][ j ] = DISP->ve[ 3*i + j

];ddpost_displ( comm, myid, sock_fd, f_log,ndloc, evec, pvec,

DLOC, MDISP, NODES, &DDISP );

/* Post-process for the desired stresses*/ddpost_stress( comm, myid, sock_fd, f_log,nsloc, evec, pvec, mvec,

DISP, NODES, SLOC, &DSTRESS );

/* Disconnect from the queue server */close( sock_fd );

/** Clean up the allocated memory*/m_free( CNODES ); m_free( CCON ); m_free(DDISP ); m_free( DLOC );m_free( DSTRESS ); m_free( FDATA );m_free( FNODES ); m_free( FPDATA );m_free( K ); m_free( MDISP ); m_free(mvec ); m_free( NCON );m_free( NDISP ); m_free( NLOAD ); m_free(NODES ); m_free( PCON );m_free( PNODES ); m_free( SLOC );v_free( DISP ); v_free( evec );

v_free( pvec );

/** Finish the logging*/fprintf( f_log, "[%d] Finished!\n", myid);fprintf( f_log, "\n" );fclose( f_log );

return;

File name: worker_ddsolve.c #include "parmes.h"#include "worker_parallel_gauss.h"

void worker_ddsolve( MPI_Comm comm,MPI_Status *status, int master_pid )worker_parallel_gauss( comm, status,master_pid );return;

File name: worker_parallel_gauss.c #include "parmes.h"#include "parallel_gauss.h"

void worker_parallel_gauss( MPI_Comm comm,MPI_Status *status, int master_pid )int myid;MAT *a;VEC *xlocal;

MPI_Comm_rank( comm, &myid );

/* Receive a package from the master */MPI_Recv_matrix( &a, master_pid,master_pid, comm, status );

/* Perform Gaussian Elimination */xlocal = parallel_gauss( comm, status, a);

/* Send the results to the master */MPI_Send_vector( &xlocal, master_pid,myid, comm );

m_free( a );v_free( xlocal );

return;