parallel scientific computing and optimization€¦ · parallel scientiﬁc computing in industrial...

PARALLEL SCIENTIFIC COMPUTINGAND OPTIMIZATION

Springer Optimization and Its Applications

VOLUME 27

Managing EditorPanos M. Pardalos (University of Florida)

Editor—Combinatorial OptimizationDing-Zhu Du (University of Texas at Dallas)

Advisory BoardJ. Birge (University of Chicago)C.A. Floudas (Princeton University)F. Giannessi (University of Pisa)H.D. Sherali (Virginia Polytechnic and State University)T. Terlaky (McMaster University)Y. Ye (Stanford University)

Aims and ScopeOptimization has been expanding in all directions at an astonishing rateduring the last few decades. New algorithmic and theoretical techniqueshave been developed, the diffusion into other disciplines has proceeded at arapid pace, and our knowledge of all aspects of the field has grown even moreprofound. At the same time, one of the most striking trends in optimizationis the constantly increasing emphasis on the interdisciplinary nature of thefield. Optimization has been a basic tool in all areas of applied mathematics,engineering, medicine, economics and other sciences.

The series Springer Optimization and Its Applications publishes under-graduate and graduate textbooks, monographs and state-of-the-art exposi-tory works that focus on algorithms for solving optimization problems andalso study applications involving such problems. Some of the topics cov-ered include nonlinear optimization (convex and nonconvex), network flowproblems, stochastic optimization, optimal control, discrete optimization,multi-objective programming, description of software packages, approxima-tion techniques and heuristic approaches.

PARALLEL SCIENTIFIC COMPUTINGAND OPTIMIZATION

Advances and Applications

By

RAIMONDAS CIEGISVilnius Gediminas Technical University, Lithuania

DAVID HENTYUniversity of Edinburgh, United Kingdom

BO KAGSTROMUmea University, Sweden

JULIUS ZILINSKASVilnius Gediminas Technical University and Institute of Mathematicsand Informatics, Lithuania

123

Raimondas Ciegis David HentyDepartment of Mathematical Modelling EPCCVilnius Gediminas Technical University The University of EdinburghSauletekio al. 11 James Clerk Maxwell BuildingLT-10223 Vilnius Mayfield RoadLithuania Edinburgh EH9 [email protected] United Kingdom

[email protected]

Bo Kagstrom Julius ZilinskasDepartment of Computing Science and Vilnius Gediminas Technical University and

High Performance Computing Institute of Mathematics and InformaticsCenter North (HPC2N) Akademijos 4

Umea University LT-08663 VilniusSE-901 87 Umea LithuaniaSweden [email protected]@cs.umu.se

ISSN: 1931-6828ISBN: 978-0-387-09706-0 e-ISBN: 978-0-387-09707-7

Library of Congress Control Number: 2008937480

Mathematics Subject Classification (2000): 15-xx, 65-xx, 68Wxx, 90Cxx

c© Springer Science+Business Media, LLC 2009All rights reserved. This work may not be translated or copied in whole or in part without the writtenpermission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York,NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use inconnection with any form of information storage and retrieval, electronic adaptation, computer software,or by similar or dissimilar methodology now known or hereafter developed is forbidden.The use in this publication of trade names, trademarks, service marks, and similar terms, even if they arenot identified as such, is not to be taken as an expression of opinion as to whether or not they are subjectto proprietary rights.

Printed on acid-free paper

springer.com

Preface

The book is divided into four parts: Parallel Algorithms for Matrix Computations,Parallel Optimization, Management of Parallel Programming Models and Data, andParallel Scientific Computing in Industrial Applications.

The first part of the book includes chapters on parallel matrix computations.The chapter by R. Granat, I. Jonsson, and B. Kagstrom, “RECSY and SCASYLibrary Software: Recursive Blocked and Parallel Algorithms for Sylvester-typeMatrix Equations with Some Applications”, gives an overview of state-of-the-arthigh-performance computing (HPC) algorithms and software for solving variousstandard and generalized Sylvester-type matrix equations. Computer-aided controlsystem design (CACSD) is a great source of applications for matrix equations in-cluding different eigenvalue and subspace problems and for condition estimation.The parallelization is invoked at two levels: globally in a distributed memoryparadigm, and locally on shared memory or multicore nodes as part of the dis-tributed memory environment.

In the chapter by A. Jakusev, R. Ciegis, I. Laukaityte, and V. Trofimov, “Par-allelization of Linear Algebra Algorithms Using ParSol Library of MathematicalObjects”, the mathematical objects library ParSol is described and evaluated. Itis applied to implement the finite difference scheme to solve numerically a sys-tem of PDEs describing a nonlinear interaction of two counter-propagating laserwaves.

The chapter by R.L. Muddle, J.W. Boyle, M.D. Mihajlovic, and M. Heil, “TheDevelopment of an Object-Oriented Parallel Block Preconditioning Framework”,is devoted to the analysis of block preconditioners that are applicable to problemsthat have different types of degree of freedom. The authors discuss the developmentof an object-oriented parallel block preconditioning framework within oomph-lib ,the object-oriented, multi-physics, finite-element library, available as open-sourcesoftware. The performance of this framework is demonstrated for problems fromnon-linear elasticity, fluid mechanics, and fluid-structure interaction.

In the chapter by C. Denis, R. Couturier, and F. Jezequel, “A Sparse LinearSystem Solver Used in a Distributed and Heterogenous Grid Computing Environ-ment”, the GREMLINS (GRid Efficient Linear Systems) solver of systems of linear

v

vi Preface

equations is developed. The algorithm is based on multisplitting techniques, and anew balancing algorithm is presented.

The chapter by A.G. Sunderland, “Parallel Diagonalization Performance onHigh-Performance Computers”, analyzes the performance of parallel eigensolversfrom numerical libraries such as ScaLAPACK on the latest parallel architecturesusing data sets derived from large-scale scientific applications.

The book continues with the second part focused on parallel optimization. In thechapter by J. Zilinskas, “Parallel Global Optimization in Multidimensional Scal-ing”, global optimization methods are outlined, and global optimization algorithmsfor multidimensional scaling are reviewed with particular emphasis on parallel com-puting. Global optimization algorithms are computationally intensive, and solutiontime crucially depends on the dimensionality of a problem. Parallel computing en-ables solution of larger problems.

The chapter by K. Woodsend and J. Gondzio, “High-Performance Parallel Sup-port Vector Machine Training”, shows how the training process of support vectormachines can be reformulated to become suitable for high-performance parallelcomputing. Data is pre-processed in parallel to generate an approximate low-rankCholesky decomposition. An optimization solver then exploits the problem’s struc-ture to perform many linear algebra operations in parallel, with relatively low datatransfer between processors, resulting in excellent parallel efficiency for very-large-scale problems.

The chapter by R. Paulavicius and J. Zilinskas, “Parallel Branch and Bound Al-gorithm with Combination of Lipschitz Bounds over Multidimensional Simplicesfor Multicore Computers”, presents parallelization of a branch and bound algorithmfor global Lipschitz minimization with a combination of extreme (infinite and first)and Euclidean norms over a multidimensional simplex. OpenMP is used to imple-ment the parallel version of the algorithm for multicore computers. The efficiencyof the parallel algorithm is studied using an extensive set of multidimensional testfunctions for global optimization.

The chapter by S. Ivanikovas, E. Filatovas, and J. Zilinskas, “Experimental Inves-tigation of Local Searches for Optimization of Grillage-Type Foundations”, presentsa multistart approach for optimal pile placement in grillage-type foundations. Vari-ous algorithms for local optimization are applied, and their performance is experi-mentally investigated and compared. Parallel computations is used to speed up ex-perimental investigation.

The third part of the book covers management issues of parallel programs anddata. In the chapter by D. Henty and A. Gray, “Comparison of the UK NationalSupercomputer Services: HPCx and HECToR”, an overview of the two current UKnational HPC services, HPCx and HECToR, are given. Such results are particularlyinteresting, as these two machines will now be operating together for some time andusers have a choice as to which machine best suits their requirements. Results ofextensive experiments are presented.

In the chapter by I.T. Todorov, I.J. Bush, and A.R. Porter, “DL POLY 3 I/O:Analysis, Alternatives, and Future Strategies”, it is noted that an important bottle-neck in the scalability and efficiency of any molecular dynamics software is the I/O

Preface vii

speed and reliability as data has to be dumped and stored for postmortem analysis.This becomes increasingly more important when simulations scale to many thou-sands of processors and system sizes increase to many millions of particles. Thisstudy outlines the problems associated with I/O when performing large classic MDruns and shows that it is necessary to use parallel I/O methods when studying largesystems.

The chapter by M. Piotrowski, “Mixed Mode Programming on HPCx”, presentsseveral benchmark codes based on iterative Jacobi relaxation algorithms: a pureMPI version and three mixed mode (MPI+OpenMP) versions. Their performanceis studied and analyzed on mixed architecture – cluster of shared memory nodes.The results show that none of the mixed mode versions managed to outperform thepure MPI, mainly due to longer MPI point-to-point communication times.

The chapter by A. Grothey, J. Hogg, K. Woodsend, M. Colombo, and J. Gondzio,“A Structure Conveying Parallelizable Modeling Language for Mathematical Pro-gramming”, presents an idea of using a modeling language for the definition ofmathematical programming problems with block constructs for description of struc-ture that may make parallel model generation of large problems possible. The pro-posed structured modeling language is based on the popular modeling languageAMPL and implemented as a pre-/postprocessor to AMPL. Solvers based on blocklinear algebra exploiting interior point methods and decomposition solvers cantherefore directly exploit the structure of the problem.

The chapter by R. Smits, M. Kramer, B. Stappers, and A. Faulkner, “Compu-tational Requirements for Pulsar Searches with the Square Kilometer Array”, isdevoted to the analysis of computational requirements for beam forming and dataanalysis, assuming the SKA Design Studies’ design for the SKA, which consistsof 15-meter dishes and an aperture array. It is shown that the maximum data ratefrom a pulsar survey using the 1-km core becomes about 2.7·1013 bytes per sec-ond and requires a computation power of about 2.6·1017 ops for a deep real-timeanalysis.

The final and largest part of the book covers applications of parallel comput-ing. In the chapter by R. Ciegis, F. Gaspar, and C. Rodrigo, “Parallel MultiblockMultigrid Algorithms for Poroelastic Models”, the application of a parallel multi-grid method for the two-dimensional poroelastic model is investigated. A domain ispartitioned into structured blocks, and this geometrical structure is used to develop aparallel version of the multigrid algorithm. The convergence for different smoothersis investigated, and it is shown that the box alternating line Vanka-type smoother isrobust and efficient.

The chapter by V. Starikovicius, R. Ciegis, O. Iliev, and Z. Lakdawala, “A ParallelSolver for the 3D Simulation of Flows Through Oil Filters”, presents a paral-lel solver for the 3D simulation of flows through oil filters. The Navier–Stokes–Brinkmann system of equations is used to describe the coupled laminar flow of in-compressible isothermal oil through open cavities and cavities with filtering porousmedia. Two parallel algorithms are developed on the basis of the sequential numer-ical algorithm. The performance of implementations of both algorithms is studiedon clusters of multicore computers.

viii Preface

The chapter by S. Eastwood, P. Tucker, and H. Xia, “High-Performance Com-puting in Jet Aerodynamics”, is devoted to the analysis of methods for reductionof the noise generated by the propulsive jet of an aircraft engine. The use of high-performance computing facilities is essential, allowing detailed flow studies to becarried out that help to disentangle the effects of numerics from flow physics. Thescalability and efficiency of the presented parallel algorithms are investigated.

The chapter by G. Jankeviciute and R. Ciegis, “Parallel Numerical Solver forthe Simulation of the Heat Conduction in Electrical Cables”, is devoted to themodeling of the heat conduction in electrical cables. Efficient parallel numericalalgorithms are developed to simulate the heat transfer in cable bundles. They areimplemented using MPI and targeted for distributed memory computers, includingclusters of PCs.

The chapter by A. Deveikis, “Orthogonalization Procedure for Antisymmetriza-tion of J-shell States”, presents an efficient procedure for construction of the anti-symmetric basis of j-shell states with isospin. The approach is based on an efficientalgorithm for construction of the idempotent matrix eigenvectors, and it reduces toan orthogonalization procedure. The presented algorithm is much faster than thediagonalization routine rs() from the EISPACK library.

In the chapter by G.A. Siamas, X. Jiang, and L.C. Wrobel, “Parallel DirectNumerical Simulation of an Annular Gas–Liquid Two-Phase Jet with Swirl”, theflow characteristics of an annular swirling liquid jet in a gas medium is examinedby direct solution of the compressible Navier–Stokes equations. A mathematicalformulation is developed that is capable of representing the two-phase flow sys-tem while the volume of fluid method has been adapted to account for the gascompressibility. Fully 3D parallel direct numerical simulation (DNS) is performedutilizing 512 processors, and parallelization of the code was based on domaindecomposition.

In the chapter by I. Laukaityte, R. Ciegis, M. Lichtner, and M. Radziunas, “Par-allel Numerical Algorithm for the Traveling Wave Model”, a parallel algorithm forthe simulation of the dynamics of high-power semiconductor lasers is presented.The model equations describing the multisection broad-area semiconductor lasersare solved by the finite difference scheme constructed on staggered grids. The algo-rithm is implemented by using the ParSol tool of parallel linear algebra objects.

The chapter by X. Guo, M. Pinna, and A.V. Zvelindovsky, “Parallel Algorithmfor Cell Dynamics Simulation of Soft Nano-Structured Matter”, presents a parallelalgorithm for large-scale cell dynamics simulation. With the efficient strategy of do-main decomposition and the fast method of neighboring points location, simulationsof large-scale systems have been successfully performed.

The chapter by Z. Dapkunas and J. Kulys, “Docking and Molecular DynamicsSimulation of Complexes of High and Low Reactive Substrates with Peroxidases”,presents docking and parallel molecular dynamics simulations of two peroxidases(ARP and HRP) and two compounds (LUM and IMP). The study of docking simula-tions gives a clue to the reason of different reactivity of LUM and similar reactivityof IMP toward two peroxidases. In the case of IMP, −OH group is near FE=Oin both peroxidases, and hydrogen bond formation between −OH and Fe=0 is

Preface ix

possible. In the case of LUM, N-H is near Fe=0 in ARP and the hydrogen formationis possible, but it is farther in HRP and hydrogen bond with Fe=O is not formed.

The works were presented at the bilateral workshop of British and Lithuanian sci-entists “High Performance Scientific Computing” held in Druskininkai, Lithuania,5–8 February 2008. The workshop was supported by the British Council throughthe INYS program.

The British Council’s International Networking for Young Scientists pro-gram (INYS) brings together young researchers from the UK and other countriesto make new contacts and promote the creative exchange of ideas through shortconferences. Mobility for young researchers facilitates the extended laboratory inwhich all researchers now operate: it is a powerful source of new ideas and a strongforce for creativity. Through the INYS program, the British Council helps to de-velop high-quality collaborations in science and technology between the UK andother countries and shows the UK as a leading partner for achievement in worldscience, now and in the future. The INYS program is unique in that it brings to-gether scientists in any priority research area and helps develop working relation-ships. It aims to encourage young researchers to be mobile and expand their knowl-edge. The INYS supported workshop “High Performance Scientific Computing”was organized by the University of Edinburgh, UK, and Vilnius Gediminas Tech-nical University, Lithuania. The meeting was coordinated by Professor R. Ciegisand Dr. J. Zilinskas from Vilnius Gediminas Technical University and Dr. D. Hentyfrom The University of Edinburgh. The homepage of the workshop is available athttp://techmat.vgtu.lt/˜inga/inys2008/.

Twenty-four talks were selected from thirty-two submissions from young UKand Lithuanian researchers. Professor B. Kagstrom from Umea University, Sweden,and Dr. I. Todorov from Daresbury Laboratory, UK, gave invited lectures. Reviewlectures were also given by the coordinators of the workshop.

This book contains review papers and revised contributed papers presented at theworkshop. All twenty-three papers have been reviewed. We are very thankful to thereviewers for their recommendations and comments.

We hope that this book will serve as a valuable reference document for the sci-entific community and will contribute to the future cooperation between the partici-pants of the workshop.

We would like to thank the British Council for financial support. We are verygrateful to the managing editor of the series, Professor Panos Pardalos, for his en-couragement.

Druskininkai, Lithuania Raimondas CiegisFebruary 2008 David Henty

Bo KagstromJulius Zilinskas

Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

List of Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix

Part I Parallel Algorithms for Matrix Computations

RECSY and SCASY Library Software: Recursive Blockedand Parallel Algorithms for Sylvester-Type Matrix Equationswith Some Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Robert Granat, Isak Jonsson, and Bo Kagstrom

1 Motivation and Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Variants of Bartels–Stewart’s Schur Method . . . . . . . . . . . . . . . . . . . 53 Blocking Strategies for Reduced Matrix Equations . . . . . . . . . . . . . 6

3.1 Explicit Blocked Methods for Reduced MatrixEquations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.2 Recursive Blocked Methods for Reduced MatrixEquations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4 Parallel Algorithms for Reduced Matrix Equations . . . . . . . . . . . . . 94.1 Distributed Wavefront Algorithms . . . . . . . . . . . . . . . . . . . 104.2 Parallelization of Recursive Blocked Algorithms . . . . . . . 11

5 Condition Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115.1 Condition Estimation in RECSY . . . . . . . . . . . . . . . . . . . . . 135.2 Condition Estimation in SCASY . . . . . . . . . . . . . . . . . . . . . 13

6 Library Software Highlights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136.1 The RECSY Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136.2 The SCASY Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

7 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 Some Control Applications and Extensions . . . . . . . . . . . . . . . . . . . . 18

8.1 Condition Estimation of Subspaces with SpecifiedEigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

8.2 Periodic Matrix Equations in CACSD . . . . . . . . . . . . . . . . 19References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

xi

xii Contents

Parallelization of Linear Algebra Algorithms Using ParSol Libraryof Mathematical Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25Alexander Jakusev, Raimondas Ciegis, Inga Laukaityte,and Vyacheslav Trofimov

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 The Principles and Implementation Details of ParSol Library . . . . . 27

2.1 Main Classes of ParSol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.2 Implementation of ParSol . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3 Parallel Algorithm for Simulation of Counter-propagatingLaser Beams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.1 Invariants of the Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.2 Finite Difference Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.3 Parallel Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.4 Results of Computational Experiments . . . . . . . . . . . . . . . . 34

4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

The Development of an Object-Oriented Parallel Block PreconditioningFramework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37Richard L. Muddle, Jonathan W. Boyle, Milan D. Mihajlovic,and Matthias Heil

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372 Block Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 The Performance of the Block Preconditioning Framework . . . . . . 40

3.1 Reference Problem : 2D Poisson . . . . . . . . . . . . . . . . . . . . . 403.2 Non-linear Elasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.3 Fluid Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.4 Fluid–Structure Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . 44


A Sparse Linear System Solver Used in a Distributed and HeterogenousGrid Computing Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47Christophe Denis, Raphael Couturier, and Fabienne Jezequel

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472 The Parallel Linear Multisplitting Method Used

in the GREMLINS Solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483 Load Balancing of the Direct Multisplitting Method . . . . . . . . . . . . 504 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.1 Experiments with a Matrix Issuedfrom an Advection-Diffusion Model . . . . . . . . . . . . . . . . . . 51

4.2 Results of the Load Balancing . . . . . . . . . . . . . . . . . . . . . . . 525 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

Contents xiii

Parallel Diagonalization Performance on High-Performance Computers . 57Andrew G. Sunderland

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572 Parallel Diagonalization Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

2.1 Equations for Matrix Diagonalizations in PRMAT . . . . . . 582.2 Equations for Matrix Diagonalizations in CRYSTAL . . . . 582.3 Symmetric Eigensolver Methods . . . . . . . . . . . . . . . . . . . . . 592.4 Eigensolver Parallel Library Routines . . . . . . . . . . . . . . . . . 60

3 Testing Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Part II Parallel Optimization

Parallel Global Optimization in Multidimensional Scaling . . . . . . . . . . . . . 69Julius Zilinskas

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 692 Global Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703 Multidimensional Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724 Multidimensional Scaling with City-Block Distances . . . . . . . . . . . 755 Parallel Algorithms for Multidimensional Scaling . . . . . . . . . . . . . . 776 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

High-Performance Parallel Support Vector Machine Training . . . . . . . . . . 83Kristian Woodsend and Jacek Gondzio

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 832 Interior Point Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 853 Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

3.1 Binary Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 863.2 Linear SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 863.3 Non-linear SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

4 Parallel Partial Cholesky Decomposition . . . . . . . . . . . . . . . . . . . . . . 885 Implementing the QP for Parallel Computation . . . . . . . . . . . . . . . . 89

5.1 Linear Algebra Operations . . . . . . . . . . . . . . . . . . . . . . . . . . 905.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90


Parallel Branch and Bound Algorithm with Combination of LipschitzBounds over Multidimensional Simplices for Multicore Computers . . . . . . 93Remigijus Paulavicius and Julius Zilinskas

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 932 Parallel Branch and Bound with Simplicial Partitions . . . . . . . . . . . 943 Results of Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

xiv Contents


Experimental Investigation of Local Searches for Optimizationof Grillage-Type Foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103Sergejus Ivanikovas, Ernestas Filatovas, and Julius Zilinskas

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1032 Optimization of Grillage-Type Foundations . . . . . . . . . . . . . . . . . . . 1043 Methods for Local Optimization of Grillage-Type Foundations . . . 1044 Experimental Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1055 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

Part III Management of Parallel Programming Models and Data

Comparison of the UK National Supercomputer Services: HPCxand HECToR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115David Henty and Alan Gray

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1152 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

2.1 HPCx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1162.2 HECToR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

3 System Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1183.1 Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1193.2 Interconnect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

4 Applications Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1204.1 PDNS3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1214.2 NAMD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122


DL POLY 3 I/O: Analysis, Alternatives, and Future Strategies . . . . . . . . . 125Ilian T. Todorov, Ian J. Bush, and Andrew R. Porter

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1252 I/O in DL POLY 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

2.1 Serial Direct Access I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1262.2 Parallel Direct Access I/O . . . . . . . . . . . . . . . . . . . . . . . . . . 1272.3 MPI-I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1272.4 Serial I/O Using NetCDF . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1284 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

Mixed Mode Programming on HPCx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133Michał Piotrowski

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1332 Benchmark Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

Contents xv

3 Mixed Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1354 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1375 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1396 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

A Structure Conveying Parallelizable Modeling Languagefor Mathematical Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145Andreas Grothey, Jonathan Hogg, Kristian Woodsend, Marco Colombo,and Jacek Gondzio

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1452 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

2.1 Mathematical Programming . . . . . . . . . . . . . . . . . . . . . . . . . 1462.2 Modeling Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

3 Solution Approaches to Structured Problems . . . . . . . . . . . . . . . . . . 1493.1 Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1493.2 Interior Point Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

4 Structure Conveying Modeling Languages . . . . . . . . . . . . . . . . . . . . 1504.1 Other Structured Modeling Approaches . . . . . . . . . . . . . . . 1514.2 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1514.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153


Computational Requirements for Pulsar Searches with the SquareKilometer Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157Roy Smits, Michael Kramer, Ben Stappers, and Andrew Faulkner

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1572 SKA Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1583 Computational Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

3.1 Beam Forming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1593.2 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161


Part IV Parallel Scientific Computing in Industrial Applications

Parallel Multiblock Multigrid Algorithms for Poroelastic Models . . . . . . . 169Raimondas Ciegis, Francisco Gaspar, and Carmen Rodrigo

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1692 Mathematical Model and Stabilized Difference Scheme . . . . . . . . . 1713 Multigrid Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

3.1 Box Relaxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1734 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1745 Parallel Multigrid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

xvi Contents

5.1 Code Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1765.2 Critical Issues Regarding Parallel MG . . . . . . . . . . . . . . . . 177


A Parallel Solver for the 3D Simulation of Flows Through Oil Filters . . . . 181Vadimas Starikovicius, Raimondas Ciegis, Oleg Iliev, and Zhara Lakdawala

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1812 Mathematical Model and Discretization . . . . . . . . . . . . . . . . . . . . . . . 182

2.1 Time Discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1832.2 Finite Volume Discretization in Space . . . . . . . . . . . . . . . . 1842.3 Subgrid Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

3 Parallel Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1853.1 DD Parallel Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1853.2 OpenMP Parallel Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 189


High-Performance Computing in Jet Aerodynamics . . . . . . . . . . . . . . . . . . 193Simon Eastwood, Paul Tucker, and Hao Xia

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1932 Numerical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

2.1 HYDRA and FLUXp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1952.2 Boundary and Initial Conditions . . . . . . . . . . . . . . . . . . . . . 1962.3 Ffowcs Williams Hawkings Surface . . . . . . . . . . . . . . . . . . 196

3 Computing Facilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1974 Code Parallelization and Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . 1985 Axisymmetric Jet Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

5.1 Problem Set Up and Mesh . . . . . . . . . . . . . . . . . . . . . . . . . . 1995.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

6 Complex Geometries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2006.1 Mesh and Initial Conditions . . . . . . . . . . . . . . . . . . . . . . . . . 2006.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203


Parallel Numerical Solver for the Simulation of the Heat Conductionin Electrical Cables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207Gerda Jankeviciute and Raimondas Ciegis

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2072 The Model of Heat Conduction in Electrical Cables

and Discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2083 Parallel Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2114 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

Contents xvii

Orthogonalization Procedure for Antisymmetrization of J-shell States . . . 213Algirdas Deveikis

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2132 Antisymmetrization of Identical Fermions States . . . . . . . . . . . . . . . 2143 Calculations and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2174 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

Parallel Direct Numerical Simulation of an Annular Gas–LiquidTwo-Phase Jet with Swirl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223George A. Siamas, Xi Jiang, and Luiz C. Wrobel

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2232 Governing Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2243 Computational Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

3.1 Time Advancement, Discretization, and Parallelization . . 2263.2 Boundary and Initial Conditions . . . . . . . . . . . . . . . . . . . . . 227

4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2294.1 Instantaneous Flow Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 2294.2 Time-Averaged Data, Velocity Histories, and Energy

Spectra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2325 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

Parallel Numerical Algorithm for the Traveling Wave Model . . . . . . . . . . . 237Inga Laukaityte, Raimondas Ciegis, Mark Lichtner,and Mindaugas Radziunas

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2372 Mathematical Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2403 Finite Difference Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

3.1 Discrete Transport Equations for Optical Fields . . . . . . . . 2433.2 Discrete Equations for Polarization Functions . . . . . . . . . . 2443.3 Discrete Equations for the Carrier Density Function . . . . . 2443.4 Linearized Numerical Algorithm . . . . . . . . . . . . . . . . . . . . . 244

4 Parallelization of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2464.1 Parallel Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2464.2 Scalability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2474.3 Computational Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 248


Parallel Algorithm for Cell Dynamics Simulation of SoftNano-Structured Matter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253Xiaohu Guo, Marco Pinna, and Andrei V. Zvelindovsky

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2532 The Cell Dynamics Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2543 Parallel Algorithm of CDS Method . . . . . . . . . . . . . . . . . . . . . . . . . . 256

xviii Contents

3.1 The Spatial Decomposition Method . . . . . . . . . . . . . . . . . . 2563.2 Parallel Platform and Performance Tuning . . . . . . . . . . . . . 2583.3 Performance Analysis and Results . . . . . . . . . . . . . . . . . . . 259


Docking and Molecular Dynamics Simulation of Complexes of Highand Low Reactive Substrates with Peroxidases . . . . . . . . . . . . . . . . . . . . . . . 263Zilvinas Dapkunas and Juozas Kulys

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2632 Experimental . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265

2.1 Ab Initio Molecule Geometry Calculations . . . . . . . . . . . . 2652.2 Substrates Docking in Active Site of Enzyme . . . . . . . . . . 2652.3 Molecular Dynamics of Substrate–Enzyme Complexes . . 266

3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2673.1 Substrate Docking Modeling . . . . . . . . . . . . . . . . . . . . . . . . 2673.2 Molecular Dynamics Simulation . . . . . . . . . . . . . . . . . . . . . 268


Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273

List of Contributors

Jonathan W. BoyleSchool of Mathematics, University of Manchester, Oxford Road, Manchester, M139PL, UK, e-mail: [email protected]

Ian J. BushSTFC Daresbury Laboratory, Warrington WA4 4AD, UK

Marco ColomboSchool of Mathematics, University of Edinburgh, Edinburgh, UK

Raphael CouturierLaboratoire d Informatique del Universite de Franche-Comte, BP 527, 90016Belfort Cedex, France, e-mail: [email protected]

Raimondas CiegisVilnius Gediminas Technical University, Sauletekio al. 11, LT-10223 Vilnius,Lithuania, e-mail: [email protected]

Zilvinas DapkunasVilnius Gediminas Technical University, Department of Chemistryand Bioengineering, Sauletekio Avenue 11, LT-10223 Vilnius, Lithuania,e-mail: [email protected]

Christophe DenisSchool of Electronics, Electrical Engineering & Computer Science, The Queen’sUniversity of Belfast, Belfast BT7 1NN, UK, e-mail: [email protected] Univ Paris 06, Laboratoire d’Informatique LIP6, 4 place Jussieu, 75252Paris Cedex 05, France, e-mail: [email protected]

Algirdas DeveikisDepartment of Applied Informatics, Vytautas Magnus University, Kaunas,Lithuania, e-mail: [email protected]

xix

xx List of Contributors

Simon EastwoodWhittle Laboratory, University of Cambridge, Cambridge, UK,e-mail: [email protected]

Andrew FaulknerJodrell Bank Centre for Astrophysics, University of Manchester, Manchester, UK

Ernestas FilatovasInstitute of Mathematics and Informatics, Akademijos 4, LT-08663 Vilnius,Lithuania, e-mail: [email protected]

Francisco GasparDepartamento de Matematica Aplicada, Universidad de Zaragoza, 50009 Zaragoza,Spain, e-mail: [email protected]

Jacek GondzioSchool of Mathematics, University of Edinburgh, The King’s Buildings, Edinburgh,EH9 3JZ, UK, e-mail: [email protected]

Robert GranatDepartment of Computing Science and HPC2N, Umea University, Umea, Sweden,e-mail: [email protected]

Alan GrayEdinburgh Parallel Computing Centre, The University of Edinburgh,Edinburgh, UK, e-mail: [email protected]

Andreas GrotheySchool of Mathematics, University of Edinburgh, Edinburgh, UK,e-mail: [email protected]

Xiaohu GuoSchool of Computing, Engineering and Physical Sciences, University of CentralLancashire, Preston, Lancashire, PR1 2HE, UK, e-mail: [email protected]

Matthias HeilSchool of Mathematics, University of Manchester, Oxford Road, Manchester,M13 9PL, UK, e-mail: [email protected]

David HentyEdinburgh Parallel Computing Centre, The University of Edinburgh,Edinburgh, UK, e-mail: [email protected]

Jonathan HoggSchool of Mathematics, University of Edinburgh, Edinburgh, UK

Oleg IlievFraunhofer ITWM, Fraunhofer-Platz 1, D-67663 Kaiserslautern, Germany,e-mail: [email protected]

Sergejus IvanikovasInstitute of Mathematics and Informatics, Akademijos 4, LT-08663 Vilnius,Lithuania, e-mail: [email protected]

List of Contributors xxi

Alexander JakusevVilnius Gediminas Technical University, Sauletekio al. 11, LT-10223, Vilnius,Lithuania, e-mail: [email protected]

Gerda JankeviciuteVilnius Gediminas Technical University, Sauletekio al. 11, LT-10223, Vilnius,Lithuania, e-mail: [email protected]

Fabienne JezequelUPMC Univ Paris 06, Laboratoire d’Informatique LIP6, 4 place Jussieu, 75252Paris Cedex 05, France, e-mail: [email protected]

Xi JiangBrunel University, Mechanical Engineering, School of Engineering and Design,Uxbridge, UB8 3PH, UK, e-mail: [email protected]

Isak JonssonDepartment of Computing Science and HPC2N, Umea University, Umea, Sweden,e-mail: [email protected]

Bo KagstromDepartment of Computing Science and HPC2N, Umea University, Umea, Sweden,e-mail: [email protected]

Michael KramerJodrell Bank Centre for Astrophysics, University of Manchester, Manchester, UK

Juozas KulysVilnius Gediminas Technical University, Department of Chemistryand Bioengineering, Sauletekio Avenue 11, LT-10223 Vilnius, Lithuania,e-mail: [email protected]

Zhara LakdawalaFraunhofer ITWM, Fraunhofer-Platz 1, D-67663 Kaiserslautern, Germany,e-mail: [email protected]

Inga LaukaityteVilnius Gediminas Technical University, Sauletekio al. 11, LT-10223, Vilnius,Lithuania, e-mail: [email protected]

Mark LichtnerWeierstrass Institute for Applied Analysis and Stochastics, Mohrenstarsse 39,10117 Berlin, Germany, e-mail: [email protected]

Milan D. MihajlovicSchool of Computer Science, University of Manchester, Oxford Road, Manchester,M13 9PL, UK, e-mail: [email protected]

Richard L. MuddleSchool of Computer Science, University of Manchester, Oxford Road, Manchester,M13 9PL, UK, e-mail: [email protected]

xxii List of Contributors

Remigijus PaulaviciusVilnius Pedagogical University, Studentu 39, LT-08106 Vilnius, Lithuania,e-mail: [email protected] of Mathematics and Informatics, Akademijos 4, LT-08663 Vilnius,Lithuania

Marco PinnaSchool of Computing, Engineering and Physical Sciences, University of CentralLancashire, Preston, Lancashire, PR1 2HE, UK, e-mail: [email protected]

Michał PiotrowskiEdinburgh Parallel Computing Centre, University of Edinburgh, Edinburgh, UK,e-mail: [email protected]

Andrew R. PorterSTFC Daresbury Laboratory, Warrington WA4 4AD, UK

Mindaugas RadziunasWeierstrass Institute for Applied Analysis and Stochastics, Mohrenstarsse 39,10117 Berlin, Germany, e-mail: [email protected]

Carmen RodrigoDepartamento de Matematica Aplicada, Universidad de Zaragoza, 50009 Zaragoza,Spain, e-mail: [email protected]

George A. SiamasBrunel University, Mechanical Engineering, School of Engineering and Design,Uxbridge, UB8 3PH, UK, e-mail: [email protected]

Roy SmitsJodrell Bank Centre for Astrophysics, University of Manchester, Manchester, UK,e-mail: [email protected]

Ben StappersJodrell Bank Centre for Astrophysics, University of Manchester, Manchester, UK

Vadimas StarikoviciusVilnius Gediminas Technical University, Sauletekio al. 11, LT-10223 Vilnius,Lithuania, e-mail: [email protected]

Andrew G. SunderlandSTFC Daresbury Laboratory, Warrington, UK, e-mail: [email protected]

Ilian T. TodorovSTFC Daresbury Laboratory, Warrington WA4 4AD, UK,e-mail: [email protected]

Vyacheslav TrofimovM. V. Lomonosov Moscow State University, Vorob’evy gory, 119992, Russia,e-mail: [email protected]

List of Contributors xxiii

Paul TuckerWhittle Laboratory, University of Cambridge, Cambridge, UK

Kristian WoodsendSchool of Mathematics, University of Edinburgh, The King’s Buildings, Edinburgh,EH9 3JZ, UK, e-mail: [email protected]

Luiz C. WrobelBrunel University, Mechanical Engineering, School of Engineering and Design,Uxbridge, UB8 3PH, UK, e-mail: [email protected]

Hao XiaWhittle Laboratory, University of Cambridge, Cambridge, UK

Andrei V. ZvelindovskySchool of Computing, Engineering and Physical Sciences, University ofCentral Lancashire, Preston, Lancashire, PR1 2HE, UK, e-mail: [email protected]

Julius ZilinskasInstitute of Mathematics and Informatics, Akademijos 4, LT-08663 Vilnius,Lithuania, e-mail: [email protected] Gediminas Technical University, Sauletekio 11, LT-10223 Vilnius,Lithuania, e-mail: [email protected]

Part IParallel Algorithms for Matrix

Computations

RECSY and SCASY Library Software:Recursive Blocked and Parallel Algorithmsfor Sylvester-Type Matrix Equations withSome Applications

Robert Granat, Isak Jonsson, and Bo Kagstrom

Abstract In this contribution, we review state-of-the-art high-performance com-puting software for solving common standard and generalized continuous-time anddiscrete-time Sylvester-type matrix equations. The analysis is based on RECSY andSCASY software libraries. Our algorithms and software rely on the standard Schurmethod. Two ways of introducing blocking for solving matrix equations in reduced(quasi-triangular) form are reviewed. Most common is to perform a fix block par-titioning of the matrices involved and rearrange the loop nests of a single-elementalgorithm so that the computations are performed on submatrices (matrix blocks).Another successful approach is to combine recursion and blocking.

We consider parallelization of algorithms for reduced matrix equations at twolevels: globally in a distributed memory paradigm, and locally on shared memory ormulticore nodes as part of the distributed memory environment. Distributed wave-front algorithms are considered to compute the solution to the reduced triangularsystems. Parallelization of recursive blocked algorithms is done in two ways. Thesimplest way is so-called implicit data parallelization, which is obtained by usingSMP-aware implementations of level 3 BLAS. Complementary to this, there is alsothe possibility of invoking task parallelism. This is done by explicit parallelizationof independent tasks in a recursion tree using OpenMP. A brief account of somesoftware issues for the RECSY and SCASY libraries is given. Theoretical resultsare confirmed by experimental results.

1 Motivation and Background

Matrix computations are fundamental and ubiquitous in computational science andits vast application areas. Along with the computer evolution, there is a continuing

Robert Granat · Isak Jonsson · Bo KagstromDepartment of Computing Science and HPC2N, Umea University, Swedene-mail: {granat · isak · bokg}@cs.umu.se

3

4 R. Granat, I. Jonsson, and B. Kagstrom

demand for new and improved algorithms and library software that is portable,robust, and efficient [13]. In this contribution, we review state-of-the-art high-performance computing (HPC) software for solving common standard and general-ized continuous-time (CT) and discrete-time (DT) Sylvester-type matrix equations,see Table 1. Computer-aided control system design (CACSD) is a great source ofapplications for matrix equations including different eigenvalue and subspace prob-lems and for condition estimation.

Both the RECSY and SCASY software libraries distinguish between one-sidedand two-sided matrix equations. For one-sided matrix equations, the solution is onlyinvolved in matrix products of two matrices, e.g., op(A)X or Xop(A), where op(A)can be A or AT. In two-sided matrix equations, the solution is involved in matrixproducts of three matrices, both to the left and to the right, e.g., op(A)Xop(B).The more complicated data dependency of the two-sided equations is normally ad-dressed in blocked methods from complexity reasons.

Solvability conditions for the matrix equations in Table 1 are formulated in termsof non-intersecting spectra of standard or generalized eigenvalues of the involvedcoefficient matrices and matrix pairs, respectively, or equivalently by nonzero asso-ciated sep-functions (see Sect. 5 and, e.g., [28, 24, 30] and the references therein).

The rest of this chapter is structured as follows. First, in Sect. 2, variants of thestandard Schur method for solving Sylvester-type matrix equations are briefly de-scribed. In Sect. 3, two blocking strategies for dense linear algebra computationsare discussed and applied to matrix equations in reduced (quasi-triangular) form.Section 4 reviews parallel algorithms for reduced matrix equations based on theexplicitly blocked and the recursively blocked algorithms discussed in the previ-ous section. Condition estimation of matrix equations and related topics are dis-cussed in Sect. 5. In Sect. 6, a brief account of some software issues for the RECSYand SCASY libraries are given. Section 7 presents some performance results withfocus on the hybrid parallelization model including message passing and multi-threading. Finally, Sect. 8 is devoted to some CACSD applications, namely condi-tion estimation of invariant subspaces of Hamiltonian matrices and periodic matrixequations.

Table 1 Considered standard and generalized matrix equations. CT and DT denote the continuous-time and discrete-time variants, respectively.

Name Matrix equation Acronym

Standard CT Sylvester AX−XB = C ∈ Rm×n SYCT

Standard CT Lyapunov AX +XAT = C ∈ Rm×m LYCT

Generalized Coupled Sylvester (AX−Y B,DX−Y E) = (C,F) ∈ R(m×n)×2 GCSY

Standard DT Sylvester AXB−X = C ∈ Rm×n SYDT

Standard DT Lyapunov AXAT −X = C ∈ Rm×m LYDT

Generalized Sylvester AXBT −CXDT = E ∈ Rm×n GSYL

Generalized CT Lyapunov AXET +EXAT = C ∈ Rm×m GLYCT

Generalized DT Lyapunov AXAT −EXET = C ∈ Rm×m GLYDT

RECSY and SCASY Library Software: Algorithms for Sylvester-Type Matrix Equations 5

2 Variants of Bartels–Stewart’s Schur Method

Our algorithms and software rely on the standard Schur method proposed already in1972 by Bartels and Stewart [6]. The generic standard method consists of four majorsteps: (1) initial transformation of the left-hand coefficient matrices to Schur form;(2) updates of the right-hand-side matrix with respect to the orthogonal transfor-mation matrices from the first step; (3) computation of the solution of the reducedmatrix equation from the first two steps in a combined forward and backward substi-tution process; (4) a retransformation of the solution from the third step, in terms ofthe orthogonal transformations from the first step, to get the solution of the originalmatrix equation.

Let us demonstrate the solution process by considering the SYCT equationAX−XB =C. Step 1 produces the Schur factorizations TA = QT AQ and TB = PT BP,where Q ∈ R

m×m and P ∈ Rn×n are orthogonal and TA and TB are upper quasi-

triangular, i.e., having 1× 1 and 2× 2 diagonal blocks corresponding to real andcomplex conjugate pairs of eigenvalues, respectively. Reliable and efficient algo-rithms for the Schur reduction step can be found in LAPACK [4] and in ScaLA-PACK [26, 7] for distributed memory (DM) environments.

Steps 2 and 4 are typically conducted by two consecutive matrix multiply(GEMM) operations [12, 33, 34]: C = QTCP and X = QXPT , where X is the solu-tion to the triangular equation

TAX − XTB = C,

resulting from steps 1 and 2, and solved in step 3. Similar Schur-based methods areformulated for the standard equations LYCT, LYDT, and SYDT.

It is also straightforward to extend the generic Bartels–Stewart method to thegeneralized matrix equations GCSY, GSYL, GLYCT, and GLYDT. Now, we mustrely on robust and efficient algorithms and software for the generalized Schur reduc-tion (Hessenberg-triangular reduction and the QZ algorithm) [40, 39, 10, 1, 2, 3, 32],and algorithms for solving triangular forms of generalized matrix equations.

For illustration we consider GCSY, the generalized coupled Sylvester equation(AX −Y B,DX −Y E) = (C,F) [37, 36]. In step 1, (A,D) and (B,E) are trans-formed to generalized real Schur form, i.e., (TA,TD) = (QT

1 AZ1,QT1 DZ1), (TB,TE) =

(QT2 BZ2,QT

2 EZ2), where TA and TB are upper quasi-triangular, TD and TE are uppertriangular, and Q1,Z1 ∈ R

m×m, Q2,Z2 ∈ Rn×n are orthogonal. Step 2 updates the

right-hand-side matrices (C, F) = (QT1 CZ2,QT

1 FZ2) leading to the reduced triangu-lar GCSY

(TAX− Y TB,TDX− Y TE) = (C, F),

which is solved in step 3. The solution (X ,Y ) of the original GCSY is obtained instep 4 by the orthogonal equivalence transformation (X ,Y ) = (Z1XZT

2 ,Q1Y QT2 ).

Notice that care has to be taken to preserve the symmetry of the right-hand-side Cin the Lyapunov equations LYCT, LYDT, GLYCT, and GLYDT (see Table 1) duringeach step of the different variants of the Bartels–Stewart method. This is important


for reducing the complexity in steps 2–4 (for step 3, see Sect. 3) and to ensure thatthe computed solution X of the corresponding Lyapunov equation is guaranteed tobe symmetric on output.

In general, the computed solutions of the matrix equations in Table 1 overwritethe corresponding right-hand-side matrices of the respective equation.

3 Blocking Strategies for Reduced Matrix Equations

In this section, we review two ways of introducing blocking for solving matrix equa-tions in reduced (quasi-triangular) form. Most common is to perform a fix block par-titioning of the matrices involved and rearrange the loop nests of a single-elementalgorithm so that the computations are performed on submatrices (matrix blocks).This means that the operations in the inner-most loops are expressed in matrix-matrix operations that can deliver high performance via calls to optimized level 3BLAS. Indeed, this explicit blocking approach is extensively used in LAPACK [4].

Another successful approach is to combine recursion and blocking, leading toan automatic variable blocking with the potential for matching the deep memoryhierarchies of today’s HPC systems. Recursive blocking means that the involvedmatrices are split in the middle (by columns, by rows, or both) and a number ofsmaller problems are generated. In turn, the subarrays involved are split again gen-erating a number of even smaller problems. The divide phase proceeds until thesize of the controlling recursion blocking parameter becomes smaller than somepredefined value. This termination criteria ensures that all leaf operations in therecursion tree are substantial level 3 (matrix-matrix) computations. The conquerphase mainly consists of increasingly sized matrix multiply and add (GEMM) oper-ations. For an overview of recursive blocked algorithms and hybrid data structuresfor dense matrix computations and library software, we refer to the SIAM Reviewpaper [13].

In the next two subsections, we demonstrate how explicit and recursive blockingare applied to solving matrix equations already in reduced form. To solve each ofthe Sylvester equations SYCT, SYDT, GCSY, and GSYL in quasi-triangular form isan O(m2n+mn2) operation. Likewise, solving reduced Lyapunov equations LYCT,LYDT, GLYCT, and GLYDT are all O(m3) operations. The blocked methods de-scribed below have a similar complexity, assuming that the more complicated datadependencies of the two-sided matrix equations (SYDT, LYDT, GSYL, GLYCT,GLYDT) are handled appropriately.

3.1 Explicit Blocked Methods for Reduced Matrix Equations

We apply explicit blocking to reformulate each matrix equation problem into asmuch level 3 BLAS operations as possible. In the following, the (i, j)th block of a

RECSY and SCASY Library Software: Algorithms for Sylvester-Type Matrix Equations 7

partitioned matrix, say X , is denoted Xi j. For illustration of the basic concepts, weconsider the one-sided matrix equation SYCT and the two-sided LYDT. Let mb andnb be the block sizes used in an explicit block partitioning of the matrices A and B,respectively. In turn, this imposes a similar block partitioning of C and X (whichoverwrites C). Then Dl = �m/mb� and Dr = �n/nb� are the number of diagonalblocks in A and B, respectively. Now, SYCT can be rewritten in block-partitionedform as

AiiXi j−Xi jB j j = Ci j− (Dl

∑k=i+1

AikXk j−j−1

∑k=1

XikBk j), (1)

for i = 1,2, . . . ,Dl and j = 1,2, . . . ,Dr. This block formulation of SYCT can beimplemented using a couple of nested loops that call a node (or kernel) solver forthe Dl ·Dr small matrix equations and level 3 operations in the right-hand side [22].Similarly, we express LYDT in explicit blocked form as

AiiXi jATj j−Xi j = Ci j−

(Dl ,Dl)

∑(k,l)=(i, j)

AikXklATjl , (k, l) �= (i, j). (2)

Notice that the blocking of LYDT decomposes the problem into several smallerSYDT (i �= j) and LYDT (i = j) equations. Apart from solving for only the upperor lower triangular part of X in the case of LYDT, a complexity of O(m3) can beretained by using a technique that stores intermediate sums of matrix products toavoid computing certain matrix products in the right-hand side of (2) several times.The same technique can also be applied to all two-sided matrix equations.

Similarly, explicit blocking and the complexity reduction technique are appliedto the generalized matrix equations. Here, we illustrate with the one-sided GCSY,and the explicit blocked variant takes the form{

AiiXi j−Yi jB j j = Ci j− (ΣDlk=i+1AikXk j−Σ j−1

k=1YikBk j)

DiiXi j−Yi jE j j = Fi j− (ΣDlk=i+1DikXk j−Σ j−1

k=1XikEk j),(3)

where i = 1,2, . . . ,Dl and j = 1,2, . . . ,Dr. A resulting serial level 3 algorithm isimplemented as a couple of nested loops over the matrix operations defined by (3).

We remark that all linear matrix equations considered can be rewritten as anequivalent large linear system of equations Zx = y, where Z is the Kronecker prod-uct representation of the corresponding Sylvester-type operator. This is utilized incondition estimation algorithms and kernel solvers for small-sized matrix equations(see Sects. 5 and 6.1).

3.2 Recursive Blocked Methods for Reduced Matrix Equations

To make it easy to compare the two blocking strategies, we start by illustratingrecursive blocking for SYCT.


The sizes of m and n control three alternatives for doing a recursive splitting. InCase 1 (1≤ n≤m/2), A is split by rows and columns, and C by rows only. In Case 2(1 ≤ m ≤ n/2), B is split by rows and columns, and C by columns only. Finally, inCase 3 (n/2 < m < 2n), both rows and columns of the matrices A, B, and C are split:[

A11 A12

A22

][X11 X12

X21 X22

]−[

X11 X12

X21 X22

][B11 B12

B22

]=[

C11 C12

C21 C22

].

This recursive splitting results in the following four triangular SYCT equations:

A11X11−X11B11 = C11−A12X21,

A11X12−X12B22 = C12−A12X22 +X11B12,

A22X21−X21B11 = C21,

A22X22−X22B22 = C22 +X21B12.

Conceptually, we start by solving for X21 in the third equation above. After updatingC11 and C22 with respect to X21, one can solve for X11 and X22. Both updates andthe triangular Sylvester solves are independent operations and can be executed con-currently. Finally, C12 is updated with respect to X11 and X22, and we can solve forX12. The description above defines a recursion template that is applied to the fourSylvester sub-solves leading to a recursive blocked algorithm that terminates withcalls to optimized kernel solvers for the leaf computations of the recursion tree.

We also illustrate the recursive blocking and template for GSYL, the most generalof the two-sided matrix equations. Also here, we demonstrate Case 3 (Cases 1 and2 can be seen as special cases), where (A,C), (B,D), and E are split by rows andcolumns: [

A11 A12

A22

][X11 X12

X21 X22

][BT

11BT

12 BT22

]

−[

C11 C12

C22

][X11 X12

X21 X22

][DT

11DT

12 DT22

]=[

E11 E12

E21 E22

],

leading to the following four triangular GSYL equations:

A11X11BT11−C11X11DT

11 = E11−A12X21BT11− (A11X12 +A12X22)BT

12

+C12X21DT11 +(C11X12 +C12X22)DT

12,


22 = E12−A12X22BT22 +C12X22DT

22,


11 = E21−A22X22BT12 +C22X22DT

12,


22 = E22.

Now, we start by solving for X22 in the fourth equation above. After updating E12 andE21 with respect to X22, we can solve for X12 and X21. As for SYCT, both updatesand the triangular GSYL solves are independent operations and can be executedconcurrently. Finally, after updating E11 with respect to X12,X21 and X22, we solvefor X11. Some of the updates of E11 can be combined in larger GEMM operations,

parallel scientific computing and optimization€¦ · parallel scientiﬁc computing in industrial...

Documents