introduction - hindawi

69

Introduction

An industry standard for shared memory parallel pro-gramming has long been anticipated. From the mo-ment it was announced late 1997, OpenMP was it. Thisset of directives and library routines, agreed upon bya powerful consortium of hardware vendors, providesapplication developers with a vendor-neutral API forparallelizing their Fortran, C and C++ codes to exe-cute on the popular SMPs. As such, it has been quicklyaccepted by the community.

OpenMP has its roots in the efforts of the PCF Forumand the ANSI X3H5 committee, both of which met inthe late eighties in an effort to provide just such a stan-dard. Although much hard work went into the develop-ment of language features – for use with Fortran backthen – the timing was wrong: vendor attention becamefirmly focussed on distributed memory computers andtheir APIs. The OpenMP developers did not just res-urrect this effort. They selected the best of its features,producing a structured language with constructs thatnot only support the most popular elements of tradi-tional shared memory programming models, such asparallel loops: they also enable incremental applicationdevelopment, conditional compilation, and the paral-lelization of large program regions. Their attention tosuch details has enabled the use of this API for rapidapplication development, with major programs some-times being converted in a matter of hours.

OpenMP is based upon the traditional fork-joinmodel of shared memory parallel programming, inwhich a single master thread begins execution andspawns worker threads to perform computations in re-gions that are marked by the user as being parallel. Di-rectives are provided with which the programmer mayindicate how the work in such regions is to be dis-tributed among the executing threads and how threadsare to be synchronized. Innovative features include or-phan directives that are dynamically, but not lexically,within the scope of a parallel region. OpenMP doesnot demand a radical change in programming style. Aserial program can be successively annotated with itsdirectives, leaving the sequential code largely undis-turbed. The compiler performs a relatively straightfor-ward translation of these, usually to calls to a threadlibrary.

The vendor community has established a non-profitorganization called the OpenMP Architecture ReviewBoard, or ARB, to deal with the further development ofOpenMP. They have addressed problems with the initialspecification, as well as defining features that help tokeep it up-to-date. Few modifications have been foundnecessary so far, although a major revision to the For-tran API has greatly facilitated its use with Fortran 90.The user community is beginning to organize itself inthe form of a User Group to address their concerns andliaise with the ARB.

This special issue of Scientific Programming presentsa cross-section of papers that reflect current OpenMPactivities. Several papers report on experiences usingthis paradigm to solve their computational problems,each of them from a somewhat different point of view.Further papers discuss tools to support the creation andoptimization of OpenMP programs. The compilationof OpenMP, both in its traditional setting and for usewith software distributed shared memory is described,as are on-going research efforts that aim to provide ad-ditional power within the language. The contributionswere chosen from a variety of papers that were pre-sented at two distinct workshops on OpenMP, its ap-plication and its tools, that wereorganized by the edi-tors. One of these was held in San Diego in July 2000and attracted a variety of participants from governmentlabs, universities and industry. The other workshoptook place in Edinburgh, Scotland, in September of thesame year and attracted a similar range of attendees. Itwas interesting to witness the diversification that hadtaken place since the initial workshop the previous year,when few implementations were available. A surpris-ing number of compilers and tools on a variety of plat-forms were already in use by the summer of 2000.

Application developers and parallelization expertsare still learning how to use OpenMP. Experiences havebeen reported on a variety of codes and target systems,and general strategies proposed. In this issue, Kaiserand Baden discuss how to get performance out of thehybrid programming model obtained when MPI andOpenMP are combined in a code. This is particularlypopular as a model for clusters of SMPs and tightlycoupled machines that exhibit this hierarchical paral-

Scientific Programming 9 (2001) 69–71ISSN 1058-9244 / $8.00 2001, IOS Press. All rights reserved

70 Introduction

lelism. They consider the use of OpenMP at a coarsegranularity, and show how it is possible to overlap MPIcommunication with OpenMP computation. The paperfocusses on the needs of regular (structured) stencil-based applications. Their results compare strategies inwhich a thread is reserved to perform communicationwith those that attempt to share both communicationand computation among all threads. The usefulness ofblocking for cache is demonstrated and the claim madethat moving OpenMP parallelism to outer levels of thecode will enable cache blocking without interferencefrom these directives.

Smith and Bull also consider this hybrid model forSMP clusters, evaluating its overall usefulness. Theyshow that this is not necessarily the most effective pro-gramming model but that it may provide significantbenefits in some situations. This is particularly likelyto be the case if the MPI code suffers from poor scaling,perhaps due to load imbalance or too fine grain prob-lem size, or if the MPI program suffers from memorylimitations due to large amounts of replicated data. TheMPI implementation used may exacerbate such prob-lems. The authors have created a QMC code that mayexecute with an arbitrary mix of OpenMP and MPIeven on a pure SMP system. It scales well up to 32processes under each of the programming models, yetthe OpenMP version was easier to create.

Paolo Malfetti discusses the application of OpenMPto weather, wave and ocean models in his contributionto this volume. The target machine in this case is Sil-icon Graphic’s Origin 2000, a ccNUMA architecture.The programs he works with were written for vectormachines, so there are a variety of porting problemsthat need to be overcome. Malfetti addresses the suit-ability of this architecture for environmental simula-tions. He also discusses the benefits of OpenMP asa programming model, the need for single processortuning in order to obtain performance under it, and therelationship between scalability and input size, cachesizes and the target processor.

Di Martino and colleagues discuss OpenMP work-load decomposition strategies, based upon a case studyinvolving a particle-in-cell code. Both domain and par-ticle decomposition strategies are applied to the HybridMHD-Gyrokinetic Code and evaluated with regard totheir time efficiency,memory usage and the program re-structuring effort required to achieve them. The resultsin this case are sensitive to the relationship between thedomain size and the number ofparticles. A major find-ing of their work is the trade-off implied between effi-ciency and recoding effort. A particle decomposition

requires little code rewriting, but has drawbacks: onesuch approach leads to large memory use, and anotherresults in comparatively low efficiency. The domaindecomposition approach is superior in each of these re-spects but required considerable program modification.The authors conclude that ad hoc parallelization strate-gies, taking application characteristics into account, arerequired to achieve high efficiency in general.

Compiler strategies for dealing with explicitly par-allel shared memory programs are comparatively new.Indeed, most OpenMP compilers refrain from opti-mizations that are not essentially sequential (i.e. withina single thread). Yet broader optimizations could poten-tially have a big impact on the performance of OpenMPcodes, especially if they are translated to a ccNUMAsystem or target software distributed shared memory.

Sato and colleagues describe the compiler transla-tion from OpenMP to SCASH, a software distributedshared memory system that works on clusters of PCs.They explain that data mapping is key to achievinggood performance under this system and add a set ofdirectives to specify data mapping and loop schedulingat page granularity.

In their paper on compiler optimization techniquesfor OpenMP programs, Satoh and colleagues extendstandard dataflow analysis methods to cover OpenMPparallelism. They describe a flowgraph containingnodes to represent OpenMP constructs, and develop al-gorithms to handle problems such as reaching defini-tions, but also to analyze memory synchronizations andcross-loop data dependences. Although their goal isto target software distributed shared memory systems,much of their work on reducing synchronization over-heads and improving data locality is equally applicableto generic OpenMP implementations.

Although it is easy to create an OpenMP program, itis not necessarily easy to create one that executes on agiven parallel platform with high efficiency. A varietyof optimizations may need to be manually applied toimprove single processor performance as well as toreduce false sharing and a variety of overheads.

Park and co-authors propose a general methodologyfor developingparallelprograms that includes the use ofa parallelizing compiler to perform some of the trans-lations. They describe a pair of independent yet com-plementary tools that were designed to support a set oftasks that are part of the development process. URSAMINOR is a performance evaluation tool and INTER-POL is an interactive tool for program transformationthat can be used with OpenMP codes. Their work ex-tends to consider the use of a database, storing infor-

Introduction 71

mation to support the diagnosis and possible solutionsfor various performance symptoms.

Ierotheou and colleagues at Greenwich as well asNASA Ames have extended a tool that supports de-velopment of MPI programs so that it can also gener-ate OpenMP code. They break the process down intothree stages: identification of parallel regions and par-allel loops, optimization of these, and the actual codetransformation, including the insertion of OpenMP di-rectives into the input source program. An importantaspect of this system is its extensive data dependenceanalysis, the results of which are used to classify loops.The user may inspect this classification and the depen-dences themselves, and is given the opportunity to im-prove upon the system’s knowledge. The authors in-clude several case studies to illustrate the workings ofCAPTools.

One measure of the success of OpenMP as an API fortechnical computing on SMPs is the fact that there aremany attempts to broaden its range of applicability.Thisincludes proposals for features that address other kindsof computations, software to support its execution ondistributed memory systems and clusters of worksta-tions, as well as its implementation on cache-coherentNUMA systems. Work addressing each of these issuesis to be found both in academia as well as in industry.

Labarta et al. consider extensions to OpenMP thatsupport irregular data access loops. One of the advan-tages of OpenMP is that it can be used without diffi-culty to express the parallelism in unstructured compu-tations. However, if there are dependences between theupdates performed by different threads, these updatesmust be protected. This is unnecessarily inefficient

in many cases. The paper proposes supporting an in-spector/executor implementation so that those accessesrequiring protection are identified and all others maybe updated in parallel. They borrow from work previ-ously performed to support HPF by adding an indirectclause and a named schedule. The authors point out thepotential usefulness of the inspector/executor approachto enable runtime data dependence and correspondingconstruction of execution schedules, e.g. for wavefrontexecution.

Nested parallelism is the feature in the currentOpenMP specification that has proved to be most re-sistent to implementation. Blikberg and Sorevik per-form numerical experiments to support their claim that,if a problem has more than one level of parallelism, ap-plying parallelism to all levels will enhance scalability.They take the inevitable load balancing issues in suchsituations into consideration and argue for the full im-plementation of this feature. The paper discusses twoexample codes in detail, one synthetic and one a realworld use of nested parallelism. The overarching goalis to use OpenMP for adaptive mesh refinement codes,where multiple levels of parallelism will be present,albeit with difficult load balancing problems. The au-thors claim that many problems are typically brokendown into smaller subproblems, leading to a code thatcan profit from multilevel parallelism; an outer layergenerally has a few large tasks, and successive layershave more, but smaller, subtasks.

Happy reading!

Mark Bull and Barbara Chapman

Submit your manuscripts athttp://www.hindawi.com

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014


Distributed Sensor Networks


Advances in

FuzzySystems

Hindawi Publishing Corporationhttp://www.hindawi.com

Volume 2014


ReconfigurableComputing

Hindawi Publishing Corporation http://www.hindawi.com Volume 2014


Applied Computational Intelligence and Soft Computing

Advances in

Artificial Intelligence


Advances inSoftware EngineeringHindawi Publishing Corporationhttp://www.hindawi.com Volume 2014


Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications


Hindawi Publishing Corporation

http://www.hindawi.com Volume 2014

Advances in

Multimedia


Biomedical Imaging


ArtificialNeural Systems

Advances in


RoboticsJournal of



Computational Intelligence and Neuroscience

Industrial EngineeringJournal of


Modelling & Simulation in EngineeringHindawi Publishing Corporation http://www.hindawi.com Volume 2014

The Scientific World JournalHindawi Publishing Corporation http://www.hindawi.com Volume 2014


Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in


introduction - hindawi

Documents