compiling for sve - microsoft...compiling for sve sve hackathon, sc18 will lovett software...

Post on 11-Jun-2020

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Compiling for SVESVE Hackathon, SC18

Will LovettSoftware Architect, HPC Tools

2

Content

• Quick introduction to Arm Compiler for HPC

• Compiler flags

• Pragmas and directives

• Other tips

3

Content

• Quick introduction to Arm Compiler for HPC

• Compiler flags

• Pragmas and directives

• Other tips

4

Arm’s solution for HPC application development and portingCommercial tools for aarch64, x86_64, ppc64 and accelerators

Cross-platform Tools Arm Architecture Tools

DDT MAPFORGE

PERFORMANCEREPORTS

C/C++ & FORTRANCOMPILER

PERFORMANCE LIBRARIES

5

Arm’s solution for HPC application development and portingCommercial tools for aarch64, x86_64, ppc64 and accelerators

Cross-platform Tools Arm Architecture Tools

DDT MAPFORGE

PERFORMANCEREPORTS

C/C++ & FORTRANCOMPILER

PERFORMANCE LIBRARIES

6

Arm’s commercially-supported C/C++/Fortran compiler

Tuned for Scientific Computing, HPC and Enterprise workloads• Processor-specific optimizations for various server-class Arm-based platforms• Optimal shared-memory parallelism using latest Arm-optimized OpenMP runtime

Linux user-space compiler with latest features• C++ 14 and Fortran 2003 language support with OpenMP 4.5• Support for Armv8-A and SVE architecture extension• Based on LLVM and Flang, leading open-source compiler projects

Commercially supported by Arm • Available for a wide range of Arm-based platforms running leading Linux

distributions – RedHat, SUSE and Ubuntu

Compilers tuned for Scientific Computing and HPC

Latest features and performance optimizations

Commercially supported by Arm

7

C/C++ Frontend

Fortran Frontend

Optimizer Armv8-A code-gen

SVE code-gen

Clang based LLVM based

PGI Flang based

Enhanced optimization for Armv8-A and SVE

C/C++ Files (.c/.cpp)

Fortran Files (.f/.f90)

Arm C/C++/Fortran Compiler

Armv8-A binary

SVEbinary

LLVM IR LLVM IRIR Optimizations

Auto-vectorization

LLVM based

LLVM based

Language specific frontend Architecture specific backendLanguage agnostic optimization

Arm Compiler is built on LLVM, Clang and Flang projects

8

Optimized BLAS, LAPACK and FFT

Commercial 64-bit Armv8-A math libraries • Commonly used low-level math routines - BLAS, LAPACK and FFT• Provides FFTW compatible interface for FFT routines• Batched BLAS support

Best-in-class serial and parallel performance• Generic Armv8-A optimizations by Arm• Tuning for specific platforms like Cavium ThunderX2 in collaboration with silicon

vendors

Validated and supported by Arm• Available for a wide range of server-class Arm-based platforms• Validated with NAG’s test suite, a de-facto standard• Available for Arm compiler or GNU

Best in class performance

Validated with NAG test suite

Commercially supportedby Arm

9 © 2018 Arm Limited

What’s new in the compiler

• Updates• GNU 8.2• LLVM 7.0

• Enhanced integration between ArmPL and Arm Compiler• Fortran

• Submodules (F2008)• Directives: IVDEP, NOVECTOR, VECTOR ALWAYS• NINT and DNINT performance

• Improved documentation

10 © 2018 Arm Limited

What’s new in the libraries

• Improved function prototypes• BLAS, CBLAS, LAPACK

• Enhanced libamath

• FFTW MPI interface added• Single and double precision

• Improved Guru interface

• Sparse mat-vec multiplication (SpMV) in CSR format

11

Useful Links

• SVE Hackathon SC18 landing page• https://developer.arm.com/hpc/events/sve-hackathon-sc18

• Arm porting and tuning guides• https://developer.arm.com/products/software-development-tools/hpc/resources/porting-and-tuning

• Arm application recipes (Please contribute to this)• https://gitlab.com/arm-hpc/packages/wikis/categories/allPackages

• Compiler documentation and quick start guides• https://developer.arm.com/products/software-development-tools/hpc/documentation• E.g. gfortran to armflang, ifort to armflang, compiler quick start

• Arm C Language Extensions (ACLE) for SVE• https://developer.arm.com/docs/100987/latest/arm-c-language-extensions-for-sve

• Arm community forum• https://community.arm.com/tools/hpc/f/hpc-user-group

12

Content

• Quick introduction to Arm Compiler for HPC

• Compiler flags

• Pragmas and directives

• Other tips

13 © 2018 Arm Limited

Basic flagsarmclang -march=armv8-a+sve -mcpu=native –armpl –Ofast

14 © 2018 Arm Limited

Basic flagsarmclang -march=armv8-a+sve -mcpu=native –armpl –Ofast

• armclang for C/C++• armflang for Fortran

15 © 2018 Arm Limited

• Required to enable SVE code generation

Basic flagsarmclang -march=armv8-a+sve -mcpu=native –armpl –Ofast

-march=name[+[no]feature]• Targets an architecture profile,

generating generic code that runs on any processor of that architecture.

16 © 2018 Arm Limited

• Produces higher-quality, but less portable code

• Once SVE-based SoCs are available, this will remove the need for-march=armv8-a+sve

Basic flagsarmclang -march=armv8-a+sve -mcpu=native –armpl –Ofast

• Enables the compiler to automaticallydetect the CPU it is being run on andoptimize accordingly.

• This supports a range of Arm®v8-A based SoCs, including ThunderX2.

17 © 2018 Arm Limited

• SVE-enabled ArmPL libraries will beavailable in a future release

• SVE vector mathematical functions supported today

Basic flagsarmclang -march=armv8-a+sve -mcpu=native –armpl –Ofast

• Enable Arm Performance Libraries(ArmPL)

• Enables optimized versions of the C mathematical functions declared in math.h

• Enables optimized versions of Fortran math intrinsics

• Enables autovectorization of mathematical functions & intrinsics

18 © 2018 Arm Limited

Optimizations enabled by -Ofast:• Assume no infinities• Assume no NaNs• Disable correct handling of errno for

math functions• Enable re-association of FP• Enable reciprocal math optimizations• Disable handling of signed zero• Disable handling of FP traps

Basic flagsarmclang -march=armv8-a+sve -mcpu=native –armpl –Ofast

-Ofast• Aggressive optimization at the expense

of strict language compliance-O3 –ffp-contract=fast• O3 optimizations plus fused FP

instructions (eg FMAs)-O2• Minimum level required for

vectorization

19 © 2018 Arm Limited

LLVM Optimization RemarksLet the compiler tell you how to improve vectorization

Flag Description-Rpass=<regexp> What was optimized.

-Rpass-analysis=<regexp> What was analyzed.

-Rpass-missed=<regexp> What failed to optimize.

For each flag, replace <regexp> with an expression for the type of remarks you wish to view.

Recommended <regexp> queries are:

• -Rpass=\(loop-vectorize\|inline\)• -Rpass-missed=\(loop-vectorize\|inline\)• -Rpass-analysis=\(loop-vectorize\|inline\)

where loop-vectorize will filter remarks regarding vectorized loops, and inline for remarks regarding

inlining.

To enable optimization remarks, pass the following -Rpass options to armclang:

20 © 2018 Arm Limited

LLVM Optimization Remarks example

example.c:8:18: remark: hoisting zext[-Rpass=licm]

for (int i=0;i<K; i++)^

example.c:8:4: remark: vectorized loop (vectorization width: 4, interleaved count: 2) [-Rpass=loop-vectorize]

for (int i=0;i<K; i++)^ example.c:7:1: remark: 28 instructions in function

armclang -O3 -Rpass=.* -Rpass-analysis=.* example.c

armflang -O3 -Rpass=loop-vectorize example.F90 -gline-tables-onlyexample.F90:21: vectorized loop (vectorization width: 2, interleaved count: 1) [-Rpass=loop-vectorize]

END DO

21 © 2018 Arm Limited

• Use this flag when compilingprocedures that are called fromOpenMP regions, but do not contain OpenMP themselves

Thread-safe recursionarmflang -frecursive

• Allocates all local variables on the stack

• Allows thread-safe recursion

• Applied automatically with -fopenmp

• Can have implications on stack size

• Increase stack with:

ulimit -s unlimited

22

Text 30pt sentence case

Other compiler optionsman armclang

General flags

-g Generate source-level debug information (use with -O0 where possible)

-Wall Enable all warnings.

-w Suppress all warnings.

-fopenmp Enable OpenMP.

Fortran flags

-cpp Preprocess Fortran files. Default for .F, .F90, .F95,…

-module <path> Specifies a directory to place, and search for, module files.

-i8 Set the default kind for INTEGER and LOGICAL to 64bit (i.e. KIND=8).

23

Content

• Quick introduction to Arm Compiler for HPC

• Compiler flags

• Pragmas and directives

• Other tips

24 © 2018 Arm Limited

C pragmas to control vectorization#pragma clang loop <…>

vectorize(assume_safety)• Allows the compiler to assume that there

are no aliasing issues in a loop

vectorize(enable)• Always vectorize, if it’s safe to do so

vectorize(disable)• Disable vectorization

distribute(enable) • Attempt to split a loop, to aid

vectorization

unroll_count(_value_)• Forces a scalar loop to unroll by a given

factor

interleave_count(_value_)• Forces a vectorized loop to be interleaved

by a given factor

25

Fortran directives to control vectorization

!dir$ ivdep

• Allows the compiler to assume that there are no aliasing issues in a loop

!dir$ vector always

• Always vectorize, if it’s safe to do so

!dir$ novector

• Disable vectorization

26

Content

• Quick introduction to Arm Compiler for HPC

• Compiler flags

• Pragmas and directives

• Other tips

27 © 2018 Arm Limited

PGI Fortran specific workarounds

• The Flang in the Arm compiler comes from PGI• However it is not completely the same

• PGI Fortran specific workarounds are still valid for ArmFlang

• However ifdef __PGI not recognised

• Use #ifdef __FLANG

28 © 2018 Arm Limited

Fortran lazy evaluation

if ( some_true_statement .OR. function_with_side_effect() )

• In the Fortran standard lazy vs. strict evaluation is undefined

• ArmFlang always performs lazy evaluation in if statements

Solution:• Remove ambiguous evaluation operations• Don’t write functions with side effects!

29 © 2018 Arm Limited

Fortran Features

• Not all of the (new/esoteric) Fortran features have been implemented• Code may fail at compile time due to unsupported feature

• Full list of supported features:• Fortran 2003

– https://developer.arm.com/products/software-development-tools/hpc/arm-fortran-compiler/fortran-2003-

status

• Fortran 2008

– https://developer.arm.com/products/software-development-tools/hpc/arm-fortran-compiler/fortran-2008-

status

30 © 2018 Arm Limited

ArmFlang Fortran Intrinsics

• Some GNU Fortran intrinsics are non-standard• ArmFlang started by implementing standard ones• Some non-standard ones may be missing

• Intrinsics list• https://developer.arm.com/products/software-development-tools/hpc/arm-fortran-compiler/fortran-intrinsics

• Example:• GETARG – Getting a command line argument – GNU non-standard• GET_COMMAND_ARGUEMENT – Fortran 2003 standard, supported by armflang

• ISNAN, COSD• Compiler specific: mm_prefetch

31 © 2018 Arm Limited

Arm uses a weak memory model

• Some HPC codes have their own threading• Usually based on pthreads• Written to have more control over the threads of

execution and how they synchronize• Some had no problems working with AArch64’s weakly

ordered memory system• Others exhibited issues in multi-threaded modes that

were particularly hard to diagnose without a detailed investigation into how the multi-threaded mode was implemented

– Problems are almost always down to a lock-free thread interaction implementation

– Key symptom: correct operation on a strongly ordered architecture, failure on weakly ordered

Done!

In UseMemory Pool

In UseFree

In UseIn Use

Core 1

Waiting for Memory...

Core 2

Inco

min

g W

rite

Inco

min

g W

rite

Mar

k Fr

ee

In UseMemory Pool

In Use

In UseIn Use

Core 1

Allocated

Core 2

Inco

min

g W

rite

Mar

k Fr

eeIn

com

ing

Writ

e

Weakly

Ordered

In UseDone!

32 © 2018 Arm Limited

Old config.guess Fails to Recognise Arm Architecture

• When building a code with autoconf it can fail to recognise aarch64

• Will fail automatically

• Similar issue to that seen on Power 8

• Solution:

• Pull down new versions of config.guess and config.sub

• If you own the package, or know the owners, ask them to update the config package in their release

• This would help us massively

wget'http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess;hb=HEAD' -O config.guesswget'http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.sub;hb=HEAD' -O config.sub

33 © 2018 Arm Limited

Libtool Fails to Recognise Arm Compiler

• Build systems which rely on an old libtool it fails to recognise Flang as a compiler• Causes failures when building code

• Solution:• Run sed-script to update the configuration

• Steps:– Update autoconf - Optional– Run configure– Run sed scripts– Run make

• Example: https://gitlab.com/arm-hpc/packages/wikis/packages/fftw• Let package owners know to update the libtool in their package

– Like: https://savannah.gnu.org/patch/index.php?9442

sed -i -e 's#wl=""#wl="-Wl,"#g' libtoolsed -i -e 's#pic_flag=""#pic_flag=" -fPIC -DPIC"#g' libtool

34 © 2018 Arm Limited

Arm Porting Advisor

• Porting advisor for Arm architecture / compiler• https://github.com/arm-hpc/porting-advisor

• Static source code analysis• Python tool to look for common source based porting issues

• Identifies:• Build system issues (config.guess)• Non-Arm assembly• Non-Arm intrinsics

35

Request: Fortran usage survey

Fortran usage survey

Anton Shterenlikht, Standards Officer, BCS Fortran Specialist Group

https://goo.gl/forms/JUFUReOoVUin2m8D2

http://www.fortran.bcs.org/2018/FortranBenefitsSurvey_interimrep_Aug2018.pdf

36

Content

• Quick introduction to Arm Compiler for HPC

• Compiler flags

• Pragmas and directives

• Other tips

• … one more thing

37

Arm Compiler for HPC on your laptop

Instructions:

1. Install docker (https://www.docker.com/)

2. git clone https://github.com/ARM-software/arm-hpc-tools

3. cd arm-hpc-tools

4. make interactive

3838

Thank You!Danke!Merci!��!�����!Gracias!Kiitos!

Confidential © Arm 2017 Limited

top related