intel mpi library cluster software and technologies software & solutions group

54
Intel MPI Library Cluster Software and Technologies Software & Solutions Group

Upload: kailee-leed

Post on 02-Apr-2015

245 views

Category:

Documents


9 download

TRANSCRIPT

Page 1: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

Intel MPI Library

Cluster Software and Technologies

Software & Solutions Group

Page 2: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

2

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Повестка дня

Обзор Intel MPI

• Возможности

• Value Proposition

• Архитектура

• Устройства и фабрики

• Поддержка Cluster Tools

Лабораторная работа

• Начало• Среда• Кольцо демонов• Запуск приложения• Debugging

• Оптимизация• Размещение процессов• Выбор устройства• Привязка процессов• Автоматический

настройщик

Page 3: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

3

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Features

MPI-2 specification conformance• Standardized job startup mechanism (mpiexec)• Process spawning/attachment (sock device only)• One-sided communication• Extended collective operations• File I/O

Multiple communication fabrics selectable at runtime

Additional thread-safe libraries at level MPI_THREAD_MULTIPLE

Dynamic or static linking, plus debugging and tracing libraries

Page 4: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

4

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Features

IA-32, Itanium® 2, and Intel® 64 platforms

Intel® compilers 7.1 or later, and GNU* compilers 2.96 or later

Red Hat Linux* 7.2 thru EL 5, SUSE Linux* 8.0 thru SLES10

C, C++, Fortran-77, and Fortran-90 language bindings

Dynamic or static linking, plus debugging and tracing libraries

Affordable SDK package

Freely downloadable and royalty-free RTO package

Page 5: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

5

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Customers select interconnect at

runtime

ISVs see & support single interconnect

AA

BB

CC

DD

EE

FF

11

22

33

44

Applications

Fabrics

Intel® MPI atopAbstract Fabric

IHVs create DAPL providers and fabric

drivers

Value Proposition

Page 6: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

6

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

MPI Devices

Selectable at runtime

• shm (shared memory only)

• sock (sockets only)

• ssm (shared memory + sockets)

• rdma (DAPL only)

• rdssm (shared memory + DAPL + sockets)

Fault tolerance at startup:• Static sock device used as fallback at initialization step• Disable fallback for best application performance

Page 7: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

7

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

DAPL

• Direct Access Programming Library

• Implements a Remote Direct Memory Access (RDMA) interface for high performance transports

• See www.datcollaborative.org

Page 8: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

8

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Supported Fabrics

Selectable at runtime

• Shared memory

• InfiniBand (via DAPL)

• Myrinet (via DAPL)

• Quadrics (via DAPL)

• Ammasso RNIC (via DAPL)

• Dolphin SCI (via sockets)

• Ethernet (via sockets and DAPL)

Additional fabrics are being investigated

Page 9: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

9

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel MPI 3.1 (shipping now)

Streamlined product setup

• Installation under root or ordinary user ID

• mpivars.sh and mpivars.csh scripts for easy path setting

Simplified process management

• mpiexec -perhost and -nolocal options

• mpirun script that automates MPD startup and cleanup

• System-, user-, and session-specific configuration files

• Transparent support for alternative IP interfaces

Page 10: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

10

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel MPI 3.1 (contd.)

Environment variables for runtime control over

• Process pinning

• Optimized collective operations

• Device-specific protocol thresholds

• Collective algorithm thresholds

• Enhanced memory registration cache

• Many others …

Page 11: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

11

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel MPI 3.1 (contd.)

Increased interoperability

• Support for DAPL* v1.1 and DAPL* v1.2 compliant providers

• Message queue browsing with TotalView* and DDT* debuggers

• Internal MPI library state tracing with Intel® Trace Collector

Page 12: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

12

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel MPI 3.1 (contd.)

Enhanced support for operating systems and compilers• RedHat EL 3, EL4, EL5• SUSE 9.0 – 9.3, 10.0, SLES9, SLES10• SGI ProPack4, ProPack5• Asianux 1.0, 2.0 (includes Red Flag DC* Server 5.0)• Turbo Linux* 10• Mandriva*/Mandrake* v10.1• Miracle* Linux v4.0• Fedora Core 1-4• Microsoft Windows* CCS 2003

Getting Started and Reference Manual documents

Page 13: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

13

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Cluster Toolkit 3.1• Intel MPI Library 3.1• Intel Math Kernel Library 10.0• Intel Trace Analyzer and Collector 7.1• Intel MPI Benchmarks 3.1

Integration with the job schedulers

• LSF*, version 6.1 and higher

• PBS Pro*, version 7.1 and higher

• Torque*, version 1.2.0 and higher

• Parallelnavi* NQS* for Linux V2.0L10 and higher

• Parallelnavi for Linux Advanced Edition V1.0L10A and higher

More tools are coming

Supporting Cluster Tools

Page 14: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

14

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Повестка дня

Обзор Intel MPI

• Возможности

• Value Proposition

• Архитектура

• Устройства и фабрики

• Поддержка Cluster Tools

Лабораторная работа

• Начало• Инсталляция• Среда• Компиляция приложения• Запуск приложения• Debugging

• Оптимизация• Размещение процессов• Выбор устройства• Привязка процессов• Автоматический

настройщик

Page 15: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

15

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Download

Intel® MPI not installed or different version required?

Part of the Cluster Tools (free eval)www.intel.com/software/products/cluster

Single product (free eval):www.intel.com/go/mpi

From Internal License Center: select Intel® MPI Library http://softwareproducts.intel.com

Current version is 3.1 (Build: 3.1.026)

Page 16: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

16

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Installation

Installation possible as root or ordinary user!

• Common step: un-zip and un-tar the library package into temporary directory

• Check that you have Python* v2.2 or higher in your PATH

Root installation follows the usual procedure

• Put licenses into default location: /opt/intel/licenses

• Start the ./install in the temporary directory

• Follow instructions generated by the script

• Default installation path: /opt/intel/impi/3.1

User installation

• Put licenses into user‘s preferred location or into temporary directory

• Start the ./install in the temporary directory

• Select the installation “without using RPM database”

• Follow the instructions as indicated by the script:• If required: Enter path to licenses• Enter installation directory (Default: $HOME/intel/impi/3.1)

Page 17: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

17

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Задание №1: настроить среду

Source mpivars for your preferred shell

Create default config file $HOME/.mpd.conf

Create a hostfile e.g. $HOME/mpd.hosts

Page 18: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

18

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Source mpivars in your preferred shell

Important: Get the PATH and LD_LIBRARY_PATH right!(in particular on x86_64)

IA32, Intel 64 32-bit mode, Itanium

• csh: source /opt/intel/impi/3.1/bin/mpivars.csh

• bash: source /opt/intel/impi/3.1/bin/mpivars.sh

Intel 64 64-bit mode

• csh: source /opt/intel/impi/3.1/bin64/mpivars.csh

• bash: source /opt/intel/impi/3.1/bin64/mpivars.sh

$ which mpicc

• Intel® 64 64-bit mode:

/opt/intel/impi/3.1/bin64/mpicc

Hands-on sessionHands-on session

Page 19: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

19

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

$HOME/.mpd.conf

File $HOME/.mpd.conf needed in your home directory• Contents: secretword=<some secret word>• Access rights: read/write for the user only• Automatically created during first execution of mpdboot by the

library

Page 20: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

20

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Create mpd.hosts file

Recommended:• Create a central hostfile e.g. $HOME/mpd.hosts• Contents: the list of nodes of your system

Create a hostfile mpd.hosts

$ cat > mpd.hosts

node1

node2

<ctrl>-D

Hands-on sessionHands-on session

Page 21: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

21

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Задание №2: первая программа с Intel® MPI

Compile and link one of test.{c,cpp,f,f90} into executable test

Start MPD daemons with mpdboot

Run parallel a.out with mpiexec on the nodes

Check debugging techniques

Stop MPD daemons with mpdallexit

Page 22: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

22

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Compile and Link Commands

Using Intel compilers

• mpiicc, mpiicpc, mpiifort, ...

Using Gnu compilers

• mpicc, mpicxx, mpif77, ...

Ease of use

• Commands find Intel® MPI include files automatically

• Commands link Intel MPI libraries automatically

Commands use compilers from PATH(compilers not hard-wired!)

Page 23: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

23

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Compile a Test Program

Copy test directory from /opt/intel/impi/3.1/test

$ cp /opt/intel/impi/3.1/test/test.* .

Build one of test.{c,cpp,f,f90}

• Intel C compiler

$ mpiicc –o testc test.c

• Intel C++ compiler

$ mpiicpc –o testcpp test.cpp

• Intel Fortran compiler

$ mpiifort –o testf test.f

Hands-on sessionHands-on session

Page 24: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

24

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Multiple Purpose Daemons

Used to start and stop Intel® MPI programs

Typical use of daemons in interactive mode:• “Start once, use many times”

• MPD daemons may run until next re-boot!

Running Intel MPI jobs in a Batch System:• Start (and stop) one MPD ring per batch job

• Overlapping MPD rings from different jobs will kill each other

• Distinguish MPD ring from others?Set environment variable MPD_CON_EXT to unique value; i.e.time in seconds, etc.

Page 25: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

25

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Multiple Purpose Daemons (contd.)

Benefit of MPD daemons:• Faster start-up of Intel® MPI programs• Ctrl-C works! All processes get the signal

Experience: no zombies left on [remote] nodes• No more hung jobs!

Job will be terminated according to environment variable MPIEXEC_TIMEOUT at launch

Daemons start any program (multiple purpose)• Use mpiexec as a cluster “management” tool, e.g.

mpiexec –l –n 4 hostname

Page 26: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

26

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Start MPD Daemons on Two Nodes

Start MPD daemons with mpdboot (local mpd.hosts is detected automatically)

• using rsh between nodes

$ mpdboot –n 2 [-f hostfile]

• using ssh between nodes

$ mpdboot –n 2 [-f hostfile] –r ssh

Check available nodes with mpdtrace

$ mpdtrace

node1

node2

Hands-on sessionHands-on session

Page 27: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

27

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Execution Commands

Model: Running an Intel® MPI program requires a multiple purpose daemon (MPD) per node for the start-up

Starting the “ring” of MPD daemons• mpdboot [–d] [–v] –n #nodes –f hostfile [–r ssh]

• Flags: –d = debug, –v = verbose, –r = toggle between ssh or rsh

Checking the MPD daemons• mpdtrace

Running an MPI program

• mpiexec [– l] –n #processes executable

• Flag: –l = prefix output with process id

Stopping the MPD daemons

• mpdallexit

Page 28: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

28

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Execution Commands (contd.)

All-inclusive• mpirun –f hostfile –n #processes executable

• Includes start and stop of an MPD ring• Convenient

• May be too “slow” for interactive work

• May be good for jobs in batch system“In-session” mode: mpirun inquires the list of nodes from the batch system

Which iMPI version?

• mpiexec –V

• cat /opt/intel/impi/3.1/mpisupport.txt

• rpm –qa | grep intel-mpi

Page 29: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

29

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Run the Test Program

Run the test program

• C

$ mpiexec –n 2 ./testc

Hello world: rank 0 of 2 running on node1

Hello world: rank 1 of 2 running on node2

• Fortran 77

$ mpiexec –n 2 ./testf

Redo the exercises, look out for node order!

$ mpiexec –n 4 ./testc

Hands-on sessionHands-on session

Page 30: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

30

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Debugging

Application run will output detailed Intel MPI information (if I_MPI_DEBUG > 0)• export I_MPI_DEBUG 3• mpiexec -genv I_MPI_DEBUG 3 ...

Support parallel debugging with Intel®, GNU*, TotalView*, and DDT*• mpiexec –idb ...• mpiexec –gdb ...• mpiexec –tv ...

Task #3: run Intel® MPI library debugging Check output of different debug levels

$ mpiexec –genv I_MPI_DEBUG {1,2,3,1000} –n 2 ./testc

Compile and link application with debug information enabled

$ mpiicc -g -o testc test.c

Check how Intel® MPI debug output changes at higher level:

$ mpiexec –genv I_MPI_DEBUG 1000 –n 2 ./testc

Start Intel MPI® allication with GDB

$ mpiexec –gdb –n 2 ./testc

Page 31: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

31

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Stop MPD Daemons

Stop MPD daemon with mpdallexit

$ mpdallexit

Check again with mpdtrace

$ mpdtrace

mpdtrace: cannot connect to local mpd (/tmp/mpd2.console_gsslavov); possible causes:

1. no mpd is running on this host

2. an mpd is running but was started without a "console" (-n option)

Re-start MPD daemons for later use

$ mpdboot –n 2 [-f hostfile] [–r ssh]

Hands-on sessionHands-on session

Page 32: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

32

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Перерыв 10 минут

Page 33: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

33

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Повестка дня

Обзор Intel MPI

• Возможности

• Value Proposition

• Архитектура

• Устройства и фабрики

• Поддержка Cluster Tools

Лабораторная работа

• Начало• Инсталляция• Среда• Компиляция приложения• Запуск приложения• Debugging

• Оптимизация• Выбор устройства• Размещение процессов• Привязка процессов• Автоматический

настройщик

Page 34: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

34

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Measuring Performance with IMB

IMB = Intel MPI Benchmarks(improved former PMB = Pallas MPI Benchmarks)

IMB-MPI1 measures performance of different communication patterns: PingPong, SendRecv, Allreduce, Alltoall, ...

IMB-EXT and IMB-IO measure MPI-2 features (one-sided communication: MPI_Put/Get etc.)

Download it from www.intel.com/software/products/clustertoolkit

BKM:Always run full IMB before running an application. Will reveal interconnect issues!

Page 35: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

35

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Measuring Performance with IMB (contd.)

Intel ® MPI Library supplies IMB binary for autotuning needs on each platform

• /opt/intel/impi/3.1/em64t/bin/IMB-MPI1

Running full IMB-MPI1 [or specific benchmarks]

• mpiexec –n #procs IMB-MPI1 [pingpong alltoall …]

Other IMB-MPI1 Options

• –multi 1: Span several process groups (n/2 PingPong pairs)

• –npmin #p: Minimum group contains #p processes

BKM: Read the IMB documentation

Page 36: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

36

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Task #4: run IMB benchmark

Run IMB-MPI1 PingPong on 1 and 2 nodes for different iMPI devices

• Compare the bandwidth (Mbytes/sec for large #bytes) e.g. for sock and rdma

• Compare the latencies (time[usec] for 0 or 1 byte messages) e.g. for sock and ssm

Run IMB-MPI1 PingPong on all nodes with IMB flag –multi 1

• mpiexec … IMB-MPI1 pingpong –multi 1

Hands-on sessionHands-on session

Page 37: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

37

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Tuning advice: Use best available communication fabric• Use default fabric selection if you can. Check the result thru I_MPI_DEBUG

to 2.

• If necessary, use I_MPI_DEVICE to explicitly select the desired fabric: sock TCP/Ethernet*/socketsshm Shared memory only (no sockets)ssm TCP + shared memoryrdma[:<provider>] InfiniBand*, Myrinet* (via specified DAPL* provider)rdssm[:<provider>] TCP + shared memory + DAPL*

with optional <provider> name as specified in /etc/dat.conf

• Alternatively, use alias options:-sock TCP/Ethernet*/sockets-shm Shared memory only (no sockets)-ssm TCP + shared memory-rdma InfiniBand*, Myrinet* (via specified DAPL* provider)-rdssm TCP + shared memory + DAPL*

• Select network interface for socket communications IP over IB:I_MPI_NETMASK=ib0 for IP over IBI_MPI_NETMASK=192.169.0.0 for particular subnet

$ mpirun –genv I_MPI_DEBUG 2 -rdma -n <number of processes> ./your_app

Page 38: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

38

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Task #5: run different communication devices

Check selected device

$ mpiexec –genv I_MPI_DEBUG 2 –n 2 –host node1 ./testc

….. will use default device rdssm

Change selected device

$ mpiexec –genv I_MPI_DEBUG 2 –genv I_MPI_DEVICE shm \ –n 2 –host node1 ./testc

….. will use device shm

Hands-on sessionHands-on session

Page 39: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

39

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Change selected device with I_MPI_DEVICE

$ mpiexec –genv I_MPI_DEBUG 2 –genv I_MPI_DEVICE ssm IMB-MPI

$ mpiexec –genv I_MPI_DEBUG 2 –genv I_MPI_DEVICE sock IMB-MPI

Hands-on sessionHands-on session

Task #5: run different communication devices (cont.)

Page 40: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

40

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Tuning advice: Select proper process layout

• Default process layout is “group round-robin” that puts consecutive MPI processes on different nodes (this may be fixed soon)

• Set I_MPI_PERHOST variable to override the default process layout:• I_MPI_PERHOST=1 makes round-robin distribution• I_MPI_PERHOST=all maps processes to all logical CPUs on a node • I_MPI_PERHOST=allcores maps processes to all physical CPUs on a node

• Or use an mpiexec options:-perhost <#> - "processes per host“, group round-robin distribution

placing # of processes at every host-rr - "round-robin" distribution, same as ‘-perhost 1’-grr <#> - "group round-robin", same as ‘-perhost #’-ppn <#> - "processes per node", same as ‘-perhost #’

$ mpirun –perhost 2 -n <number of processes> ./your_app

Page 41: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

41

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Task #6:Understand default process placement

Observe default round robin placement

$ mpiexec –n 4 ./testc

Hello world: rank 0 of 4 running on node1

Hello world: rank 1 of 4 running on node2

Hello world: rank 2 of 4 running on node1

Hello world: rank 3 of 4 running on node2

Hands-on sessionHands-on session

Page 42: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

42

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Task #7:Use group round robin placement

Place 2 consecutive ranks on every node using the –perhost option

$ mpiexec –perhost 2 –n 4 ./testc

Hello world: rank 0 of 4 running on node1

Hello world: rank 1 of 4 running on node1

Hello world: rank 2 of 4 running on node2

Hello world: rank 3 of 4 running on node2

Alternative flag names:

-ppn processes per node, equivalent to -perhost

-rr round robin placement (-ppn 1)

-grr group round robin placement (-ppn)

Hands-on sessionHands-on session

Page 43: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

43

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Task #8:Exact process placement (Argument Sets)

Place several instances of testc exactly on the desired nodes, for example

$ mpiexec –host node1 –n 2 ./testc : –host node2 –n 1 ./testc

Hello world: rank 0 of 3 running on node1

Hello world: rank 1 of 3 running on node1

Hello world: rank 2 of 3 running on node2

Note that you can start different executables upon different nodes

Hands-on sessionHands-on session

Page 44: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

44

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Tuning advice: Use proper process pinning

• By default Intel MPI library pins sequential MPI ranks at CPU cores in the order they enumerated by OS; one process per core.

• Use I_MPI_PIN_PROCESSOR_LIST to define custom map of MPI processes to CPU cores pinning. Find pinning map optimal to your application.

• Reasonable strategy is to stick pair of actively exchanging process at the pair of CPU cores sharing L2 data cache

• Use the ‘cpuinfo’ utility supplied with Intel MPI Library to see the processor topology, including inter-core cache sharing information.

$ mpirun –perhost 2 –genv I_MPI_PIN_PROCESSOR_LIST 0,3 \-n <number of processes> ./your_app

Tuning advice: Disable pinning for threaded apps• Turn off pinning by setting I_MPI_PIN to `disable`, so that the default shell

affinity mask is inherited by threads created within the process.

• Apply other pinning techniques to keep performance of hybrid applications high:• use KMP_AFFINITY for hybrid application built with Intel OpenMP• use SGI ProPack* “dplace” tool

Page 45: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

45

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Task #9: try different process pining

Check processor topology. Take a look at L2 cache sharing.

$ cpuinfo

Place odd processes on cores sharing L2 cache:

$ mpiexec –grr 8 –genv I_MPI_PIN_PROCESSOR_LIST ?,?,?,? –n 2 IMB-MPI1 Pingpong

Place even processes on cores sharing L2 cache:

$ mpiexec –grr 8 –genv I_MPI_PIN_PROCESSOR_LIST ?,?,?,? –n 2 IMB-MPI1 Pingpong

Page 46: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

46

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Tuning advice: Use Intel MPI lightweight statistics

• For any Intel MPI 3.1 application, enable gathering MPI communication statistics with the I_MPI_STATS set to a non-zero integer value.

• Results will be written into a file named ‘stats.txt’ or any other file specified by the I_MPI_STATS_FILE environment variable.

• Statistics includes:• Point to point communication activity• Collective operations timing • Rank to rank data transfer volume

• Increase effectiveness of the analysis manipulating the scope of statistics with variable I_MPI_STATS_SCOPE and details of the statistics with variable I_MPI_STATS

• Reasonable values to adjust collective operations algorithm are I_MPI_STATS=3 and I_MPI_STATS_SCOPE=coll

Stats.txt for HPCC/RandomAccess~~~~ Process 0 of 256 on node C-21-23

Data TransfersSrc --> Dst Amount(MB) Transfers-----------------------------------------000 --> 000 0.000000e+00 0000 --> 001 1.548767e-03 60000 --> 002 1.625061e-03 60000 --> 003 0.000000e+00 0000 --> 004 1.777649e-03 60…=========================================Totals 3.918986e+03 1209

Communication ActivityOperation Volume(MB) Calls-----------------------------------------P2P Csend 9.147644e-02 1160Send 3.918895e+03 49CollectivesBarrier 0.000000e+00 12Bcast 3.051758e-05 6Reduce 3.433228e-05 6Allgather 2.288818e-04 30Allreduce 4.108429e-03 97

$ mpiexec –genv I_MPI_STATS 3 –I_MPI_STATS_SCOPE coll …

Page 47: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

47

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Tuning advice: Choose the best collective algorithms

• There are 2 or more algorithms for each MPI collective operation. Check all to find the most suitable.

• Use one of the I_MPI_ADJUST_<opname> knobs to change the algorithm, for example:• I_MPI_ADJUST_ALLREDUCE controls MPI_Allreduce algorithm, which could

be (1) recursive doubling algorithm, (2) Bruck’s algorithm, (3) Reduce + Bcast or (4) Topology aware Reduce + Bcast algorithm.

• Section “3.5 Collective Operation Control” of Intel MPI Reference Manual defines the full set of variables and algorithm identifiers.

• Recommendations:• Focus on the most critical collective operations (see stats output).• Run Intel MPI Benchmarks selecting various algorithms to find out the

right protocol switchover points for hot collective operations.

$ mpirun -genv I_MPI_ADJUST_REDUCE 4 -n 256 ./fluent

Page 48: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

48

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Task #10: adjust collectives

Collect statistics for Alltoall test from IMB benchmark.

$ mpiexec –genv I_MPI_STATS –n 8 IMB-MPI1 Alltoall

Try different algorithms. Check performance impact with IMB lightweight statistics:

$ mpiexec –genv I_MPI_STATS –genv I_MPI_ADJUST_? ? –n 8 IMB-MPI1 Alltoall

Page 49: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

49

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Tuning aqdvice:Use Intel MPI automatic tuning utility

• Per-installation tune done by cluster administrator:• Collect configuration values:

$ mpitune –n 32 –f ./mpd.hosts• Check tuning results added to the package:

$ ls /opt/intel/impi/3.1/etc64• Reuse recorded values:

$ mpiexec –tune –n 32 ./your_app

• Custom tune by ordinary users:• Collect configuration values:

$ mpitune –n 32 –f ./mpd.hosts –o <user_dir>• Check tune results recorded to user_dir:

$ ls <user_dir>• Reuse recorded values:

$ export I_MPI_TUNER_DATA_DIR=<user_dir> $ mpiexec –tune –n 32 ./your_app

• Find optimal values for library tuning knobs on the particular cluster environment with the automated tuning utility

• Run it once after installation and each time after cluster configuration change

• Best configuration is recorded for each combination of communication device, number of nodes, MPI ranks and process distribution model

Page 50: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

50

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Stay tuned!

• Learn more online• Intel® MPI self-help pages

support.intel.com/support/performancetools/cluster/mpi/index.htm

• Ask questions and share your knowledge• Intel® MPI Library support page

http://support.intel.com/support/performancetools/cluster/mpi/index.htm

• Intel® Software Network Forumshttp://softwarecommunity.intel.com/isn/Community/en-us/Forums/

Page 51: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

51

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Backup

Page 52: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

52

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Lab– mpiexec for Admins

See how mpiexec transports the environment

• Setenv (csh) or export (sh) any new variable

• mpiexec –l –n 2 env | sort

• Find the variable on both processes!

Running scripts with mpiexec

• Gather some commands in a scripte.g. hostname; cat /etc/dat.conf

• mpiexec –l –n 2 ascript | sort

Page 53: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

53

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Lab– mpiexec for Admins (contd.)

Mpiexec does not transport local limits

• Check limits: “limit”(csh) or “ulimit –a” (sh)

• Set new stacksize limit

• csh: limit stacksize 20 megabytes; csh –fc limit

• sh: ulimit –s 20000; sh –c “ulimit –a”

• Check remote limits

• mpiexec –l –n 2 csh –fc limit | sort

• mpiexec –l –n 2 sh –c “ulimit –a” | sort

• Conclusion: Set limits in .cshrc or .profileor: Encapsulate in application script

Page 54: Intel MPI Library Cluster Software and Technologies Software & Solutions Group

54

Copyright © 2006, Intel Corporation. All rights reserved.

Intel MPI Library

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.