introdução ao coprocessador intel® xeon phi™ - intel software conference 2013

40
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Introduction to the Intel ® Xeon Phi™ Coprocessor Leo Borges ([email protected]) Intel - Software and Services Group iStep-Brazil, August 2013 1

Upload: intel-software-brasil

Post on 06-May-2015

1.389 views

Category:

Technology


2 download

DESCRIPTION

Palestra ministrada por Leonardo Borges no Intel Software Conference nos dias 6 de Agosto (NCC/UNESP/SP) e 12 de Agosto (COPPE/UFRJ/RJ).

TRANSCRIPT

Page 1: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.

and/or other countries. *Other names and brands may be claimed as the property of others.

Introduction to the Intel® Xeon Phi™ Coprocessor

Leo Borges ([email protected])

Intel - Software and Services Group

iStep-Brazil, August 2013

1

Page 2: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

Click to edit Master title style

2

Introduction

High-level overview of the Intel® Xeon Phi™ platform: Hardware and Software

Intel Xeon Phi Case Studies

Intel Xeon Phi Ecosystem

Conclusions & References

Page 3: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.

and/or other countries. *Other names and brands may be claimed as the property of others.

Large ScaleClustersfor Test & Optimization

Tera-ScaleResearch

Leading Performance,Energy Efficient

Platform BuildingBlocks

Dedicated,Renowned ApplicationsExpertise

Broad Software Tools Portfolio

DefinedHPCApplicationPlatform

ManyIntegrated CoreArchitecture

ManufacturingProcessTechnologies

Exa-Scale Labs

A long term commitment to the HPC market segment

3

Intel in High-Performance Computing

Page 4: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.

and/or other countries. *Other names and brands may be claimed as the property of others.

HPC Processor Solutions

Common Intel Environment

Portable code, common tools

Xeon®

General Purpose Architecture

Leadership Per Core Performance

FP/core via AVX

Multi-Core Performance Intel® Xeon Phi™ Coprocessor

Trades a “big” IA core for multiple lower performance IA cores resulting in higher performance for a subset of highly parallel applications

ENGeneral purpose

perf/watt

EPMax perf/watt

w/ Higher Memory BW / freq and QPI ideal for HPC

Xeon EXAdditional

sockets & big memory

EP 4SAdditional compute density

Multi-Core Many-Core

4

Page 5: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.

and/or other countries. *Other names and brands may be claimed as the property of others.5

Highly parallel and vectorized applications, or with need for higher memory bandwidth, will run even faster on Intel® Xeon Phi™ Coprocessors

Most applications will still run best on multi-core Intel® Xeon® processors

Optimizing code often delivers significant performance gains

RUNNING

EXISTING SERIAL SOFTWARE

RUNNING

OPTIMIZEDSOFTWARE

Big Gains for Selected Applications

Medical imaging and biophysics

Computer Aided Design & Manufacturing

Climate modeling & weather prediction

Financial analyses, trading

Energy &oil exploration

Digital content creation

Page 6: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.

and/or other countries. *Other names and brands may be claimed as the property of others.6

YES

Evaluating Your Applicationsfor Intel® Xeon Phi™

NO

YES

YES

YES

Can your workload benefit from more

memory bandwidth?

Can your workload benefit from

large vectors?

NO

NO

Can your workload scale to over 100 threads?

Use Intel® Xeon Phi™ coprocessors for applications that scale with:

• Threads • Vectors • Memory Bandwidth

Page 7: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

Click to edit Master title style

7

Introduction

High-level overview of the Intel® Xeon Phi™ platform: Hardware and Software

Intel Xeon Phi Case Studies

Intel Xeon Phi Ecosystem

Conclusions & References

Page 8: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.

and/or other countries. *Other names and brands may be claimed as the property of others.8

Intel Many Integrated Core (MIC, pronounced “Mike”)

Product Family/Architecture for Highly Parallel Applications

• Based on large number of smaller, low power, Intel Arch. Cores

• 512-bit wide vector engine

• Compliments Intel Xeon processor product line

• Provides breakthrough performance for highly parallel apps

– Familiar x86 programming model– Same source code supports both Intel Xeon processor & Intel Xeon Phi coprocessor– Initially a coprocessor with PCI Express form factor

First products announced at SC12: Code named Knights Corner (KNC)

• Up to 61 cores, 4 threads per core

• Up to 16GB GDDR5 memory (up to 352 GB/s)

• 225-300W (Cooling: Both passive & active SKUs)

• x16 PCIe Form-Factor (requires IA host)

8

Intel® Xeon® Phi™ Product FamilyBased on the Intel MIC Architecture

Page 9: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.

and/or other countries. *Other names and brands may be claimed as the property of others.9

Each Xeon Phi can be addressed asan Individual Node in the Cluster

• 9

6 to 16 GB GDDR5 memory

Page 10: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

INTEL CONFIDENTIAL

• Click to edit Master text styles

‒ Second level

Third level

o Fourth level

Fifth level

Click to edit Master title style

10© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.

and/or other countries. *Other names and brands may be claimed as the property of others.

3 Family Outstanding Parallel Computing Solution

Performance/$ leadership

Intel® Xeon Phi™ Coprocessors

3120P 3120A

5 FamilyOptimized for High Density Environments

Performance/watt leadership

5120D

7 FamilyHighest Level of FeaturesPerformance leadership

7120P 7120X

16GB GDDR5

352 GB/s

> 1.2 TFlops DP

Turbo

T

8GB GDDR5

>300 GB/s

>1 TFlops DP

6GB GDDR5

240 GB/s

>1 TFlops DP

5120P

Page 11: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

Click to edit Master title style

11

Introduction

High-level overview of the Intel® Xeon Phi™ platform: Hardware and Software

Performance Considerations

Intel Xeon Phi Case Studies

Intel Xeon Phi Ecosystem

Conclusions & References

Page 12: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

12

Based on memory access and flops required

• Temporal/spatial locality of data

• Bandwidth Requirement

6 GB/s

Bandwidth

LimitedCore Limited

Stream-triad

BLAS1 & BLAS

2

All

Linpack

DGEMM

Mfg &

Scientific

Sparse

Matrix-

Vector

Scientific

SPECfp2000

All

Reservoir

Simulation

FTDT

Oil & GasKirchhoff

Migration

Oil & Gas

Fluid Dynamics

Ocean Models

ScientificFFT

Oil & Gas

Mil HPC

(Y: Math Kernel; B: Applications; W: Segment)

Option

pricing

FSI

Molecular

Dynamic

Scientific

Application Characterization

RTM

Oil & Gas

Page 13: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

INTEL CONFIDENTIAL13

75

171

0

50

100

150

200

STREAM Triad (GB/s)

330

802

0

200

400

600

800

1000

SMP Linpack (GF/s)

347

887

0

200

400

600

800

1000

DGEMM (GF/s)

728

1,796

0

500

1000

1500

2000

SGEMM (GF/s)

Notes

1. Intel® Xeon® Processor E5-2680 used for all SGEMM Matrix = 12800 x 12800 , DGEMM Matrix 10752 x 10752, SMP Linpack Matrix 26000 x 26000

2. Intel® Xeon Phi™ coprocessor SE10P (ECC on) with “Gold” SW stack SGEMM Matrix = 12800 x 12800, DGEMM Matrix 12800 x 12800, SMP Linpack Matrix 26872 x 28672

3. Average single-node results from measurements across a set of nodes from the TACC+ Stampede* Cluster

+ Texas Advanced Computing Center (TACC) at the University of Texas at Austin.

++ Measured on the TACC+ Stampede Cluster

Coprocessor results: Benchmark run 100% on coprocessor, no help from Intel® Xeon® processor host (aka native)

Synthetic BenchmarksIntel® Xeon Phi™ Coprocessor and Intel® MKL

UP TO

2.4XUP TO

2.5XUP TO

2.2XUP TO

2.4X

Higher is Better

• 2S Intel® Xeon® • Intel Xeon Phi

ECC ON84% Efficient 83% Efficient 75% Efficient

Page 14: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

INTEL CONFIDENTIAL

1.00

3.91

4.634.81

0.00

1.00

2.00

3.00

4.00

5.00

6.00

2S Intel® Xeon® Processor SMP Linpack DGEMM SGEMM

Rela

tive P

erfo

rm

an

ce p

er W

att

(N

orm

alized

to

1.0

Baselin

e o

f a

2 s

ocket

In

tel®

Xeo

pro

cesso

r E

5-2

67

0)

Performance per Watt

Intel® Xeon Phi™ Coprocessor vs. 2S Intel® Xeon® processor (Intel MKL)

14

1 Intel® Xeon Phi™ Coprocessorvs.

2 Socket Intel® Xeon® processor

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific

computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you

in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

Source: Intel Measured results as of October 26, 2012 Configuration Details: Please reference slide speaker notes.

For more information go to http://www.intel.com/performance

Notes:

1. 2 X Intel® Xeon® Processor E5-2670 (2.6GHz, 8C, 115W)

2. Intel® Xeon Phi™ coprocessor 5110P (ECC on) with Gold RC SW stack (Coprocessor power only)

Higher is Better

Coprocessor results: Benchmark run 100% on coprocessor, no help from Intel® Xeon® processor host (aka native)

5110P

Page 15: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

Click to edit Master title style

15

Introduction

High-level overview of the Intel® Xeon Phi™ platform: Hardware and Software

Native, Offload and Variations

Intel Xeon Phi Case Studies

Intel Xeon Phi Ecosystem

Conclusions & References

Page 16: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

INTEL CONFIDENTIAL

• Click to edit Master text styles

‒ Second level

Third level

o Fourth level

Fifth level

Click to edit Master title style

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.

and/or other countries. *Other names and brands may be claimed as the property of others.

Wide Spectrum of Execution Models

General purpose serial and parallel

computing

Codes with highly-parallel phases

Highly-parallel codes

Codes with balanced needs

Main( )Foo( )

MPI_*()

Foo( )

Main( )Foo( )

MPI_*()

Main()Foo( )

MPI_*()

Main( )Foo( )

MPI_*()

Main( )Foo( )

MPI_*()

Multicore

Many-core

Multicore Centric Many-core Centric

(Intel® Xeon® processors) (Intel® Many Integrated Core co-processors)

Multi-core-hosted Offload Symmetric Many-core-hosted

Range of Models to Meet Application Needs

16

Page 17: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.

and/or other countries. *Other names and brands may be claimed as the property of others.

The Intel Manycore Platform Software Stack (MPSS) provides Linux on the coprocessor

17

Linux* OS

Intel® Xeon Phi™ Coprocessor support libraries, tools, and

drivers

Linux* OS

PCI-E Bus PCI-E Bus

Intel® Xeon Phi™ Coprocessor communication and application-

launch support

Intel® Xeon Phi™ Coprocessor Host Processor

System-level code System-level code

User-level codeUser-level code

Page 18: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.

and/or other countries. *Other names and brands may be claimed as the property of others.

Runs either as an accelerator for offloadedhost computation…

18

Linux* OS

Intel® Xeon Phi™ Coprocessor support libraries, tools, and

drivers

Linux* OS

PCI-E Bus PCI-E Bus

Intel® Xeon Phi™ Coprocessor communication and application-

launch support

Intel® Xeon Phi™ Coprocessor Host Processor

System-level code System-level code

User-level codeUser-level code

Offload libraries, user-level driver, user-accessible APIs

and libraries

User code

Host-side offload application

User code

Offload libraries, user-accessible APIs and libraries

Target-side offload applicationAdvantages

• More memory available• Better file access• Host better on serial code• Better uses resources

Page 19: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.

and/or other countries. *Other names and brands may be claimed as the property of others.

…Or runs as a native orMPI* compute node via IP or OFED

19

Linux* OS

Intel® Xeon Phi™ Coprocessor support libraries, tools, and

drivers

Linux* OS

PCI-E Bus PCI-E Bus

Intel® Xeon Phi™ Coprocessor communication and application-

launch support

Intel® Xeon Phi™ Coprocessor Host Processor

System-level code System-level code

User-level codeUser-level code

Advantages• Simpler model

• No directives• Easier port

• Good kernel test

ssh or telnetconnection to coprocessor IP

address

Virtual terminal session

Use if• Not serial • Modest memory• Complex code

Target-side “native” application

User code

Standard OS libraries plus any 3rd-party or

Intel libraries

IB fabric

Page 20: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.

and/or other countries. *Other names and brands may be claimed as the property of others.

Flexible: Enables Multiple Programming Models

20

CPU MIC

CPU MIC

Data

MPI

Data

Net

wo

rk

Homogenous network of many-core CPUs

CPU MIC

CPU MIC

Data

MPI

Data

Net

wo

rk

Data

Data

Heterogeneous network of homogeneous CPUs

CPU MIC

CPU MIC

MPI

Offload

Offload

Net

wo

rk

Data

Data

Homogenous network of heterogeneous nodes

Coprocessor only Host+Offload Symmetric

Page 21: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Click to edit Master text styles

• Second level

– Third level

– Fourth level

– Fifth level

Click to edit Master title style

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.

and/or other countries. *Other names and brands may be claimed as the property of others.

Advisor XEVTune Amplifier XEInspector XETrace Analyzer

Code Analysis

Comprehensive set of SW tools for Xeon and Xeon Phi Programing

Intel Cilk PlusThreading Building BlocksOpenMPOpenCLMPIOffload/Native/MYO

Programming Models

Math Kernel LibraryIntegrated Performance Primitives Intel Compilers

Libraries & Compilers

21

Page 22: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Click to edit Master title style

First Level

• Second level

– Third level

– Fourth level

– Fifth level

INTEL CONFIDENTIAL

22

• Click to edit Master text styles

‒ Second level

Third level

o Fourth level

Fifth level

Click to edit Master title style

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.

and/or other countries. *Other names and brands may be claimed as the property of others.

Options for Thread Parallelism

Intel® Math Kernel Library

OpenMP*

Intel® Threading Building Blocks

Intel® Cilk™ Plus

OpenCL*

Pthreads* and other threading libraries Programmer control

Ease of use / code maintainability

Choice of unified programming to target Intel® Xeon® and Intel® Xeon Phi™ Architecture!

22

Page 23: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

Click to edit Master title style

23

Introduction

High-level overview of the Intel® Xeon Phi™ platform: Hardware and Software

Intel Xeon Phi Case Studies

Intel Xeon Phi Ecosystem

Conclusions & References

Page 24: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

INTEL CONFIDENTIAL24

145X

FASTER

0.46SECONDS

STEP 1.

OPTIMIZE CODE

Parallelize and vectorize code and continue to run on

multi-core Intel Xeon processors

67.097SECONDS

CurrentPerformance

STARTING POINT

Unoptimized serial code running on multi-core

Intel® Xeon® processors

2.3XFASTER

0.197SECONDS

STEP 2.

USE COPROCESSORS

Run all or part of the optimized code on Intel®

Xeon Phi™ coprocessors

The Following Performance Results are Based on Already Optimized Code

SOURCE: INTEL MEASURED RESULTS AS OF NOVEMBER, 2012

Example: A Two-Step Process with SAXPY

Parallelizing for High Performance

340XFASTER

Page 25: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

INTEL CONFIDENTIAL

• Application: Hybrid Monte-Carlo program that simulates lattice QCD with dynamical Wilson fermions. It is one of the main production programs of the QCDSF collaboration (DEISA) and beyond used for quark simulation.

• Status: Many optimizations already in released version; more optimizations and alternative offload model version in development

• Demonstrated Results:

- No source code changes

- Recompiled, selected run-time parameters to get maximum performance

25

Performance Proof-Point: Government and Academic Research

BQCD

“The performance improvement for BQCD using the Intel Xeon Phi coprocessor was reached in record time, requiring only recompilation. We are confident that larger speed-ups can be obtained with modest modifications of the code.”

Prof. Dr. Tilo Wettig

Principal Investigator of the QPACE project

BQCD Scalability Gflops/Sec(Higher is Better)

0

50

100

150

200

250

300

1 2 4 8

SOURCE: INTEL MEASURED MARCH’13

• 2S Intel® Xeon® Processor E5-2670

• Intel® Xeon Phi™ coprocessor–native(pre-production HW/SW)

• 2S Intel Xeon E5-2670 +

Intel® Xeon Phi™ coprocessor–symmetric(pre-production HW/SW)

Page 26: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

INTEL CONFIDENTIAL

• Application: Seismic imaging technique used to obtain a subsurface depth image from input seismic data

• Status: See presentation Rice O&G HPC workshop, http://rice2013.og-hpc.org/technical-program

• Execution Model: Fully Hybrid MPI+OpenMP using symmetric mode

– Highly scalable on cluster

• Code Optimization:

– Minimal source code changes for dynamic load balancing

Performance Proof-Point: Energy Industry

CGG: WAVE EQUATION MIGRATION (WEM)

1

2.57

3.57

6.14

0

1

2

3

4

5

6

7

Speedup(Higher is Better)

• 2S Intel® Xeon® processor E5-2670 4 MPI / 4 OMP

• Intel® Xeon Phi™ Coprocessor (pre-production HW/SW) 12 MPI / 20 OMP

• 2S Intel Xeon processor E5-2670 (4/4)

+ Intel® Xeon Phi™ coprocessor (12/20)(pre-production HW/SW)

• 2S Intel Xeon processor E5-2670 (4/4)

+ 2x Intel® Xeon Phi™ coprocessor (12/20 + 12/20) (pre-production HW/SW)

26 SOURCE: ARSLAN ET AL., CGG 2013, MARCH’13

Page 27: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

INTEL CONFIDENTIAL

• Application: Monte Carlo algorithms are used to evaluate complex instruments, portfolios, and investments. Performance depends on raw computational power and the performance of exp2()

• Status: Case Study available

• Highlights: Dramatic performance scaling for bothsingle-precision and double-precision calculations

• Demonstrated Results:

- Intel® Xeon Phi™ coprocessor fast exp2() and FMA instructions deliver high performance, high accuracy for single precision computations

- Compiler based loop unrolling delivers high performance

- Cache blocking further optimizes cache utilization, reduces cache misses, and makes outer loop vectorization possible

• Read the Case Study: software.intel.com/en-us/articles/case-

study-achieving-high-performance-on-monte-carlo-european-option-on-intel-xeon-phi

27

Performance Proof-Point: Financial Services

MONTE CARLO EUROPEAN OPTIONS

1 1

10.36

3.34

0

2

4

6

8

10

12

Single Precision

Double Precision

Speedup(Higher is Better)

• 2S Intel® Xeon® processor E5-2670

• 2S Intel Xeon processor E5-2670 +

Intel® Xeon Phi™ Coprocessor (pre-production HW/SW)

SOURCE: INTEL MEASURED RESULTS AS OF JULY, 2013

Page 28: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

INTEL CONFIDENTIAL

• Application: Weather Research and Forecasting (WRF)

• Status: WRF V3.5 was released 4/18/13

• Code Optimization:

– Approximately two dozen files with less than 2,000 lines of code were modified (out of approximately 700,000 lines of code in about 800 files, all Fortran standard compliant)

– Most modifications improved performance for both the host and the co-processors

• Performance Measurements: Pre release of WRF 3.5 (V3.5Pre) and NCAR supported CONUS2.5KM benchmark (a high resolution weather forecast)

• Acknowledgments: There were many contributors to these results, including the National Renewable Energy Laboratory and The Weather Channel Companies

Performance Proof-Point: Government and Academic Research

WEATHER RESEARCH AND FORECASTING (WRF)

1

1.4

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Speedup(Higher is Better)

• 2S Intel® Xeon® processor E5-2670 with

eight-node cluster configuration

• 2S Intel® Xeon® processor E5-2670 +

Intel® Xeon Phi™ coprocessor (pre-production HW/SW)

with eight-node cluster configuration

28 SOURCE: INTEL MEASURED RESULTS AS OF JULY, 2013

Page 29: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

INTEL CONFIDENTIAL

• Application: Sandia National Laboratories' best approximation to an unstructured implicit finite element or finite volume application in fewer than 8000 lines of code

• Status: available at http://software.sandia.gov/trac/mantevo/browser/trunk/packages

• Demonstrated Results:- Porting was easy using OpenMP- Substituting an Intel MKL routine for the sparse matrix-

vector product accelerated performance and will simplify future optimization

- The Intel MPI Library enables rapid performance improvement when adding an Intel® Xeon Phi™ coprocessor

• Read the Case Study:

29

Performance Proof-Point: Government and Academic Research

SANDIA MANTEVO miniFE

1

2.2

0

0.5

1

1.5

2

2.5

Speedup(Higher is Better)

• 2S Intel® Xeon® processor E5-2670

• 2S Intel Xeon processor E5-2670 +

Intel® Xeon Phi™ coprocessor (pre-production HW/SW)

SOURCE: INTEL MEASURED RESULTS AS OF MARCH, 2012

“The programming models available for the Intel MIC Architecture are open-standard and portable between traditional processors and Intel Xeon Phi coprocessors. This should allow us to leverage code development across multiple platforms.”James A. Ang, Ph.D.Extreme-scale Computing, Sandia National Laboratories

software.intel.com/en-us/articles/running-minife-on-intel-xeon-phi-coprocessors

Page 30: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

INTEL CONFIDENTIAL30

DEMONSTRATED PERFORMANCE BENEFITSIntel® Xeon Phi™ Coprocessor

UP TO

2.23X

Acceleware 8th Order Isotropic

Variable Velocity2

Seismic

UP TO

2X

Sandia National Labs MiniFE1

Finite Element Analysis

30

1. 8 node cluster, each node with 2S Xeon* (comparison is cluster performance with and without 1 Xeon Phi* per node) (Hetero)2. 2S Xeon* vs. 1 Xeon Phi* (preproduction HW/SW & Application running 100% on coprocessor (unless otherwise noted)3. 2S Xeon* vs. 2S Xeon* + 2 Xeon Phi* (offload)

UP TO

3.54X

China Oil & Gas Geoeast Pre-stack

Time Migration3

SOURCE: INTEL MEASURED RESULTS AS OF NOVEMBER, 2012

Page 31: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

INTEL CONFIDENTIAL31

DEMONSTRATED PERFORMANCE BENEFITSIntel® Xeon Phi™ Coprocessor

UP TO

10.75X

Monte Carlo SP3

Finance

UP TO

2.7X

Jefferson LabLattice QCD

Physics

UP TO 7XBlack-Scholes SP3

31

Notes:1. 2S Xeon* vs. 1 Xeon Phi* (preproduction HW/SW & Application running 100% on coprocessor unless otherwise noted)2. Intel Measured Oct. 20123. Includes additional FLOPS from transcendental function unit

SPEED-UP

2.11X

Intel Labs Ray Tracing2

Embree Ray Tracing

SOURCE: INTEL MEASURED RESULTS AS OF NOVEMBER, 2012

Page 32: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

32

Introduction

High-level overview of the Intel® Xeon Phi™ platform: Hardware and Software

Intel Xeon Phi Case Studies

Intel Xeon Phi Ecosystem

Conclusions & References

Page 33: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

INTEL CONFIDENTIAL

• System: TACC Stampede is a 10 petaflop supercomputer, one of the largest computing systems in the world for open science research. It became operational on January 7, 2013

• Status: In Service

• Workloads: Runs hundreds of applications for thousands of users around the world

• Performance:

– More than 7 petaflops using Intel® Xeon Phi™ coprocessors1

– More than 2 petaflops using the Intel® Xeon®

processor E5 family1

• More Information:

– SC12 interview: insidehpc.com/2012/12/06/video-intel-xeon-phi-powers-7-tacc-stampede-super/

– TACC HPC systems overview: www.tacc.utexas.edu/resources/hpc

Implementation Proof-Point: Government and Academic Research

Texas Advanced Computing Center (TACC)

33

1 http://www.tacc.utexas.edu/resources/hpc/stampede

Page 34: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

INTEL CONFIDENTIAL

System: Located in Southwest China, it contains 16,000 nodes composing the world's largest (public) installation of Intel Ivy Bridge and Xeon Phi’s processors. Each cluster node is formed with

• 2 CPUs hex-core Intel® Xeon® Ivy-Bridge @ 2.2GHz• 3 Intel® Xeon Phi™ cards, each with 57 cores @ 1.1GHz

Performance: Theoretical peak of 54.9 Pflop/s

• 6.8 Pflop/s from 32,000 Xeon Ivy Bridge sockets • 48.1 Pflop/s from 48,000 Xeon Phi cards• for a total of 3,120,000 cores.

30.65 Pflop/s sustained Linpack.

More Information: "Visit to the National University for Defense Technology Changsha, China." Jack Dongarra, University of Tennessee, and Oak Ridge National Laboratory. June 2013. www.netlib.org/utk/people/JackDongarra/PAPERS/tianhe-2-dongarra-report.pdf

Tianhe-2 System: #1 June 2013 Top500 List

34

Page 35: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

INTEL CONFIDENTIALOther brands and names are the property of their respective owners.

A Growing Sotware Ecosystem:Developing today on Intel® Xeon Phi™ coprocessors

Shown at SC’12, November 2012

35

Page 36: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

36

Introduction

High-level overview of the Intel® Xeon Phi™ platform: Hardware and Software

Intel Xeon Phi Case Studies

Intel Xeon Phi Ecosystem

Conclusions & References

Page 37: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Click to edit Master text styles

• Second level

– Third level

– Fourth level

– Fifth level

Click to edit Master title style

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.

and/or other countries. *Other names and brands may be claimed as the property of others.

Conclusions

Intel® Xeon Phi™ coprocessor advantages:

• Comparable performance potential to other accelerators

• Faster time to solution due to reduced development effort

• Better investment protection with a single code base for processors and coprocessors

Flexible and Wide range of programming models: from pure Native to Offloaded – and all variants between

All with the familiar Intel development environment

37

Page 38: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Click to edit Master text styles

• Second level

– Third level

– Fourth level

– Fifth level

Click to edit Master title style

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.

and/or other countries. *Other names and brands may be claimed as the property of others.

One Stop Shop for:

Tools & Software Downloads

Getting Started Development Guides

Video Workshops, Tutorials, & Events

Code Samples & Case Studies

Articles, Forums, & Blogs

Associated Product Links

http://software.intel.com/mic-developer

Intel® Xeon Phi™ Coprocessor DeveloperSite: http://software.intel.com/mic-developer

38

Page 39: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

Obrigado.

Page 40: Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Click to edit Master text styles

• Second level

– Third level

– Fourth level

– Fifth level

Click to edit Master title style

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.

and/or other countries. *Other names and brands may be claimed as the property of others.

INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

Copyright © , Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Xeon Phi, Core, VTune, and Cilk are trademarks of Intel Corporation in the U.S. and other countries.

Optimization Notice

Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

Legal Disclaimer & Optimization Notice

Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

40