gpu systems

39
GPU Systems Advanced Clustering’s offerings for GPGPU computing advanced clustering technologies www.advancedclustering.com • 866.802.8222

Upload: jpaugh

Post on 28-Jan-2015

126 views

Category:

Documents


3 download

DESCRIPTION

Our unique 1U GPU servers allow you to use the latest GPUs (Tesla, GTX285, Quadro FX5800) for visualization or offloading processing in a small form factor. These are built on Intel\'s latest Nehalem processors.

TRANSCRIPT

Page 1: Gpu Systems

GPU SystemsAdvanced Clustering’s offerings for GPGPU computing

advanced clustering technologieswww.advancedclustering.com • 866.802.8222

Page 2: Gpu Systems

what is GPU computing

• The use of a GPU (graphics processing unit) to do general purpose scientific and engineering computing

• Model is to use a CPU and GPU together in a heterogenous computing model

• CPU is used to run sequential portions of application

• Offload parallel computation onto the GPU

2

Page 3: Gpu Systems

history of GPUs

• GPUs designed with fixed function pipelines for real-time 3D graphics

• As complexity of GPU increased they were designed to be more programable to easily implement new features

• Scientists and engineers discovered that the originally purpose built GPUs could also be re-programmed for General Purpose computing on a GPU (GPGPU)

3

Page 4: Gpu Systems

history of GPUs - continued

• The nature of 3D graphics meant GPUs have very fast floating-point units, which are also great for scientific codes

• Originally very difficult to program, GPU vendors have realized another market for their products and developed specially designed GPUs and programming environments for scientific computing

• Most prominent is NVIDIA Tesla GPU and their CUDA programming environment

4

Page 5: Gpu Systems

GPUs vs. CPUs

5

Quad-core CPU

240 Core Tesla GPU•Traditional x86 CPUs are available today with 4 cores: 6, 8, 12 core in the future

•NVIDIA’s Tesla GPU is shipping with 240 cores

Page 6: Gpu Systems

GPUs vs. CPUs - continued

6

Page 7: Gpu Systems

why use GPUs?

• Massively parallel design: 240 cores per GPU

• Nearly 1 teraflop of single precision floating-point performance

• Designed as an accelerator card to add into your existing system - does not replace your current CPU

• Maximum of 4GB of fast dedicated RAM per GPU

• If your code is highly parallel it’s worth investigating

7

Page 8: Gpu Systems

why not use GPUs?

• Fixed RAM sizes on GPU - not upgradable or configurable

• Large power requirements of 188W

• Still requires a host server and CPU to operate

• Specialized development tools required, does not run standard x86 code

• Current development tools are specific to NVIDIA cards - no support for other manufacturer’s GPUs

• Your code maybe difficult to parallelize

8

Page 9: Gpu Systems

developing for GPUs

• Current development model: CUDA parallel environment

• The CUDA parallel programming model guides programmers to partition the problem into coarse sub-problems that can be solved independently in parallel.

• Fine grain parallelism in the sub-problems is then expressed such that each sub-problem can be solved cooperatively in parallel.

• Currently an extension for the C programming language - other languages in development

9

Page 10: Gpu Systems

NVIDIA GPUs

• All of NVIDIA’s recent GPUs support CUDA development

• Tesla cards designed exclusively for CUDA and GPGPU code (no graphics support)

• GeForce cards designed for graphics can be used for CUDA code as well

• Usually slower, less cores, or less RAM - but a great way to get started at low price points

• Development and testing can be done on almost any standard GeForce GPU and run on a Tesla system

10

Page 11: Gpu Systems

GeForce vs. Tesla

11

Page 12: Gpu Systems

GPU future

• More products coming: AMD Stream processor line of products, similar to NVIDIA’s Tesla

• Standard, portable programming via OpenCL

• OpenCL (Open Computing Language) is the first open, royalty-free standard for general-purpose parallel programming. Create portable code for a diverse mix of multi-core CPUs, GPUs, Cell-type architectures and other parallel processors such as DSPs.

• More info: http://www.khronos.org/opencl/

12

Page 13: Gpu Systems

building GPU systems

• Building systems to house GPUs can be difficult:

• Requires lots of engineering and design work to be able to be able to power and cool them correctly

• GPUs were originally designed for visualization and gaming; size and form-factor were not as important

• When used for computation data-center space is limited and expensive - need to find a way to implement GPUs in existing infrastructure

13

Page 14: Gpu Systems

traditional GPU servers

14

•Large tower style cases

•Rackmount servers 4U or larger

•Either choice is not an efficient use of limited data center spaceText

Page 15: Gpu Systems

GPUs are large

15

1.5” deep

10.5” long

4.6” tallThe size of the GPU has

limited it’s application

Page 16: Gpu Systems

GPUs are power hungry

16

=•GPU Cards can use a lot of power - as much as 270W

•Lots of power equals lots of heat

•Difficult to put into a small space and cool effectively

Page 17: Gpu Systems

GPU system options

17

Advanced Clustering has two solutions to the power, heat, and density problems:

NVIDIA’s Tesla S1070

Advanced Clustering’s 15XGPU nodes

Page 18: Gpu Systems

NVIDIA’s tesla S1070

• The S1070 is an external 1U box that contains 4x Tesla C1060 GPUs

• The S1070 must be connected to one or two host servers to operate

• S1070 has one power supply and dedicated cooling for the 4x GPUs

• Only available with the C1060 GPU cards pre-installed

18

Page 19: Gpu Systems

tesla S1070 - front view

19

Page 20: Gpu Systems

tesla S1070 - rear view

20

Page 21: Gpu Systems

tesla S1070 - inside view

21

Page 22: Gpu Systems

host interface cards (HIC)

22

• The Host Interface Card (HIC) connects Tesla S1070 to Server

• Every S1070 requires 2 HICs

• Each HIC bridges the server to two of the four GPUs inside of the S1070

• HICs can be installed in 2 separate servers, or 1 server

• HICs are available in PCI-e 8x and 16x widths

Page 23: Gpu Systems

tesla S1070 block diagram

23

Cables to HICs in Host System(s)

Tesla S1070

Page 24: Gpu Systems

connecting S1070 to 2 servers

24

Tesla S1070

Server#1

Server#2

Most servers do not have enough PCI-e bandwidth, so S1070 is designed to allow connecting to 2 separate machines.

Page 25: Gpu Systems

connecting S1070 to 1 server

25

Tesla S1070

ServerIf the server has enough PCI-e lanes and expansion slots one Tesla S1070 can be connected to one server.

Page 26: Gpu Systems

example cluster of S1070s

26

HIC #1

HIC #2

HIC #1

HIC #2

HIC #1

HIC #2

HIC #1

HIC #2

HIC #1

HIC #2

• 10x 1U compute nodes with 2x CPUs each

• 5 Tesla S1070 with 4x GPUs each

• Balanced system of 20 CPUs and 20 GPUs

• All in 15U of rack space

Page 27: Gpu Systems

S1070s pros and cons

•Pros

• External enclosure to hold GPUs doesn’t require a special server design to hold the GPUs

• Easy to add GPUs to any existing system

• 4 GPUs in only 1U of space

• Multiple HIC card configurations including PCI-e 8x or 16x

• Thermally tested and validated by NVIDIA

•Cons

• Two GPUs share one PCI-e slot in the host server limiting bandwidth to the GPU card

• Most 1U servers only have 1x PCI-e expansion slot which is occupied by the HIC - this limits ability to use interconnects like InfiniBand or 10 Gigabit Ethernet

• Limited configuration options, only Tesla cards, no GeForce or Quadro options

27

Page 28: Gpu Systems

S1070 - specifications

28

Page 29: Gpu Systems

advanced clustering GPU nodes

• The 15XGPU line of systems is a complete two processor server and GPU in 1U

• Server fully configured with latest quad-core Intel Xeon processors, RAM, hard drives, optical, networking, InfiniBand and GPU card

• Flexible to support various GPUs, including:

• Tesla C1060 card

• GeForce series

• Quadro series

29

Page 30: Gpu Systems

GPU node - front

30

Page 31: Gpu Systems

GPU node - rear

31

Page 32: Gpu Systems

GPU node - inside

32

Page 33: Gpu Systems

GPU node - block diagram

33

Advanced Clustering 15XGPU

node

Simplified design, host server completely integrated with GPU no external components

to connect to.

Page 34: Gpu Systems

example cluster of GPU nodes

34

• 15x 1U compute nodes

• 2x CPUs each

• 1x GPU integrated in each node

• Entire system contains 30x CPUs and 15x GPUs

• All in 15U of rack space

Page 35: Gpu Systems

GPU nodes - thermals

35

•System carefully engineered to ensure all components will fit in the small form factor

•Detailed modeling and testing to make sure the system components (CPU and memory) and the GPU are adequately cooled

Page 36: Gpu Systems

GPU nodes pros and cons

•Pros

• Entire server and GPU all enclosed in a 1U package

• Flexibility in GPU choice: Tesla, GeForce, and Quadro supported

• Full PCI-e bandwidth to GPU

• Full-featured server with the latest quad-core Intel Xeon CPUs

• Can be used for more than computation, use the GPU for video output as well

•Cons

• Only 1x GPU per server

• Requires purchase of new servers, not an upgrade or add-on

• Not as dense of a solution as S1070 for 4x GPUs

36

Page 37: Gpu Systems

GPU nodes

• The GPU node concept is unique to Advanced Clustering

• Only vendor shipping a 1U with integrated Tesla or high-end GeForce / Quadro card

• Available for order as the 1X5GPU2

• Dual Quad-Core Intel Xeon 5500 series processors

• Choice of GPU

37

Page 38: Gpu Systems

• Processor

• Two Intel Xeon 5500 Series processors

• Next generation "Nehalem" microarchitecture

• Integrated memory controller and 2x QPI chipset interconnects per processor

• 45nm process technology

• Chipset

• Intel 5500 I/O controller hub

• Memory

• 800MHz, 1066MHz, or 1333MHz DDR3 memory

• Twelve DIMM sockets for support up to 144GB of memory

• GPU

• PCI-e 2.0 16x double height expansion slot for GPU

• Multiple options: Tesla, GeForce, or Quadro cards

• Storage

• Two 3.5" SATA2 drive bay

• Support RAID level 0-1 with Linux software RAID (with 2.5" drives)

• DVD+RW slim-line optical drive

• Management

• Integrated IPMI 2.0 module

• Integrated management controller providing iKVM and remote disk emulation.

• Dedicated RJ45 LAN for management network

• I/O connections

• Two independent 10/100/1000Base-T (Gigabit) RJ-45 Ethernet interfaces

• Two USB 2.0 ports

• One DB-9 serial port (RS-232)

• One VGA port

• Optional ConnectX DDR or QDR InfiniBand connector

• Electrical Requirements

• High-efficiency power supply (greater than 80%)

• Output Power: 560W

• Universal input voltage 100V to 240V

• Frequency: 50Hz to 60Hz, single phase

15XGPU2 - specifications

38

Page 39: Gpu Systems

availability

• Both the Tesla S1070 and 15XGPU GPU nodes are available and shipping now

• For price and custom configuration contact your Account Representative

• (866) 802-8222

[email protected]

• http://www.advancedclustering.com/go/gpu

39