hpc algorithms and applications - intro · massively distributed computing (seti@home, e.g.)...

33
Technische Universit¨ at M ¨ unchen HPC – Algorithms and Applications – Intro – Michael Bader Winter 2012/2013 Michael Bader: HPC – Algorithms and Applications – Intro –, Winter 2012/2013 1

Upload: others

Post on 16-Oct-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HPC Algorithms and Applications - Intro · massively distributed computing (SETI@home, e.g.) Michael Bader: HPC – Algorithms and Applications – Intro –, Winter 2012/2013 8

Technische Universitat Munchen

HPC – Algorithms and Applications

– Intro –

Michael Bader

Winter 2012/2013

Michael Bader: HPC – Algorithms and Applications

– Intro –, Winter 2012/2013 1

Page 2: HPC Algorithms and Applications - Intro · massively distributed computing (SETI@home, e.g.) Michael Bader: HPC – Algorithms and Applications – Intro –, Winter 2012/2013 8

Technische Universitat Munchen

Part I

Scientific Computing andNumerical Simulation

Michael Bader: HPC – Algorithms and Applications

– Intro –, Winter 2012/2013 2

Page 3: HPC Algorithms and Applications - Intro · massively distributed computing (SETI@home, e.g.) Michael Bader: HPC – Algorithms and Applications – Intro –, Winter 2012/2013 8

Technische Universitat Munchen

The Simulation Pipelinephenomenon, process etc.

mathematical model?

modelling

numerical algorithm?

numerical treatment

simulation code?

implementation

results to interpret?

visualization

�����

HHHHj embeddingstatement tool

-

-

-

validation

Michael Bader: HPC – Algorithms and Applications

– Intro –, Winter 2012/2013 3

Page 4: HPC Algorithms and Applications - Intro · massively distributed computing (SETI@home, e.g.) Michael Bader: HPC – Algorithms and Applications – Intro –, Winter 2012/2013 8

Technische Universitat Munchen

The Simulation Pipelinephenomenon, process etc.

mathematical model?

modelling

numerical algorithm?

numerical treatment

simulation code?

parallel implementation

results to interpret?

visualization

�����

HHHHj embeddingstatement tool

-

-

-

validation

Michael Bader: HPC – Algorithms and Applications

– Intro –, Winter 2012/2013 3

Page 5: HPC Algorithms and Applications - Intro · massively distributed computing (SETI@home, e.g.) Michael Bader: HPC – Algorithms and Applications – Intro –, Winter 2012/2013 8

Technische Universitat Munchen

Example – Millennium-XXL Project

(Springel, Angulo, et al., 2010)

• N-body simulation with N = 3 · 1011 “particles”• compute gravitational forces and effects

(every “particle” correspond to ∼ 109 suns)• simulation of the generation of galaxy clusters

plausibility of the “cold dark matter” modelMichael Bader: HPC – Algorithms and Applications

– Intro –, Winter 2012/2013 4

Page 6: HPC Algorithms and Applications - Intro · massively distributed computing (SETI@home, e.g.) Michael Bader: HPC – Algorithms and Applications – Intro –, Winter 2012/2013 8

Technische Universitat Munchen

Example – Millennium-XXL Project (2)

Simulation – HPC-Related Data:• N-body simulation with N = 3 · 1011 “particles”• 10 TB RAM required only to store particles positions and

velocities (single precision)• total memory requirement: 29 TB• JuRoPa supercomputer (Julich)• simulation on 1536 nodes

(each 2x QuadCore, thus 12 288 cores)• hybrid parallelisation: MPI plus OpenMP/Posix threads• runtime: 9.3 days; 300 CPU years in total

Michael Bader: HPC – Algorithms and Applications

– Intro –, Winter 2012/2013 5

Page 7: HPC Algorithms and Applications - Intro · massively distributed computing (SETI@home, e.g.) Michael Bader: HPC – Algorithms and Applications – Intro –, Winter 2012/2013 8

Technische Universitat Munchen

Example – Gordon Bell Prize 2010

(Rahimian, . . . , Biros, 2010)

• direct simulation of blood flow• particulate flow simulation (coupled problem)• Stokes flow for blood plasma• red blood cells as immersed, deformable particles

Michael Bader: HPC – Algorithms and Applications

– Intro –, Winter 2012/2013 6

Page 8: HPC Algorithms and Applications - Intro · massively distributed computing (SETI@home, e.g.) Michael Bader: HPC – Algorithms and Applications – Intro –, Winter 2012/2013 8

Technische Universitat Munchen

Example – Gordon Bell Prize 2010 (2)

Simulation – HPC-Related Data:• up to 260 Mio blood cells, up to 9 · 1010 unknowns• fast multipole method to compute Stokes flow

(octree-based; octree-level 4–24)• scalability: 327 CPU-GPU nodes on Keeneland cluster,

200,000 AMD cores on Jaguar (ORNL)• 0.7 Petaflops/s sustained performance on Jaguar• extensive use of GEMM routine (matrix multiplication)• runtime: ≈ 1 minute per time step

Article for Supercomputing conference:http://www.cc.gatech.edu/~gbiros/papers/sc10.pdf

Michael Bader: HPC – Algorithms and Applications

– Intro –, Winter 2012/2013 7

Page 9: HPC Algorithms and Applications - Intro · massively distributed computing (SETI@home, e.g.) Michael Bader: HPC – Algorithms and Applications – Intro –, Winter 2012/2013 8

Technische Universitat Munchen

“Faster, Bigger, More”

Why parallel high performance computing:• response time: compute a problem in 1

p time• speed up engineering processes• real-time simulations (tsunami warning?)

• Problem size: compute a p-times bigger problem• Simulation of multi-scale phenomena• maximal problem size that “fits into the machine”

• Throughput: compute p problems at once• case and parameter studies, statistical risk scenarios,

etc.• massively distributed computing (SETI@home, e.g.)

Michael Bader: HPC – Algorithms and Applications

– Intro –, Winter 2012/2013 8

Page 10: HPC Algorithms and Applications - Intro · massively distributed computing (SETI@home, e.g.) Michael Bader: HPC – Algorithms and Applications – Intro –, Winter 2012/2013 8

Technische Universitat Munchen

Part II

HPC in the Literature – Past andPresent Trends

Michael Bader: HPC – Algorithms and Applications

– Intro –, Winter 2012/2013 9

Page 11: HPC Algorithms and Applications - Intro · massively distributed computing (SETI@home, e.g.) Michael Bader: HPC – Algorithms and Applications – Intro –, Winter 2012/2013 8

Technische Universitat Munchen

Four Horizons for Enhancing the Performance . . .

of Parallel Simulations Based on Partial Differential Equations(David Keyes, 2000)

1. Expanded Number of Processors→ in 2000: 1000 cores; in 2010: 200,000 cores

2. More Efficient Use of Faster Processors→ PDF working-sets, cache efficiency

3. More Architecture-Friendly Algorithms→ improve temporal/spatial locality

4. Algorithms Delivering More “Science per Flop”

→ adaptivity (in space and time), higher-order methods,fast solvers

Michael Bader: HPC – Algorithms and Applications

– Intro –, Winter 2012/2013 10

Page 12: HPC Algorithms and Applications - Intro · massively distributed computing (SETI@home, e.g.) Michael Bader: HPC – Algorithms and Applications – Intro –, Winter 2012/2013 8

Technische Universitat Munchen

The Seven Dwarfs of HPC

“dwarfs” = key algorithmic kernels in many scientificcomputing applications

P. Colella (LBNL), 2004:1. dense linear algebra2. sparse linear algebra3. spectral methods4. N-body methods5. structured grids6. unstructured grids7. Monte Carlo

Michael Bader: HPC – Algorithms and Applications

– Intro –, Winter 2012/2013 11

Page 13: HPC Algorithms and Applications - Intro · massively distributed computing (SETI@home, e.g.) Michael Bader: HPC – Algorithms and Applications – Intro –, Winter 2012/2013 8

Technische Universitat Munchen

Computational Science Demands a New Paradigm

Computational simulation must meet three challenges tobecome a mature partner of theory and experiment(Post & Votta, 2005)

1. performance challenge→ exponential growth of performance, massively parallel

architectures2. programming challenge→ new (parallel) programming models

3. prediction challenge→ careful verification and validation of codes; towards

reproducible simulation experiments

Michael Bader: HPC – Algorithms and Applications

– Intro –, Winter 2012/2013 12

Page 14: HPC Algorithms and Applications - Intro · massively distributed computing (SETI@home, e.g.) Michael Bader: HPC – Algorithms and Applications – Intro –, Winter 2012/2013 8

Technische Universitat Munchen

Free Lunch is Over(∗). . . actually already over for quite some time!

Speedup of your softwarecan only come from parallelism:• clock speed of CPU has stalled• instruction-level parallelism

per core has stalled• number of cores is growing• size of vector units is growing

(∗) Quote and image taken from: H. Sutter, The Free Lunch Is Over:A Fundamental Turn Toward Concurrency in Software,Dr. Dobb’s Journal 30(3), March 2005.

Michael Bader: HPC – Algorithms and Applications

– Intro –, Winter 2012/2013 13

Page 15: HPC Algorithms and Applications - Intro · massively distributed computing (SETI@home, e.g.) Michael Bader: HPC – Algorithms and Applications – Intro –, Winter 2012/2013 8

Technische Universitat Munchen

Performance Development in Supercomputing

Michael Bader: HPC – Algorithms and Applications

– Intro –, Winter 2012/2013 14

Page 16: HPC Algorithms and Applications - Intro · massively distributed computing (SETI@home, e.g.) Michael Bader: HPC – Algorithms and Applications – Intro –, Winter 2012/2013 8

Technische Universitat Munchen

Performance Development in Supercomputing

Michael Bader: HPC – Algorithms and Applications

– Intro –, Winter 2012/2013 14

Page 17: HPC Algorithms and Applications - Intro · massively distributed computing (SETI@home, e.g.) Michael Bader: HPC – Algorithms and Applications – Intro –, Winter 2012/2013 8

Technische Universitat Munchen

Top 500 (www.top500.org) – June 2012

CONTACT SUBMISSIONS LINKS HOME

Like 1237 likes. Sign Up to see whatyour friends like.

June 2012

November 2011

June 2011

November 2010

June 2010

Recent Releases

next

Home Lists June 2012

TOP500 List - June 2012 (1-100)Rmax and Rpeak values are in TFlops. For more details about other fields, check the TOP500 description.

Power data in KW for entire system

Rank Site Computer/Year Vendor Cores Rmax Rpeak Power

1 DOE/NNSA/LLNLUnited States

Sequoia - BlueGene/Q, Power BQC 16C 1.60GHz, Custom / 2011IBM

1572864 16324.75 20132.66 7890.0

2RIKEN Advanced Institute forComputational Science (AICS)Japan

K computer, SPARC64 VIIIfx 2.0GHz, Tofuinterconnect / 2011Fujitsu

705024 10510.00 11280.38 12659.9

3 DOE/SC/Argonne National LaboratoryUnited States

Mira - BlueGene/Q, Power BQC 16C 1.60GHz,Custom / 2012IBM

786432 8162.38 10066.33 3945.0

4 Leibniz RechenzentrumGermany

SuperMUC - iDataPlex DX360M4, XeonE5-2680 8C 2.70GHz, Infiniband FDR / 2012IBM

147456 2897.00 3185.05 3422.7

5National Supercomputing Center inTianjinChina

Tianhe-1A - NUDT YH MPP, Xeon X5670 6C2.93 GHz, NVIDIA 2050 / 2010NUDT

186368 2566.00 4701.00 4040.0

6 DOE/SC/Oak Ridge National LaboratoryUnited States

Jaguar - Cray XK6, Opteron 6274 16C2.200GHz, Cray Gemini interconnect, NVIDIA2090 / 2009Cray Inc.

298592 1941.00 2627.61 5142.0

7 CINECAItaly

Fermi - BlueGene/Q, Power BQC 16C1.60GHz, Custom / 2012IBM

163840 1725.49 2097.15 821.9

8 Forschungszentrum Juelich (FZJ)Germany

JuQUEEN - BlueGene/Q, Power BQC 16C1.60GHz, Custom / 2012IBM

131072 1380.39 1677.72 657.5

9 CEA/TGCC-GENCIFrance

Curie thin nodes - Bullx B510, Xeon E5-26808C 2.700GHz, Infiniband QDR / 2012Bull

77184 1359.00 1667.17 2251.0

10National Supercomputing Centre inShenzhen (NSCS)China

Nebulae - Dawning TC3600 Blade System,Xeon X5650 6C 2.66GHz, Infiniband QDR,NVIDIA 2050 / 2010Dawning

120640 1271.00 2984.30 2580.0

11 NASA/Ames Research Center/NASUnited States

Pleiades - SGI Altix ICE X/8200EX/8400EX,Xeon 54xx 3.0/5570/5670/E5-2670 2.93/2.6/3.06/3.0 Ghz, Infiniband QDR/FDR / 2011SGI

125980 1243.00 1731.84 3987.0

PROJECT LISTS STATISTICS RESOURCES NEWS

1 of 6 14.7.2012 17:43

Michael Bader: HPC – Algorithms and Applications

– Intro –, Winter 2012/2013 15

Page 18: HPC Algorithms and Applications - Intro · massively distributed computing (SETI@home, e.g.) Michael Bader: HPC – Algorithms and Applications – Intro –, Winter 2012/2013 8

Technische Universitat Munchen

Top 500 Spotlights – Sequoia and K Computer

Sequoia→ IBM BlueGene/Q (LLNL)• 98,304 compute nodes; 1.6 mio cores(!)• Linpack benchmark: 16.32 PFlop/s• ≈ 8 MW power

K Computer→ SPARC64 (RIKEN, Kobe)• 98,304 compute nodes; 705,024 cores• Linpack benchmark: 10.51 PFlop/s• ≈ 12 MW power• SPARC64 VIIIfx 2.0GHz (8-core CPU)

Michael Bader: HPC – Algorithms and Applications

– Intro –, Winter 2012/2013 16

Page 19: HPC Algorithms and Applications - Intro · massively distributed computing (SETI@home, e.g.) Michael Bader: HPC – Algorithms and Applications – Intro –, Winter 2012/2013 8

Technische Universitat Munchen

Top 500 Spotlights – Sequoia and K Computer

Sequoia→ IBM BlueGene/Q (LLNL)• 98,304 compute nodes; 1.6 mio cores(!)• Linpack benchmark: 16.32 PFlop/s• ≈ 8 MW power

K Computer→ SPARC64 (RIKEN, Kobe)• 98,304 compute nodes; 705,024 cores• Linpack benchmark: 10.51 PFlop/s• ≈ 12 MW power• SPARC64 VIIIfx 2.0GHz (8-core CPU)

Michael Bader: HPC – Algorithms and Applications

– Intro –, Winter 2012/2013 16

Page 20: HPC Algorithms and Applications - Intro · massively distributed computing (SETI@home, e.g.) Michael Bader: HPC – Algorithms and Applications – Intro –, Winter 2012/2013 8

Technische Universitat Munchen

International Exascale Software Project RoadmapTowards an Exa-Flop/s Platform in 2018 (www.exascale.org):

1. technology trends→ concurrency, reliability, power consumption, . . .→ blueprint of an exascale system:

10-billion-way concurrency, 100 million to 1 billioncores, 10-to-100-way concurrency per core,hundreds of cores per die, . . .

2. science trends→ climate, high-energy physics, nuclear physics, fusion

energy sciences, materials science and chemistry, . . .3. X-stack (software stack for exascale→ energy, resiliency, heterogeneity, I/O and memory

4. Polito-economic trends→ exascale systems run by government labs, used by

CSE scientistsMichael Bader: HPC – Algorithms and Applications

– Intro –, Winter 2012/2013 17

Page 21: HPC Algorithms and Applications - Intro · massively distributed computing (SETI@home, e.g.) Michael Bader: HPC – Algorithms and Applications – Intro –, Winter 2012/2013 8

Technische Universitat Munchen

Exascale Roadmap“Aggressively Designed Strawman Architecture”

Level What Perform. Power RAMFPU FPU, regs,. instr.-memory 1.5 Gflops 30mWCore 4 FPUs, L1 6 Gflops 141mW

Proc. Chip 742 cores, L2/L3, Interconn. 4.5 Tflops 214WNode Proc. chip, DRAM 4.5 Tflops 230W 16 GBGroup 12 proc. chips, routers 54 Tflops 3.5KW 192 GBrack 32 groups 1.7 Pflops 116KW 6.1 TB

System 583 racks 1 Eflops 67.7MW 3.6 PB

approx. 285,000 cores per rack; 166 mio cores in total

Source: ExaScale Computing Study: Technology Challenges in AchievingExascale Systems

Michael Bader: HPC – Algorithms and Applications

– Intro –, Winter 2012/2013 18

Page 22: HPC Algorithms and Applications - Intro · massively distributed computing (SETI@home, e.g.) Michael Bader: HPC – Algorithms and Applications – Intro –, Winter 2012/2013 8

Technische Universitat Munchen

Exascale Roadmap – Should You Bother?

Your department’s compute cluster in 5 years?

• a Petaflop System!• “one rack of the Exaflop system”→ using the same/similar hardware

• extrapolated example machine:• peak performance: 1.7 PFlop/s• 6 TB RAM, 60 GB cache memory• “total concurrency”: 1.1 · 106

• number of cores: 280,000• number of chips: 384

Source: ExaScale Software Study: Software Challenges in ExtremeScale Systems

Michael Bader: HPC – Algorithms and Applications

– Intro –, Winter 2012/2013 19

Page 23: HPC Algorithms and Applications - Intro · massively distributed computing (SETI@home, e.g.) Michael Bader: HPC – Algorithms and Applications – Intro –, Winter 2012/2013 8

Technische Universitat Munchen

Exascale Roadmap – Should You Bother?

Your department’s compute cluster in 5 years?

• a Petaflop System!

• “one rack of the Exaflop system”→ using the same/similar hardware

• extrapolated example machine:• peak performance: 1.7 PFlop/s• 6 TB RAM, 60 GB cache memory• “total concurrency”: 1.1 · 106

• number of cores: 280,000• number of chips: 384

Source: ExaScale Software Study: Software Challenges in ExtremeScale Systems

Michael Bader: HPC – Algorithms and Applications

– Intro –, Winter 2012/2013 19

Page 24: HPC Algorithms and Applications - Intro · massively distributed computing (SETI@home, e.g.) Michael Bader: HPC – Algorithms and Applications – Intro –, Winter 2012/2013 8

Technische Universitat Munchen

Exascale Roadmap – Should You Bother?

Your department’s compute cluster in 5 years?

• a Petaflop System!• “one rack of the Exaflop system”→ using the same/similar hardware

• extrapolated example machine:• peak performance: 1.7 PFlop/s• 6 TB RAM, 60 GB cache memory• “total concurrency”: 1.1 · 106

• number of cores: 280,000• number of chips: 384

Source: ExaScale Software Study: Software Challenges in ExtremeScale Systems

Michael Bader: HPC – Algorithms and Applications

– Intro –, Winter 2012/2013 19

Page 25: HPC Algorithms and Applications - Intro · massively distributed computing (SETI@home, e.g.) Michael Bader: HPC – Algorithms and Applications – Intro –, Winter 2012/2013 8

Technische Universitat Munchen

Exascale Roadmap – Should You Bother?

Your department’s compute cluster in 5 years?

• a Petaflop System!• “one rack of the Exaflop system”→ using the same/similar hardware

• extrapolated example machine:• peak performance: 1.7 PFlop/s• 6 TB RAM, 60 GB cache memory• “total concurrency”: 1.1 · 106

• number of cores: 280,000• number of chips: 384

Source: ExaScale Software Study: Software Challenges in ExtremeScale Systems

Michael Bader: HPC – Algorithms and Applications

– Intro –, Winter 2012/2013 19

Page 26: HPC Algorithms and Applications - Intro · massively distributed computing (SETI@home, e.g.) Michael Bader: HPC – Algorithms and Applications – Intro –, Winter 2012/2013 8

Technische Universitat Munchen

Your Department’s PetaFlop/s Cluster in 5 Years?

Tianhe-1A (Tianjin, China; Top500 # 5 )• # 5 in Top 500 (July 2012)• 14,336 Xeon X5670 CPUs• 7,168 Nvidia Tesla M2050 GPUs• Linpack benchmark: 2.56 PFlop/s• ≈ 4 MW power

Discovery (Intel, Top500 # 150)• 9800 cores (Xeon and MIC/“many integrated cores”)• Linpack benchmark: 118 TFlop/s• Knights Corner / Intel Xeon Phi / Intel MIC as accelerator• “>50 cores”, roughly 1.1–1.3 GHz• wider vector FP units: 64 bytes (i.e., 16 floats, 8 doubles)

Michael Bader: HPC – Algorithms and Applications

– Intro –, Winter 2012/2013 20

Page 27: HPC Algorithms and Applications - Intro · massively distributed computing (SETI@home, e.g.) Michael Bader: HPC – Algorithms and Applications – Intro –, Winter 2012/2013 8

Technische Universitat Munchen

Your Department’s PetaFlop/s Cluster in 5 Years?

Tianhe-1A (Tianjin, China; Top500 # 5 )• # 5 in Top 500 (July 2012)• 14,336 Xeon X5670 CPUs• 7,168 Nvidia Tesla M2050 GPUs• Linpack benchmark: 2.56 PFlop/s• ≈ 4 MW power

Discovery (Intel, Top500 # 150)• 9800 cores (Xeon and MIC/“many integrated cores”)• Linpack benchmark: 118 TFlop/s• Knights Corner / Intel Xeon Phi / Intel MIC as accelerator• “>50 cores”, roughly 1.1–1.3 GHz• wider vector FP units: 64 bytes (i.e., 16 floats, 8 doubles)

Michael Bader: HPC – Algorithms and Applications

– Intro –, Winter 2012/2013 20

Page 28: HPC Algorithms and Applications - Intro · massively distributed computing (SETI@home, e.g.) Michael Bader: HPC – Algorithms and Applications – Intro –, Winter 2012/2013 8

Technische Universitat Munchen

Top 500 (www.top500.org) – # Cores

CONTACT SUBMISSIONS LINKS HOME

Like 1237 likes. Sign Up to see whatyour friends like.

June 2012

November 2011

June 2011

November 2010

June 2010

Recent Releases

Seymour Cray

Anyone can build a fast CPU.The trick is to build a fastsystem.

TOP500 Release:

Category:

TOP500 Statistics

Rmax and Rpeak values are in GFlops. For more details about other fields, check the TOP500 description.

Cores per Socket System Share

648161221091

47%15.2%

21.6%

Cores per Socket Performance Share

64816122109Other

22.5%

11.6%34%

25.7%

PROJECT LISTS STATISTICS RESOURCES NEWS

1 of 2 14.7.2012 17:49

Michael Bader: HPC – Algorithms and Applications

– Intro –, Winter 2012/2013 21

Page 29: HPC Algorithms and Applications - Intro · massively distributed computing (SETI@home, e.g.) Michael Bader: HPC – Algorithms and Applications – Intro –, Winter 2012/2013 8

Technische Universitat Munchen

Top 500 (www.top500.org) – # Cores

CONTACT SUBMISSIONS LINKS HOME

Like 1237 likes. Sign Up to see whatyour friends like.

June 2012

November 2011

June 2011

November 2010

June 2010

Recent Releases

Seymour Cray

Anyone can build a fast CPU.The trick is to build a fastsystem.

TOP500 Release:

Category:

TOP500 Statistics

Rmax and Rpeak values are in GFlops. For more details about other fields, check the TOP500 description.

Cores per Socket System Share

648161221091

47%15.2%

21.6%

Cores per Socket Performance Share

64816122109Other

22.5%

11.6%34%

25.7%

PROJECT LISTS STATISTICS RESOURCES NEWS

1 of 2 14.7.2012 17:49

Michael Bader: HPC – Algorithms and Applications

– Intro –, Winter 2012/2013 22

Page 30: HPC Algorithms and Applications - Intro · massively distributed computing (SETI@home, e.g.) Michael Bader: HPC – Algorithms and Applications – Intro –, Winter 2012/2013 8

Technische Universitat Munchen

Part III

Organisation

Michael Bader: HPC – Algorithms and Applications

– Intro –, Winter 2012/2013 23

Page 31: HPC Algorithms and Applications - Intro · massively distributed computing (SETI@home, e.g.) Michael Bader: HPC – Algorithms and Applications – Intro –, Winter 2012/2013 8

Technische Universitat Munchen

Lecture Topics – The Seven Dwarfs of HPC

Algorithms, parallelisation, HPC-aspectsfor problems related to:

1. dense linear algebra2. sparse linear algebra3. spectral methods4. N-body methods5. structured grids6. unstructured grids7. Monte Carlo

Michael Bader: HPC – Algorithms and Applications

– Intro –, Winter 2012/2013 24

Page 32: HPC Algorithms and Applications - Intro · massively distributed computing (SETI@home, e.g.) Michael Bader: HPC – Algorithms and Applications – Intro –, Winter 2012/2013 8

Technische Universitat Munchen

Tutorials – Computing on Manycore Hardware

Examples on GPUs using CUDA:• dense linear algebra• structured grids• sparse linear algebra• N-body methods

Tutors and time/place:• Oliver Meister• roughly bi-weekly tutorials (90min)

(see website for exact schedule)• small “dwarf-related” projects

(1–2 tutorial lessons per project)

Michael Bader: HPC – Algorithms and Applications

– Intro –, Winter 2012/2013 25

Page 33: HPC Algorithms and Applications - Intro · massively distributed computing (SETI@home, e.g.) Michael Bader: HPC – Algorithms and Applications – Intro –, Winter 2012/2013 8

Technische Universitat Munchen

Exams, ECTS, Modules

Exam:• written or oral exam depending on number of participants• include exercises (as part of the exam)

ECTS, Modules• 4 ECTS (2+1 lectures per week)• Informatik/Computer Science: Master catalogue• CSE: Application area• others?

Michael Bader: HPC – Algorithms and Applications

– Intro –, Winter 2012/2013 26