instrumenting parsecs raytrace

31
Instrumenting a benchmark application Tools and Measurements Techniques Project by Mário Almeida (EMDC) Barcelona, 25 April 2012

Upload: mario-almeida

Post on 08-Jun-2015

876 views

Category:

Technology


0 download

DESCRIPTION

(Check my blog @ http://www.marioalmeida.eu/ ) In this presentation I present the performance metrics and results of running the parsec benchmark with the raytrace application on Upc's boada server

TRANSCRIPT

Page 1: Instrumenting parsecs raytrace

Instrumenting a benchmark applicationTools and Measurements TechniquesProject by Mário Almeida (EMDC)

Barcelona, 25 April 2012

Page 2: Instrumenting parsecs raytrace

Index (1/2)Tools and configuration● Parsec

○ Overview○ Benchmark programs

● Extrae● Paraver● Configuration

1

Page 3: Instrumenting parsecs raytrace

Index (2/2)Measurements● Raytrace

○ Overview○ Code○ Inputs○ Traces○ Load Balancing○ Cache misses and instructions○ Execution time○ Configuration comparisons○ Extrae overhead

Conclusions 2

Page 4: Instrumenting parsecs raytrace

Tools and configuration

Page 5: Instrumenting parsecs raytrace

ParsecOverview● Benchmark with the following characteristics:

○ Multithreaded○ Emerging workloads○ Diverse○ Not HPC-focused○ Research

3

Page 6: Instrumenting parsecs raytrace

ParsecBenchmark programs● blackscholes● bodytrack● canneal● dedup● facesim● ferret● fluidanimate● freqmine● raytrace● ... 4

Page 7: Instrumenting parsecs raytrace

Extrae● Instrumentation package to trace programs

and run with shared memory model and message passing programming.

5

Page 8: Instrumenting parsecs raytrace

Paraver● Detailed quantitative analysis of a program

performance.● Concurrent comparative analysis of several

traces.● Support for mixed message passing and

shared memory.● Building of derived metrics.

6

Page 9: Instrumenting parsecs raytrace

Configuration (1/4)Boada server:

● Dual CPU Six Core with Hyperthreading.● Kills applications after a few minutes.● 24 GB of RAM.

Boada server:

● Used cpulimit to limit the cpu usage up to four cores.

7

Page 10: Instrumenting parsecs raytrace

Configuration (2/4)Installed and/or configured:

● Parsec 2.1 with raytrace package only.● Extrae 2.2.1.● Paraver 4.3.0 (in my laptop).● CpuLimit● Minor configurations on .bashrc.● Multiple scripts to clean, build and run.

8

Page 11: Instrumenting parsecs raytrace

Configuration (3/4)

9

Page 12: Instrumenting parsecs raytrace

Configuration (4/4)

10

Page 13: Instrumenting parsecs raytrace

Measurements

Page 14: Instrumenting parsecs raytrace

RaytraceOverview● Physical simulation for visualization● Computer animation● Input is a complex object of many triangles.

11

Page 15: Instrumenting parsecs raytrace

RaytraceCodeFor every pixel in the image

calculate trajectory of ray striking pixelfind closest intersection point of ray with scene

geometrycalculate contribution of all lights at intersection pointrecursively trace specularly reflected ray

end for

12

Page 16: Instrumenting parsecs raytrace

RaytraceInputs● simsmall - 1 million polygons (480x270)● simmedium - 1 million poly (960x540)● simlarge - 1 million poly (1920x1080)● native - 10 million poly (1920x1080)

13

Page 17: Instrumenting parsecs raytrace

RaytraceTrace (1/2)Only 10% of the execution time is parallel!

14

Not created Running

Page 18: Instrumenting parsecs raytrace

Render time is proportional to the # of frames!

RaytraceTrace (2/2)

15

RenderInit and adding object Build Context

Page 19: Instrumenting parsecs raytrace

RaytraceLoad balancing (1/2)

16Not created

Barrier

Create Threads Task

Wait for all threads

Page 20: Instrumenting parsecs raytrace

Good load balancing between the slave threads.

RaytraceLoad balancing (2/2)

17

Page 21: Instrumenting parsecs raytrace

RaytraceCache and instructions

18

High number of cache misses Very low number of cache misses

There were no significative diferences of IPC between threads.

Page 22: Instrumenting parsecs raytrace

RaytraceExecution time (1/3)

These are average times from multiple executions of the parallel code only and without extrae overhead.There was a high average deviation of 0.3 seconds in the experiments.Bigger inputs were more accurate.

19

Page 23: Instrumenting parsecs raytrace

RaytraceExecution time (2/3)

There was a smaller average deviation of 0.03 seconds. With 64 threads it runs almost three times faster!

20

Page 24: Instrumenting parsecs raytrace

RaytraceExecution time (3/3)

There was a even smaller average deviation of 0.02 seconds. With 64 threads it runs almost three times faster!

21

Page 25: Instrumenting parsecs raytrace

RaytraceConfiguration comparison

22

In the case of the limited configuration, although perfomance doesn't seem to degrade, the execution time seems to stabilize for more than 8 threads.

Page 26: Instrumenting parsecs raytrace

RaytraceExtrae overhead

23

Page 27: Instrumenting parsecs raytrace

Conclusions

Page 28: Instrumenting parsecs raytrace

Conclusions (1/3)● The system seemed to perform worse for a

number of threads multiple of the total number of physical cores.

● The program has a good load balancing. ● Fine-granular parallelism.

24

Page 29: Instrumenting parsecs raytrace

Conclusions (2/3)● Although it wasn't possible to verify,

increasing the input should cause higher cache misses, because of the big working sets that won't fit on the memory.

● Memory bandwidth should be the main issue

for good speedups. ● Boada killed almost all the native input

executions. 25

Page 30: Instrumenting parsecs raytrace

Conclusions (3/3)● Paraver simplifies the process of analyzing

an application performance. ● Better knowledge of the systems

architecture would be needed in order further analyse the performance of the application.

26

Page 31: Instrumenting parsecs raytrace

Questions