Download - INTEL HPC DEVELOPER CONFERENCE Fuel Your Insight · PDF fileINTEL® HPC DEVELOPER CONFERENCE Fuel Your Insight ... IceT Compositing ... - All ranks not master run commands specified

INTEL® HPC DEVELOPER CONFERENCE

Fuel Your Insight

Large-scale Distributed Rendering with the OSPRay Ray Tracing Framework

Carson Brownlee

Shared-memory

Distributed-memory

Why MPI?

Data that exceeds the memory limits of a single node

Performance limitations

Tiled displays

In Situ

Strong Scaling

Weak Scaling

High Fidelity Rendering

Related Work

Sending Rays, Kilauea - Kato ’01,’02,’03

Interactive Ray Tracing on Clusters - Wald et al. ‘03

Distributed Shared Memory - DeMarle et al. ‘03

IceT Compositing - Moreland et al. ’11

Multiple Device

API commands are processed on the appropriate active device. This provides a modular backend for processing API calls. Currently these include:

1. Local

2. MPI

3. COI (Now deprecated in favor of MPI)

Using OSPRay MPI

Compile

OSPRAY_BUILD_MPI_DEVICE=ON

Requires MPI Library with multi-threading support (IMPI recommended)

OSPRAY_EXP_DATA_PARALLEL=ON (expiremental)

Run

mpirun -n 3 ./ospGlutViewer —osp:mpi teapot.obj (mpirun args vary)

mpirun -ppn 1 -n 1 -host localhost ./ospGlutViewer —osp:mpi teapot.obj : -n 2 -host n1, n2 ./ospray_mpi_worker —osp:mpi

ParaView

VTKOSPRAY_ARGS=“—osp:mpi” mpirun -ppn 1 -n 1 -host localhost ./paraview : -n 1 -host n1, n2 ./ospray_mpi_worker —osp:mpi

Distributed Framebuffer

Data replicated and Data distributed

Tile ownership

Stores accumulation buffer locally

Pixel Operations

Processed tiles with framebuffer colors sent to display node

Tiling Pseudocode

Load Balancing

Static load balancing

Tiles are strided to avoid work imbalance

1 2 3 1 2 3

2 3 1 2 3 1

3 1 2 3 1 2

Work API Comm

API:

ospRenderFrame() {…}

MPIDevice:

MPIDevice::renderFrame()

{

work::RenderFrame work(_fb, _renderer, fbChannelFlags);

processWork(&work);

}

Work:

void RenderFrame::serialize(SerialBuffer &b) const {

b << (int64)fbHandle << (int64)rendererHandle << channels;

}

Work API Comm

Work:

void RenderFrame::run() {

FrameBuffer *fb = (FrameBuffer*)fbHandle.lookup();

Renderer *renderer = (Renderer*)rendererHandle.lookup();

renderer->renderFrame(fb, channels);

}

Worker:

mpi::recv(mpi::Address(&mpi::app, (int32)mpi::RECV_ALL), workCommands);

for (work::Work *&w : workCommands)

w->run();

Async Comm Layer

Actions are separated into

receive queue

process queue

send queue

Async Comm Layer

struct MasterTileMessage : public mpi::async::CommLayer::Message {

vec2i coords;

float error;

uint32 color[TILE_SIZE][TILE_SIZE];

};

void DFB::incoming(mpi::async::CommLayer::Message *_msg) {

switch (_msg->command) {

case MASTER_WRITE_TILE_NONE:

this->processMessage((MasterTileMessage_NONE*)_msg);

break;

}

Distributed Data

Currently experimental and only for Volume data

env var OSPRAY_DATA_PARALLEL=blockXxBlockYxBlockZ

Data is projected onto tiles, all nodes determine tile overlap

Tiles sent to owning node for compositing

Strong Scaling

Distributed API

Ability to specify what is run where

3 Modes:

Master/Slave

- All ranks not master run commands specified from master rank

Collaborative

- All ranks make the same commands

Independent

- run locally

D-API Example - Distributed Volume Rendering

Sync: initialization

Sync: create shared volume

Local: create resident volume section

Local: add local volume to synchronous volume

Master: add annotations

Sync: render

Distributed API

ospdApiMode(OSPD_MODE_INDEPENDENT);

OSPVolume localVol = ospNewVolume("shared_structured_volume");

OSPData ospLocalVolData = ospNewData(volumeData.size(), OSP_UCHAR, volumeData.data(), OSP_DATA_SHARED_BUFFER);

ospCommit(ospLocalVolData);

// Switch back to collaborative mode and commit the collab volume and add it to the world

ospdApiMode(OSPD_MODE_COLLABORATIVE);

ospCommit(volume);

ospAddVolume(world, volume);

ospCommit(world);

D-API Implementation

void MPIDevice::processWork(work::Work* work)

{

if (currentApiMode == OSPD_MODE_MASTER) {

mpi::send(mpi::Address(&mpi::worker,(int32)mpi::SEND_ALL), work);

} else if (currentApiMode == OSPD_MODE_COLLABORATIVE) {

// sync calls

}

work->run();

}

Tiled Displays

DisplayWald - Experimental

Built as an OSPRay module

Requires MPI

Stereo supported

Routing through single head node supported if display nodes not accessible from compute nodes

DisplayWald - Experimental

Server (displays):

mpirun -perhost 1 -n 6 ./ospDisplayWald -w 3 -h 2 --no-head-node

mpirun -perhost 1 -n 6 ./ospDisplayWald -w 3 -h 2 —head-node

// will output hostname and port

Client (renderer):

mpirun -n ./ospDwViewer —display-wall-host host:port

Performance Tips

Wayness - single MPI process per node ideal

Excessive API calls can currently cause very long load times

Affinity issues - check CPU utilization pegged at 100%.

KNL cache mode - OSPRay runs best in cache/quadrant mode

Samples Per Pixel - Negative values will subset image per frame

Questions?

Legal Notices and DisclaimersIntel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or

service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.

Performance tests, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

Copyright © 2016 Intel Corporation. All rights reserved. Intel, Intel Inside, the Intel logo, Intel Xeon and Intel Xeon Phi are trademarks of Intel Corporation in the United States and other countries. *Other names and brands may be claimed as the property of others.

Copyright © 2016 Intel Corporation, All Rights Reserved

29

Download - INTEL HPC DEVELOPER CONFERENCE Fuel Your Insight · PDF fileINTEL® HPC DEVELOPER CONFERENCE Fuel Your Insight ... IceT Compositing ... - All ranks not master run commands specified

Top Related