multi-gpu accelerated refraction-corrected reflection...

22
Multi-GPU Accelerated Refraction-Corrected Reflection Image Reconstruction for 3D Ultrasound Breast Imaging Qun (Maxine) Liu Martin Cwikla

Upload: others

Post on 31-Jul-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multi-GPU Accelerated Refraction-Corrected Reflection ...on-demand.gputechconf.com/gtc/2015/...Reconstruction Stage Single CPU time Single GPU time Single GPU speedup Two GPU time

Multi-GPU Accelerated Refraction-Corrected Reflection

Image Reconstruction for 3D Ultrasound Breast Imaging

Qun (Maxine) Liu Martin Cwikla

Page 2: Multi-GPU Accelerated Refraction-Corrected Reflection ...on-demand.gputechconf.com/gtc/2015/...Reconstruction Stage Single CPU time Single GPU time Single GPU speedup Two GPU time

Presentation Overview

Background

Motivation & Problem Statement

Technical Design

GPU Implementation

Results

Contact Information

Page 3: Multi-GPU Accelerated Refraction-Corrected Reflection ...on-demand.gputechconf.com/gtc/2015/...Reconstruction Stage Single CPU time Single GPU time Single GPU speedup Two GPU time

Scanner Introduction

For breast tissue evaluation; Quantitative transmission image;

Qualitative reflection image; No radiation;

Patient comfort improvement.

Figure 1 QTultrasound scanner

Background

Page 4: Multi-GPU Accelerated Refraction-Corrected Reflection ...on-demand.gputechconf.com/gtc/2015/...Reconstruction Stage Single CPU time Single GPU time Single GPU speedup Two GPU time

Scanner Geometry

Figure 2 Five scanning arrays mounted on the tri-channel

Figure 3 Geometry of three reflection arrays and a pair of transmission arrays

Background

Page 5: Multi-GPU Accelerated Refraction-Corrected Reflection ...on-demand.gputechconf.com/gtc/2015/...Reconstruction Stage Single CPU time Single GPU time Single GPU speedup Two GPU time

Vo

ltag

e

Transducer Receiver

d

Time t = 2d / c

Transmitted pulse

Echo from skin surface

Echo from lesion front face

Echo from lesion back face

Background

Data Acquisition

Page 6: Multi-GPU Accelerated Refraction-Corrected Reflection ...on-demand.gputechconf.com/gtc/2015/...Reconstruction Stage Single CPU time Single GPU time Single GPU speedup Two GPU time

B-mode Scan and Acoustic Wave Behavior

Figure 5 Acoustic wave behavior between the inhomogeneous border of two different mediums.

Figure 4 Sample of B-mode (brightness mode). Image driven.

Background

http://www.sonoguide.com/physics.html

Page 7: Multi-GPU Accelerated Refraction-Corrected Reflection ...on-demand.gputechconf.com/gtc/2015/...Reconstruction Stage Single CPU time Single GPU time Single GPU speedup Two GPU time

Image Reconstruction Algorithm Background

Transmission Image Preprocessing

Support Function Generation

Refraction-corrected Ray Tracing

Image Postprocessing

Mapping transmission results into reflection image space.

Signal gain control for attenuation.

Ray (data) driven B-mode back-projection tomography

Readability improvement

Page 8: Multi-GPU Accelerated Refraction-Corrected Reflection ...on-demand.gputechconf.com/gtc/2015/...Reconstruction Stage Single CPU time Single GPU time Single GPU speedup Two GPU time

Refraction-Corrected Ray Tracing Background

Eikonal equation: 𝑑

𝑑𝑠𝑛𝑑𝒓

𝑑𝑠= 𝛻𝑛

s: arc length along the ray; r: ray position vector in 3D; n: refractive index.

𝑛 𝒓 = 𝑐0 𝑐(𝒓)

Euler step method:

𝒓𝑖+1 = 𝒓𝑖 + ℎ ∙ 𝑢𝒓 𝑖+1 2

For i = 1,…

h: step length; u: unit tangent vector to the ray path.

Page 9: Multi-GPU Accelerated Refraction-Corrected Reflection ...on-demand.gputechconf.com/gtc/2015/...Reconstruction Stage Single CPU time Single GPU time Single GPU speedup Two GPU time

Compounding to Tomography Background

Page 10: Multi-GPU Accelerated Refraction-Corrected Reflection ...on-demand.gputechconf.com/gtc/2015/...Reconstruction Stage Single CPU time Single GPU time Single GPU speedup Two GPU time

Challenges for Parallel Computation

Sequential operation of refraction-corrected ray tracing

Each step of each ray depends

on the previous step’s position

and refractive index and the

current step’s refractive index;

Each pixel’s signal weighting is

contributed by multiple rays;

Each ray behavior is

unpredictable in terms of

position ranges.

File access speed limitation

Data writing and reading

between pipeline stages

allows for all the operations in

each stage to be computed

independently;

However, the data throughput

on an SSD or a hard drive

becomes a limiting factor.

Large amount of memory management

Unknowns: 32390540 pixels;

Acquired data: around 1.88

GB;

Computation data: around 9.4

GB.

Motivation

Page 11: Multi-GPU Accelerated Refraction-Corrected Reflection ...on-demand.gputechconf.com/gtc/2015/...Reconstruction Stage Single CPU time Single GPU time Single GPU speedup Two GPU time

Parallelism with multi-core CPU and GPU streaming processors Design

CPU multiple worker threads

. . . . . .

Work group

. . . View 30 View 1 View 6 …

. . . . . .

View 54 View 60 . . .

Probe 1 Probe 2 Probe 3

Level 140 Level 70 Level 1

Ray 1 Ray 94 Ray 192

SubRay 1 SubRay 2 SubRay 3

Refraction corrected ray tracing

Worker thread join()

. . . . . .

. . . . . .

Page 12: Multi-GPU Accelerated Refraction-Corrected Reflection ...on-demand.gputechconf.com/gtc/2015/...Reconstruction Stage Single CPU time Single GPU time Single GPU speedup Two GPU time

Ray Tracing in Parallel

3D sparse matrix

Dictionary of Keys (DOK) Pixel position-> key

Weight-> Value Time sample-> Value

Design

Page 13: Multi-GPU Accelerated Refraction-Corrected Reflection ...on-demand.gputechconf.com/gtc/2015/...Reconstruction Stage Single CPU time Single GPU time Single GPU speedup Two GPU time

Concurrent Operations of CPU and GPU

cudaMalloc (&dev, size); … cudaStream_t stream[nStreams]; For (int iStream = 1; iStream <= nStreams; ++iStream) cudaStreamCreate (&stream[iStream]); cudaEvent_t event; cudaEventCreate (&event); For (int iView = 1; iView < nViews; ++iView) { …read the data for iView cudaMemcpyAsync (dev, host, size, H2D, stream[iView]); kernel <<< grid, block, 0, stream[iView]>>> (…, dev, …); if (iView != nViews) { cudaEventRecord (event, stream[iView]); cudaStreamWaitEvent(stream[iView + 1], event, 0); } } cudaMemcpy (host, device, size, D2H); …destroy stream and event

Create streams for each view computation

Only one event needed to be created

Asynchronous with stream

Wait for the previous event done

Design

Page 14: Multi-GPU Accelerated Refraction-Corrected Reflection ...on-demand.gputechconf.com/gtc/2015/...Reconstruction Stage Single CPU time Single GPU time Single GPU speedup Two GPU time

Memory Contention Solution

View 1 View 2 View 3 View 4 View 5 View 6

Thread group

View 7 View 8

Finished Aborted

Sufficient memory available

Memory wait

Sleep (5)

Finished

Memory wait

Exception Thread group

View 20 View 45

Finished Finished Finished

Design

View 9

Page 15: Multi-GPU Accelerated Refraction-Corrected Reflection ...on-demand.gputechconf.com/gtc/2015/...Reconstruction Stage Single CPU time Single GPU time Single GPU speedup Two GPU time

Hardware Selection

Stability and reliability: long-term product;

High single precision floating-point

performance: 4.20 TFlops;

Large memory to support multiple CPU

worker threads operation: 12 GB.

Tesla K40 GPU

Implementation

Page 16: Multi-GPU Accelerated Refraction-Corrected Reflection ...on-demand.gputechconf.com/gtc/2015/...Reconstruction Stage Single CPU time Single GPU time Single GPU speedup Two GPU time

Software Architecture Design

Host Layer

GPU Layer

Interface Layer

Pipeline Layer

Interface to rest of production software

GUI Standalone

GPU kernels implementation

Manage GPU memory; CPU and GPU data transfer; Call GPU kernels and check kernel errors; schedule asynchronous operations; Support multi-GPUs

Object-oriented API for each stage of reconstruction algorithm

Read parameters, order of algorithm operations, etc, from a configuration file

Implementation

Page 17: Multi-GPU Accelerated Refraction-Corrected Reflection ...on-demand.gputechconf.com/gtc/2015/...Reconstruction Stage Single CPU time Single GPU time Single GPU speedup Two GPU time

Performance Test (individual functions)

Functions Single CPU time Single GPU time Speedup

Bilinear interpolation 1930.00 33.28 57.99X

Blurring filter (FFT included) 15660.00 129.17 121.24X

L1 norm fit third-order

polynomial

100.00 7.24 13.81X

Nearest points mapping 6850.00 39.98 171.34X

Compounding images 366800.00 4403.09 83.31X

Dynamic gain for images 910.01 42.88 21.22X

Note: The calculation of GPU time includes data transfer from host to device and back from device to host. All times are given in milliseconds.

Result

Page 18: Multi-GPU Accelerated Refraction-Corrected Reflection ...on-demand.gputechconf.com/gtc/2015/...Reconstruction Stage Single CPU time Single GPU time Single GPU speedup Two GPU time

Performance Test (overall)

Reconstruction Stage Single CPU time

Single GPU time

Single GPU speedup

Two GPU time

Two GPU speedup

Preprocessing and Support Function

34.09 9.36 3.64X 7.72 4.42X

Refraction-Corrected Ray Tracing

1899.98 63.29 30.02X 45.53 41.73X

Compounding Views 39.33 0.84 46.71X 0.84 46.71X

Entire Reflection Reconstruction

2108.40 79.16 26.63X 54.57 38.64X

Note: All times are presented in seconds.

Result

Page 19: Multi-GPU Accelerated Refraction-Corrected Reflection ...on-demand.gputechconf.com/gtc/2015/...Reconstruction Stage Single CPU time Single GPU time Single GPU speedup Two GPU time

Case Images 1: Multiple Cysts

Figure 9 Coronal, Axial and Sagittal images present multiple cysts.

Figure 10 Comparison with mammography, hand-held ultrasound

Result

Page 20: Multi-GPU Accelerated Refraction-Corrected Reflection ...on-demand.gputechconf.com/gtc/2015/...Reconstruction Stage Single CPU time Single GPU time Single GPU speedup Two GPU time

Case Images 2: Invasive Ductal Carcinoma

Figure 11 Coronal, Axial and Sagittal images present invasive ductal carcinoma

Figure 12 Comparison with mammography, hand-held ultrasound

Result

Page 21: Multi-GPU Accelerated Refraction-Corrected Reflection ...on-demand.gputechconf.com/gtc/2015/...Reconstruction Stage Single CPU time Single GPU time Single GPU speedup Two GPU time

Contact Information

Qun (Maxine) Liu: Scientist QTultrasound, LLC 3216 S Highland Drive, Suite 100, Salt Lake City, UT 84121 Email: [email protected] Personal Email: [email protected] Cell: 979-703-9475 Website: http://qunmaxineliu.weebly.com/

Martin Cwikla: Senior Software Engineer QTultrasound, LLC 3216 S Highland Drive, Suite 100, Salt Lake City, UT 84121 Email: [email protected] Personal Email: [email protected] Cell: 801-512-1027

Contact

Page 22: Multi-GPU Accelerated Refraction-Corrected Reflection ...on-demand.gputechconf.com/gtc/2015/...Reconstruction Stage Single CPU time Single GPU time Single GPU speedup Two GPU time

Thank you