accelerating real-time processing of the atst adaptive ... · vivek venugopal • atmospheric...

1
Accelerating Real-time Processing of the ATST Adaptive Optics System Vivek Venugopal Atmospheric turbulence distorts the wavefront by generating phase variations in the incoming light and limits the resolution of large solar telescopes such as the four meter solar telescope, Advanced Technology Solar Telescope (ATST) now beginning construction at Maui's Haleakala. HOAO Real-time system [1] S. L. Keil, T. R. Rimmele, J. Wagner, and ATST team. Advanced Technology Solar Telescope: A status report. Astronomische Nachrichten, 331:609–615, 2010. [2] V. Venugopal, et. al. Accelerating Real-time processing of the ATST Adaptive Optics System using Coarse-grained Parallel Hardware Architectures. In International Conference on Engineering of Reconfigurable Systems and Algorithms (ERSA 2011), pages 296–301, Las Vegas, USA, July 2011. [3] Nvidia Inc. (Last Accessed: February 2012) Nvidia Tesla C2050 GPU Computing Processor. [Online]. Available: http://www.nvidia.com/object/product_tesla_c2050_us.html Implementation platforms and Results The high speed camera sends 1750 20x20 pixel raw sub-aperture images to the processing system. The sub-apertures undergo a dark field correction followed by a flat field correction, which is the equivalent to correcting the images for zero level and gain equalization. The 2D cross-correlation step determines the shift in the x and y direction of each sub-aperture, as compared to the reference image. The wavefront reconstruction step consists of a precomputed 3500x1900 reconstruction matrix, which is multiplied with the x and y shifts. Nvidia's GPUs provide computational speedup as compared to the CPU implementation. Although DSPs are faster, using 48 DSPs are more expensive than 3 GPUs. GPUs provide flexibility in terms of problem scalability as it can handle the computational complexity and at the same time provides more FLOPS/$ as compared to DSPs. Introduction Advanced Technology Solar Telescope FPGA-DSP solution Conclusion References [email protected] Processors Uncorrected light Corrected light Tip/Tilt Mirror Deformable Mirror (DM) Beamsplitter Shack-Hartmann Lenslet Array CCD Camera DM drive signal Tilt drive signal Adaptive Optics system The adaptive optics (AO) system senses the wavefront aberrations and applies the corresponding correction to the adjustable deformable mirror to improve the resolution of the telescope. WFS Camera Cross- correlation slope computation Average slope Offscale slope detection Matrix multiply Actuator servos Data collection Zernike offload process Tip/Tilt servos X X Dark field Flat field Reference image Slope offsets Recon- struction matrix Actuator offsets Actuator gains Servo parameters Offscale slope tolerance Tip/Tilt mirror Deformable mirror Dark pixels 20x20 Raw pixels 20x20 flat correction Flat pixels 20x20 2D cross-correlation find maximum dark correction Reference pixels 20x20 x and y interpolation GPU FPGA reconstruction FFT FFT Complex conjugate Multiplication IFFT reference image flat corrected image original reference image 26x26 pixels precomputed reference (20x20 pixels) Precomputed Reference pixels 20x20 (49 regions) precomputed reference (20x20 pixels) Region 1 Region 2 precomputed reference (20x20 pixels) Region 49 Camera data half Camera data half FPGA 1 FPGA 2 12 optical fiber channels 12 optical fiber channels PCI-e bus GPU/CPU Camera data half Camera data half FPGA 1 FPGA 2 12 optical fiber channels 48 DSPs 12 channels 12 channels 12 optical fiber channels FPGA-GPU solution x y 1750 1750 x x and y shifts for 1750 sub-aperture images 1900 3500 reconstruction matrix 1900x3500 1900 accumulated values for 1900 actuators Process partitioning and mapping FFT correlation 7x7 correlation 1900x3500 reconstruction matrix 0 550 1100 1650 2200 1 50 1889 1510 1619 1188 Time in us No. of images Tesla C1060 Tesla C2050 FFT correlation results 0 100 200 300 400 1 50 584 300.93 307.49 312.9 281.39 279.35 278.36 Time in us No. of images Tesla C1060 Tesla C2050 7x7 correlation results 1 10 100 1000 10000 100000 46769 229 956 964 Time in us Tesla C1060 Tesla C2050 DSP CPU Wavefront reconstruction results Test platform: 2 quad-core 2.6GHz AMD processors running Ubuntu Linux OS with Nvidia Tesla C1060 and C2050 Implementation strategy: 48 DSPs or GPUs? 7x7 correlation faster than FFT correlation on GPUs Compute-intensive

Upload: others

Post on 16-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Accelerating Real-time Processing of the ATST Adaptive ... · Vivek Venugopal • Atmospheric turbulence distorts the wavefront by generating phase variations in the incoming light

Accelerating Real-time Processing of the ATST Adaptive Optics SystemVivek Venugopal

• Atmospheric turbulence distorts the wavefront by generating phase variations in the incoming light and limits the resolution of large solar telescopes such as the four meter solar telescope, Advanced Technology Solar Telescope (ATST) now beginning construction at Maui's Haleakala.

HOAO Real-time system

[1] S. L. Keil, T. R. Rimmele, J. Wagner, and ATST team. Advanced Technology Solar Telescope: A status report. Astronomische Nachrichten, 331:609–615, 2010. [2] V. Venugopal, et. al. Accelerating Real-time processing of the ATST Adaptive Optics System using Coarse-grained Parallel Hardware Architectures. In International Conference on Engineering of Reconfigurable Systems and Algorithms (ERSA 2011), pages 296–301, Las Vegas, USA, July 2011.[3] Nvidia Inc. (Last Accessed: February 2012) Nvidia Tesla C2050 GPU Computing Processor. [Online]. Available: http://www.nvidia.com/object/product_tesla_c2050_us.html

Implementation platforms and Results

• The high speed camera sends 1750 20x20 pixel raw sub-aperture images to the processing system. • The sub-apertures undergo a dark field correction followed by a flat field correction, which is the equivalent to correcting the images for zero level and gain equalization. • The 2D cross-correlation step determines the shift in the x and y direction of each sub-aperture, as compared to the reference image. • The wavefront reconstruction step consists of a precomputed 3500x1900 reconstruction matrix, which is multiplied with the x and y shifts.

• Nvidia's GPUs provide computational speedup as compared to the CPU implementation. • Although DSPs are faster, using 48 DSPs are more expensive than 3 GPUs.• GPUs provide flexibility in terms of problem scalability as it can handle the computational complexity and at the same time provides more FLOPS/$ as compared to DSPs.

Introduction

Advanced Technology Solar Telescope

FPGA-DSP solution

Conclusion

References

[email protected]

Processors

Uncorrected light

Corrected light

Tip/Tilt Mirror

DeformableMirror (DM)

Beamsplitter

Shack-Hartmann Lenslet Array

CCD Camera

DM drive signal

Tilt drive signal

Adaptive Optics system

•The adaptive optics (AO) system senses the wavefront aberrations and applies the corresponding correction to the adjustable deformable mirror to improve the resolution of the telescope.

WFS Camera

Cross-correlation

slope computation

Average slope

Offscale slope

detectionMatrix

multiplyActuator servos

Data collection

Zernike offload process

Tip/Tilt servos

X X

Dark field Flat field Reference

imageSlope offsets

Recon-struction matrix

Actuator offsets

Actuator gains

Servo parameters

Offscale slope

tolerance

Tip/Tilt mirror

Deformable mirror

Dark pixels20x20

Raw pixels20x20

flat correction

Flat pixels20x20

2D cross-correlation find maximum

dark correction

Reference pixels20x20

x and y interpolation

GPUFPGA

reconstruction

FFTFFT

Complex conjugate Multiplication

IFFT

reference image

flat corrected

imageoriginal reference

image 26x26 pixels

precomputed reference

(20x20 pixels)

Precomputed Reference pixels 20x20 (49 regions)

precomputed reference

(20x20 pixels)

Region 1

Region 2

precomputed reference

(20x20 pixels)Region 49

Camera data half

Camera data half

FPGA 1

FPGA 2

12 optical fiber

channels

12 optical fiber

channels

PCI-e

bus

GPU/CPU

Camera data half

Camera data half

FPGA 1

FPGA 2

12 optical fiber

channels

48 DSPs

12 channels

12 channels

12 optical fiber

channels

FPGA-GPU solution

x y

1750 1750x

x and y shifts for 1750 sub-aperture images

1900

3500

reconstruction matrix 1900x3500

1900

accumulated values for 1900 actuators

Process partitioning and mapping

FFT correlation

7x7 correlation

1900x3500 reconstruction matrix

0

550

1100

1650

2200

1 50

1889

15101619

1188

Tim

e in

us

No. of images

Tesla C1060Tesla C2050

FFT correlation results

0

100

200

300

400

1 50 584

300.93307.49312.9281.39279.35278.36

Tim

e in

us

No. of images

Tesla C1060Tesla C2050

7x7 correlation results

1

10

100

1000

10000

100000 46769

229

956964

Tim

e in

us

Tesla C1060Tesla C2050DSPCPU

Wavefront reconstruction results

• Test platform: 2 quad-core 2.6GHz AMD processors running Ubuntu Linux OS with Nvidia Tesla C1060 and C2050 • Implementation strategy: 48 DSPs or GPUs?• 7x7 correlation faster than FFT correlation on GPUs

Compute-intensive