distributed mri reconstruction using gadgetron-based cloud computing

FULL PAPER

Distributed MRI Reconstruction Using Gadgetron-BasedCloud Computing

Hui Xue,1* Souheil Inati,2 Thomas Sangild S�rensen,3 Peter Kellman,4 and

Michael S. Hansen1

Purpose: To expand the open source Gadgetron reconstruction

framework to support distributed computing and to demonstratethat a multinode version of the Gadgetron can be used to provide

nonlinear reconstruction with clinically acceptable latency.Methods: The Gadgetron framework was extended with newsoftware components that enable an arbitrary number of

Gadgetron instances to collaborate on a reconstruction task.This cloud-enabled version of the Gadgetron was deployed onthree different distributed computing platforms ranging from a

heterogeneous collection of commodity computers to thecommercial Amazon Elastic Compute Cloud. The Gadgetron

cloud was used to provide nonlinear, compressed sensingreconstruction on a clinical scanner with low reconstructionlatency (eg, cardiac and neuroimaging applications).

Results: The proposed setup was able to handle acquisitionand 11-SPIRiT reconstruction of nine high temporal resolution

real-time, cardiac short axis cine acquisitions, covering theventricles for functional evaluation, in under 1 min. A three-dimensional high-resolution brain acquisition with 1 mm3 iso-

tropic pixel size was acquired and reconstructed with nonlin-ear reconstruction in less than 5 min.

Conclusion: A distributed computing enabled Gadgetron pro-vides a scalable way to improve reconstruction performanceusing commodity cluster computing. Nonlinear, compressed

sensing reconstruction can be deployed clinically with lowimage reconstruction latency. Magn Reson Med 000:000–000, 2014. VC 2014 Wiley Periodicals, Inc.

Key words: Gadgetron; distributed computing; nonlinear MRIreconstruction; open-source software

INTRODUCTION

Image reconstruction algorithms are an essential part ofmodern medical imaging devices. The complexity of thereconstruction software is increasing due to the compet-

ing demands of improved image quality and shortened

acquisition time. In the field of MRI, in particular, the

image reconstruction has advanced well beyond simple

fast Fourier transforms to include parallel imaging (1–5),

nonlinear reconstruction (6,7), and real-time reconstruc-

tion (8,9). The increased interest in three-dimensional

(3D) acquisitions and non-Cartesian sampling has

boosted the computational demands further. For appli-

cations where lengthy acquisition and reconstruction

time is prohibited by physiological motion, biological

processes, or limited patient cooperation, the reconstruc-

tion system is under further pressure to deliver images

with low latency in order to keep up with the ongoing

study.While the need for fast image reconstruction is grow-

ing, most published reconstruction algorithms, especiallythose relying on iterative solvers—such as image domaincompressive sensing (6,10–12) and k-space SPIRiT andits regularized versions (7,13,14)—do not come with theefficient reference implementations that would enableclinical use. In many cases, the algorithms are not imple-mented for online use on clinical scanners. Even if thedevelopers would like to integrate their reconstructionalgorithms for online use, the vendor-provided hardwareand software platform may have inadequate specifica-tions for a demanding reconstruction, or the availableprogramming window may be unsuited for integration ofnew reconstruction schemes. Consequently, there is agap between the number of new algorithms being devel-oped and published and the clinical testing and valida-tion of these algorithms. Undoubtedly, this is having animpact on the clinical adoption of novel nonlinear recon-struction approaches (eg, compressed sensing).

We have previously introduced an open-source plat-form for medical imaging reconstruction algorithmscalled the Gadgetron (15), which aims to partiallyaddress the aforementioned concerns. This platform isfreely available to the research community and industrypartners. It is platform independent and flexible for bothprototyping and commercial development. Moreover,interfaces to several commercial MR scanners have beendeveloped and are being shared in the research commu-nity. This simplifies the online integration of new recon-struction algorithms significantly, and the newalgorithms in research papers can be tested in clinicalsettings with less implementation effort. As a result,some groups have used Gadgetron for online implemen-tation and evaluation of their reconstruction methods(16–18). Since the publication of the first version of theGadgetron, the framework has adopted a vendor-independent raw data format, the ISMRM Raw Data

1Magnetic Resonance Technology Program, National Heart, Lung andBlood Institute, National Institutes of Health, Bethesda, Maryland, USA.2National Institute of Mental Health, National Institutes of Health, Bethesda,Maryland, USA.3Department of Computer Science and Department of Clinical Medicine,Aarhus University, Aarhus, Denmark.4Medical Signal and Image Processing Program, National Heart, Lung andBlood Institute, National Institutes of Health, Bethesda, Maryland, USA.

Grant sponsor: National Heart, Lung and Blood Institute IntramuralResearch Program.

*Correspondence to: Hui Xue, Magnetic Resonance Technology Program,National Heart, Lung and Blood Institute, National Institutes of Health, 10Center Drive, Bethesda, MD 20814. E-mail: [email protected]

Additional Supporting Information may be found in the online version ofthis article.

Received 11 December 2013; revised 28 January 2014; accepted 18February 2014

DOI 10.1002/mrm.25213Published online 00 Month 2014 in Wiley Online Library (wileyonlinelibrary.com).

Magnetic Resonance in Medicine 00:00–00 (2014)

VC 2014 Wiley Periodicals, Inc. 1

(ISMRMRD) format (19). This further enables sharing ofreconstruction algorithms.

Although this concept of an open-source platform anda unified ISMRMRD format shows great potential, theoriginal Gadgetron framework did not support distrib-uted computing across multiple computational nodes.Although the Gadgetron was designed for high perform-ance (using multiple CPU cores or graphics processingunits [GPUs]), it was originally implemented to operatewithin a single node or process. Distributed computingwas not integral to the design. Because reconstructionalgorithms increase in complexity, they may need com-putational power that would not be economical toassemble in a single node. The same considerations haveled to the development of commodity computing clusterswherein a group of relatively modest computers areassembled to form a powerful computing cluster. Anexample of such a cluster system is the National Insti-tutes of Health (NIH) Biowulf Cluster (http://biowulf.nih.gov). Recently, commercial cloud-based services havealso offered the ability to configure such commoditycomputing clusters on demand and rent them by thehour. The Amazon Elastic Compute Cloud (EC2) is anexample of such a service (http://aws.amazon.com/ec2).

In this paper, we propose to extend the Gadgetronframework to enable cloud computing on multiplenodes. With this extension, named Gadgetron Plus (GT-Plus), any number of Gadgetron processes can be startedat multiple computers (referred to as “nodes”), and adedicated interprocess controlling scheme has beenimplemented to coordinate the Gadgetron processes torun on multiple nodes. A large MRI reconstruction taskcan be split and run in parallel across these nodes. Thisextension to distributed computing maintains the origi-nal advantages of the Gadgetron framework. It is freelyavailable and remains platform-independent. As demon-strated in this paper, the nodes can even run differentoperating systems (eg, Windows or different distributionsof Linux) and have different hardware configurations.

The implemented architecture allows the user to setup a GT-Plus cloud in several different ways. Specifi-cally, it does not require a dedicated professional cloudcomputing platform. The GT-Plus cloud can be deployedon setups ranging from an arbitrary collection of net-worked computers in a laboratory (we refer to this as a“casual cloud”) to high-end commercial cloud systems,as demonstrated herein. In this paper, we demonstratethe GT-Plus cloud setup in three different scenarios anddemonstrate its flexibility to cope with variable computa-tional environments. The first cloud setup is a casualcloud consisting of seven personal computers situated inour laboratory at the National Institutes of Health. Thesecond setup uses NIH’s Biowulf Cluster (20). The lastconfiguration is deployed on the commercial AmazonElastic Compute Cloud (Amazon EC2) (21).

To demonstrate the benefits of this extension, we usedthe cloud-enabled version of the Gadgetron to implementnonlinear 11-SPIRiT (7) for two-dimensional (2D) time-resolved (2Dþt) and 3D imaging applications. We dem-onstrate that cardiac cine imaging with nine slices cover-ing the entire left ventricle (32 channels; acquiredtemporal resolution¼50 ms; matrix size¼ 192 � 100;

acceleration factor¼5, 1.5 s per slice) can be recon-structed with 11-SPIRiT with a reconstruction time(latency <30 s) that is compatible with clinical work-flow. Similarly, we demonstrate high-resolution 3D iso-tropic brain scans (20 channels; matrix size¼ 256 � 256� 192; acceleration factor 3 � 2; 1 mm3 isotropicacquired resolution), can be reconstructed with nonlin-ear reconstruction with clinically acceptable latency(<2.5 min). For both cases, significant improvement ofimage quality is achieved with nonlinear reconstructioncompared to the linear GRAPPA results.

In the following sections, details of the GT-Plus designand implementation are provided. Specific source codecomponents such as Cþþ classes, variables, and func-tions are indicated with boldface font (eg,GadgetCloudController).

METHODS

Architecture and Implementation

In the following sections, we will first briefly review theGadgetron architecture and the extensions that havebeen made to this architecture to enable cloud comput-ing. Subsequently, we will describe two specific types ofMRI reconstructions (2D time-resolved imaging and 3Dimaging) that have been deployed on this architecture.

Gadgetron Framework

The Gadgetron framework is described in detail by Han-sen and S�rensen (15). Here, we briefly review the data-flow for comparison with the cloud-based dataflowintroduced below. As shown in Fig. 1, a Gadgetronreconstruction process consists of three components:Readers, Writers, and Gadgets. A Reader receives anddeserializes the incoming data sent from the client (eg, aclient can be the MR scanner). A Writer serializes thereconstruction results and sends the data packages to theclient. The Gadgets are connected to each other in astreaming framework as processing chains.

The Gadgetron maintains the communication with cli-ents using a TCP/IP socket–based connection. The typi-cal communication protocol of the Gadgetron process isas follows:

1. The client issues the connection request to theGadgetron server at a specific network port.

2. The Gadgetron accepts the connection and estab-lishes the TCP/IP communication.

3. The client sends an XML-based configuration file tothe Gadgetron server. This XML configuration out-lines the Readers, Writers, and Gadgets to assemblea Gadget chain.

4. The Gadgetron server loads the required Readers,Writers, and Gadgets from shared libraries as speci-fied in the configuration file.

5. The client sends an XML-based parameter file tothe Gadgetron. The Gadgetron can initialize itsreconstruction computation based on the parameters(eg, acquisition matrix size, field of view, accelera-tion factor, etc). For MRI, this XML parameter file isgenerally the ISMRMRD XML header describing theexperiment.

2 Xue et al.

http://biowulf.nih.gov

http://biowulf.nih.gov

http://aws.amazon.com/ec2

6. The client sends every piece of readout data to theGadgetron in the ISMRMRD format. These data aredeserialized by the Reader and passed through theGadget chain.

7. The reconstruction results are serialized by theWriter and sent back to the client via the socketconnection.

8. When the end of acquisition is reached, the Gadget-ron closes down the Gadget chain and closes theconnection when the last reconstruction result hasbeen passed back to the client.

GT-Plus: Distributed Computing Extension of Gadgetron

A schematic outline of GT-Plus extension is shown inFig. 2. A distributed Gadgetron process has at least oneGadgetron running on a specific port for each node (mul-tiple Gadgetron processes can run within the same node

at different ports). A software module, GadgetCloudCon-troller, manages the communication between nodes. Typ-ically, the gateway node receives the readout data fromthe client and deserializes them using a Reader. Depend-ing on the reconstruction workflow, the gateway nodemay buffer the incoming readouts and perform someprocessing before sending reconstruction jobs to the con-nected nodes, or data can be forwarded directly to theclient nodes. GadgetCloudController maintains multipleTCP/IP socket connections with every connected nodevia a set of GadgetronCloudConnector objects (one foreach connected node). Each GadgetronCloudConnectorhas a Reader thread (CloudReaderTask) and a Writerthread (CloudWriterTask) that are responsible for receiv-ing and sending data (to the node), respectively. There isa Gadgetron Gadget chain running on every connectednode. The gateway node will send the XML configura-tion to the connected node to assemble the chain. For

FIG. 2. Example presentation of GT-Plus distributed computing architecture for Gadgetron. In this example, at least one Gadgetron pro-cess is running on a node (Multiple Gadgetron processes can run in one node on different ports). The gateway node communicates

with the client application (eg, MRI scanner). It manages the connections to each of the computing nodes via the software moduleGadgetCloudController and GadgetronCloudConnector. Whenever sufficient data is buffered on the gateway node, the package can be

sent to the computing node for processing. Different computing nodes can run a completely different reconstruction chain. [Color figurecan be viewed in the online issue, which is available at wileyonlinelibrary.com.]

FIG. 1. Schematic presentation of Gadgetron architecture. The client application communicates with the Gadgetron process via theTCP/IP connection. The Reader deserializes the incoming data and passes them through the Gadget chain. The results are serialized by

the Writer and sent back to the client application. The Gadgetron process resides on one computer, the Gadgetron server. The clientcan be an MRI scanner or other customer processes. Note that the algorithm modules implemented in the Gadgetron toolboxes areindependent of the streaming framework; therefore, these functionalities can be called by stand-alone applications that are not using

the streaming framework. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Gadgetron C-Bud Computing 3

different cloud nodes, different Gadget chains can beassembled. In fact, the connected nodes can also be gate-way nodes, thus creating a multi-tiered cloud. The typi-cal protocol of GT-Plus distributed computing is asfollows:

1. The client issues the connection request to the GT-Plus gateway at a specific network port. Once theconnection is established, the client will send theXML based configuration and parameter files toestablish and initialize the gadget chain at the gate-way node.

2. The client starts sending readout data to the gate-way node.

3. The Gateway node establishes a connection to cloudnodes and supplies them with XML configurationand parameter files. As mentioned, different con-nected nodes can be configured with differentchains if needed.

4. Job packages are sent to connected cloud nodes forprocessing via the corresponding CloudWriterTask.

5. When all jobs are sent to nodes (ie, when the acqui-sition is done), the gateway Gadgetron either waitsfor reconstruction results or conducts extra compu-tation. The ReaderTask objects, at the same time,listen for reconstruction results. Whenever thereconstruction results are sent back from a cloudnode, the gateway Gadgetron is notified by theReaderTask object and will take user-definedactions (eg, passing the results downstream or wait-ing for the completion of all jobs). Finally, the gate-way node will proceed to finish other processingsteps (if any) and send the images down the remain-ing part of the gateway Gadget chain and eventuallyback to the client.

6. If one or more connected nodes fail to complete ajob successfully, the gateway node will be notifiedby either receiving an invalid result package ordetecting a shutdown message on the socket. TheGadgetCloudController on the gateway node willkeep a record of uncompleted jobs and processthem locally. In this way, the GT-Plus gains robust-ness against network instability.

The software modules mentioned above are imple-mented as Cþþ template classes and are capable of han-dling any type of custom job packages (ie, the user isable to configure the data types that are sent andreceived from the cloud nodes). The only requirement isto implement appropriate functions for serialization anddeserialization of work packages and results. This designis a straightforward extension of the Readers and Writersin the original Gadgetron.

In the current implementation, every computing nodecan be given an index indicating its computational power.This index is used to allow the gateway node to apply ascheduling algorithm where the workload is distributedto the cloud nodes in proportion to their computationalcapabilities. This can be important if the cloud consists ofa heterogeneous set of hardware configuration.

To supply the network connection information ofcloud nodes to the Gadgetron, the user can specify IPaddresses or hostnames of nodes in the gateway XML

configuration file. Alternatively, the framework can readin a cloud structure definition file.

The parallel computing and communication protocolsused in this implementation are based directly on theexisting communications protocols within the Gadgetron.In the previous versions of the software, the Gadgetronprocess itself was seen as a server, but in the current ver-sion, a Gadgetron process can also act as a client andconnect to other Gadgetron processes. Like the previousversion of Gadgetron software, the Adaptive Communica-tion Environment (22) package is used for TCP/IP net-work communication between clients and servers. Theconnection between gateway and computing nodes ismaintained using an Adaptive Communication Environ-ment socket on a certain port. Whenever a job package isto be sent to a node, the corresponding CloudWriterTasktakes the package and forwards it on the socket to a con-nected Gadgetron node. The ReaderTask on the gatewaynode is always listening to the TCP/IP port for recon-struction results. Whenever a result package arrives, theReaderTask receives it and notifies the GadgetCloudCon-troller to process the incoming result package. To easethe thread synchronization between ReaderTask/Writer-Task and GadgetCloudController, all three componentsare implemented as the active objects (23) with a mes-sage queue attached to each as the buffer for inbound/outbound communication.

GT-Plus for 2Dþt Reconstruction Tasks

In addition to the software modules for cloud communi-cation, the GT-Plus extensions also include a specific jobtype to support 2Dþt reconstruction tasks (eg, multislicecine imaging). This workflow is used here to illustrate acloud setup with a dual-layer topology, as illustrated inFig. 3. The gateway gadget chain will buffer readouts fora specified ISMRMRD dimension (eg, for the multislicecine, it is usually the slice dimension). Once all data fora slice have arrived, the GtPlusRecon2DTGadgetCloudgadget will forward the job package to a first-layer nodeto start the reconstruction.

For a 2Dþt dynamic imaging task, one slice will havemultiple 2D k-spaces. For example, for the cardiac cineacquisition, multiple cardiac phases are usuallyacquired. The first-layer node responsible for the recon-struction of a given slice can choose to further distributethe dataset to a set of subnodes in the second layer. Thefirst-layer nodes can serve solely to distribute jobs orthey can perform computation as well. In principle, agiven reconstruction job can utilize an arbitrary numberof node layers to form a more complicated cloudtopology.

GT-Plus for 3D Reconstruction Tasks

For 3D acquisitions, a single layer cloud topology wasused in this paper. The gateway node’s GtPlusRecon3DT-Gadget receives the 3D acquisition and performs theprocessing such as coil compression and estimation of k-space convolution kernel for parallel imaging. It thensplits the large 3D reconstruction problem by performinga one-dimensional inverse FFT transform along the read-out direction. Thus, the reconstruction is decoupled

4 Xue et al.

along the readout direction. Every piece of data alongthe readout direction is then sent to a connect node fornonlinear reconstruction. The gateway node will wait forall jobs to complete and results to return. It then reas-sembles the 3D volume from all pieces and continues toperform other postprocessing tasks (eg, k-space filteringtasks) and finally returns images to the client.

Toolbox Features

The Gadgetron is divided into Gadgets and “toolbox”algorithms that can be called from the Gadgets or stand-alone applications. In addition to the toolbox featuresdescribed by Hansen and S�rensen (15), the GT-Plusextensions add additional toolboxes. Here is an incom-plete list of key algorithm modules:

1. 2D/3D GRAPPA. A GRAPPA implementation for 2Dand 3D acquisition is added. It fully supports theISMRMRD data format and different parallel imag-ing modes [embedded, separate, or interleaved auto-calibration lines, as defined by Hansen (19)]. Forthe 3D GRAPPA case, the implementation supportstwo-dimensional (2D) acceleration. To support par-allel imaging in interactive or real-time applications,a real-time high-throughput 2D GRAPPA implemen-tation using a GPU is also provided in theGadgetron.

2. 2D/3D Linear SPIRiT. A linear SPIRiT (7) recon-struction is implemented in the Gadgetron toolbox.Specifically, if the k-space x consists of filled pointsa and missing points m, then:

x ¼ DTaþDTc m [1]

Here, x is a vector containing k-space points for allphase encoding lines for all channels, a stores theacquired points, and m is for missing points. D andDc are the sampling pattern matrixes for acquiredand missing points. Linear SPIRiT solves the follow-ing equation:

G�Ið ÞDTc m ¼ � G�Ið ÞDTa [2]

Here, G is the SPIRiT kernel matrix, which is com-puted from a fully sampled set of autocalibrationdata.

3. 2D/3D Nonlinear 11-SPIRiT. Equation 2 is extendedby the L1 term:

argminmfjjðG� IÞ ðDTaþDTc mÞjj2

þ ljjWWCHFHðDTaþDTc mÞjj1g [3]

Another variation of 11-SPIRiT is to treat the full k-space as the unknowns:

argminx jj G� Ið Þxjj2 þ ljjWWCHFHxjj1 þ bjjDx � ajj2� �

[4]

Here, W is the sparse transform and W is an extraweighting matrix applied on the computed sparsecoefficients to compensate for nonisotropic resolu-tion or temporal redundancy. F is the Fourier trans-form matrix. C is the coil sensitivity.

4. Redundant 2D and 3D wavelet transform. Theredundant wavelet transform for 2D and 3D data

FIG. 3. The dual-layer topology of cloud computing for 2Dþt applications. Every connected node contains its own GadgetCloudControl-ler and can split a reconstruction task for a slice to its subnodes. Compared with the basic cloud topology shown in Fig. 2, this dual-layer example adds extra complexity. This example demonstrates the flexibility of GT-Plus architecture for composing different comput-

ing cloud topologies to fit needs of different reconstruction tasks.


arrays are implemented in the toolbox. It isused in the L1 regularization term. A fast imple-mentation for redundant Harr wavelet is alsoprovided.

Example Applications

Corresponding to the two types of cloud topologiesdescribed in the previous sections, two in vivo experi-ments were performed on healthy volunteers. The localInstitutional Review Board approved the study, and allvolunteers gave written informed consent.

Real-Time Multislice Myocardial Cine Imaging Using11-SPIRiT

The aim was to make it clinically feasible to assess myo-cardial function using real-time acquisitions and nonlin-ear reconstruction covering the entire ventricles. Withconventional reconstruction hardware, the reconstructiontime for such an application would prohibit clinicaladoption. The dual-layer cloud topology was used here,and every slice was sent to a separate node (first-layer)that further splits cardiac phases into multiple pieces.While processing one piece itself, the first-layer nodealso sent others to its subnodes for parallel processing.The algorithm workflow was as follows: 1) Under-sampled k-space data were acquired with the time-interleaved sampling pattern. 2) The gateway nodereceived the readout data and performed the on-the-flynoise prewhitening (5). 3) The data from one slice wassent to one first-layer node. 4) To reconstruct the under-

lying real-time cine images, the autocalibration signal(ACS) data for a slice were obtained by averaging allundersampled k-space frames at this slice. 5) The SPIRiTkernel was estimated on the assembled ACS data. 6) Thedata for the slice were split into multiple pieces and sentto subnodes, together with the estimated kernel. The sizeof a data piece for a node was proportional to its com-puting power index. 7) Subnodes received the data pack-age and solved Eq. 3. The linear SPIRiT problem (Eq. 2)was first solved to initialize the nonlinear solver. 8)Once the process for a slice was completed, the nodesent the reconstructed frames back to gateway, whichthen returned them to the scanner. Note the reconstruc-tion for a given slice started while acquisition was stillongoing for subsequent slices. Thus the data acquisitionand processing was overlapped in time to minimize theoverall waiting time after the data acquisition.

Figure 4 outlines the 11-SPIRiT reconstruction gadgetchain for multislice cine imaging using the dual-layercloud topology. The raw data first went through theNoiseAdjustGadget, which performed noise prewhiten-ing. After removing the oversampling along the readoutdirection, the data were buffered in the Accumulator-Gadget. Whenever the acquisition for a slice wascomplete, the data package was sent to the GtPlusRe-con2DTGadgetCloud gadget, which established the net-work connection to the first-layer nodes and sent thedata. The first-layer node ran only the GtPlusRecon-Job2DTGadget gadget, which then connected to thesecond layer nodes. The CloudJobReader and CloudJob-Writer were used to serialize and deserialize the cloudjob package.

FIG. 4. The gadget chain used in the multislice myocardial cine imaging. Here the GtPlusRecon2DTGadgetCloud gadget controls the

connection to the first-layer node where the data from a slice was reconstructed. The second layer subnodes were further utilized tospeed up the reconstruction. The TCP/IP socket connections were established and managed by the cloud software module imple-mented in GT-Plus. The dotted line indicates the internode connection, and the solid line indicates the data flow within the gateway and

computing nodes. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

6 Xue et al.

Imaging experiments were performed on a 1.5Tclinical MRI system (MAGNETOM Aera, Siemens AGHealthcare Sector, Erlangen, Germany) equipped with a32-channel surface coil. A healthy volunteer (female,23.8 y) was scanned. Acquisition parameters for free-breathing cine were as follows: balanced SSFP readout;repetition time/echo time¼ 2.53/1.04 ms; acquiredmatrix size¼ 192 � 100; flip angle¼60�; field of view-¼ 320 � 240 mm2; slice thickness¼ 8 mm with a gap of2 mm; bandwidth¼ 723 Hz/pixel; interleaved acquisitionpattern with acceleration factor R¼ 5. The whole leftventricule was covered by nine slices and acquisitionduration for every slice was �1.5 s with one dummyheartbeat between slices. The scan time (defined as thetime to perform data acquisition) to complete all nine sli-ces was 22.6s.

High-Resolution Neuroimaging Using 11-SPIRiT

The second example aimed to use GT-Plus for nonlinearreconstruction of a high-resolution 3D acquisition. Thealgorithm workflow was as follows: 1) Undersampled k-space data were acquired with the fully sampled centerregion. 2) The gateway node received the readout dataand performed the on-the-fly noise prewhitening. 3)When all data were received, the SPIRiT kernel calibra-tion was performed on the fully sampled ACS region.The kernel was zero-padded only along the readoutdirection and Fourier-transformed to the image domain.This was done to reduce the maximal memory needed tostore the image domain kernel and decrease the amountof data transferred over the network to connected nodes.The k-space data were transformed to image domainalong the readout as well. 4) The gateway node split theimage domain kernel and data into multiple pieces alongthe readout direction and sent them to multiple con-nected nodes. 5) The connected nodes received the pack-ages and zero-padded kernels along the remaining twospatial dimensions and transformed into image domain.The image domain kernel was applied to the aliasedimages by pixel-wise multiplication. This linear recon-struction was performed to initialize the nonlinear recon-struction. 6) After receiving all reconstructed imagesfrom connected nodes, the gateway assembled the 3Dvolume and performed postprocessing tasks such as k-space filtering and then sent results back to the scanner.

The imaging experiments were performed on a 3T clin-ical MRI system (MAGNETOM Skyra; Siemens AGHealthcare Sector, Erlangen, Germany) equipped with a20-channel head coil. The acquisition parameters wereas follows: gradient echo readout; repetition time/echotime¼ 10.0/3.11 ms; acquired matrix¼ 256 � 256 � 192;flip angle¼ 20�; isotropic spatial resolution¼ 1 mm3;bandwidth¼ 130 Hz/pixel; 2D acceleration factor R¼ 3 �2. The embedded parallel imaging mode was used withthe ACS signal acquired as a 32 � 32 fully sampledregion. The total acquisition time was 91.6 s.

Deployment and Scanner Integration

The cloud extension of Gadgetron (GT-Plus) can bedeployed on different types of platform. Three setupshave been tested here for in vivo experiments, and the

GT-Plus software can be deployed on those setups with-out any code changes. The first setup (referred to as the“casual cloud”) is a heterogeneous collection of com-puters (eg, as one would find in many MRI research lab-oratories). These computers have different hardware andoperating system configurations. The second setup is theNIH Biowulf cluster, which is a custom-built clusterwith a total of 2300 nodes (>12,000 computing cores); itis a shared system in which users request a specificamount of resources for a given computational task. Thethird setup is the Amazon Elastic Compute Cloud (EC2),in which an arbitrary amount of computational powercan be rented by the hour providing the flexibility to tai-lor the cloud configuration to suit a specific application.

Casual Cloud

The first setup is a solution that almost any MRI researchlaboratory would be able to use by installing the Gadget-ron on a set of networked computers and using one ofthese computers as the gateway node. No specific operat-ing system or special hardware is needed. Here, six per-sonal computers on the NIH intranet (1Gb/s connections)were used as the cloud nodes, and two more computerswere used as gateway nodes for 2Dþt and 3D experi-ments, respectively. The Gadgetron software was com-piled and installed on all computers. The gateway nodefor 2Dþt test was a desktop computer, running Windows7 Pro 64-bit (four core Intel Xeon E5-2670 2.60-GHz proc-essers, 48 GB DDR3 RAM). The gateway node for the 3Dtest also ran the same operating system (two eight-coreIntel Xeon E5-2670 2.60-GHz processers, 192 GB DDR3RAM).

Among the six cloud nodes, two of them were runningWindows 7 Pro 64-bit (each had four cores of Intel XeonE5-2670 2.60 GHz, and 48 GB DDR3 RAM). UbuntuLinux 12.04 were running on other four nodes (twonodes each had six cores of Intel Xeon E5645 2.4 GHzand 24 GB DDR3 RAM; the other two had four cores ofIntel Xeon X5550 2.67 GHz and 24 GB DDR3 RAM).

For the dual-layer cloud test, one Windows and twoUbuntu computers served as first-layer nodes. The otherthree nodes were on the second layer. Each of them wasconnected to one first-layer node. For the 3D reconstruc-tion test, all six nodes were connected directly to thegateway.

NIH Biowulf Cloud

The second cloud setup tested in this study utilized theNIH Biowulf cluster. NIH Biowulf system is a GNU/Linux parallel processing system designed and built atthe NIH. Biowulf consists of a main login node and alarge number of computing nodes. The computing nodeswithin Biowulf are connected by a 1-Gb/s network. TheMRI scanner used in this study was also connected tothe Biowulf system with a 1-Gb/s network connection.For the 2D multislice cine experiments, 37 nodes wererequested from the cluster. The gateway node had 16cores (two eight-core Intel Xeon E5-2670 2.60-GHz pro-cessors) and 72 GB DDR3 RAM. All other nodes hadidentical configurations (two six-core Intel Xeon X56602.80-GHz processors, 24 GB DDR3 RAM). For the dual-


layer cloud test, nine nodes were used on the first layerto match the number of acquired slices. Each of themwas connected to three subnodes on the second layer.For the 3D reconstruction test, the same gateway nodeand 23 connected nodes were requested.

The cloud topology was selected to balance the maxi-mal parallelization and programming complexity. It isconvenient to have dual-layer structure for the multislicecine imaging, because this structure allows overlapbetween data acquisition and reconstruction computa-tion. For the 3D acquisition, the reconstruction onlystarts when the data acquisition is completed; therefore,a simple one-layer cloud is used to simplify the synchro-nization. The number of nodes used in the study wasmainly limited by the available resources during theexperiments. Although the Biowulf system has a largenumber of nodes, hundreds of jobs from multiple userscan run in parallel at any given time. We also intention-ally made the number of nodes different between 2Dþtand 3D tests, to challenge the scalability of Gadgetron-based cloud software.

Amazon EC2-Based Cloud

The last setup utilized the Amazon Elastic ComputeCloud (EC2). Amazon EC2 provides resizable computingcapacity in the Amazon Web Services cloud. The usercan configure and launch a flexible number of comput-ing nodes. Every node can have different hardware con-figuration and operating system. Amazon EC2 provides abroad selection of Windows and Linux operating sys-tems. More details about Amazon EC2 can be found athttp://aws.amazon.com/ec2/ (21).

For this study, 19 nodes were launched to build up avirtual cloud on Amazon EC2. Every node used thecc2.8xlarge instance type (two eight-core Intel Xeon E5-2670 2.60-GHz processors, 20 MB L3 cache per proces-sor, 60 GB DDR3 RAM), running Ubuntu Server 13.10.All nodes were connected 10Gb/s connections. For thedual-layer test, one node was used as the gateway andnine nodes were on the first layer. Each first-layer nodewas connected to one subnote. Thus, data for every slicewere processed by two nodes in total. For the 3D recon-struction test, all 18 nodes are connected directly to thegateway. At the time of the study, the price for the nodesused in this test was US $2.4 per hour per node or US$45.60 per hour for the complete cloud setup. In thiscase, the number of nodes was chosen as a reasonablebalance between cost and performance.

The connection speed from the MRI scanner to theAmazon EC2 cloud was measured to be 49.8 MB/s or0.39 Gb/s at the time of the experiments.

Clinical Scanner Integration

As previously described, the Gadgetron enables directintegration with the MRI scanner for online reconstruc-tion (15). The cloud extension of GT-Plus maintains thisadvantage of online processing. Effectively, this setupenables a seamless connection of cloud computingresources directly to the scanner. From the end-user per-spective, the only noticeable difference between recon-struction in the cloud and locally (on the scanner’s

reconstruction hardware) is the improvement in recon-struction speed.

The integration of cloud computing resources with aclinical scanner raises questions about patient privacy.In this study, multiple steps were taken to mitigate anyrisks. First, all data being sent to the Gadgetron from thescanner was anonymized. Absolutely no identifyinginformation that could be used to trace back to thepatient was sent over the network. The only informationsent was the MR raw data and protocol parameters (eg,field of view, matrix size, bandwidth, etc). Second, alldata transfer between scanner and gateway node wasencrypted via a secure shell tunnel (24). All other nodesconnected to the gateway were behind firewalls. Besidesserving as a security measure, the use of secure shelltunnels to connect to the Gadgetron was a convenientway to test multiple Gadgetron setups on the same scan-ner. The reconstruction could simply be directed to adifferent Gadgetron by changing the secure shell tunnelpointing to a different Gadgetron instance.

RESULTS

Multislice Myocardial Cine Imaging

Figure 5 shows the reconstruction results generated bythe GT-Plus cloud, illustrating the noticeable improvedin image quality using non-linear reconstruction.

Table 1 shows the total imaging time and computingtime. The imaging time is the time period from the startof data acquisition to the moment when images of all sli-ces were returned to scanner. The computing time isdefined as the time used to perform the reconstructioncomputation. On the described Amazon EC2 cloud, thetotal imaging time was 52.6 s. The computing time was48.9 s because the computation was overlapped with thedata acquisition. On the NIH’s Biowulf cloud, imagingand computing times were 62.1 s and 58.2 s respectively.If only a single node was used, the computing time was427.9 s and 558.1 s for two cloud setups. Therefore, themultislice cine imaging with entire ventricle coveragewas completed within 1 min using the nonlinear recon-struction on the GT-Plus cloud. The casual cloud yieldeda computing time of 255.0 s; if only one node was used,the computing time went up to 823.7 s. This speed-upmay be helpful for the development and validation ofnonlinear reconstruction in an MRI research laboratory.

High-Resolution 3D Neuroimaging

Figure 6 illustrates the cloud reconstruction results using11-SPIRiT. Compared with GRAPPA, the nonlinearreconstruction shows noticeable signal-to-noise ratioimprovements. Table 2 shows the total imaging time andcomputing time for three cloud setups and the single-node processing. The total imaging time on the Biowulfcloud was 278.5 s, which includes the computing timeof 186.4 s, since not all computational steps ran in paral-lel for this 3D reconstruction. Because every computingnode of the Amazon EC2 cloud setup had 16 cores and anewer central processing unit (CPU) model, the totalcomputing time was further reduced to 146.0 s. The totalimaging time on the Amazon cloud was 255.0 s. Thus,

8 Xue et al.

http://aws.amazon.com/ec2/

the latency following data acquisition was less than2 min.

Availability and Platform Support

The entire package is integrated with the currently avail-able version of the Gadgetron, which is distributed undera permissive, free software license based on the BerkeleySoftware Distribution license. The licensing terms allowusers to use, distribute, and modify the software insource or binary form. The source code of entire Gadget-ron package (including the GT-Plus extensions) can bedownloaded from http://gadgetron.sourceforge.net/. Thecode base used in this article corresponds to release2.5.0, or the git code repository commit with SHA-1hash 415d999aa83860225fdddc0e94879fc9c927d64d. Thedocumentation can be found at the Sourceforge OpenSource distribution website for Gadgetron (http://gadget-ron.sourceforge.net/). The software has been compiledand tested on Microsoft Windows 7 64-bit, UbuntuLinux, CentOS 6.4, and Mac OS X. (Note that this paperdoes not present examples running on CentOS 6.4 orMac OS X, but the software has been compiled andtested on those two operating systems.)

DISCUSSION

This study extends the previously published Gadgetronframework to support distributed computing, which

makes it possible to distribute a demanding reconstruc-tion task over multiple computing nodes and signifi-cantly reduce the processing time. This paper focused onreducing the processing time of nonlinear reconstructionapplications, which have so far been difficult to deployclinically. As demonstrated in the examples, theincreased computational power could make the process-ing time of nonlinear reconstruction feasible for regularclinical use.

The GT-Plus extension provides a set of utility func-tions to manage and communicate with multiple nodes.Based on these functions, flexible cloud topologies canbe realized as demonstrated in this paper. In the currentGadgetron software release, the discussed dual-layer andsingle-layer cloud implementation are made available tothe community. They are designed and implemented forgeneral use, although the examples given are specific forcine and 3D neuroimaging. Based on the GT-Plus exten-sions, new cloud topologies will be explored in thefuture, and users can design and implement their owncloud structure.

The Gadgetron-based cloud computing was deployedon three different types of cloud computing hardware.This demonstrates the flexibility of both Gadgetronframework in general and the GT-Plus extensions specifi-cally. The casual cloud setup is the cheapest, mostaccessible way of using Gadgetron cloud computing. Itcan be set up quickly and used for algorithm

FIG. 5. Reconstruction results of multislice myocardial cine imaging on the GT-Plus cloud. Compared with the linear reconstruction (a),nonlinear reconstruction (b) improves the image quality. With the computational power of a Gadgetron-based cloud, the entire imagingincluding data acquisition and nonlinear reconstruction can be completed in 1 min to achieve the whole heart coverage. (See also Sup-

porting Movies 1a and 1b.)

Table 1

Total Imaging Time (from Start of Data Acquisition to the Moment When All Images Were Returned to the Scanner) and Computing Timefor the Multislice Myocardial Cine Imaging

Casual Biowulf Amazon EC2

Single Cloud Single Cloud Single Cloud

Imaging time, s — 259.2 — 62.1 — 52.6Computing time, s 823.7 255.0 558.1 58.2 427.9 48.9

The in vivo test was only performed with cloud setups. The single node computing time was recorded with the retrospective reconstruc-tion on the gateway node.


development and validation in a research laboratory.The disadvantage of using such a setup is the computingnodes may be very heterogeneous and optimal distribu-tion of the work among nodes may become nontrivial. Ifcloud computing is used in a production type setting ona clinical scanner, it is also problematic that a node maybe busy with other tasks when it is needed by thescanner.

The Amazon EC2 system, on the other hand, providesa professional grade cloud setup with 24/7 availability.The computational resources can be dedicated for a spe-cific project or scanner, and the nodes can in generalhave the same configuration, which reduces the com-plexity of scheduling. This setup is also easy to replicateand share among groups and scanners. There are othercompanies that provide the same type of cloud infra-structure service, such as Rackspace (http://www.rack-space.com/). With these vendors, the users pay by hoursto rent computers. In the setup we describe in this paper,the cost of running the cloud was on the order of US $50per hour. While this cost could be prohibitive if thecloud is enabled 24 hours per day, it is a relatively mod-est cost compared with the cost of typical patient scans.Moreover, this cost is falling rapidly with more serviceproviders entering into the market and more powerfulcomputing hardware becoming available. The main

downside of using a remote cloud service such as Ama-zon EC2 is the need for a high-speed Internet connectionto the cloud provider. At large universities, 1-Gb/s con-nections (which are sufficient for the applications pre-sented here) are becoming commonplace; however, thismay not be the case for hospitals in general.

The third setup, the NIH’s Biowulf, is a dedicated,custom-built cluster system. Although not available tothe entire MRI community, it represents a parallel com-puting platform often found at research institutions(such as universities) or large corporations. For largeorganizations aiming to provide imaging, reconstructionand processing service for a large number of scanners,this setup may be the most cost-effective platform. It is,however, important to note that purchasing, configura-tion, deployment, and management of even modestcloud computing resources is a nontrivial task thatrequires significant resources.

Several software modules implemented in the Gadget-ron framework are GPU-accelerated, although theexamples demonstrated in this paper primarily useCPUs. The GT-Plus extension of distributed computingdoes not conflict with GPU-based computational acceler-ation in any sense. Every computing node can beequipped with different hardware; for those with goodGPUs or CPU coprocessors (eg, Intel Xeon Phi

Table 2Total Imaging and Computing Time for the 3D Neuroimaging Acquisition

Casual Biowulf Amazon EC2

Single Cloud Single Cloud Single Cloud

Imaging time, s — 541.9 — 278.5 — 255.0

Computing time, s 1054.3 449.1 1265.1 186.4 983.5 146.0

The single node used for comparison is the gateway node for every cloud setup.

FIG. 6. Reconstruction results of 1 mm3 brain acquisition on the GT-Plus cloud. The basic cloud topology shown in Fig. 2 was usedhere. Both GRAPPA linear reconstruction (a) and 11-SPIRiT results (b) are shown here. The single node processing time is over 20 min,which prohibits the clinical usage of nonlinear reconstruction. With the Gadgetron-based cloud, a computing time of <2.5 min can be

achieved for this high-resolution acquisition.

10 Xue et al.

http://www.rackspace.com/

http://www.rackspace.com/

coprocessors [http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-detail.html]), the local process-ing can utilize those resources, since the Gadgetronframework allows every node to be configured with dif-ferent Gadget chains. Heterogeneous computational hard-ware across nodes can be challenging for optimal loadbalancing. In the current framework, the user can supplycomputing power indexes for nodes to indicate itsstrength. But the framework does not presently imple-ment more complicated approaches to help determinethe computing ability of every node.

The Gadgetron framework and its toolboxes are mainlyprogrammed using Cþþ and fully utilize generic tem-plate programming. Although efforts are made to providedocumentation and examples in the Gadgetron distribu-tion, developers who want to extend the toolboxes stillneed some proficiency with object oriented programmingand Cþþ in particular.

CONCLUSION

We have developed the Gadgetron Plus extension for theGadgetron framework to support distributed computingusing various cloud setups. The experiments using thisdistributed computing setup show that the increased com-putational power in the cloud can be used to accelerate11-SPIRiT reconstructions significantly and thus makethem clinically practical. These results indicate thatGadgetron-based computing can help with clinical testingand adoption of advanced nonlinear reconstruction meth-ods. Furthermore, the open source nature of the imple-mented software framework can help the advancement ofreproducible research encouraging software sharing.

ACKNOWLEDGMENTS

This study utilized the high-performance computationalcapabilities of the Biowulf Linux cluster at the NationalInstitutes of Health, Bethesda, Maryland, USA. (http://biowulf.nih.gov).

REFERENCES

1. Blaimer M, Breuer F, Mueller M, Heidemann RM, Griswold MA,

Jakob PM. SMASH, SENSE, PILS, GRAPPA: how to choose the opti-

mal method. Top Magn Reson Imaging 2004;15:223–236.

2. Breuer FA, Kellman P, Griswold MA, Jakob PM. Dynamic autocali-

brated parallel imaging using temporal GRAPPA (TGRAPPA). Magn

Reson Med 2005;53:981–985.

3. Griswold MA, Jakob PM, Heidemann RM, Nittka M, Jellus V, Wang J,

Kiefer B, Haase A. Generalized autocalibrating partially parallel

acquisitions (GRAPPA). Magn Reson Med 2002;47:1202–1210.

4. Kellman P, Epstein FH, McVeigh ER. Adaptive sensitivity encoding

incorporating temporal filtering (TSENSE). Magn Reson Med 2001;45:

846–852.

5. Pruessmann KP, Weiger M, Scheidegger MB, Boesiger P. SENSE: sen-

sitivity encoding for fast MRI. Magn Reson Med 1999;42:952–962.

6. Lustig M, Donoho D, Pauly JM. Sparse MRI: the application of com-

pressed sensing for rapid MR imaging. Magn Reson Med 2007;58:

1182–1195.

7. Lustig M, Pauly JM. SPIRiT: iterative self-consistent parallel imaging

reconstruction from arbitrary k-space. Magn Reson Med 2010;64:457–

471.

8. Hansen MS, Atkinson D, Sorensen TS. Cartesian SENSE and k-t

SENSE reconstruction using commodity graphics hardware. Magn

Reson Med 2008;59:463–468.

9. Sorensen TS, Schaeffter T, Noe KO, Hansen MS. Accelerating the

nonequispaced fast fourier transform on commodity graphics hard-

ware. IEEE Trans Med Imaging 2008;27:538–547.

10. Jung H, Sung K, Nayak KS, Kim EY, Ye JC. k-t FOCUSS: a general

compressed sensing framework for high resolution dynamic MRI.

Magn Reson Med 2009;61:103–116.

11. Liang D, DiBella EVR, Chen R-R, Ying L. k-t ISD: dynamic cardiac

MR imaging using compressed sensing with iterative support detec-

tion. Magn Reson Med 2011;68:41–53.

12. Lustig M, Santos JM, Donoho DL, Pauly JM. k-t SPARSE: High frame

rate dynamic MRI exploiting spatio-temporal sparsity. In Proceedings

of the 14th Annual Meeting of ISMRM, Seattle, Washington, USA,

2006. p. 2420.

13. Vasanawala SS, Alley MT, Hargreaves BA, Barth RA, Pauly JM,

Lustig M. Improved Pediatric MR Imaging with Compressed Sensing.

Radiology 2010;256:607–616.

14. Vasanawala SS, Murphy MJ, Alley MT, Lai P, Keutzer K, Pauly JM,

Lustig M. Practical parallel imaging compressed sensing MRI: Sum-

mary of two years of experience in accelerating body MRI of pediatric

patients. 2011 March 30 - April 2; Chicago, IL p1039–1043.

15. Hansen MS, S�rensen TS. Gadgetron: an open source framework for

medical image reconstruction. Magn Reson Med 2012;69:1768–1776.

16. Feng Y, Song Y, Wang C, Xin X, Feng Q, Chen W. Fast direct fourier

reconstruction of radial and PROPELLER MRI data using the chirp

transform algorithm on graphics hardware. Magn Reson Med 2013;70:

1087–1094.

17. Simpson R, Keegan J, Gatehouse P, Hansen M, Firmin D. Spiral tissue

phase velocity mapping in a breath-hold with non-cartesian SENSE.

Magn Reson Med 2013. doi: 10.1002/mrm.24971.

18. Xue H, Kellman P, LaRocca G, Arai A, Hansen M. High spatial and

temporal resolution retrospective cine cardiovascular magnetic reso-

nance from shortened free breathing real-time acquisitions. J Cardio-

vasc Magn Reson 2013;15:102.

19. Hansen MS. Sharing reconstruction software–raw data format and

gadgetron. In Proceedings of the ISMRM Workshop on Data Sampling

& Image Reconstruction, Seattle, Sedona, Arizona, USA, 2013.

20. National Institutes of Health. NIH Biowulf Cluster System. http://bio-

wulf.nih.gov/. July 15, 2013.

21. Amazon Elastic Compute Cloud (Amazon EC2). http://aws.amazon.-

com/ec2/. Accessed July 12, 2013.

22. Schmidt DC. The ADAPTIVE Communication Environment: Object-

Oriented Network Programming Components for Developing Client/

Server Applications. In Proceedings of the 11th Annual Sun Users

Group Conference, San Jose, California, USA, 1993. p. 214–225.

23. Lavender RG, Schmidt DC. Active object: an object behavioral pattern

for concurrent programming. In: John MV, James OC, Norman LK,

editors. Pattern languages of program design 2. Addison-Wesley

Longman Publishing Co., Inc., London, UK, 1996. p. 483–499.

24. Network Working Group of the IETF. The Secure Shell (SSH) authen-

tication protocol. Fremont, CA: Internet Engineering Task Force;

2006.


http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-detail.html

http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-detail.html

info:doi/10.1002/mrm.24971

distributed mri reconstruction using gadgetron-based cloud computing

Documents