libra: thermal and process variation aware reliability...

LIBRA: Thermal and Process VariationAware Reliability Management in Photonic

Networks-on-ChipSai Vineel Reddy Chittamuru , Student Member, IEEE,

Ishan G. Thakkar , Student Member, IEEE, and Sudeep Pasricha , Senior Member, IEEE

Abstract—Silicon nanophotonics technology is being considered for future networks-on-chip (NoCs) as it can enable high bandwidth

density and lower latency with traversal of data at the speed of light. But, the operation of photonic NoCs (PNoCs) is very sensitive to

on-chip temperature and process variations. These variations can create significant reliability issues for PNoCs. For example, a

microring resonator (MR) may resonate at another wavelength instead of its designated wavelength due to thermal and/or process

variations, which can lead to bandwidth wastage and data corruption in PNoCs. This paper proposes a novel run-time framework

called LIBRA to overcome temperature- and process variation- induced reliability issues in PNoCs. The framework consists of

(i) a device-level reactive MR assignment mechanism that dynamically assigns a group of MRs to reliably modulate/receive data in a

waveguide based on the chip thermal and process variation characteristics; and (ii) a system-level proactive thread migration technique

to avoid on-chip thermal threshold violations and reduce MR tuning/ trimming power by dynamically migrating threads between cores.

Our simulation results indicate that LIBRA can reliably satisfy on-chip thermal thresholds and maintain high network bandwidth while

reducing total power by up to 61.3 percent, and thermal tuning/trimming power by up to 76.2 percent over state-of-the-art thermal and

process variation aware solutions.

Index Terms—Thermal variations, process variations, photonic networks-on-chip, reliability

Ç

1 INTRODUCTION

AS the core count in emerging manycore systemsincreases, traditional on-chip communication fabrics,

i.e., electrical networks-on-chip (ENoCs), are unable to scaleup to meet aggressive performance goals. To overcome per-formance bottlenecks with ENoCs, recent advances inCMOS-photonics integration [1] have enabled an excitingsolution in the form of photonic NoCs (PNoCs). PNoCsexcel over their electrical counterparts in performance dueto several advantages, including near light speed transfers,high bandwidth density, and low dynamic power dissipa-tion [2]. Thus, emerging manycore systems may be able toovercome their on-chip communication bottlenecks withPNoCs (e.g., [3], [4], [5], [6], [7], [8]).

Typical PNoC architectures employ several photonic devi-ces such as photonic waveguides, couplers, splitters, andmulti-wavelength laser sources, along with microring resona-tors (MRs) asmodulators, detectors, and switches. In a typicalhigh bandwidth PNoC, an off-chip laser source generatesmulti-wavelength light, which is coupled by an optical cou-pler to an on-chip photonic waveguide. This waveguide

guides the input optical power of multiple dense-wave-length-division-multiplexed (DWDM) wavelengths, via aseries of optical power splitters, to the individual nodes (e.g.,processing cores) on the chip. Each node in the PNoC can con-nect and communicate to multiple other nodes through suchphotonic waveguides that can guide the utilized DWDMwavelengths. Awavelength serves as a carrier to a data signal.Typically, multiple data signals are generated at a sourcenode in the electrical domain as sequences of logical 1 and 0voltage levels. These input electrical data signals are modu-lated onto the DWDM wavelengths using a bank of modula-tor MRs. These data-modulated carrier wavelengths traversea waveguide to a destination node, where an array of detectorMRs filter them and drop them on adjacent photodetectors toregenerate electrical data signals. In general, each node in aPNoC should be able to send and receive data in the opticaldomain on all of the utilized carrier wavelengths. Therefore,each node has a bank of modulator MRs and a bank of detec-tor MRs. Each MR in a bank is structurally itself a loopedwaveguide with a small and unique circumference, whichmakes the MR resonate with and operate on a specific carrierwavelength referred to as the MR’s assigned carrier wave-length. Thus, the excellent wavelength selectivity of MRs andDWDMcapability ofwaveguides enable high bandwidth par-allel data transfers in PNoCs, under ideal conditions.

As advocated by prior works [3], [4], [5], [6], PNoCs areexpected to be 3D-stacked on top of their respective many-core chips. Therefore, the MRs of PNoCs will be placedon top of, and hence in close proximity to, processing cores.Variations in core workloads lead to variations in theirpower dis4sipation, which in turn can alter the temperatures

� The authors are with the Department of Electrical and Computer Engineer-ing, Colorado State University, Fort Collins, CO 80523.E-mail: {sai.chittamuru, ishan.thakkar, sudeep}@colostate.edu.

Manuscript received 9 Oct. 2017; revised 29 May 2018; accepted 31 May2018. Date of publication 12 June 2018; date of current version 29 Jan. 2019.(Coressponding author: Sai Vineel Reddy Chittamuru.)Recommended for acceptance by N. Jha.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference the Digital Object Identifier below.Digital Object Identifier no. 10.1109/TMSCS.2018.2846274

758 IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS, VOL. 4, NO. 4, OCTOBER-DECEMBER 2018

2332-7766� 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See ht _tp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

https://orcid.org/0000-0002-0918-1755

https://orcid.org/0000-0002-0918-1755

https://orcid.org/0000-0002-0918-1755

https://orcid.org/0000-0002-0918-1755

https://orcid.org/0000-0002-0918-1755

https://orcid.org/0000-0002-7289-1530

https://orcid.org/0000-0002-7289-1530

https://orcid.org/0000-0002-7289-1530

https://orcid.org/0000-0002-7289-1530

https://orcid.org/0000-0002-7289-1530

https://orcid.org/0000-0002-0846-0066

https://orcid.org/0000-0002-0846-0066

https://orcid.org/0000-0002-0846-0066

https://orcid.org/0000-0002-0846-0066

https://orcid.org/0000-0002-0846-0066

mailto:

of the cores and MRs in their vicinity. For instance, the tem-perature on a typical manycore chip can easily vary by asmuch as 90�C [13]. Unfortunately, MRs are very sensitive tothese on-chip thermal variations (TV): their effective refrac-tive indices, and hence their resonance wavelengths arealtered if their operating temperatures change. Therefore, ina typical PNoC, the resonance wavelengths of the utilizedmodulatorMRsmay not alignwith, and hencemay notmod-ulate their assigned carrier wavelengths [9]. This may resultin bandwidth wastage, or worse, data corruption whendetector MRs are unable to read from their assigned carrierwavelengths [10].

In addition to TV, MRs are also susceptible to fabricationprocess variations. Process variations (PV) induce variationsin the width, thickness, and doping concentration of MRs(see Section 3.2), causing resonance wavelength shifts inMRs [11], [12]. PV measurements of fabricated MR devicesindicate a standard deviation (s) of 1.3 nm in width, whichtranslates to a 0.76nm shift in an MR’s resonance wave-length [14]. These PV-induced resonance wavelength shiftsin MRs also cause bandwidth wastage and data corruption.

The adverse effects of PV and TV related to resonanceshifts in MRs, and their performance and reliability impacts,can be redressed by realigning the resonant wavelengths ofMRs with their assigned carrier wavelengths using localizedtrimming [15] and thermal tuning [9] mechanisms. Trim-ming alters the free-carrier concentration in an MR core,whereas thermal tuning uses integrated micro-heaters toalter local temperatures at MRs. But these mechanismscome with high power and performance overhead [9].Hence, it is essential to intelligently manage thermal and processvariations in PNoC-based manycore systems, to achieve reliablecommunication with minimal trimming and tuning costs.

In this work, we aim to minimize the need for (and over-heads of) localized thermal tuning and trimming in PNoCswhile coping with process and thermal variations, therebyeasing the adoption of PNoCs in future manycore systems.We propose a novel thermal and process variation awaredynamic reliability management framework called LIBRAthat integrates adaptive MR assignment at the device-leveland dynamic thread migration at the system-level forPNoC-based manycore systems. Our novel contributions aspart of the LIBRA framework are summarized below:

� We design a novel thermal and process variationaware MR assignment (TPMA) mechanism at thedevice-level, which dynamically assigns a set of MRsto the utilized set of carrier wavelengths at run-time.TPMA enables reliable modulation and reception ofdata with minimal overheads, while maintaining themaximum possible bandwidth;

� We propose a novel PV-aware anti wavelength-shiftdynamic thermal management (VADTM) mecha-nism at the system-level, which uses support vectorregression (SVR) based temperature prediction anddynamic thread migration to avoid on-chip thermalthreshold violations and reduce trimming/tuningpower for MRs;

� We evaluate our LIBRA (TPMAþ VADTM) frame-work on a 64-core chip, and compare it with fourstate-of-the-art thermal management solutions: anMR-aware thermal management (RATM) framework[16], an MR PV-aware thermal management (FATM)

framework [17], a predictive dynamic thermal man-agement (PDTM) framework [18], and an MR-awarethermal management (SPECTRA) framework [10];and show significant reduction in maximum temper-ature and trimming/tuning power costs comparedto these solutions.

2 RELATED WORK

Traditional electrical NoC communication fabrics are pro-jected to suffer from cripplingly high power dissipation andseverely reduced performance in future manycore systems[19]. The higher bandwidth density and lower power dissi-pation possiblewith silicon-photonic links, compared to elec-trical wires, hasmade them an attractive option formanycoresystems. Recent research has thus focused on exploring awide spectrum of network topologies and protocols to enableefficient PNoC architectures [3], [4], [5], [6], [7], [8], [32].

PV and TV in silicon-photonic links represent importantchallenges for the widespread adoption of PNoC architec-tures. Several techniques have been proposed to reducethermal hotspots and gradients using DVFS [20], [21], work-load migration [8], [18], [22] and liquid cooling [27]. A fewPV-aware application mapping frameworks have also beenproposed [29], [30] that optimize performance and energy inmanycore systems. In [29] a run-time application-mappingstrategy was presented, which considers the variation pro-file of a manycore processor to maximize performance andreduce leakage-power for a given fixed power budget. In[30] a framework was presented that integrates reliabilityand variation-awareness in a run-time variable degree-of-parallelism (DoP) application-scheduling methodology toenhance manycore performance. However, these techniques donot consider the unique challenges (e.g., MR resonance wavelengthshifts) and constraints (e.g., wavelength match between sender andreceiver MR pairs) that exist in PNoCs.

A few prior works have analyzed the impact of TV andPV on PNoCs at the device-level, link-level, and system-level, and proposed solutions to remedy these variations.

The device-level efforts have mainly proposed variousathermal photonic devices to reduce localized tuning/trim-ming power in MRs. These design-time solutions includeusing materials such as cladding to reduce thermal sensitiv-ity [31] and using heaters and temperature sensors for ther-mal control [33]. An electrical backend capable of bit re-shuffling was proposed in [34] to enhance photonic linkrobustness against TV and PV with lower MR tuningpower. While these device- and link-level techniques are promis-ing, they either possess a high power overhead or require costlychanges in the manufacturing process (e.g., larger device areas)that would decrease bandwidth density and area efficiency. Inaddition, there are device and link-level solutions [23], [24],[25], [26], [55] aim to mitigate crosstalk noise in PNoCs.

At the system-level, the overhead associatedwith localizedtuning of MRs was reduced in [9] using the group shift prop-erty of co-located MRs as part of a method to trim a group ofrings at the same time. In [16], a ring-aware thread schedulingpolicy was proposed to reduce on-chip thermal gradients in aPNoC. In [35], a thread migration mechanism was proposedto minimize on-chip thermal gradients within a PNoC. In[57], an island of heater based thermal management frame-work was proposed to adapt groups or islands of MRs withinPNoCs to on-chip thermal variations. A few prior works have

CHITTAMURU ET AL.: LIBRA: THERMAL AND PROCESS VARIATION AWARE RELIABILITY MANAGEMENT IN PHOTONIC NETWORKS-ON-CHIP 759

also explored the impact of PV on DWDM-based photoniclinks at the system-level [36], [37], [38]. A reliability-awaredesign flow to address variation induced reliability issues isproposed in [36], which uses athermal coating at fabrication-level, voltage tuning at device-level, as well as channel hop-ping at the system-level. In [37], a methodology to salvagenetwork-bandwidth loss due to PV-shifts is proposed, whichreorders MRs and trims them to nearby wavelengths. In[38], power-efficient techniques are proposed, based on inter-channel hopping and variation-aware routing to compensatefor PV effects at runtime. A few system-level works [14], [17],[39] also consider the impact of both TV and PV on opticallinks. In [14], a thermal-tuning approach is presented thatadjusts chip temperature using DVFS to compensate for chip-wide thermal and process variation induced resonance shiftsinMRs and improve system performance. In [17], a PV awareworkload allocation policy is presented to reduce the thermaltuning power of PNoCs. In [39], a tunable laser source designis demonstrated, in which the signal power at the source isadapted to compensate for signal losses due to TV and PVacross optical interconnects.None of these system-level solutionsfor PNoCs considers the impact of the relationship between thermalhotspots and transmission reliability.

To address these shortcomings of prior work, we pro-posed the SPECTRA framework in our prior work [10].SPECTRA is a cross-layer framework that combines twodynamic thermal management mechanisms to reduce maxi-mum on-chip temperature and conserve trimming and tun-ing power of MRs in DWDM-based PNoC architectures. Ourproposed LIBRA framework in this paper improves uponSPECTRA, by (i) considering the impact of PV on dynamicthermal management; (ii) utilizing a new device-level TVand PV aware ring assignment mechanism; and (iii) utilizinga new system-level PV-aware thread migration mechanism.Sections 4, 5, 6 describe our proposed framework which isthen evaluated in Section 7 against prior work.

3 IMPACT OF TV AND PV ON DWDM BASED

PNOCS

In this section, we explain the key impacts of PV and TV onDWDMbased PNoCs. Althoughmost silicon-based photonicdevices exhibit some susceptibility to temperature and pro-cess variations, the highwavelength selectivity ofMRsmakesthem especially susceptible to these variations. Therefore, weprimarily focus on the impacts of TV and PV onMRs.

3.1 Impact of TV on DWDM Based PNoCsIn a DWDM PNoC, the temperatures of the individual com-pute nodes and their associated MR banks follow theworkload-dependent temperatures of the processing coresin the nodes. As the application workload of each core in amanycore system usually differs from that of other cores andalso varies with time, the temperatures of the cores (and thusnodes) of the system differ from one-another and vary withtime. As a result, the temperatures of different MR banks ofthe PNoC also differ from one another and varywith time.

Typically, the MR banks of each PNoC node are designedto resonate with and operate upon their assigned carrierwavelengths at a specific temperature, e.g., room tempera-ture. But due to the time- and workload-dependent temper-ature variations, the resonances of different MR banks shiftaway from their assigned carrier wavelengths by differentamounts.

For example, Fig. 1 depicts an MR bank with MRs R1-Rn

that are designed to resonate with their assigned carrierwavelengths �1-�n, respectively, at temperature T1. As thetemperature increases to T2, the resonance wavelength ofeach MR shifts away from its assigned carrier wavelengthtowards the red end of the spectrum (i.e., red-shift). Thisred-shift is shown in the figure where, at temperatureT2 (T2 > T1Þ, the resonance wavelength �i of MR Ri is inline with the carrier wavelength �i�1. Consequently, the car-rier wavelength �n is not assigned to any of the MRs. Thisresults in bandwidth wastage if the MR bank is a modulatorMR bank, as �n cannot be modulated by any modulator MRnow. This example scenario can also result in data corruptionif the MR bank is a detector MR bank, as �n cannot bereceived by any detector MR. Similarly, if T2 < T1, the res-onance wavelength �i of each Ri shifts towards the blue endof the spectrum (i.e., blue-shift), which may leave �1 unas-signed, causing bandwidth wastage or data corruption.Thus, during the runtime of a PNoC, an increase in an MRbank’s temperature red-shifts the resonances of all its MRs,whereas a decrease in an MR’s temperature blue-shifts theresonances of all its MRs.

The amount of shift in an MR’s resonance not onlydepends on the magnitude of temperature change, but alsoon the MR’s structure and geometry manifested as its effec-tive refractive index neff . Typically, an MR is a loopedwaveguide with a silicon (Si) core and silicon dioxide (SiO2)cladding, irrespective of whether it is used as a modulatoror a detector. The change D�r in the resonance wavelength�r of an MR due to an arbitrary change DT in its local tem-perature is given by the following equation [40]:

D�r

DT¼ dneff

dT

�r

ng¼ GSi

dnSi

dTþ GSiO2

dnSiO2

dT

� ��r

ng; (1)

Here, ng is the group refractive index (ratio of speed oflight to group velocity of all wavelengths traversing thewaveguide) of the MR waveguide. GSi and GSiO2 are themodal confinement factors of the MR’s core (Si) and clad-ding (SiO2), respectively. dnSi=dT and dnSiO2=dT are thethermo-optic coefficients of Si (MR’s core) and SiO2 (MR’scladding) materials, with values of 1:86� 10�4 K�1 and1� 10�5 K�1, respectively [40]. As the thermo-optic coeffi-cient of Si is an order of magnitude greater than that ofSiO2, and as GSi is also greater than GSiO2 for a typical MR,the contributions from the MR’s cladding (SiO2) in Eq. 1 canbe ignored. Consequently, Eq. (1) reduces to:

D�r ¼ GSi:dnSi

dT:�r

ng:DT; (2)

Note that the MRs used in this study are looped channelwaveguides with a cross section of 450 nm� 220 nm. Wemodel theseMRsusing a commercial-grade eigenmode solver[54], based on which the values of GSi and ng at 1550 nm arecalculated to be 0.78 and 4.16, respectively.

Fig. 1. Impact of temperature increase on an MR bank.


3.2 Impact of PV on DWDM Based PNoCsIdeally, without any fabrication-induced PV, a sender or areceiver node can modulate and detect all of the carrier wave-lengths available in the waveguide without any bandwidthloss or error. But in reality, similar to deep submicron elec-tronic devices, photonic devices such as MR modulators, MRdetectors, grating couplers, splitters etc. also suffer from sig-nificant PV [43]. In this work, we mainly focus on the severePV effects in MRs. The MR structure is very sensitive to PV,much like it is to TV. Due to PV effects, the widths, heights,and side wall roughness of MRs can deviate from desiredvalues after fabrication. Consequently, the resonance wave-lengths ð�rÞ of the MRs also deviate from their designedvalues. For example, 1nm of variation in width and height ofanMR can lead to 0.58�1 nm and�2 nm shift in its resonancewavelength, respectively [11].

As discussed earlier, PNoCs employ DWDM-based pho-tonic links with cascaded MRs (i.e., MR banks) in their send-ing and receiving nodes. Unlike TV that induces systematicred or blue shifts in all the MRs of an MR bank, PV can incurrandom shifts in the resonance wavelengths of the MRs of asingle bank, as shown in Fig. 2. From this figure, MRs R1,R4, . . ., Rn-1 have blue shift in their resonance wavelengthsand MRs R2, R3, . . ., Rn have red shift in their resonancewavelengths. Much like with TV, PV can also throw the res-onances of the MRs out of alignment with their assignedcarrier wavelengths, which can ultimately lead to band-width wastage and/or data corruption.

In summary, to enable reliable photonic communication,there is a need to mitigate the combined impact of TV andPV on PNoCs. This paper presents a cross-layer frameworkthat uses device-level and system-level enhancements toremedy the combined impact of TV and PV. Before discus-sing our proposed framework, we present our performance,power, and thermal setup for modeling manycore systemswith PNoCs in the next subsection. We also present a char-acterization of the impact of TV and PV on the MRs of a typ-ical DWDM PNoC based manycore system, in this nextsubsection.

3.3 Modeling TV and PV in PNoC ArchitecturesTo model and characterize TV and PV in a manycore systemwith a PNoC, we developed a simulation framework, whichintegrates performance, power, thermal, and variation simu-lators, as shown in Fig. 3. We considered a three layered 3D-stacked 64-core system as advocated in existing PNoC archi-tectures [3], [4], [5], [6] with a planar die area footprint of 400mm2. Microarchitectural parameters of the manycore systemare presented in Table 1. The top layer is the core-cache layer,where each core along with its private L1 cache, private andinclusive L2 cache and network interface spreads across anarea of 6.25 mm2 (i.e., 2:5 mm� 2:5 mm). For each core, weuse an architecture and layout similar to the IA-32 core fromIntel’s Single-chip Cloud Computer (SCC) [47], scaled to 32

nm. As this core-cache layer dissipates significant portion oftotal power, therefore heat sink is placed closer to this layer.The middle layer is the conversion layer with digital and ana-log circuits that support electrical-to-optical (E/O) and opti-cal-to-electrical (O/E) conversion of data. Finally, the bottomlayer is the photonic layer with photonic components anddevices (e.g., MRs, waveguides, ring heaters, etc.) that com-prise a PNoC.

We use Sniper [44] to simulate the performance of themanycore system while it executes multithreaded applica-tions from SPLASH-2 [45] and PARSEC [46] benchmarksuites. To factor in the varying systemutilizations as a contrib-utor to the dynamic TV in the processing cores and its impacton the associated photonic devices (e.g., MRs), we run eachapplication on a target 64-core system (see Section 6) with 8,16, 32, 48, and 64 threads. To capture runtime behavior of anapplication, we generate performance traces using Sniper,which are fed to MCPAT [46] to generate power traces atcore-level granularity. We use published power dissipationdata from Intel’s Single-Chip Cloud Computer (SCC) [47],scaled to 32 nm, to calibrate our dynamic power data. Thepower traces generated by McPAT are given as inputs to the3D-ICE tool [48] for transient thermal simulations (see Fig. 3).Some of the key materials used in the construction of the 3D-stack in the 3D-ICE tool and their properties are shown inTable 2.

We analyzed the spatial variation in the peak temperaturesof various tiles (at core-level granularity) of the photoniclayer. For the 64-core system each tile has an estimated area of6.25 mm2 (i.e., 2:5� 2:5mm2). We executed 64-threaded ver-sions of the blackscholes (BS), bodytrack (BT), vips (VI), face-sim (FS), fluidanimate (FA), swaptions (SW), barnes (BA), fft(FFT), radix (RX), radiosity (RD), and raytrace (RT) applica-tions from the PARSEC and SPLASH2 benchmark suites on

Fig. 2. Impact of PV on DWDM based PNoCs.

Fig. 3. Simulation framework to analyze TV and PV in a manycore sys-tem with a PNoC architectures. The framework integrates performance,power, thermal, and variation simulators.

TABLE 1Micro-Architectural Parameters for Manycore System

Number of cores 64Threads per core 1

Per Core:

L1 I-Cache size/Associativity 32 KB/Direct Mapped CacheL1 D-Cache size/Associativity 32 KB/Direct Mapped CacheL2 Cache size/ Associativity 256 KB/ Direct Mapped CacheL2 Coherence MOESIFrequency 2 GHzIssue Policy In-orderMemory controllers 8Main memory 8 GB; DDR5@30 ns


the 64-core systemwith one application running at a time.Wemonitored the peak temperature of each part of the photoniclayer for every application and plotted the maximum peaktemperature of each part across all the applications, as shownin Fig. 4a. From this figure,we can observe themaximumpos-sible temperature-rise (above the room temperature) for anypart of the layer, which caps all possible dynamic TV valuesfor that part. From Fig. 4a, higher peak temperatures areobtained at the center of the chip while relatively lower peaktemperatures are achieved at the periphery of the chip. Themain reason for the higher temperature at the center of thechip is the inefficiency of the heat sink to remove heat fromthe center of the chip. Furthermore, using Eqs. (1) and (2), wedetermined the resonance wavelength shifts because of thepeak temperature-rises, which are presented as a histogramin Fig. 4b. As evident from this figure, TV can induce up to a7.4 nm shift inMR resonances.

In addition to TV, we also analyzed PV in PNoCs with thesimulation setuppresented in Fig. 3.We adapted theVARIUStool [51] to model die-to-die (D2D) as well as within-die(WID) process variations in MRs for the PNoC. VARIUS usesa normal distribution to characterize on-chip D2D and WIDprocess variations. The key parameters are mean (m), vari-ance (s2), anddensity (a) of a variable that follows the normaldistribution. Aswavelength variations are approximately lin-ear to the dimension variations of MRs, we assume they fol-low the same distribution. The mean (m) of wavelengthvariation of an MR is its nominal resonance wavelength. ForPNoCs, we considered waveguides with 32 DWDM degreesharing the working band 1530–1625 nm (i.e., C and L bands)with a wavelength channel spacing of 1.48 nm. Hence, thosewavelengths are the means for each MR modeled. The vari-ance (s2) of wavelength variation is determined based on lab-oratory fabrication data [11] and our target die size. For a64-core chip with 400 mm2 size at 32 nm node, we consider aWID and D2D standard deviations of sWID ¼ 0:61 nmsD2D ¼ 1:01 nm, respectively [37]. We also consider a den-sity (a) of 0.5 [37] for this die size. With these parameters, weuse VARIUS to generate 100 process variationmaps.

We depict a PV map in Fig. 5a, which shows a spatialvariation in PV-induced resonance wavelength shifts on thephotonic die. Each PV map contains over one million pointsindicating the PV-induced shifts in MR resonances. Thetotal number of points picked from these maps equal thenumber of MRs in the PNoC. We also present these pointsas a histogram in Fig. 5b. As evident from the histogram,PV can induce resonance wavelength shifts in the range of�1.8 nm to 1.6 nm. However, we observed that this rangecan increase up to �3 nm to 3 nm for other PV maps.

4 OVERCOMING PV/TV INDUCED RESONANCE

WAVELENGTH SHIFTS

The adverse effects of PV and TV, i.e., resonance shifts inMRs and their performance and reliability impacts, can be

overcome by realigning and locking the resonance wave-lengths of the individual MRs with the utilized carrierwavelengths. As PV is a static phenomenon, the PV-inducedresonance shifts need to be overcome only once at systeminitialization. In contrast, due to the dynamic nature of TV,the TV-induced resonance shifts require runtime thermalstabilization of MRs. A stable locking of MR resonanceswith the utilized carrier wavelengths can be achieved usingdevice-level (MR-level) mechanisms, such as localized trim-ming [15] and/or thermal tuning [9], with a dithering signalbased feedback control [40]. However, the localized trim-ming and thermal tuning mechanisms proposed in priorwork come with several challenges, which must be over-come to ease the adoption of PNoCs for future manycoresystems.

First, thermal tuning and localized trimming mecha-nisms cannot provide sufficient tuning range to remedyPV/TV-induced resonance shifts in MRs. For instance, fromSection 3, TV and PV together can induce shifts in MR reso-nance wavelengths of up to 10.4 nm, i.e., 7.4 nm for TV and�3 nm for PV. Therefore, compensating these TV/PV-induced resonance shifts would require a net tuning rangeof 10.4 nm. But localized trimming can provide a tuningrange of only 1.5 nm at most [36]. In contrast, thermal tun-ing can provide a tuning range of about 6.6 nm correspond-ing to the temperature range of up to 60K [40] at 0.11 nm/Ksensitivity [9]. Thus, even the thermal tuning and localizedtrimming together (i.e., 6:6 nmþ 1:5 nm tuning range) can-not provide the required tuning range of �10.4 nm. Anotherchallenge for these mechanisms is their significant poweroverhead. A typical MR may consume 130 mW of trimmingpower or 240 mW of thermal tuning power to remedy 1 nmshift in its resonance wavelength, depending on its size,structure, and integration feasibility [9]. To remedy a larger

TABLE 2Properties of Materials Used By 3D-ICE Tool [49], [50]

Material Thermal Conductivity Volumetric Heat Capacity

Silicon 1.30e-4 W/mmK 1.628e-12 J/mm3 KSilicon di oxide 1.46e-6 W/mmK 1.628e-12 J/mm3 KBEOL 2.25e-6 W/mmK 2.175e-12 J/mm3 KCopper 5.85e-4 W/mmK 3.45e-12 J/mm3 K

Fig. 4. (a) Spatial variation in peak temperatures. (b) Histogram of peakTV-induced resonance wavelength variation across a chip of size400 mm2 using 3D ICE tool while executing 64 threaded PARSEC andSPLASH2 benchmark applications on a 64-core system.

Fig. 5. (a) PV-induced resonance wavelength variation. (b) Histogram ofresonance wavelength variation across a chip of size 400 mm2.


shift of �10.4 nm, a single MR may consume as much as�1.35 mW of trimming power or �2.5 mW of thermal tun-ing power. As a DWDM PNoC may have thousands ofMRs, the total power overhead of PV/TV remedy can easilybe in the range of a few tens of watts, which is a prohibi-tively high power overhead for chip-scale systems andmust be minimized to make the total power costs of large-scale DWDM PNoCs manageable.

Fortunately, due to the periodicity of MR resonances, theresonance of none of theMRs in a PNoC needs to be tuned formore than a single channel gap [34]. This makes the requiredtuning range and the total tuning power more manageable.To understand this, consider Fig. 6. The periodic resonances(R1-R4) of an example bank of four MRs and their assignedcarrier wavelengths (�1-�4) for an ideal case with no PV or TVare shown in Fig. 6a. Due to the absence of PV/TV, the reso-nances of all MRs are aligned with their assigned carrierwavelengths. Fig. 6b shows systematic blue-shifts of over twochannel gaps in the resonances of all four MRs. In this case,the MR resonances can be re-aligned to their nearest carrierwavelengths followed by electrical repositioning of bits usingbackend barrel-shifters or pipelined shift registers [34]. In casethe random PV throw the MR resonances out of order(Fig. 6c), use of bit reordering multiplexers at the backend canstill allow the MR resonances to be re-aligned to their nearestcarrier wavelengths. Thus, due to the periodicity of MR reso-nances, and the use of bit reordering/repositioning techni-ques, the necessary tuning distance for the individual MRsreduces to less than one channel gap.

Our previously proposed SPECTRA framework [10] uses adifferent approach to reduce the required tuning distance andpower overhead of PV/TV remedy. It integrates one system-level and two device-level optimizations. At the device-level,the SPECTRA framework utilizes three more MRs than thenumber of utilized carrier wavelengths, and thus, increasesthe available tuning range by three channel gaps. Thismecha-nism reassigns the extra MRs to operate on nearby carrierwavelengths in the case when the resonances shift by lessthan three channel gaps. The need for remedying resonance

shifts of more than three channel gaps is eliminated by reduc-ing the range of temperature swings of the individual coresbelow the threshold levels that can induce resonance shifts ofgreater than three channel gaps. For that, an adaptive threadmigration policy is used at the system level, which also elimi-nates the need of bit-shifting.Moreover, SPECTRA adaptivelychooses the least power-consuming method from thermal tun-ing and localized trimming as the preferred method for PV/TVremedy. Thus, SPECTRA conserves the total power requiredfor PV/TV remedy with low latency overhead. However, theSPECTRA framework does not deal with PV and its benefitscome with the area and power overheads of the extra MRsand bit-reordering multiplexers [34]. To address these shortcom-ings of the SPECTRA framework, we propose a new TV and PVaware reliability management framework called LIBRA, which isdescribed next.

5 LIBRA FRAMEWORK: OVERVIEW

Our LIBRA framework enables reliability-aware run-timePNoC management while rectifying TV and PV in MRs byintegrating device-level and system-level enhancements.Fig. 7 gives a high-level overview of our framework. Thethermal and process variation aware microring assignment(TPMA) mechanism dynamically assigns each MR to thenearest available carrier wavelength, which enables reliablemodulation and reception of data while maintaining themaximum possible bandwidth. This device-level mecha-nism also adaptively chooses the least power-consumingmethod from thermal tuning and localized trimming as thepreferred method for PV/TV remedy, and thus, reduces thetotal power for PV/TV remedy in the PNoC. However,limiting the peak temperature swings below thresholdlevels is critical to further reduce the total power for PV/TVremedy. To achieve this, we devise a PV-aware anti-wavelength-shift dynamic thermal management (VADTM)scheme that uses support vector regression (SVR) basedtemperature prediction and dynamic thread migration, toavoid on-chip thermal threshold violations, minimize on-chip thermal hotspots, and reduce thermal tuning powerfor MRs. The next two sections present details of the TPMAand VADTM schemes.

6 THERMAL AND PROCESS VARIATION AWARE

MICRORING ASSIGNMENT (TPMA)

6.1 Thermal Variation Aware MR Assignment (TMA)As discussed in Section 3.1, TV shifts MR resonances, whichcan prevent MRs from reading or writing to their assigned

Fig. 6. Periodic resonances ðR1 �R4Þ of an example bank of four MRsand their assigned carrier wavelengths ð�1 � �4Þ for (a) an ideal casewith no resonance shifts, (b) a case with systematic blue-shifts in reso-nances, and (c) a case with random red-shifts in resonances.

Fig. 7. Overview of LIBRA framework that integrates a device-level ther-mal and process variation aware microring assignment mechanism(TPMA) and a system-level variation aware anti wavelength-shiftdynamic thermal management (VADTM) technique.


carrier wavelengths. Fortunately, there is a linear depen-dency between temperature increase and resonance wave-length shift [36], which we exploit in our TV-awaremicroring assignment (TMA) mechanism that dynamicallyassigns each MR to the nearest available carrier wavelength.

Fig. 8 shows how at temperatures Ti and Tiþ1 ðTiþ1 > TiÞ,an MR resonance is in exact alignment with the availablewavelengths �k and �kþ1, respectively. These temperaturesare called ideal resonant temperatures (IRTs). When the MRtemperature is in between IRTs Ti and Tiþ1, as shown inFig. 8, the MR needs to be either trimmed to resonate to �k

(which is the resonance wavelength of anMR at temperatureTi) or thermally tuned to resonate to �kþ1 (which is the reso-nance wavelength of an MR at temperature Tiþ1). To adap-tively choose the least power consuming method fromtrimming and thermal tuning, we divide the temperaturerange between IRTs Ti and Tiþ1 into two parts: trimmingtemperature range ðDtrÞ and tuning temperature range ðDtuÞ.For an MR at temperature T, if ðTi þ DtrÞ > T > Ti we per-form trimming as it takes the least power, else if ðTiþ DtrÞ <T < Tiþ1 we perform tuning as it takes the least power (seeFig. 8). At the boundary of the trimming and tuning tempera-ture ranges, where Tiþ1 � Dtu ¼ Ti þ Dtr, both trimming andtuning consume equal power, and hence, an MR can beeither trimmed or tuned. This temperature is called theboundary temperature ðBTiÞ. It has been shown that for asmall resonance wavelength shift (<1 nm), thermal tuningpower is higher compared to trimming power tomitigate thesame amount of TV-induced shift [9]. Thus, our TMAapproach considers a higher trimming temperature rangecompared to tuning temperature range ðDtr > DtuÞ, to mini-mize total trimming and tuning power.

In TMA, MRs are dynamically shifted (trimmed or tuned)to an appropriate IRT for correct operation based on theircurrent temperature. Figs. 9a, 9b, 9c, 9d show four differentMR wavelength assignment configurations at successiveIRTs T1; T2; T3, and T4, where T4 > T3 > T2 > T1. If the MRgroup temperature T is such that ðT1 � DtuÞ < T < ðT1þDtrÞ then the assignment in Fig. 9a is chosen, otherwise ifðT2 � DtuÞ < T < ðT2 þ DtrÞ; ðT3 � DtuÞ < T < ðT3 þ DtrÞ,orðT4 � DtuÞ < T < ðT4 þ DtrÞ then the assignment inFigs. 9b, 9c, or 9d is chosen, respectively. One critical obser-vation in the assignment shown in Fig. 9a is that MRs R1-Rn

are in resonance with �1 � �n within the same Free SpectralRange (FSRi), whereas, in Fig. 9b at IRT T2,MRsR2 �Rn arein resonance with �1 � �n�1, respectively in FSRi and MR R1

is in resonance with �n of the next FSR (i.e., FSRiþ1). In thisassignment and the ones shown in Figs. 9c and 9d, asexplained in Section 4, there is a need to reposition bits inelectrical domain using backend barrel-shifters or pipelinedshift registers. The assignments shown in Figs. 9b, 9c, and 9d

require one, two, and three bit shifts, respectively, to retrievethe original data. In addition, our TMAmechanism can toler-ate a range of on-chip temperature variations by employingtrimming/tuning for smaller variations and MR reassign-ment for larger variations.

TMA represents a powerful reactive technique to adapt toon-die thermal variations with low overhead while ensuringreliable and high-bandwidth communication in MR basedPNoCs. But there is scope for three further enhancements.First, TMA does not consider the impact of PV on MRs, thusthere is a need to readapt TMA to address the impact of PVon MRs, which is discussed in Section 6.2. Second, there is aneed to proactively control the peak on-chip temperature toreduce the range of on-chip temperature swings, which ulti-mately limits the number of required bit shifts (this workcaps the number bit shifts to three as shown Fig. 9d) andreduces the latency to retrieve the original data. Third, at theBT temperature (Fig. 8), maximum trimming or tuningpower is required to realign theMR resonances to their near-est carrier wavelengths. Thus, avoiding BT temperatures atMRs can reduce trimming and tuning power overhead. Asshown in Fig. 8, we define a boundary temperature zone(BTZ) around each BTi. This zone includes temperatures Tsuch that BTi � DZtr < T < BTi þ DZtu where DZtr andDZtu are designer specified parameters. Cores with corre-sponding MR bank temperatures that are within BTZs arecalled boundary temperature cores (BTCs). As BTCs possessthe highest trimming and tuning power overhead for theircorrespondingMR bank, amechanism that reduces the num-ber of BTCs can save trimming and tuning power. Section 7describes such a mechanism, which also controls the rangeof on-chip temperature swingswithin allowable limits.

6.2 Readapting TMA for Process Variations (PMA)In this subsection, we readapt the TMA mechanism toaddress the impact of PV onMR resonances. When using theTMAmechanism, PV-induced red or blue shift (D�PV ) altersthe resonance wavelength ð�BTiÞ of an MR at BTi to �BTR or�BTB, respectively, as shown in Fig. 10. This violates theactual definition of BT, which is the temperature fromwhicheither trimming to �k (which is the resonance wavelength ofanMR at temperature Ti) or tuning to �kþ1 (which is the reso-nance wavelength of an MR at temperature Tiþ1) dissipatesequal power. For example, in case of a PV-induced blue shift,tuning �BTB to �kþ1 would consume more power than trim-ming it to �k, as �BTB is shifted towards �k from �BTi. There-fore, in the TMA that is readapted for process variations

Fig. 8. Red shift of MR with increase in temperature from IRTs Ti to Tiþ1

with trimming and tuning range of temperatures between these IRTs.

Fig. 9. Thermal aware assignment of microrings ðR1�nÞ to wavelengthsð�1�nÞ at four successive IRTs T1; T2; T3; and T4 in TMAmechanism.


(PMAmechanism), we propose to either increase or decreasethe BTs in line with the PV-induced red or blue shifts in MRresonances, respectively.

In our PMA mechanism, first, the PV-induced resonanceshifts in MRs are gauged in situ at initialization by using adithering signal based control system [40]. The overhead ofthis in-situ PV detection technique is considered in ourresults section as dithering power. In our analysis, wemodel and estimate PV in MRs using the VARIUS tool [51],a description of which is already given in Section 3.3. OncePV-induced red or blue shifts of MRs are determined, weestimate the average resonance shift (in nm) across all MRsof each MR bank. We use each average shift value ðD�PV;aveÞto determine the shift in BT (i.e., DBTi) for all the MRs of thecorresponding MR bank using Eq. (3), where TS is the MRthermal sensitivity obtained from Eq. (2) as D�r=DT .

DBTi ¼ D�PV; ave

TS; (3)

Once the DBTi values for all MR banks of the PNoC areobtained, we revise the BTs of each MR bank by either add-ing or subtracting the corresponding DBTi value from theoriginal BT. Similar to the TMA mechanism, we then buildBTZs around these updated BTs. Note that we cannot shiftthe original BT beyond a particular temperature range (i.e.,DBTi > Dtu and DBTi < �Dtr), especially when the PV-induced resonance wavelength shifts are greater than onechannel gap (CG). Unfortunately, for state-of-the-art fabri-cation processes, the maximum PV-induced wavelengthshifts are around � 3 nm (> one channel gap of 1.48 nm).Shifting BT beyond a certain range to compensate for largerPV-induced shifts will also lead to higher tuning and trim-ming power dissipation.

Fig. 11 shows an example of a larger PV-induced blue shift,which alters the resonance wavelength (�BTi) of an MR at BTto �BTB. One possible solution is to bring back the resonancewavelength to �BTi. But this is not always possible especiallywhen the chip is operating at lower temperatures. Therefore,we propose to shift this �BTB to �BTi-1 instead of �BTi, i.e.,instead of decreasing BT by a larger amount here we increaseBT by a smaller amount. In order to facilitate this shifting, sim-ilar to TMA, we perform ring assignment along with extra bitshifts. At a channel spacing of 1.48 nm, to compensate forpeak PV-induced resonance shift of � 3 nm, two extra bitshifts (forward and backward bit shifts to compensate posi-tive and negative PV induced resonance shift) are needed.

Overheads: Our proposed TPMA scheme is a combinationof the two previously proposed techniques: TMA (Section6.1) and PMA (this subsection). TPMA requires a maximumof five bit shifts, which include three for TMA and two forPMA. These additional bit shifts in TPMA incur latency

overhead. This latency overhead is quantified in more detailin Section 8. Furthermore, with TPMA each MR bankrequires a Read Only Memory (ROM) to store its corre-sponding three BT values, which are determined using PVprofiling at design time, as discussed earlier. This ROM alsostores beginning and ending temperatures of three BTZs ineach MR bank. We have considered 16-bits to store eachtemperature value. As there is a need to store nine differenttemperature values (three BTs, three BTZ start tempera-tures, three BTZ end temperatures) for each MR bank, weneed a ROM that can store 144-bits. Moreover, a 16-bit com-parator circuit is needed for each MR bank to determine therange of operation of MRs (i.e., trimming or tuning tempera-ture range). This comparator is also used to determinewhether an MR bank is in BTZ or not. Therefore, one inputfor this comparator comes from a thermal sensor (i.e., infor-mation on current temperature) and the other input is fromthe ROM. The area and power overhead of the ROM andcomparator is quantified in detail in Section 8.

7 VARIATION AWARE ANTI WAVELENGTH-SHIFT

DYNAMIC THERMAL MANAGEMENT (VADTM)

To proactively reduce thermal hotspots (which in turnwill limit the maximum number of bit shifts to five) andcontrol on-die temperature (to reduce the number of BTCs),we propose a system-level variation aware anti wavelength-shift dynamic thermal management (VADTM) technique,described below.

7.1 ObjectiveThe primary goals with VADTM is to maintain the tempera-ture of all of the cores on a die below a specified thermalthreshold, i.e., for all cores 1 � i � N;Ti < Tt where Ti isthe temperature of core i and Tt is threshold temperature. Weutilize support vector based regression (SVR) to predict thefuture temperature of a core, as SVR achieves better accuracycompared to several previously proposed statistical, learning-based, and regression-based algorithms by ultimately reduc-ing number of false positives [52]. This predicted temperatureis comparedwith a thermal threshold to determine the poten-tial for a thermal emergency. If such a potential exists, threadsare migrated to available BTCs. These BTCs are determinedbased on the PV profile of MRs and ring blocks that are usedto send and receive data from these cores. Migration to a BTChas a twofold benefit. First bymoving the thread away from acore that could suffer a thermal emergency, we limit therequired bit shifts and related performance overhead in theMR groups of that core. Second, by moving the thread to aBTC, the temperature of the BTC will increase resulting inthat core no longer being a BTC (consequently the tempera-ture of the core’s MR groups will also increase, taking them

Fig. 10. Impact of PV-induced red and blue shift on boundary tempera-ture on TMA.

Fig. 11. Boundary temperature adaptation for larger PV-induced blueshifts in PMA.


outside of their BTZ and closer to IRTs, which will reducetrimming/tuning power). The parameters used to describeVADTM are shown in Table 3.

7.2 Temperature Prediction ModelWe designed a support vector regression (SVR) based tem-perature predictor that accepts input parameters reflectingthe workload for the core under consideration, in terms ofinstructions per cycle (IPCi), as well as the current core tem-perature (CTi) and temperatures of surrounding cores(TNi). Once these parameter values are obtained at run-time, the future temperature for the core can be predicted.

Architecture: A typical SVR [52] relies on a predictionmodel that ignores errors that are situated within e distanceof the true value. This type of a prediction model is calledan e-insensitive prediction model. Fig. 12 shows an exampleof a one-dimensional non-linear SVR based predictionmodel with an e-insensitive band. The variables (j and e)measure the cost of the errors on the training points. Theseare zero for all points that are inside the e-insensitive band.

SVR is primarily designed to perform linear regression. Tohandle non-linearity in data, SVR first maps the input xi ontoan m-dimensional space using some fixed (non-linear) map-ping notated as F, and then a linear model is constructed inthis high-dimensional space as shown in eqs. (5) and (6)below. Thus, it overcomes drawbacks of linear and logisticregression towards handling non-linearity in data. This classof SVRs is called kernel based SVRs which use kernel k asshown in Eq. (7) for implicit mapping of non-linear trainingdata (as shown in Fig. 12) into a higher dimensional space.

CF ¼ min1

2WT:W þ C

Xni¼1

�i þ �ið Þ: (4)

Subject to:

yi �WTF xið Þ � b � "þ �i �i 0; i ¼ 1; 2; . . . ; nð Þ (5)

WTF xið Þ þ b� yi � "þ �i �i

0; i ¼ 1; 2; . . . ; nð Þ (6)

k xi; xj� � ¼ FðxiÞTFðxjÞ: (7)

SVR performs linear regression in this high-dimensionspace using e-insensitive loss and, at the same time, tries to

reduce model complexity by minimizing WT :W . This canbe described by introducing (non-negative) slack variables�i and �i ði ¼ 1 to nÞ, to measure the deviation of trainingsamples outside the e-insensitive band. Thus SVR is formu-lated with the goal of minimizing the cost function in Eq. (4)with the constraints shown in eqs. (5) and (6).

As on-chip temperature variation data is non-linear inthe original space, our SVR model employs a kernel basedregression which uses a Radial Basis Function (RBF) [25](Gaussian kernel) as shown in Eq. (8). The RBF kernelimproves the accuracy of SVR when data has non-linearityin the original space. We performed a sensitivity analysis(SA) to determine regularization parameter (C) and‘gamma’ (g) values of the kernel based SVR (see Section 7.1for chosen values). This SA overcomes the possibility ofover fitting of training data and improves accuracy further.

k xi; xj

� � ¼ exp �g xi � xj

�� 2� �: (8)

Training and Accuracy: We trained our SVR model usinga set of multi-threaded applications from the SPLASH-2[45] and PARSEC [46] benchmark suites, specifically: black-scholes (BS), bodytrack (BT), vips (VI), facesim (FS), fluida-nimate (FA), swaptions (SW), barnes (BA), fft (FFT), radix(RX), radiosity (RD) and raytrace (RT) with different threadcounts: 2, 4 and 8. We considered different combinations ofthread mappings on a 9-core (3� 3) floorplan, to train ourpredictor to determine the temperature of the center (target)core. As the future temperature of a target core is dependenton the average temperature of its immediate neighboringcores, we trained our SVR model with temperature inputsfrom the target core running a single thread, as well as itssurrounding cores running a variable number of threads.Simulations for each of these floorplans allowed us to obtaindata to train our SVR model. This data included tempera-ture for the target core and its neighboring core tempera-tures, as well as instructions per cycle (IPC) for the targetcore. IPC is very useful to determine if there is a phasechange in an application and plays a crucial role in main-taining future temperature prediction accuracy, especiallywhen temperatures of a target core and its neighbors aresimilar at a given time. Our training algorithm involved aniterative process that adjusts the weights and bias values inthe SVR shown in eqs. (4), (5) and (6) to fit the training set.

We verified the accuracy of our SVR model for multi-threaded benchmark workloads (we considered 6000 floor-plans,with 70 percent of input data for training and 30 percentfor testing) and found that it has an accuracy of over 95percent. Figs. 13a and 13b show actual and predicted on-chiptemperature variations for a 64-core platform executing32 threads of the FA and RD benchmarks. From these figuresit can be seen that our temperature predictor tracks tempera-ture quite accurately. When predicted temperature exceedsthe thermal threshold our thread migration mechanism(which is discussed next) migrates threads from hotter cores

TABLE 3List of VADTM Parameters and Their Definitions

Symbol Definition

IPCi Instructions per cycle of ith coreCTi Current temperature of ith coreTNi Average temperature of immediate neighboring cores

of ith core; if this core is on chip periphery and missingneighbors, then we consider virtual neighbor cores atambient temperature in lieu of the missing cores

PTi Predicted temperature of ith coreTt Thermal thresholdBTCs Boundary temperature coresNBTCs Non-boundary temperature coresC Regularization parameterW Weight vector for regressionxi and yi Input and outputs in training and test data�i Slack variablesE Error functionB Bias for cost function

Fig. 12. Non-linear support vector based regression prediction model.


to cooler cores to keep overall maximum temperature belowthe threshold.

7.3 Thermal Management FrameworkFig. 14 illustrates the entire VADTM technique. For eachcore, we periodically monitor the IPC value from perfor-mance counters and temperature of each of its MR banksfrom on-chip thermal sensors. A unique digital thermal sen-sor (DTS) is used to monitor temperature of every MR bank.Furthermore, the monitored temperature value of each MRbank is stored in a bank specific register. In addition, eachcore employs an 8-bit adder circuit and a bit shifter, alongwith a mechanism of accessing the current temperature val-ues of all MR banks of neighboring cores from their specificregisters, to determine the average temperature of all MRbanks of neighboring cores. We have considered the latencyand power overheads of these circuits in our results section(i.e., Section 8.2). If a thermal emergency is predicted for acore by the SVR predictor, then VADTM initiates a threadmigration procedure, otherwise no action is taken.

Algorithm 1 shows the pseudo-code for the VADTMthread migration procedure. First, the future temperatureðPTiÞ of the ith core is predicted using the SVR based predic-tor with inputs: core temperature ðCTiÞ, core IPC ðIPCiÞ, andtemperature of neighboring cores ðTNiÞ in steps 1-3. The listof available BTCs (i.e., those that are not currently executingany thread) and available NBTCs is obtained in steps 4-10. Insteps 11-12, a loop iterates over all cores and checks for possi-ble threadmigration conditions (i.e., thermal emergency caseswhere current core predicted temperature ðPTiÞ is greaterthan thermal threshold ðTtÞ). If a threadmigration is required,then in steps 13-21, we check for free BTCs, and if they areavailable then we migrate the thread from the current core tothe BTC with lowest temperature, else we migrate the threadto a freeNBTCwith lowest temperature. ThisVADTM threadmigration procedure is invoked at every epoch (1ms).

8 EXPERIMENTS

8.1 Experimental Setup

We target a 64-core manycore system to evaluate our LIBRA(TPMAþ VADTM) framework. Microarchitecural para-metrs of the many core system is presented in Table 1. Weevaluate LIBRA on two well-known PNoC architectures:Corona [3] and Flexishare [5]. Corona uses a 64� 64 multi-ple write single read (MWSR) crossbar with token slot arbi-tration. Flexishare uses 32 multiple write multiple read(MWMR) waveguide groups with 2-pass token stream

arbitration. Each MWSR waveguide in Corona and MWMRwaveguide in Flexishare is capable of transferring 512 bitsof data from a source node to a destination node. As eachcore in our considered manycore system has private L1 andL2 caches, electrical interconnects are used for communica-tions between the L1 and L2 caches at each core, and a pho-tonic network is used for communications between the L2caches of cores and memory controllers.

Algorithm 1. VADTM thread migration algorithm

Inputs: Current core temperature (CTi), average neighboringcore temperature (TNi), current core IPC (IPCi)1: for each core i do // Loop that predicts future temperature2: PTi ¼ SVR_predict_future_temperature (CTi, TNi, IPCi)3: end for4: for each core i do // Loop that checks for free BTCs and NBTCs5: if CTi in BTZ and IPCi ¼¼ 0 then6: List BTC ¼ Push i //add core to BTC list7: else if IPCi ¼¼ 0 then8: List NBTC ¼ Push i //add core to NBTC list9: end if10: end for11: for each core i do // Loop that performs thread migration12: if PTi Tt then13: if List BTC 6¼ fg then14: Migrated core ¼ Find min temperature coreðList BTC)

15: Do_thread_migration(core_i !Migrated_core)16: List BTC ¼ Pop i17: else if List NBTC 6¼ fg then18: Migrated core ¼ Find min temperature coreðList NBTC)

19: Do_thread_migration(core_i !Migrated_core)20: List NBTC ¼ Pop i21: end if22: end if23: end forOutput: Thread migration to BTC or NBTC cores

We modeled and simulated these architectures with theLIBRA framework for multi-threaded applications from theSPLASH-2 [45] and PARSEC [46] benchmark suites asexplained in Section 3. Simulations were performed forentire regions-of-interest (ROIs) where parallelized versionof these applications were executing on the 64-core system.Power and instruction traces for the benchmark applica-tions were generated using the Sniper 6.0 [44] simulator andMcPAT [47]. We used the 3D-ICE tool [49] for thermal

Fig. 13. Actual and predicted maximum temperature variation with exe-cution time for (a) fluidanimate (FA) and (b) radiosity (RD) benchmarkapplications executed on 64-core platform executing 32 threads.

Fig. 14. Overview of VADTM in LIBRA framework with support vectorregression (SVR) based temperature prediction model.


analysis. The ambient temperature was set to 303K and thethermal threshold (Tt) was set to 353K.

We model and consider area, power, and performanceoverheads for our framework in our analysis. LIBRA consid-ers the power overhead of the dithering signal based controlmechanism to be 385mW per MR [59]. LIBRA with bothCorona and Flexishare PNoCs has an electrical area overheadof 0.34 mm2 and a power overhead of 57 mWusing gate-levelanalysis and the CACTI 6.5 [56] tool for memory and compa-rators. The MR trimming power is set to 130 mW/nm [15] forcurrent injection (blue shift) and tuning power is set to 240mW/nm [9] for heating (red shift). From [58], tuning/trim-ming power values are linear with resonance shifts of up to3.5 nm for various sizes of MRs. Because of this, and as ourTPMA technique requires the use of localized trimming andthermal tuningmethods to compensate for resonance shifts ofless than one channel gap (<1.5 nm) only, we assume that thetuning/trimming power values are linear with resonanceshifts. To compute laser power, we considered detectorresponsivity as 0.8 A/W [53], MR through loss as 0.02 dB,waveguide propagation loss as 0.274 dB/cm, waveguidebending loss as 0.005 dB/900, andwaveguide coupler/splitterloss as 0.5 dB [28], [53]. We calculated photonic loss in compo-nents using these values, which sets the photonic laser powerbudget and correspondingly the electrical laser power. Forenergy consumption of photonic devices, we adapt parame-ters from [53], with 0.42 pJ/bit for every modulation anddetection event, and 0.18 pJ/bit for the driver circuits of MRmodulators and photodetectors. Based on our sensitivityanalysis we get the best accuracywith our SVR-based temper-ature predictor when parameters C and g are set to 1000 and0.1 respectively.

Thread migration model and overhead analysis: We have con-sidered the overheads of transferring architectural registervalues and L1-L2 cache misses incurred due to thread migra-tion. In our thread migration model, the register values (ten32-bitþ six 16-bitþ eight 80-bit registers ¼ 1056-bit register val-ues) of the source core are packed in three 512-bit packets andthen these packets are transferred to the migrated core on thePNoC. At the migrated core, the register values in these pack-ets are unpacked before being used to populate the local regis-ters. We assume that packing and unpacking of registervalues takes 16 cycles each. After this transition phase, duringwhich the register values are transferred between the cores,thread execution begins at themigrated core. The initial phaseof execution at the migrated core results in L1 and L2 cachemisses due to cold start effects. During this cold-start phase,the source core keeps its cache contents intact and the corre-sponding migrated core accesses the source core’s L2 cacheupon cold cache misses. Accessing the source core’s L2 cacheis possible through the PNoC using the source core’s micror-ings, as we do not turn off or lose control of the microrings ofthe inactive source core. In case of a cache miss at the sourcecore during the cold-start phase, the requested line is fetchedfrom the off-chip memory via a memory controller. Thus,upon an L2 cache miss at the migrated core during the cold-start phase, the requested cache line is served either by the L2cache of the source core or the off-chip memory. The over-heads of such cold cache misses at the migrated core dependon several factors, including the underlying PNoC architec-ture, cache microarchitecture, locations of the source andmigrated cores, locations ofmemory controllers, network traf-fic conditions (or program/application type), and program/application phase. The overheads of transferring the register

values from the source core to the migrated core depend onfactors such as the PNoC architecture, locations of cores,and network traffic conditions. Our trace-driven simulator,which we use to model the manycore system operation, cap-tures the effect of all these dependence factors in a cycle-accu-rate manner and accounts for the overheads of threadmigration.

Closed-loop thermal simulation framework: We use a closed-loop thermal simulation framework for implementing, eval-uating, and comparing our LIBRA framework with variousthermal management techniques from prior work. Ourclosed-loop thermal simulation framework utilizes coarse-grained core’s activity traces extracted from the intervalsimulation of the 64-core system based on the Sniper simu-lator [44]. These coarse-grained cores’ activity traces are fedinto McPAT [47] to generate power traces, which in turn arefed into a 3D-ICE [49] based model of the manycore systemwith the PNoC to generate thermal/temperature traces.In addition to these traces, our framework employs acycle-accurate network-centric simulator that models thebehavior and operation of our considered manycore systemwith the PNoC of interest and evaluates the overhead ofthread migration events. These core’s activity traces andrelated temperature traces engage with our SVR-based ther-mal predictor and thread migration scheduler in a closedloop. If the threads are migrated due to thermal emergen-cies, our scheduler updates the activity traces based on therevised threads-to-cores mapping and incurred overheadsthat are evaluated using our cycle-accurate simulator. Then,based on the revised activity traces, new power and temper-ature traces are generated. These traces are used to evaluatetotal power dissipation, energy consumption, and total per-formance overhead in terms of execution time.

As presented in Section 5, to minimize trimming and tun-ing power consumption for a fixed channel gap of 1.48 nm,trimming temperature range ðDtrÞ and tuning temperaturerange ðDtuÞ for TPMA are calculated as 8.73K and 4.72K,respectively.

8.2 Comparison ResultsWe compared the performance of our LIBRA frameworkwith four prior works on manycore thermal management: aring aware policy (RATM) [16], frequency align policy(FATM) [17], a predictive dynamic thermal management(PDTM) framework [18], and the SPECTRA framework fromour prior work [10]. RATM distributes threads uniformlyacross cores that are closer to PNoC nodes first and then dis-tributes the remaining threads in a regular pattern fromouter cores to inner cores. FATM distributes threads acrosscores based on the process variation profile of ring blocksthat are in the proximity of these cores. PDTM uses a leastsquare based regression temperature predictor to determineif the predicted temperature of a core exceeds a thermalthreshold, and if so then thread migration is performed fromthat core to the coolest core that is not executing any threads.SPECTRA performs ring assignment at the device-leveland SVR prediction based proactive thread migration at thesystem-level for thermal reliability management in PNOCs.

Figs. 15a-15b show the maximum temperature obtainedwith the five frameworks across eleven applications from thePARSEC and SPLASH-2 benchmarks suites with 48 and 32thread counts executing on a 64-core system with the CoronaPNoC [3] architecture. As LIBRA and FATM perform thread


management based on the PV profile of MRs, only theseframeworks have confidence intervals in Fig. 15. FromFig. 15a it can be observed that some applications (e.g., FA,SW) with 48 threads exceed the threshold (353K) for all frame-works, as there are insufficient number of free cores on thechip whose temperature is below the thermal threshold tomigrate threads. However, with a more manageable numberof threads, the situation improves. In Fig. 15b, for the casewith 32 threads, our LIBRA framework avoids violating ther-mal thresholds for very small number of benchmark applica-tions with 32 threads. On average, LIBRA has 14.6K and 17.5Klower maximum temperature compared to the RATM policyfor 48 and 32 threads, respectively. In addition, on averageLIBRA has 13.5K and 15.9K lower maximum temperaturecompared to the FATM policy for 48 and 32 threads, respec-tively. LIBRA migrates threads from hotter cores to coolercores to control maximum temperature, whereas no threadmigration is performed in both RATM and FATM when theon-chip thermal threshold temperature (i.e., 353K) is reached,as these mechanisms are simple thread allocation policieswithout control on peak temperature. For most of the bench-marks, maximum temperatures with PDTM, SPECTRA, andLIBRA are below the thermal threshold. However, on averageLIBRA has 3.2K and 3.5K lower maximum temperature com-pared to PDTM for 48 and 32 threads, respectively. This isbecause LIBRA employs a more accurate (less false-positives)SVR based prediction approach, which reduces the numberof mispredictions compared to the low accuracy (more false-positives) of the least square regressionmechanism in PDTM.Lastly, LIBRA has a 0.8K and 1.9K lower maximum tempera-ture compared to SPECTRA for 48 and 32 threads, respec-tively. Even though both LIBRA and SPECTRA prefer tomigrate threads to BTCs, the maximum temperatures withLIBRA are sometimes lower compared to SPECTRA, as LIBRA

is able to perform thread migrations more often to lower tem-perature BTCs compared to SPECTRA.

In the interest of brevity, we do not show maximum tem-perature results for the Flexishare PNoC architecture. Weobserved a similar trend in maximum temperature varia-tions for Flexishare as we did for Corona (Fig. 15).

Fig. 16 shows the power dissipation comparison for thefive frameworks across multiple 48-threaded applicationsfor the Corona [3] and Flexishare [5] PNoC architectures,respectively. One of the main reasons why LIBRA has lowerpower dissipation than RATM, FATM, and PDTM is that itmore aggressively reduces trimming and tuning power inboth Corona and Flexishare PNoCs. From Fig. 16a, LIBRAhas 74.5, 67.4, and 70.8 percent lower trimming and tuningpower on average compared to RATM, FATM, and PDTMfor Corona. Furthermore, from Fig. 16a, LIBRA also has76.2, 68.3, and 72.5 percent lower trimming and tuningpower on average compared to RATM, FATM, and PDTMfor Flexishare. The TPMA technique in LIBRA intelligentlyconserves trimming and tuning power compared to RATM,FATM, and PDTM by performing process variation awareMR reassignment, while our VADTM further improvestrimming and tuning power savings with its intelligentthread migration to BTCs. Lastly, the TPMA mechanism inLIBRA adapts intelligently to the PV profiles of MRs, reduc-ing it’s trimming and tuning power dissipation by 46.3 and48.1 percent, compared to SPECTRA for the Corona andFlexishare architectures, respectively.

Fig. 16 also shows the laser power comparison of the fiveframeworks for the Corona and Flexishare architectures. Itcan be observed that Corona and Flexishare with LIBRAneed similar laser power as Corona and Flexishare

Fig. 15. Maximum temperature comparison for LIBRA with RATM [16],FATM [17], PDTM [18], and SPECTRA [10], for (a) 48 thread, and (b) 32thread PARSEC and SPLASH-2 benchmarks executing on 64-coremanycore system with Corona PNoC. Bars show mean values of maxi-mum temperature across 100 PV maps; confidence intervals show vari-ation in maximum temperature.

Fig. 16. Normalized power dissipation (Laser Power, Dithering Power,Trimming/Tuning power, and Modulating and Detecting (Tx/Rx) Power)comparison for LIBRA with RATM [16], FATM [17], PDTM [18], andSPECTRA [10] for 48 threaded applications of PARSEC and SPLASH-2suites executed on (a) Corona and (b) Flexishare PNoC architecturesfor a 64-core manycore system. Results shown are normalized w.r.tRATM, therefore, RATM does not have confidence intervals. Bars showmean values of power dissipation across 100 PV maps; confidence inter-vals show variation in power dissipation.


architectures with RATM, FATM, and PDTM. Furthermore,LIBRA requires 12.9 and 6.4 percent lesser laser power com-pared to SPECTRA for Corona and Flexishare. The extraMRs used in SPECTRA to compensate for TV-induced reso-nance shifts contribute to the increase in laser power com-pared to LIBRA for both architectures. From these results itcan also be observed that the laser power saving in Coronais higher than for the better performance optimized archi-tecture of Flexishare.

In summary, LIBRA saves considerable trimming/tuningpower to ultimately achieve overall power reduction. Fromthe power analysis in Fig. 16a, LIBRA with Corona has 40.8,34.1, 37.2, and 21.4 percent lower total power dissipationcompared to Corona with RATM, FATM, PDTM, and SPEC-TRA, respectively. Further from Fig. 16b it can be seen thatFlexishare with LIBRA has 61.3, 52.9, 57.4, and 32.8 percentlower power dissipation compared to Flexishare withRATM, FATM, PDTM, and SPECTRA, respectively.

Fig. 17 shows the average execution time comparisonbetween the five frameworks across the 11 48-threaded appli-cations from PARSEC and SPLASH-2 suites, for the Coronaand Flexishare PNoC architectures, respectively. As onlyLIBRA performs thread migration based on the PV profile ofMRs, therefore, this framework has confidence intervals onexecution time shown in Fig. 17. From Fig. 17a it can be seenthat Coronawith LIBRA has 8.3 percent higher execution timecompared to Corona with RATM and FATM. Corona withLIBRA needs extra execution time tomigrate threads betweencores and to reorder bits using shift registers whereas theRATM and FATM policies simply schedule threads withoutany threadmigration and bit reorder, and thus do not possesssuch overheads. Further, Corona with LIBRA has 3.3 percenthigher execution time compared to PDTM. Despite LIBRAusing a faster SVR based temperature predictor compared toa more complex least square based regression predictor inPDTM, the higher number of thread migrations (to adapt toPV profiles of MRs) and bit reordering operations in LIBRAcontribute to an increase in execution time. Similarly, fromFig. 17b Flexishare architecture with LIBRA has 6.3 percenthigher execution time compared to Flexishare with RATM

and FATM. In addition, LIBRA also has 3.1 percent higherexecution time compared to Flexishare with PDTM. The fig-ures also indicate that the execution time overheads for LIBRAare lower when utilizing the faster Flexishare architecturecompared to the slower Corona architecture. Moreover, thebit-shifting overhead in LIBRA increases its execution time by3.9 and 3 percent compared to the SPECTRA framework withCorona and Flexishare PNoCs, respectively. From the execu-tion time results, it can be summarized that the significantpower benefits achieved with LIBRA come at some cost: anincrease in execution time.

Lastly, from the power dissipation and execution timeresults, we obtain energy consumption results for the fiveframeworks, as shown in Fig. 18. On average, for Corona,energy consumption with LIBRA is 36.8, 28.8, 35.3, and18.6 percent lower compared to RATM, FATM, PDTM andSPECTRA, respectively. For the Flexishare architecture,LIBRA has 58.7, 50.3, 55.8, and 30.8 percent lower energy con-sumption compared to RATM, FATM, PDTM, and SPECTRArespectively.

In summary, from the above results, it is apparent thatour proposed PV-aware LIBRA framework outperforms pre-viously proposed approaches for thermal management inmanycore systemswith PNoCs by combining a novel reactivedevice-level technique (TPMA) that improves waveguidechannel utilization with a novel system-level proactive threadmigration technique (VADTM). The excellent power andenergy savings compared to previous approaches stronglymotivate the use of our LIBRA framework in future PNoCbasedmanycore architectures.

9 CONCLUSION

In this paper, we have presented the LIBRA framework thatcombines two novel dynamic thermal management mecha-nisms for the reduction of maximum on-chip temperatureand conservation of trimming and tuning power of MRs inDWDM-based PNoC architectures. These techniques (TPMA

Fig. 17. Normalized average execution time comparison of LIBRA withRATM [16], FATM [17], PDTM [18], and SPECTRA [10] for (a) Coronaand (b) Flexishare PNoCs for 48 threaded applications from PARSECand SPLASH-2 suites executed on 64-core system. Results shown arenormalized wrt RATM. Bars show mean values of execution time across100 PV maps; confidence intervals show variation in execution time.

Fig. 18. Normalized energy consumption comparison of LIBRA withRATM [16], FATM [17], PDTM [18], and SPECTRA [10] for (a) Coronaand (b) Flexishare PNoCs for 48 threaded applications from PARSECand SPLASH-2 suites executed on a 64-core system. Results shownare normalized wrt RATM, therefore, RATM does not have confidenceintervals. Bars show mean values of energy consumption across 100 PVmaps; confidence intervals show variation in energy consumption.


at the device-level, VADTM at the system-level) constitute ahybrid reactive-proactivemanagement framework that dem-onstrated interesting trade-offs between performance andpower/energy across two different state-of-the-art crossbar-based PNoC architectures. Our experimental analysis onthe well-known Corona and Flexishare PNoC architectureshas shown that LIBRA can notably conserve total powerby up to 61.3 percent (trimming and tuning power by up to76.2 percent) and total energy by up to 57.3 percent.

ACKNOWLEDGMENTS

This research is supported in part by grants from SRC, NSF(CCF-1252500), and AFOSR (FA9550-13-1-0110).

REFERENCES

[1] Y. A. Vlasov, “Silicon CMOS-integrated nano-photonics forcomputer and data communications beyond 100G,” IEEE Comm.Mag., vol. 50, no. 2, pp. 67–72, Feb. 2012.

[2] J. D. Owens, et al., “Research challenges for on-chip interconnectionnetworks,” IEEEMicro, vol. 27, no. 5, pp. 96–108, Sep./Oct. 2007.

[3] D. Vantrease, et al., “Light speed arbitration and flow controlfor nanophotonic interconnects,” in Proc. 42nd Annu. IEEE/ACMInt. Symp. Microarchitecture, 2009, pp. 304–315.

[4] S. V. R. Chittamuru, D. Dang, S. Pasricha, and R. Mahapatra,“BiGNoC: Accelerating big data computing with application-specific photonic network-on-chip architectures,” IEEE Trans.Parallel Distrib. Syst., Early Access, p. 1, 2018.

[5] Y. Pan, J. Kim, and G. Memik, “Flexishare: Channel sharing for anenergy efficient nanophotonic crossbar,” in Proc. 16th Int. Symp.High-Perform. Comput. Archit., 2010, pp. 1–12.

[6] S. V. R. Chittamuru, S. Desai, and S. Pasricha, “A reconfigurablesilicon-photonic network with improved channel sharing for mul-ticore architectures,” in Proc. 25th Edition Great Lakes Symp. VLSI,May 2015, pp. 63–68.

[7] I. Thakkar, S. V. R. Chittamuru, and S. Pasricha, “Improving thereliability and energy-efficiency of high-bandwidth photonic NoCarchitectures with multilevel signaling,” in Proc. 11th IEEE/ACMInt. Symp. Netw.-on-Chip, Oct. 2017, pp. 1–8.

[8] S. Bahirat and S. Pasricha, “OPAL: A multi-layer hybrid photonicNoC for 3D ICs,” in Proc. 16th Asia South Pacific Des. Autom. Conf.,Jan. 2011, pp. 345–350.

[9] C. Sun, et al., “Single-chip microprocessor that communicatesdirectly using light,” Nature, vol. 528, pp. 24–31, 2015.

[10] S. V. R. Chittamuru and S. Pasricha, “SPECTRA: A framework forthermal reliability management in silicon-photonic networks-on-chip,” in Proc. 29th Int. Con. VLSI Des./15th Int. Conf. EmbeddedSyst., 2016, pp. 86–91.

[11] S. K. Selvaraja, “Wafer-scale fabrication technology for siliconphotonic integrated circuits,” PhD thesis, Ghent University, 2011.

[12] C. Batten, et al., “Building manycore processor-to-DRAM net-works with monolithic CMOS silicon photonics,” IEEE Micro,vol. 29, no. 4, pp. 8–21, 2009.

[13] A. Kumar, et al., “HybDTM: A coordinated hardware-softwareapproach for dynamic thermal management,” in Proc 43rd Annu.Des. Autom. Conf., 2006, pp. 548–553.

[14] Z. Li, et al., “Reliability modeling and management of nanopho-tonic on-chip networks,” IEEE Trans. Very Large Scale IntegrationSyst., vol. 20, no. 1, pp. 98–111, 2012.

[15] C. Li, et al., “Silicon photonic transceiver circuits with microringresonator bias-based wavelength stabilization in 65 nm CMOS,”IEEE J. Solid-State Circuits, vol. 49, no. 6, pp. 1419–1436, Jun. 2014.

[16] T. Zhang, et al., “Thermal management of manycore systems withsilicon-photonic networks,” in Proc. Des. Autom. Test Europe Conf.Exhib., 2014, pp. 1–6.

[17] J. L.Abellan, et al., “Adaptive tuningof photonic devices in a photonicNoC through dynamic workload allocation,” IEEE Trans. Comput.-AidedDes. IntegratedCircuits Syst., vol. 36, no. 5, pp. 801–814,May2017.

[18] I. Yeo, C. Liu, and E. Kim, “Predictive dynamic thermal manage-ment for multicore systems,” in Proc. 45th ACM/IEEE Des. Autom.Conf., 2008, pp. 734–739.

[19] L. Zhou and A. K. Kodi, “PROBE: Prediction-based optical band-width scaling for energy-efficient NoCs,” in Proc. 17th IEEE/ACMInt. Symp. Netw.-on-Chip, 2013, pp. 1–8.

[20] A. K. Coskun, et al., “Static and dynamic temperature-awarescheduling for multiprocessor SoCs,” IEEE Trans. Very Large ScaleIntegration Syst., vol. 16, no. 9, Sep. 2008.

[21] H. Hanson, et al., “Thermal response to DVFS: Analysis with anIntel Pentium M,” in Proc. Int. Symp. Low Power Electron. Des.,2007, pp. 219–224.

[22] E. Ipek, et al., “Core fusion: Accommodating software diversity inchip multiprocessors,” in Proc. 34th Annu. Int. Symp. Comput.Archit., 2007, pp, 186–197.

[23] S. V. R. Chittamuru, I. Thakkar, and S. Pasricha, “Process variationaware crosstalk mitigation for DWDM based photonic NoCarchitectures,” in Proc. 17th Int. Symp. Quality Elect. Des., Mar. 2016,pp. 57–62.

[24] I. Thakkar, S. V. R. Chittamuru, and S. Pasricha, “Mitigation ofhomodyne crosstalk noise in silicon photonic NoC architectureswith tunable decoupling,” in Proc. Int. Conf. Hardware/Softw. Code-sign Syst. Synthesis, Oct. 2016, pp. 1–10.

[25] S. V. R. Chittamuru and S. Pasricha, “Improving crosstalk resil-ience with wavelength spacing in photonic crossbar-based net-work-on-chip architectures,” in Proc. IEEE 58th Int. Midwest Symp.Circuits Syst., Aug. 2015, pp. 1–4.

[26] S. V. R. Chittamuru, I. Thakkar, and S. Pasricha, “HYDRA:Heterodyne crosstalk mitigation with double microring reso-nators and data encoding for photonic NoCs,” IEEE Trans.Very Large Scale Integration Syst., vo1. 26, no. 1, pp. 168–181,Jan. 2018.

[27] M. M. Sabry, A. Sridhar, and D. Atienza, “Thermal balancing ofliquid-cooled 3D-MPSoCs using channel modulation,” in Proc.Des. Autom. Test Europe Conf. Exhib., 2012, pp. 599–604.

[28] S. V. R. Chittamuru, I. Thakkar, and S. Pasricha, “Analyzing volt-agebias and temperature induced aging effects in photonic in-terconnects for manycore computing,” in Proc. ACM/IEEE Int.Workshop Syst. Level Interconnect Prediction, Jun. 2017.

[29] B. Raghunathan, et al., “Cherry-picking: Exploiting process varia-tions in dark-silicon homogeneous chip multi-processors,” inProc. Des. Autom. Test Europe Conf. Exhib., 2013, pp. 39–44.

[30] N. Kapadia and S. Pasricha, “VARSHA: Variation and reliability-aware application scheduling with adaptive parallelism in thedark-silicon era,” in Proc. Des. Autom. Test Europe Conf. Exhib., 2015,pp. 1060–1065.

[31] B. Guha, et al., “Cmos-compatible athermal silicon microring reso-nators,” in Opt. Express, vol. 18, no. 4, pp. 3487–3493, 2010.

[32] S. V. R. Chittamuru, et al., “SWIFTNoC: A reconfigurable silicon-photonic network with multicast enabled channel sharing formulticore architectures,” ACM J. Emerging Technol. Comput. Syst.,vol. 13, no. 4, 2017, Art. no. 58.

[33] C. T. DeRose, et al., “Silicon microring modulator with integratedheater and temperature sensor for thermal control,” CLEO/QELS:Laser Sci. Photonic Appl., 2010, pp. 1–2.

[34] M. Georgas, et al., “Addressing link-level design tradeoffs forintegrated photonic interconnects,” in Proc. IEEE Custom IntegratedCircuits Conf., 2011, pp. 1–8.

[35] M. V. Beigi and G. Memik, “Therma: Thermal-aware run-timethread migration for nanophotonic interconnects,” in Proc. Int.Symp. Low Power Electron. Des., 2016, pp. 230–235.

[36] M. Mohamed, et al., “Reliability-aware design flow for siliconphotonics on-chip interconnect,” IEEE Trans. Very Large Scale Inte-gration Syst., vol. 22, no. 8, pp. 1763–1776, Aug. 2014.

[37] Y. Xu, J. Yang, and R. Melhem, “Tolerating process variations innanophotonic on-chip networks,” in Proc. 39th Annu. Int. Symp.Comput. Archit., 2012, pp. 142–152.

[38] M. Mohamed, et al., “Power-efficient variation-aware photonicon-chip network management”, in Proc. ACM/IEEE Int. Symp.Low-Power Electron. Des., 2010, pp. 31–36.

[39] R. Wu, et al., “Variation-aware adaptive tuning for nanophotonicinterconnects,” in Proc. IEEE/ACM Int. Conf. Comput.-Aided Des.,2015, pp. 487–493.

[40] K. Padmaraju and K. Bergman, “Resolving the thermal challengesfor silicon microring resonator devices,” Nanophotonics, vol. 3,no. 4-5, pp. 269–281, 2013.

[41] M. A. Popovic, “Theory and design of high-index-contrastmicrophotonic circuits,” Ph.D. dissertation, MIT, Cambridge,2008.

[42] W. Bogaerts, et al., “Silicon microring resonators,” Laser Photon.Rev., vol. 6, no. 1, pp. 47–73, 2012.

[43] P. P. Absil, et al., “Silicon photonics integrated circuits: amanufacturing platform for high density, low power optical i/os,”Opt. Express, vol. 23, no. 7, pp. 9369–9378, 2015.


[44] T. E. Carlson, et al., “Sniper: Exploring the level of abstraction forscalable and accurate parallel multi-core simulations,” in Proc. Int.Conf. High Perform. Comput. Netw. Storage Anal., 2011, pp. 1–12.

[45] S. C. Woo, et al., “The SPLASH-2 programs: Characterization andmethodological considerations,” in Proc. 22nd Annu. Int. Symp.Comput. Archit., 1995, pp. 24–36.

[46] C. Bienia, et al., “The PARSEC benchmark suite: Characterizationand architectural implications,” in Proc. Int. Conf. Parallel Archit.Compilation Techn., 2008, pp. 72–81.

[47] S. Li, et al., “McPAT: An integrated power, area, and timingmodeling framework for multicore and manycore architectures,”in Proc. 42nd Annu. IEEE/ACM Int. Symp. Microarchitecture, 2009,pp. 469–480.

[48] J. Howard, et al., “A 48-core IA-32 processor in 45 nm CMOS usingon-die message-passing and DVFS for performance and powerscaling,” IEEE J. Solid-State Circuits, vol. 46, no. 1, pp. 173–183,Jan. 2011.

[49] A. Sridhar, et al., “3D-ICE: Fast compact transient thermal model-ing for 3d ICs with inter-tier liquid cooling,” in Proc. IEEE/ACMInt. Conf. Comput.-Aided Des., 2010, pp. 463–470.

[50] Crystal Quartz. (2016). [Online]. Available: http://www.mt-berlin.com/frames_cryst/crystals_frameset1.htm

[51] S. R. Sarangi, et al., “Varius: A model of process variation andresulting timing errors for microarchitects,” IEEE Trans. Semicond.Manuf., vol. 21, no. 1, Feb. 2008, pp. 3–13.

[52] D. Harris, et al., “Support vector regression machines,” in Proc. 9thInt. Conf. Neural Inf. Process. Syst., 1996, pp. 155–161.

[53] P. Grani and S. Bartolini, “Design options for optical ring intercon-nect in future client devices,” ACM J. Emerging Technol. in Comput.Syst., vol. 10, no. 4, 2014, Art. no. 30.

[54] Lumerical Solutions. (2017). [Online]. Available: http://www.lumerical.com/tcad-products/mode/

[55] S. V. R. Chittamuru, I. Thakkar, S. Pasricha, “PICO: Mitigatingheterodyne crosstalk due to process variations and intermodula-tion effects in photonic NoCs,” in Proc. 53nd ACM/EDAC/IEEEDes. Autom. Conf., 2016, pp. 1–6.

[56] CACTI 6.5. (2017). [Online]. Available: http://www.hpl.hp.com/research/cacti/

[57] D. Dang, S. V. R. Chittamuru, R. Mahapatra, and S. Pasricha,“Islands of heaters: A novel thermal management framework forphotonic NoCs,” in Proc. 22nd Asia South Pacific Des. Autom. Conf.,2017, pp. 306–311.

[58] J. E. Cunningham, et al., “Highly-efficient thermally-tuned resonantoptical filters,”Opt. Express, vol. 18, no. 18, pp. 19055–19063, 2010.

[59] K. Padmaraju, et al., “Wavelength locking and thermally stabiliz-ing microring resonators using dithering signals,” in J. Lightw.Technol., vol. 2, no. 3, pp. 505–512, Feb. 2014.

Sai Vineel Reddy Chittamuru (S’14) received theBTech degree in electrical engineering from theIndian Institute of Technology Kharagpur (IIT-KGP), in 2011, and the PhD degree in electricaland computer engineering, from Colorado StateUniversity, Fort Collins, CO, USA, in 2018. His re-search interests include memory system design,embedded system, systems-on-chip, and opticalnetworks-on-chip. He is the recipient of the bestpaper awards from IEEE/ACM SLIP 2016 andACM GLSVLSI 2015 conferences, and a best

paper nomination from the IEEE ISQED 2016 conference for his researchcontributions. He is a studentmember of the IEEE.

Ishan Thakkar (S’14) received the MS degree inelectrical engineering from Colorado State Univer-sity, Fort Collins, CO, USA. He is currently workingtoward the thePhD degree in electrical engineeringfrom Colorado State University, Fort Collins, CO,USA. His research interests include DRAM archi-tectures, non-volatilememories, high-speedopticalinterfaces, and optical networks-on-chip. He is therecipient of the best paper award from IEEE/ACMSLIP 2016 conference and a best paper nomina-tion from the IEEE ISQED 2016 conference for hisresearch contributions. He is a student member ofthe IEEE.

Sudeep Pasricha (M’02-SM’13) received the PhDdegree in computer science from the University ofCalifornia, Irvine, in 2008. He is currently a Monfortand Rockwell-Anderson Professor of Electrical andComputer Engineering at Colorado State Univer-sity. His research interests include energy effi-ciency and fault tolerance for multicore computing.He is on the editorial board of various journals suchas the IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems, IEEETransactions on Multi-Scale Computing Systems,

TECS, etc. He currently is or has been on the organizing committee andTPC of various conferences such as DAC, DATE, NOCS, ESWEEK,VLSID, etc. He has received Best Paper Awards at SLIP’16, GLSVLSI’15,AICCSA’11, ISQED’10, andASPDAC’06.

" For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.


http://www.mt-berlin.com/frames_cryst/crystals_frameset1.htm

http://www.mt-berlin.com/frames_cryst/crystals_frameset1.htm

http://www.lumerical.com/tcad-products/mode/

http://www.lumerical.com/tcad-products/mode/

http://www.hpl.hp.com/research/cacti/

http://www.hpl.hp.com/research/cacti/

libra: thermal and process variation aware reliability...

Documents