design of a fast and low-power sense amplifier and writing

7
IEEE TRANSACTIONS ON MAGNETICS, VOL. 51, NO. 5, MAY 2015 3400507 Design of a Fast and Low-Power Sense Amplifier and Writing Circuit for High-Speed MRAM Hochul Lee, Juan G. Alzate, Richard Dorrance, Xue Qing Cai, Dejan Markovi ´ c, Pedram Khalili Amiri, and Kang L. Wang, Fellow, IEEE Department of Electrical Engineering, University of California, Los Angeles, Los Angeles, CA 90095 USA A high-speed and low-power preread and write sense amplifier (PWSA) is presented for magnetoresistive RAM (MRAM). The sense amplifier incorporates a writing circuit for MRAM bits switched via timing of precessional dynamics (GHz speed) in a magnetic tunnel junction (MTJ). By combining read and write functions in a single power-efficient circuit, the PWSA allows for fast read and write operations while minimizing the bit error rate after data programming. The PWSA circuit is designed based on a 65 nm CMOS technology, and the magnetic dynamics are captured by a Verilog-A compact model based on macrospin behavior for MTJs. Using the preread and comparison steps in the data program operation, we are able to reduce write power consumption by up to 50% under random data input conditions. Furthermore, using the voltage-controlled magnetic anisotropy effect for precessional switching, more than 10× reduction of write power and transistor size both in the memory cell and the write circuit is achieved, compared with using the spin transfer torque effect. The circuit achieves 2 ns read time, 1.8 ns write time, and 8 ns total data program operation time (consisting of two read steps, one write step, and a pass/fail check step) using this PWSA concept, and a 2× larger sensing margin through the current feedback circuit. Index Terms—Magnetoresistive RAM (MRAM), nonvolatile memory, precessional switching of magnetic tunnel junction (MTJ) devices, preread and write sense amplifier (PWSA). I. I NTRODUCTION M AGNETORESISTIVE RAM (MRAM) is a promising next-generation emerging memory technology that can provide nonvolatility and low write energy with fast read and write speeds, long retention times (> 10 years), and endurances greater than 10 16 program cycles [1]–[6]. Magnetic tunnel junctions (MTJs) have become basic building blocks of MRAM, where relatively high tunneling magnetoresis- tance (TMR) ratios achieve two distinguishable resistive states, i.e., parallel and antiparallel states. Recently, there has been increasing interest in ultrafast pre- cessional (i.e., resonant) switching of MTJs, using both curr- ent [via the spin transfer torque (STT) effect] [5], [7] and voltage [via the voltage-controlled magnetic anisotropy (VCMA) effect] pulses [8]–[12]. In STT devices, precessional switching is achieved by incorporating an orthogonal combination of free and fixed layers into the device, where the large spin torque from the perpendicular fixed layer sets the free layer magnetization into a precessional motion, resulting in resonant switching [5], [7]. Alternatively, in the case of voltage controlled MTJ devices, the VCMA effect originates from the fact that the interface of oxides with metallic ferromagnets (e.g., CoFeB|MgO) shows a large perpendicular magnetic anisotropy (PMA), which is sensitive to voltages applied across the dielectric layer. This effect is caused by the electric field induced modulation of the relative occupancy of d orbitals at the interface [11]. Since the PMA is modulated due to the applied voltage, a torque is exerted on the free layer magnetization, setting it into a precessional motion thereby causing switching [12]. Manuscript received August 23, 2014; revised October 16, 2014; accepted October 30, 2014. Date of publication November 5, 2014; date of current version May 22, 2015. Corresponding author: H. Lee (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TMAG.2014.2367130 Precessional switching offers the advantages of very high speed (down to 100 ps) and low switching energy (down to 1 fJ/bit using the VCMA effect, 100 fJ/bit using the STT effect). However, it also presents a number of new chal- lenges. The first one is difficulty in determining the switching direction. In principle, the state of the magnetic bit is always reversed during resonant switching, irrespective of its initial state. Despite this issue, precessional VCMA switching only requires one pulse shape (amplitude and length) to write both the parallel and antiparallel data states. This greatly simplifies the pulse generation circuitry and provides more symmetric writes (which is better for device reliability/endurance). Due to its high density, a one transistor and one MTJ (1T-1MTJ) cell is the most widely used bit cell for MRAM. However, the available sensing margin is small due to the low TMR of the 1T-1MTJ memory architecture: i.e., the series bit line (BL) resistance decreases the resistance ratio of MTJs seen by the sensing circuitry. Furthermore, since a bias across the MTJ reduces its resistance, especially the resistance in the antiparallel state, the TMR is diminished compared with the case of zero bias [13]. This reduction further lowers the sensing margin, causing an increased possibility of an erroneous read. The main contributions of this paper are: 1) the ability to enable reliable precessional programming for high-speed operation, by reducing the bit error rate, while allowing for a large write error rate (WER) (up to 1%) of the precessional switching process; 2) to reduce write power by utilizing a preread process for eliminating redundant writes; and 3) to increase the sensing margin for reliable read operation using a current feedback circuit. II. COMPACT MODEL OF MAGNETIC TUNNEL J UNCTION An MTJ is comprised of two ferromagnetic layers divided by a tunneling oxide, where the magnetic moment of 0018-9464 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Upload: others

Post on 11-Dec-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

IEEE TRANSACTIONS ON MAGNETICS, VOL. 51, NO. 5, MAY 2015 3400507

Design of a Fast and Low-Power Sense Amplifier andWriting Circuit for High-Speed MRAM

Hochul Lee, Juan G. Alzate, Richard Dorrance, Xue Qing Cai, Dejan Markovic,Pedram Khalili Amiri, and Kang L. Wang, Fellow, IEEE

Department of Electrical Engineering, University of California, Los Angeles, Los Angeles, CA 90095 USA

A high-speed and low-power preread and write sense amplifier (PWSA) is presented for magnetoresistive RAM (MRAM). Thesense amplifier incorporates a writing circuit for MRAM bits switched via timing of precessional dynamics (∼GHz speed) in amagnetic tunnel junction (MTJ). By combining read and write functions in a single power-efficient circuit, the PWSA allows for fastread and write operations while minimizing the bit error rate after data programming. The PWSA circuit is designed based on a65 nm CMOS technology, and the magnetic dynamics are captured by a Verilog-A compact model based on macrospin behavior forMTJs. Using the preread and comparison steps in the data program operation, we are able to reduce write power consumption by upto 50% under random data input conditions. Furthermore, using the voltage-controlled magnetic anisotropy effect for precessionalswitching, more than 10× reduction of write power and transistor size both in the memory cell and the write circuit is achieved,compared with using the spin transfer torque effect. The circuit achieves 2 ns read time, 1.8 ns write time, and 8 ns total dataprogram operation time (consisting of two read steps, one write step, and a pass/fail check step) using this PWSA concept, and a2× larger sensing margin through the current feedback circuit.

Index Terms— Magnetoresistive RAM (MRAM), nonvolatile memory, precessional switching of magnetic tunnel junction (MTJ)devices, preread and write sense amplifier (PWSA).

I. INTRODUCTION

MAGNETORESISTIVE RAM (MRAM) is a promisingnext-generation emerging memory technology that can

provide nonvolatility and low write energy with fast readand write speeds, long retention times (>10 years), andendurances greater than 1016 program cycles [1]–[6]. Magnetictunnel junctions (MTJs) have become basic building blocksof MRAM, where relatively high tunneling magnetoresis-tance (TMR) ratios achieve two distinguishable resistive states,i.e., parallel and antiparallel states.

Recently, there has been increasing interest in ultrafast pre-cessional (i.e., resonant) switching of MTJs, using both curr-ent [via the spin transfer torque (STT) effect] [5], [7] andvoltage [via the voltage-controlled magnetic anisotropy(VCMA) effect] pulses [8]–[12]. In STT devices, precessionalswitching is achieved by incorporating an orthogonalcombination of free and fixed layers into the device, wherethe large spin torque from the perpendicular fixed layersets the free layer magnetization into a precessional motion,resulting in resonant switching [5], [7]. Alternatively, in thecase of voltage controlled MTJ devices, the VCMA effectoriginates from the fact that the interface of oxides withmetallic ferromagnets (e.g., CoFeB|MgO) shows a largeperpendicular magnetic anisotropy (PMA), which is sensitiveto voltages applied across the dielectric layer. This effect iscaused by the electric field induced modulation of the relativeoccupancy of d orbitals at the interface [11]. Since the PMAis modulated due to the applied voltage, a torque is exertedon the free layer magnetization, setting it into a precessionalmotion thereby causing switching [12].

Manuscript received August 23, 2014; revised October 16, 2014; acceptedOctober 30, 2014. Date of publication November 5, 2014; date ofcurrent version May 22, 2015. Corresponding author: H. Lee (e-mail:[email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TMAG.2014.2367130

Precessional switching offers the advantages of very highspeed (down to ∼100 ps) and low switching energy (downto ∼1 fJ/bit using the VCMA effect, ∼100 fJ/bit using theSTT effect). However, it also presents a number of new chal-lenges. The first one is difficulty in determining the switchingdirection. In principle, the state of the magnetic bit is alwaysreversed during resonant switching, irrespective of its initialstate. Despite this issue, precessional VCMA switching onlyrequires one pulse shape (amplitude and length) to write boththe parallel and antiparallel data states. This greatly simplifiesthe pulse generation circuitry and provides more symmetricwrites (which is better for device reliability/endurance).

Due to its high density, a one transistor and one MTJ(1T-1MTJ) cell is the most widely used bit cell for MRAM.However, the available sensing margin is small due to thelow TMR of the 1T-1MTJ memory architecture: i.e., theseries bit line (BL) resistance decreases the resistance ratio ofMTJs seen by the sensing circuitry. Furthermore, since a biasacross the MTJ reduces its resistance, especially the resistancein the antiparallel state, the TMR is diminished comparedwith the case of zero bias [13]. This reduction further lowersthe sensing margin, causing an increased possibility of anerroneous read.

The main contributions of this paper are: 1) the abilityto enable reliable precessional programming for high-speedoperation, by reducing the bit error rate, while allowing for alarge write error rate (WER) (up to 1%) of the precessionalswitching process; 2) to reduce write power by utilizing apreread process for eliminating redundant writes; and 3) toincrease the sensing margin for reliable read operation usinga current feedback circuit.

II. COMPACT MODEL OF MAGNETIC TUNNEL JUNCTION

An MTJ is comprised of two ferromagnetic layers dividedby a tunneling oxide, where the magnetic moment of

0018-9464 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

3400507 IEEE TRANSACTIONS ON MAGNETICS, VOL. 51, NO. 5, MAY 2015

one layer is fixed and the other can change freely based onelectrical and magnetic bias conditions. The magnetizationof the MTJs free layer has two energetically stable states.When the magnetic moments of the free and fixed layersare aligned in the same direction, the parallel state (denotedas P), the MTJ device has a low resistance (denoted as RP).In the antiparallel state (denoted as AP), the free layermagnetization is in the opposite direction to the fixed layer,resulting in a high MTJ resistance (denoted as RAP). In thispaper, we mainly consider MTJs with out-of-plane (perpen-dicular) magnetization, which are more scalable than in-planedevices for advanced technology nodes while retaining thermalstability [14]–[16].

Depending on the required speed of the reversal, switchingof MTJs can be performed via precessional or thermallyactivated switching [17]. Precessional (also referred to asresonant) switching occurs when a perpendicularly magnetizedfree layer of an MTJ is set into a precessional motionaround an in-plane magnetic field (such as a field createdby shape anisotropy, a bias field, or an effective magneticfield, otherwise built into the device) by either STT or VCMAeffects. This switching scheme is typically much faster thanthe thermally activated process.

To model the magnetic dynamics and verify the performanceof the proposed PWSA in Spectre circuit simulations, weuse a compact model that allows us to capture precessionalswitching of MTJs, while including both STT and VCMAeffects. During precessional switching, the magnetic momentof the free layer changes its state to the opposite state withinnanosecond or subnanosecond time scales. All the switch-ing dynamics are described by the Landau–Lifshitz–Gilbertequation in the macrospin approximation, while accountingfor bias dependence of resistance as well as thermal noiseeffects [18].

Compared with STT-based precessional switching, VCMAdriven switching has two main advantages: 1) more than10× reduction of switching energy compared with STT-basedswitching due to lower currents and 2) up to a 10× decreasein the size of the access transistors of the memory cell andthose of the write circuit.

Fig. 1 shows simulation results of the compact modelfor VCMA-induced precessional switching of MTJ devices.A 1.2 ns write pulse is able to switch an MTJ state fromP to AP or from AP to P, demonstrating the resonantbut nondeterministic characteristics of precessional switching,where the state of the bit is always reversed regardless of itsinitial state for a given pulse duration (1.2 ns in this case).A 2.4 ns write pulse causes one round trip precession, hencethe MTJ remains in the same state after the applied pulseis removed. The probability of precessional switching is afunction of the applied pulse width, as shown in Fig. 2, andthe switching probability converges toward 50% in the limitof long pulses, where the switching direction is determined bythermal fluctuations. The oscillatory behavior of the switchingprobability in principle imposes a challenge to obtain low errorrates for the data program operation, especially when varia-tions are considered. The next section seeks to address thisissue.

Fig. 1. Top: simulation results using the compact model for precessionalswitching of MTJ devices, driven by the VCMA effect. A 1.2 ns write pulseis able to switch the MTJ state from P to AP or AP to P, using the sameamplitude. On the other hand, a pulse width of 2.4 ns causes completeprecession, and the MTJ remains in the same state. The small write currents(<15 μA) allow for a low-power write. Bottom: schematic of the precessionalswitching process, where the free layer precesses around the effective in-planemagnetic field Heff after a voltage VMTJ is applied.

Fig. 2. Simulated switching probability of VCMA-induced precessionalswitching as a function of write pulse width. The probability peaks wheneverthe magnetization has completed half a period of precession, and therefore,proper timing of the writing pulse is required to achieve magnetizationreversal.

III. PROPOSED PREREAD AND WRITE SENSE AMPLIFIER

A. Data Program Flow

Fig. 3 shows a block diagram of the proposed PWSA. It iscomposed of a sensing latch (S Latch), a data latch (D Latch),an XNOR logic gate that compares new and current data,

LEE et al.: DESIGN OF A FAST AND LOW-POWER SENSE AMPLIFIER AND WRITING CIRCUIT 3400507

Fig. 3. Concept diagram of the proposed sense amplifier. The S Latch storesMTJ data based on the voltage difference between the Ref node and the CEnode, which is amplified by the Diff Amp. The D Latch stores external datawhich will be transferred to the MTJ during write operation if S and D Latchdata are mismatched (XNOR=0). The write and precharge circuit provideswrite and precharge pulses to the BL, at the write and read steps, respectively.When the circuit successfully completes a data program, the control circuitgenerates a high pass signal to the external circuit.

Fig. 4. Proposed data program flow for the sense amplifier. During the prereadand comparison step, the circuit compares initial MTJ data and external inputdata. A write pulse is generated based on the comparison result. At the PFstep, the PWSA creates a high pass signal when the newly programmed MTJdata is the same as the external data.

a differential amplifier (Diff Amp), and a write and prechargecircuit. The circuit is designed to perform a read operation andto compare the current MTJ state to the incoming data, leadingto a decision on whether a write pulse should be applied.

Fig. 4 shows the data program flow that includes the preread,comparison, write, read, and pass/fail (PF) check steps. ThePWSA reads out the initial MTJ state and stores it in theS Latch during the preread step and determines whether toprovide a write pulse to the MTJ, based on the comparisonresult between the initial MTJ data and the external data.

Fig. 5. Schematic of the proposed sense amplifier and write circuit (PWSA).The S Latch and D Latch store the initial MTJ data and the external data,respectively. During the read step, the differential amplifier amplifies thevoltage difference between the Ref node and the CE node, and createsa reliable logic value to the Diff_out node. The XNOR node holds thecomparison value when evaluating the initial MTJ data versus the externaldata, determining whether a write pulse is generated during the write step.A current feedback circuit is used to increase the sensing margin.

Therefore, depending on the ratio of the read and writeenergies of a single bit, redundant writes are eliminated fora given partial percentage match between the internal andexternal data during the preread and comparison steps, leadingto a significant reduction of the total power consumption.In this paper, the write energy of an MTJ is ∼10× larger thanthe read energy. The MTJ state is verified after completionof a write pulse, by comparison with the external data on theD Latch. If the desired data is matched, the operation finishes,and a high pass signal is transferred to the external circuit.Otherwise, the circuit iterates until the MTJ is in the correctstate or the maximum number of iterations n is reached.

B. Simulation Results

The circuit schematic of the proposed PWSA is shownin Fig. 5. The data program operation consists of five con-secutive steps: preread, comparison, write, read, and PF checksteps. For the operation, the AP represents logic value high,and P represents logic value low.

To verify the functionality of the PWSA, the VCMA MTJcompact model, describing the precessional switching char-acteristic of the memory elements, was used in the CadenceSpectre circuit simulator. We employed a 65 nm technologyfor circuit simulations. The VCMA MTJ is assumed to have100 k� resistance in the parallel state (RP) and 200 k�in the antiparallel state (RAP), corresponding to the TMRof 100%. We define TMR as (RAP − RP )/RP . To improvethe sensing margin, we use a current feedback circuit anda 150 k� reference resistance, as discussed in Section IV.The write pulse, which has a 1.2 V amplitude and a 1.2 nswidth, is designed to allow precessional switching with aprobability larger than 99% (WER <= 1%). In this section,two simulation cases are discussed.

3400507 IEEE TRANSACTIONS ON MAGNETICS, VOL. 51, NO. 5, MAY 2015

Fig. 6. Simulation of the proposed sense amplifier operation in the case ofan initial AP state for the MTJ, with P being the new external (input) data.Since the initial MTJ state (AP) is different from the new external data (P),the output of the XNOR gate goes to low, which keeps the S Latch in a highstate. Hence, the sense amplifier provides the MTJ with a write pulse and theMTJ state switches from AP to P during the write step. After the read step,the circuit compares the S Latch and the D Latch again to verify PF. In thiscase, the pass signal remains high due to the identical programmed MTJ stateand external input data.

1) AP to P Switching: Fig. 6 shows a simulation result forthe case where the initial MTJ data is AP (high = 1) andthe external data is P (low = 0). During the preread step, theS Latch stores high due to the AP state of the initial MTJ.At the comparison step, the D Latch goes low because thecircuit has P as input for the external data. Mismatching databetween the S Latch and D Latch generates a low of the XNOR

node (Fig. 5), turning transistor M6 OFF, which maintains theCE node in high potential. Since the potential of the CE nodeis higher than the Ref node, the output of the differentialamplifier Diff _Out drops to low, causing M7 to turn OFF.Thus, the S Latch remains in the high state due to the absenceof a ground path, even though M8 is turned ON. Under thiscondition, the circuit can provide a 1.2 V write pulse to theMTJ using transistors M4 and M5, and the MTJ state switchesfrom AP to P during the write step, which can be monitoredby the change in the resistance, as observed in Fig. 6. Next, thePWSA reads the MTJ state and the S Latch changes to low,because the MTJ has switched from AP to P. At the final PFcheck step, the high pass signal is transferred to the externalcircuit because the MTJ is correctly programmed.

2) P to P Nonswitching: As a second scenario, we considerthe case where no MTJ switching is required, i.e., where theinitial MTJ has a P state, and the external data is also P, causingthe S Latch to be low at the end of the comparison step, asshown in Fig. 7. Under the S Latch in the state of low condition(turning OFF M4), the circuit does not generate a write pulseto the MTJ and the MTJ remains in its initial P state. Next, thePWSA senses again the MTJ resistance, and the pass signalmaintains a logic high at the final PF check step.

IV. PERFORMANCE

A. Sensing Margin

The sensing margin is affected not only by MTJ charac-teristics such as the TMR ratio but also by circuit design

Fig. 7. Simulation of the proposed sense amplifier operation in the case ofan initial P state for the MTJ, with P being the external (input) data. Sincethe initial MTJ state (P) is the same as the new external data (P), the outputof the XNOR goes to high, which switches the S Latch to low. Because ofhaving a low state in the S Latch, the sense amplifier does not provide a writepulse to the MTJ.

parameters such as the size and the gate voltages of thetransistors in the sense amplifier circuit. Previous works havereported sensing circuits with a 1T-1MTJ topology using MTJswith 200% TMR and 65 nm technology transistors, achieving0.18 V sensing margin [19]. Here, we target to increase thispotential difference to ensure that the differential amplifier cangenerate a reliable output signal to control CMOS logic.

In the memory architecture, sensing margin and read dis-turbance are sensitive to the bias voltage applied to the BL.Applying higher voltage to the BL during the read operationgenerates larger read disturbance, causing reliability issues.On the other hand, applying low voltage to the BL results in adecreased sensing margin. The sensing margin in the circuit ofFig. 5 is determined by the voltage difference between the Refnode and the CE node. To maximize the sensing margin andminimize the read disturbance, we propose a current feedbackcircuit that consists of transistors M1, M2, and M3, as shownin Fig. 5, which is based on a current conveyer circuit [20].

The reference resistor (Rref ) is connected to M1, and itsresistance is (RP + RAP )/2 for a centered sensing marginof AP and P. Such reference resistor could be implemented,for example, via a serial and parallel combination of MTJs(RP + RP‖RP), or a digitally tunable CMOS-based resistorcircuit.

The operation of the current feedback circuit is as follows:a read operation is made up of a precharge stage, a BLdischarge stage through a MTJ, and a latch stage. During theprecharge stage, the sense amplifier charges up both BLrefand BLcell to the same potential level because M3 is fullyturned ON. Once M3 is turned OFF, the BL discharge stagebegins. If the MTJ has a P state (RP), Icell would be largerthan Iref (Fig. 5), causing BLcell to have a lower potentialthan that of BLref . The decreased potential of BLcell slightlyturns OFF M1, which reduces Iref further and leads BLref todischarge slowly. Therefore, the circuit is able to have a muchlarger potential difference between the Ref node, connected to

LEE et al.: DESIGN OF A FAST AND LOW-POWER SENSE AMPLIFIER AND WRITING CIRCUIT 3400507

Fig. 8. (a) Simulations indicating the sensing margin between the MTJ P stateand reference (Rref ) resistance values. Sensing margin is defined as the voltagedifference between the CE node and the Ref node. In this case, the sensingmargin is 0.28 V. (b) Sensing margin between the MTJ AP and referenceresistance values. In this case, the sensing margin is 0.44 V. (c) Output of thedifferential amplifier based on the average sensing margin of 0.36 V is largeenough to generate low and high logic states, while the overall read time isestimated to be ∼2 ns.

BLref , and the CE node, connected to BLcell. Through circuitsimulations, the average sensing margin reached 360 mV with100% TMR, which is 2× larger as compared with that ofconventional sense amplifiers using 200% TMR [19]. As aresult, the improved sensing margin through the proposedcurrent feedback circuit guarantees a stable logic swing, asobserved in Fig. 8.

B. Speed

To evaluate the speed of the proposed circuit, we constructedthe BL RC model based on the value of sheet resistanceand metal capacitance of the considered 65 nm technology.The circuit achieved 2 ns read operation time, as shownin Fig. 8. Furthermore, because of transistors M1 and M2,the voltage drop across the MTJ is significantly reduced to∼70 mV, while sensing current decreases below 0.8 μA, asseen in Figs. 7 and 8, alleviating read disturbance issues.

Write time is determined by the switching characteristicsof the MTJ. Since the precessional switching time of MTJsis fast at around 1 ns, we are able to achieve 1.8 ns writetime, accounting for both write pulse generation, as well asBL discharging for the next read step, making this approachsuitable for high-speed MRAM. Since the PF check step isbased on digital circuit operation, it takes only 0.5 ns togenerate a PF signal and does not result in a major penalty interms of speed.

If the circuit fails to write the MTJ due to WER, itexecutes additional write, read and PF steps, increasing pro-gram operation time (to ∼4.3 ns). Since the chance of passand fail is directly related to the magnitude of WER, thefull program operation time is a function of WER with a

Fig. 9. Full data program operation time as a function of the WER ofa particular MTJ device. WER determines the maximum number of writeiterations n to achieve a reliable write for a given ABER. Each iterationincreases operation time by 4.3 ns approximately.

fixed acceptable bit error rate (ABER), as shown in Fig. 9,where it can be observed that even a worst case condition ofWER = 0.1 requires an average data program time of only∼20 ns (the maximum number of write iteration n = 4).Here, we assume that ABER is equal to 3.6 × 10−4, whichis determined by a given Error correction code to handle∼1 Gbits/s read speed [21].

Table I compares the read and write times of previoussensing circuits with the proposal of this paper. Due tothe high-speed precessional switching, the proposed circuitachieves a 5× shorter write time than that of a previousone [22], [23]. Read time is also improved up to 3× comparedwith a former work [24], which is mainly attributed to the factthat the current feedback circuit boosts the potential differencebetween the Ref node and the CE node, as shown in Fig. 8.

C. Power Consumption

The large resistance of the VCMA MTJ devices assuressmall write and read current, reducing dissipated power with-out impact on data programming speed, as shown previously.The average power consumption of the proposed PWSA is18 μW (excluding a write power to BL), which is 13%higher than the power required to write a single VCMA-drivenMTJ. However, if the PWSA controls a cell array, the writepower is dramatically increased due to RC loading of the BLand a large number of unselected MTJs. Therefore, the totalpower consumption of the proposed PWSA is significantlyreduced by eliminating the redundant write pulses (and itsrequired sequence), if there is a match between old and newdata in the MTJs. The most frequently occurring matchingprobability is 50% as expected, translating into an additional50% saving in write power consumption under random datapattern conditions.

D. Area

Seventy percent of the transistors, especially in digitalcircuits (latches, XNORs, and inverters) of the proposedPWSA, have a minimum size (Length = F , Width = 2F ,Area = 12F2), where F is the minimum feature size. Theanalog circuits, such as the differential amplifier and the

3400507 IEEE TRANSACTIONS ON MAGNETICS, VOL. 51, NO. 5, MAY 2015

TABLE I

SENSE AMPLIFIER PERFORMANCE COMPARISON WITH PREVIOUS WORKS

current feedback circuit additionally contain 2 ∼ 8 times largertransistors than the minimum size one for better performance.

The total number of transistors of the PWSA is 37, whichoccupy ∼800F2. It is important to note that the size oftransistors for the write circuit are much smaller than thoseof a sense amplifier for STT-RAM, since the writing currentfor VCMA MTJs is below 15 μA. For comparison, three typesof sense amplifier for STT-RAM, which each consists of 8 ∼ 9transistors, were introduced in [25]. However, the average sizeof the transistors is much larger than in our work due to thelarge current requirement for STT switching, resulting in atotal sense amplifier size of ∼1000F2. Hence, compared withSTT-RAM, the area overhead is reduced by ∼20% using ourproposed PWSA.

V. CONCLUSION

A PWSA including a write circuit for high-speed MRAMis proposed and verified through Spectre simulations. Theproposed circuit topology increases the sensing margin up to2× over that of conventional approaches, reducing sensingerrors. Due to the preread and comparison steps of the dataprogram operation, the circuit is able to control the possiblehigh WER and nondeterministic characteristics of precessionalswitching for MTJ devices, resulting in low power, lowerror rates and high-speed operation of MRAM suitable forgigahertz applications. The PWSA takes advantage of thisultrafast switching scheme, achieving 2 ns write and read timesand 8 ns for the data program time. Furthermore, the size ofwrite circuit transistors and access transistors in the memorycell can be reduced by at least 10× because of VCMA-drivenprecessional switching.

REFERENCES

[1] B. N. Engel et al., “A 4-Mb toggle MRAM based on a novel bit andswitching method,” IEEE Trans. Magn., vol. 41, no. 1, pp. 132–136,Jan. 2005.

[2] A. Driskill-Smith et al., “Latest advances and roadmap for in-plane andperpendicular STT-RAM,” in Proc. 3rd IEEE Int. Memory Workshop(IMW), May 2011, pp. 1–3.

[3] C. Suock et al., “Fully integrated 54 nm STT-RAM with the smallestbit cell dimension for high density memory application,” in Proc. IEEEInt. Electron Devices Meeting (IEDM), Dec. 2010, pp. 12.7.1–12.7.4.

[4] T. Kawahara et al., “2Mb spin-transfer torque RAM (SPRAM) withbit-by-bit bidirectional current write and parallelizing-direction currentread,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers (ISSCC),Feb. 2007, pp. 480–617.

[5] H. Liu, D. Bedau, D. Backes, J. A. Katine, J. Langer, and A. D. Kent,“Ultrafast switching in magnetic tunnel junction based orthogonal spintransfer devices,” Appl. Phys. Lett., vol. 97, no. 24, p. 242510, 2010.

[6] P. K. Amiri et al., “Low write-energy magnetic tunnel junctions forhigh-speed spin-transfer-torque MRAM,” IEEE Electron Device Lett.,vol. 32, no. 1, pp. 57–59, Jan. 2011.

[7] G. E. Rowlands, T. Rahman, and J. A. Katine, “Deep subnanosecondspin torque switching in magnetic tunnel junctions with combinedin-plane and perpendicular polarizers,” Appl. Phys. Lett., vol. 98,no. 10, pp. 102509-1–102509-3, 2011.

[8] J. G. Alzate et al., “Voltage-induced switching of nanoscale magnetictunnel junctions,” in Proc. IEEE Int. Electron Devices Meeting (IEDM),Dec. 2012, pp. 29.5.1–29.5.4.

[9] W.-G. Wang, M. Li, S. Hageman, and C. L. Chien, “Electric-field-assisted switching in magnetic tunnel junctions,” Nature Mater., vol. 11,pp. 64–68, Nov. 2012.

[10] S. Kanai, M. Yamanouchi, and S. Ikeda, “Electric field-inducedmagnetization reversal in a perpendicular-anisotropy CoFeB-MgOmagnetic tunnel junction,” Appl. Phys. Lett., vol. 101, no. 12,pp. 122403-1–122403-3, 2012.

[11] T. Maruyama et al., “Large voltage-induced magnetic anisotropychange in a few atomic layers of iron,” Nature Nanotechnol., vol. 4,pp. 158–161, Mar. 2009.

[12] Y. Shiota, T. Nozaki, F. Bonell, S. Murakami, T. Shinjo, and Y. Suzuki,“Induction of coherent magnetization switching in a few atomic layersof FeCo using voltage pulses,” Nature Mater., vol. 11, pp. 39–43,Nov. 2012.

[13] G. Feng, S. van Dijken, and J. M. D. Coey, “Influence of annealingon the bias voltage dependence of tunneling magnetoresistance in MgOdouble-barrier magnetic tunnel junctions with CoFeB electrodes,” Appl.Phys. Lett., vol. 89, no. 16, pp. 162501-1–162501-3, 2006.

[14] S. Yuasa, T. Nagahama, A. Fukushima, Y. Suzuki, and K. Ando, “Giantroom-temperature magnetoresistance in single-crystal Fe/MgO/Fe mag-netic tunnel junctions,” Nature Mater., vol. 3, pp. 868–871, Oct. 2004.

[15] S. Ikeda et al., “A perpendicular-anisotropy CoFeB–MgO magnetictunnel junction,” Nature Mater., vol. 9, pp. 721–724, Jul. 2010.

[16] S. S. P. Parkin et al., “Giant tunnelling magnetoresistance at roomtemperature with MgO (100) tunnel barriers,” Nature Mater., vol. 3,pp. 862–867, Oct. 2004.

[17] K. L. Wang, J. G. Alzate, and P. K. Amiri, “Low-power non-volatilespintronic memory: STT-RAM and beyond,” J. Phys. D, Appl. Phys.,vol. 46, no. 7, p. 074003, 2013.

[18] L. Landau and E. Lifshitz, “On the theory of the dispersion of magneticpermeability in ferromagnetic bodies,” Phys. Zeitschrift Sowjetunion,vol. 8, no. 153, pp. 101–114, 1935.

[19] J. H. Song, J. Kim, and S. H. Kang, “Sensing margin trend withtechnology scaling in MRAM,” Int. J. Circuit Theory Appl., vol. 39,pp. 313–325, Mar. 2011.

[20] H. Koike and T. Endoh, “A new sensing scheme with high signalmargin suitable for spin-transfer torque RAM,” in Proc. Int. Symp. VLSITechnol., Syst. Appl. (VLSI-TSA), 2011, pp. 1–2.

[21] M. Fukuda, K. Higuchi, and K. Takeuchi, “Non-volatile random accessmemory and NAND flash memory integrated solid-state drives withadaptive codeword error correcting code for 3.6 times acceptable raw biterror rate enhancement and 97% power reduction,” Jpn. J. Appl. Phys.,vol. 50, no. 4S, p. 04DE09, 2011.

[22] E. K. S. Au, W.-H. Ki, W. H. Mow, S. T. Hung, and C. Y. Wong,“A novel current-mode sensing scheme for magnetic tunnel junctionMRAM,” IEEE Trans. Magn., vol. 40, no. 2, pp. 483–488, Mar. 2004.

LEE et al.: DESIGN OF A FAST AND LOW-POWER SENSE AMPLIFIER AND WRITING CIRCUIT 3400507

[23] D. Halupka et al., “Negative-resistance read and write schemes forSTT-MRAM in 0.13 μm CMOS,” in IEEE Int. Solid-State Circuits Conf.Dig. Tech. Papers (ISSCC), Feb. 2010, pp. 256–257.

[24] T. Na, J. Kim, J. P. Kim, S. H. Kang, and S.-O. Jung, “Anoffset-canceling triple-stage sensing circuit for deep submicrometerSTT-RAM,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 22,no. 7, pp. 1620–1624, Jul. 2014.

[25] C. Chia-Tsung, T. Yu-Chang, and C. Kuo-Hsing, “A high-speed cur-rent mode sense amplifier for spin-torque transfer magnetic randomaccess memory,” in Proc. 53rd IEEE Int. Midwest Symp. Circuits Syst.(MWSCAS), Aug. 2010, pp. 181–184.

Hochul Lee (S’13) received the B.S. degree in electrical engineering fromKorea University, Seoul, Korea, in 2005, and the M.S. degree from the Semi-conductor Material Device Laboratory, Seoul National University, Seoul. He iscurrently pursuing the Ph.D. degree with the Device Research Laboratory,University of California at Los Angeles, Los Angeles, CA, USA, with a focuson MTJs-based hybrid CMOS circuit.

He was with the Flash Memory Circuit Design Team, Samsung ElectronicsCompany, Ltd., Suwon, Korea, until 2012.

Juan G. Alzate received the B.S. degrees in electrical engineering and physicsfrom the Universidad de los Andes, Bogotá, Colombia, in 2007. He is currentlypursuing the M.S. and Ph.D. degrees with the Device Research Laboratory,University of California at Los Angeles, Los Angeles, CA, USA.

He joined Intel Corporation, Mountain View, CA, USA, in 2010, as aGraduate Research Intern involved in STT-RAM. He is exploring novelschemes for energy-efficient spintronics switches for memory and logicapplications.

Richard Dorrance (S’09) received the B.S. degree in electrical engineeringand computer science from the University of California at Berkeley, Berkeley,CA, USA, in 2009, and the M.S. degree in electrical engineering from theUniversity of California at Los Angeles, Los Angeles, CA, USA, in 2011,where he is currently pursuing the Ph.D. degree in electrical engineering withthe Department of Electrical Engineering.

His current research interests include the modeling and integration of post-CMOS devices for very large scale integration circuit design.

Xue Qing Cai, photograph and biography not available at the time ofpublication.

Dejan Markovic (S’96–M’06) received the Dipl.-Ing. degree from theUniversity of Belgrade, Belgrade, Serbia, in 1998, and the M.S. andPh.D. degrees from the University of California at Berkeley, Berkeley, CA,USA, in 2000 and 2006, respectively, all in electrical engineering.

He joined the faculty of the Department of Electrical Engineering at theUniversity of California at Los Angeles, Los Angeles, CA, USA, in 2006,where he is currently an Associate Professor.

Prof. Markovic was a recipient of the CalVIEW Fellow Award in 2001and 2002 for his excellence in teaching and mentoring of industry engineersfrom the Distance Learning Program, University of California at Berkeley.

Pedram Khalili Amiri (M’05) received the B.Sc. degree in electricalengineering from the Sharif University of Technology, Tehran, Iran, in 2004,and the Ph.D. (cum laude) degree in electrical engineering from the DelftUniversity of Technology, Delft, The Netherlands, in 2008.

He joined the Department of Electrical Engineering at the University ofCalifornia at Los Angeles, Los Angeles, CA, USA, in 2009, where he iscurrently an Assistant Adjunct Professor.

Dr. Khalili Amiri’s professional activities have included serving as a GuestEditor of Spin, and serving on the Technical Program Committee of the JointMMM/International Magnetics Conference (Intermag) Conference. He wasthe Best Student Paper Award Finalist at the IEEE Intermag in 2008.

Kang L. Wang (F’92) received the B.S. degree from National Cheng KungUniversity, Tainan, Taiwan, in 1964, and the M.S. and Ph.D. degrees fromthe Massachusetts Institute of Technology (MIT), Cambridge, MA, USA, in1966 and 1970, respectively.

He was an Assistant Professor with MIT from 1970 to 1972. From 1972 to1979, he was with the General Electric Corporate Research and DevelopmentCenter, Schenectady, NY, USA, as a Physicist/Engineer. In 1979, he joinedthe Department of Electrical Engineering at the University of California atLos Angeles, Los Angeles, CA, USA, where he is currently a DistinguishedProfessor and the Raytheon Chair of Engineering. He was also the Deanof Engineering with the Hong Kong University of Science and Technology,Hong Kong, from 2000 to 2002. He served as the Director of the Center onFunctional Engineered Nano Architectonics, MARCO Focus, Los Angeles,from 2003 to 2013, an interdisciplinary Research Center funded by theSemiconductor Industry Association and the Department of Defense to addressthe need of information processing technology beyond scaled CMOS. He isalso the Director of the Western Institute of Nanoelectronics, Los Angeles,a coordinated multiproject Research Institute, funded by NRI, Intel, and theState of California.

Prof. Wang has received numerous awards, including the IBM FacultyAward, the Guggenheim Fellowship Award, the TSMC Honor LectureshipAward, the Honoris Causa at the Politechnico di Torino, Turin, Italy, theSemiconductor Research Corporation Inventor Award, the European MaterialResearch Society Meeting Best Paper Award, and the Semiconductor ResearchCorporation Technical Excellence Achievement Award.