joo-hyung chae jihwan park seoul national university seoul...
TRANSCRIPT
Energy-Efficient Dynamic Comparator with Active Inductor for Receiver of Memory Interfaces
Jae Whan Lee
Seoul National University Seoul 08826 Rep. of Korea
Hyunkyu Park Seoul National University
Seoul 08826 Rep. of Korea
Joo-Hyung Chae Seoul National University
Seoul 08826 Rep. of Korea
Jaekwang Yun
Seoul National University Seoul 08826 Rep. of Korea
Jihwan Park SK Hynix
Icheon 17336 Rep. of Korea
Suhwan Kim
Seoul National University Seoul 08826 Rep. of Korea
ABSTRACT In this paper, we propose a dynamic comparator that improved the
operation performance of receiver (RX) with the effort to reduce
power consumption. It is implemented via double-tail StrongARM
latch comparator with an active inductor and efforts are made to
minimize power consumption for high-speed resulting in better
energy efficiency at the targeted high frequency. In this regard, our
comparator is suitable for memory application RX to satisfy both
low-power and high-speed. It is applied to the single-ended RX
designed with a continuous-time linear equalizer, a clock generator
and a quarter-rate 2-tap decision-feedback equalizer which is
appropriate for the high-frequency memory application. Compared
to the conventional one, our design, fabricated in 55nm CMOS
process, provides an improvement of 7% in unit interval (UI)
margin under the same power consumption and receives up to
10Gb/s PRBS15 data at BER < 10-12 with 0.4 UI margin and energy
efficiency of 0.67pJ/bit.
Keywords Dynamic comparator; StrongARM latch; Active Inductor; Decision
Feedback Equalizer; Timing Critical Path
1. INTRODUCTION The data-rate of DDR5 and LPDDR5 has increased to 6.4Gbps, and
the next generation of the memory interface is expected to operate
at a data-rate above 10Gbps. As high data-rates without excessive
I/O power dissipation for memory applications, a channel may have
limitations on speeding up the communication since high capacitive
load and impedance discontinuity result in bandwidth limitation
with reflective post-cursor intersymbol interference (ISI). It is well
known that an effective way to address the issues is to use a
decision-feedback equalizer (DFE), which can recover post-cursor
ISI at receiver (RX) side by utilizing previous decision data [1]. Therefore, DFE design will have a significant impact on the
memory function.
DFE and comparator are closely interdependent, and
comparator’s role is crucial in achieving low-power and high-
speed DFE. Therefore, comparators must cope with the need while
achieving as low-power consumption, better performance and
smaller area as possible [2].
SAMPLE: Permission to make digital or hard copies of all or part of this
work for personal or classroom use is granted without fee provided that
copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee. Conference’10, Month 1–2, 2010, City, State, Country.
Copyright 2010 ACM 1-58113-000-0/00/0010 …$15.00.
DOI: http://dx.doi.org/10.1145/12345.67890
(a)
(b)
Figure 1. (a) Schematic of the conventional StrongARM latch
comparator and (b) its operation phases
CK
CK
Vin Vref
Vo+ Vo-
CK
SamplingC RegenerationC DecisionC Reset
OUTP
OUTNC
CK
There exist a lot of techniques to design comparators such as
open loop comparator, preamplifier based comparator, dynamic
latched comparator [3, 4]. Among these techniques, dynamic
latched comparators are preferred for high-speed and low-power
applications, especially StrongARM latch comparator shown in
Fig. 1 (a) is most widely used due to its advantages such as high-
speed, zero static power consumption, high input-impedances and
full swing output [5]. The operation of the comparator consists of
four phases: Sampling, Regeneration, Decision and Reset as
illustrated in Fig. 1 (b). When clock goes high, sampling period
starts and allows the parasitic capacitors to discharge according to
the comparator’s bandwidth. If there is not enough discharging time
at high- speed, it may be difficult to obtain a margin for the decision
period. Furthermore, a study on comparator improvement
considering its performance, power consumption, size is essential.
A double-tail regenerative latch [6], a charge-steering based
StrongARM latch comparator [7], Lewis Gray comparator [8] and
various modified techniques [9] are presented to improve the
conventional StrongARM latch comparator. However, they could
not alleviate comparator speed-power trade-off.
To solve such issues, this paper presents a comparator that
alleviates speed-power trade-off. We minimize power consumption
and enhance bandwidth using the active inductor at the desired
comparator’s operation phase only. Therefore, once our comparator
is applied to DFE, it helps to relax the most important factor in DFE,
the timing critical path. This is led to the achievement of wider bit
error rate (BER) eye-opening with less power consumption.
2. STRONGARM LATCH COMPARATOR
WITH ACTIVE INDUCTOR Fig. 2 (a) represents a structure that connects a double-tail
StrongARM latch comparator with the active inductor to reduce
timing critical path, and large devices are used to spatially average
out the comparator’s offset mismatch. Clock inputs can be divided
into CKL and CKS and when CKS is turned off, there will be no
current path from the active inductor. Fortunately, all CKS has to
do is transferring signal to the latch during its initial sampling time.
That is because sampled information after the initial sampling
period does not impact latch as much compared to the figures
sampled in the beginning. In addition, it needs to be turned off after
the sampling as it could give an adverse effect on the operation at
the worst case when noise or data changes. This is the same
situation as an active inductor is on and wants the current path to
disappear when operating. Therefore, we attempt to minimize
power consumption by blocking input data path that may have an
adverse effect on the operation of latch after the sampling and the
active inductor that had finished its role in the sampling period. We
use the fact that in its operation phase, sampling slope increases by
an active inductor and that latch begins operation before the
conventional one using less power in which it helps to broaden
decision margin. Fig. 2 (b) shows CKL and CKS that go into the
proposed StrongARM latch comparator. These two clocks are
made with AND gate using clock phase before 90 degrees and the
current clock phase and become a period where the active inductor
moves around. This is the same as a sampling period and power
consumption increases in this period. There is enough scope to
reduce power consumption since sampling could be done by one
inverter delay [10]. However, if the worst case when its input signal
is very noisy or ultra-high-speed, sufficient sampling time is also
essential for latch operation. Therefore, we decided CKS be
sufficient sampling period for the latch operation. As we compared
power with targeted high frequency, power efficiency could get
better as it goes to high-speed. As there are fewer effects on BER
of the active inductor in low-speed, if there is another on-off switch,
the current path could be blocked by switching off the active
inductor.
(a)
(b)
Figure 2. (a) Schematic of the proposed StrongARM latch
comparator and (b) input clock phases
Vin Vref
Vo+Vo-
CKS
CKL
CKL
M1
M2 M4
M3 CKL
CKSCKS
Regeneration+Decision
(power saving) Reset
CKS
CKL
Sampling
(slope )
(a)
(b)
Figure 3. (a) Half circuit and (b) small-signal equivalent
circuit of the proposed StrongARM latch comparator
Vin
Vout
C
R
M1
M2
1gm3
- __// gm4
- __1
R
1gm3
- __// gm4
- __1C
Vgs2
gm2vgs2 ro2
vin gm1vin ro1
+
_
vout
As large area is mandatory for attaining high levels of
inductance and performance, implementation of passive inductors
on the chip is deemed difficult. The active inductor is composed of
resistive and capacitive elements [11]. The active inductor load
circuit and the PMOS transistor with the resistor connected
between the gate of PMOS transistor and low impedance node. A
half circuit and a small-signal analysis are performed to
characterize the behavior of the proposed idea and to verify its
bandwidth effect as shown in Fig. 3 (a) and (b). In Fig. 3 (b), we
can obtain,
𝑔𝑚1𝑣𝑖𝑛 +𝑣𝑜𝑢𝑡
𝑟𝑜1+
𝑣𝑜𝑢𝑡
− 1
𝑔𝑚3//−
1
𝑔𝑚4
+𝑣𝑜𝑢𝑡
𝑟𝑜2+ 𝑔𝑚2𝑣𝑔𝑠2 +
𝑣𝑜𝑢𝑡−𝑣𝑔𝑠2
𝑅= 0, (1)
where, 𝑟𝑜 denote the small-signal resistor of MOS device,
𝑣𝑔𝑠2 =𝑣𝑜𝑢𝑡
𝑅𝑗𝑤𝐶+1 . (2)
The impedance of this active inductor is given by
𝑍(𝑠) =𝑠𝐶𝑅+1
𝑔𝑚 , (3)
where 𝑔𝑚 is transconductance of the PMOS transistor. The benefit
of the active inductor is the improvement of the bandwidth
achieved through adding the zero. There have been efforts to
increase speed by adding active inductors to either output drivers
or continuous-time linear equalizer (CTLE) [12, 13]. However, we
noticed that there is a need to add active inductors to DFE itself and
therefore improved the timing critical path.
Fig. 4 shows the operation of the proposed and conventional
comparator. When the bandwidth of comparator is more amplified
during sampling the data, the decision period for the proposed
(Decisionp) is longer than that of the conventional one (Decisionc).
We took an advantage of the fact that sampling slope increases via
active inductor in a timely manner and broadens decision margin
since it begins operating before the conventional latch with less
power during the operation phase in StrongARM latch comparator.
Fig. 5 is the simulation results of decision margin per power
consumption versus sampling frequency. As there is a need to
achieve sufficient decision margin using less power, we divided the
percentage of decision margin by current consumption and defined
this as decision efficiency and compared it for further research. It
is confirmed that the active inductor influence bandwidth and
secure more critical time than conventional applications with better
power efficiency at high a sampling frequency (> 6Gb/s).
3. QUARTER-RATE RECEIVER
We designed the RX to compare the effect of the comparator’s
performance to the DFE’s timing critical path and power
consumption. In the RX, upon the entrance of single-ended signal
data through the channel becomes equalized by CTLE and the
signal enters the DFE which is suitable for the memory interface.
Fig. 6 represents the quarter-rate RX architecture. Here, the
quadrature clock generator receives differential clock inputs from
outside sources and an IQ divider (IQ DIV) generates 4-phase
quarter-rate clock signals. 2 tap DFE gives feedback with h1 and
h2, and the tap weights are programmed to fixed values with current
digital-to-analog converters.
Figure 4. Operation of the proposed and conventional
comparator
SamplingC RegenerationC DecisionC Reset
OUTP
OUTNC
DecisionPRegenerationP
CKS
OUTNP
CKL
latches
starts
SamplingP
Figure 5. Simulation results of decision margin per power
consumption versus sampling frequency
Reversal point
Frequency (GHz)
Decis
ion
eff
icie
ncy
(%/μ
A)
1 2 3 4 5 60
0.4
0.8
1.2
1.6
proposed comparator
conventional comparator
Figure 6. Proposed quarter-rate RX architecture
S-R
latch
CK[1,0]
S-R
latch
S-R
latch
S-R
latch
h2
h2
h2
h2
h1 = +1
h1 = -1
h1 = +1
h1 = -1
h1 = +1
h1 = -1
h1 = +1
h1 = -1
OUTφ0
Data
CKP
CKN
IQ
DIV
CK
BU
F
S-to-D
CK[3:0]2
2
4
OUTφ1
OUTφ2
OUTφ3
DFE
Quadrature Clock Gen.
CK[2,1]
CK[3,2]
CK[0,3]
CTLE
Capacitive Degeneration
Linear Equalizertiming critical path
Current-mode
Summer
Vref Gen.
VREFSEL6
A major challenge in the design of a DFE is ensuring that the
feedback signals have settled accurately at the slicer input before
the next data decision is made. Accordingly, the critical timing path
in which signal is delivered via the comparator, mux and S-R latch
should be within 1 unit interval (UI) as expressed as (5), and
therefore the timing critical path should be considered in a high-
speed application. This work reduced timing critical path by adding
the active inductor on comparator which is the most important
factor in DFE, further achieving better performance without
exceeding power consumption than the conventional one.
𝑡𝑐𝑜𝑚𝑝𝑎𝑟𝑎𝑡𝑜𝑟 + 𝑡𝑚𝑢𝑥 + 𝑡𝑆−𝑅 𝑙𝑎𝑡𝑐ℎ < 1𝑈𝐼 (5)
4. EXPERIMENTAL RESULTS
Our quarter-rate RX was fabricated in 55nm CMOS process. Fig.
7 shows the microphoto and magnified layout of the proposed RX.
The active area is 0.015 mm2, including CTLE, DFE, quadrature
clock generator and some test circuitry. There are conventional
DFE and proposed DFE to apply each conventional and proposed
comparator for comparing them under the same conditions.
Fig. 8 (a) shows a measurement environment that is suitable for
the J-BERT to transmit the data to the chip and to receive the data
back and eventually set up the bit error rate test. The codes that
determine enable signals and intensity of DFE weight transfer
signals through I2C to the chip. The equalizer was tested with 12-
inch FR-4 trace channel. Fig.8 (b) plots the measured frequency
response of the trace and the loss at 10Gb/s reaches 10 dB for the
trace.
Fig. 9 is a plot of measured power consumption of the RX with
the conventional StrongARM latch comparator and the RX with
proposed one when the pseudo-random binary sequence (PRBS) 15
pattern is given. This shows that our RX is improved in power
efficiency at higher speed and that higher data rate is measurable
due to the increase in bandwidth. The proposed RX consumes
6.7mW (0.67pJ/bit) at 10Gb/s from a 1.2V supply. Although better
power efficiency can be achieved via cutting off an active inductor
at low speed, this paper targeted high frequency for designing and
comparison. The BER was measured at 7Gb/s where comparators
consume the same power.
We estimate BER in RX as it is expected that performance of
comparator will have a direct impact on the improvement of timing
critical path of DFE in figuring out the comparator’s data decision
margin. Fig. 10 (a) shows the bathtub curves with 7Gb/s PRBS data
of length of 215-1 for 12 inch trace. To achieve BER<10-12, each
conventional equalizer and proposed equalizer allows clock phase
margin of 0.68 and 0.75 UI while the circuit is operating at a supply
voltage of 1.2V. It indicates estimated values that each compared
conventional DFE and what was proposed. From the result above,
we can see that the proposed structure has better performance with
Figure 7. The microphoto and magnified layout of the
proposed receiver
Conv. DFE
Proposed DFE
CTLE
100μm
40μm
Clock Gen.Quarter-rate
Receiver
PC
DIN
I2C command
12 inch FR-4
CLKs
PWR/GND
J-BERT
(AGILENT N4903A)
PWR source
(Agilent E3631A)
DOUT
(a)
(b)
Figure 8. (a) Measurement environment and (b) measured
frequency response of the 12-inch FR-4 channel
-10dB loss
(@ 10 Gb/s)
Frequency (GHz)
Ch
an
nel
Ga
in
(dB
)
1 2 3 4 5 6 7
-10
-4
0
-8
-6
-2
-12
-14
Figure 9. Measured Data transfer rate versus power
consumption
BER measured point
(7 Gb/s)
Data transfer rate
(Gb/s)
Pow
er C
on
sum
pti
on
(mW
)
4 5 6 7 8 9 105.5
6.0
6.5
7.0
7.5
CTLE & proposed DFE
CTLE & conventional DFE
Targeted
Frequency
(>5Gb/s)
Table 1. Comparison with previous low-power RXs
Reference JSSC
[14]
ISSCC
[15]
ASSCC
[16]
This Work
Conv.
RX
Prop.
RX
Process
(nm) 90 40 65 55
Design
CTLE +
1-Tap
DFE
CTLE +
2-Tap
DFE
CTLE +
1-Tap
DFE
CTLE +
2-Tap DFE
Clocking Half rate Quarter
rate Half rate Quarter rate
Supply
voltage
(V)
1.0 1.15 1.0 1.2
BER 10-12 10-12 10-12 10-12
Channel
loss (dB) 24 16 25 10
Horizontal
Eye
Opening
(UI)
0.36 0.26 0.25 0.38 0.4
Max. Data
rate
(Gb/s)
20 22 21 8 10
Energy
efficiency
(pJ/bit)
2.0 0.94 0.96 0.84 0.67
having 7% better BER at the same speed and same power status.
This is reliable result as the experiment is conducted and the result
is collected in the same environment. Fig. 10 (b) shows the bathtub
curves with 10Gb/s PRBS data of length of 215-1 for 12 inch trace
as well. As the bandwidth of the proposed comparator increases,
the proposed DFE achieves 0.4 UI, unlike the conventional one.
Measured parameters of this RX are compared with previous
high-speed low-power RX including CTLE and DFE in Table 1. It
summarizes the performance and the results with those of the prior
DFEs running with similar power efficiency. The table shows that
the DFE with proposed comparator achieves higher maximum data
rate than the DFE with the conventional comparator. Since we set
the target data rate to compare the two DFEs, we fixed the gain
controllability of the CTLE. With such controllability, there is
room to achieve higher data rates.
5. CONCLUSION In conclusion, a StrongARM latch comparator with the active
inductor exhibits a faster comparison speed and achieves lower
power consumption at the targeted high-speed. We applied
comparator to DFE where low-power and timing critical path is
crucial in high-speed RX. When comparing the proposed DFE and
conventional DFE designs, our DFE provides an improvement of
7% in UI margin and higher maximum data rate compared to the
conventional DFE with the most used conventional comparator.
Our RX achieves 0.67pJ/bit energy efficiency with a 12 inch FR4
PCB trace at a supply voltage of 1.2V. Also, it is measured to
receive up to 10Gb/s PRBS15 data at BER < 10-12 with 0.4 UI
margin.
6. ACKNOWLEDGMENTS This research was supported by the MOTIE (Ministry of Trade,
Industry & Energy) (10080570) and KSRC (Korea Semiconductor
Research Consortium) support program for the development of the
future semiconductor device.
7. REFERENCES [1] H. Lim et al., “A 5.8-Gb/s adaptive integrating duobinary DFE
receiver for multi-drop memory interface,” IEEE J. of Solid-State Circuits (JSSC), vol. 52, no. 6, pp.1563-1575, June 2017.
[2] S. J. Bae, et al., “A 40 nm 7 Gb/s/pin single-ended transceiver with jitter and ISI reduction techniques for high-speed DRAM interface,” Symp. on Very Large Scale Integration (VLSI), pp. 193–194, June 2010.
[3] S. Choudhary, S. Bhat, J. Selvakumar, “High gain and low power design of preamplifier for CMOS comparator,” 3rd International Conference on Signal Processing and Integrated Networks (SPIN), pp. 63–66, Feb. 2016.
[4] S. Kim, D. Kim, M. Seok, “Comparative study and optimization of synchronous and asynchronous comparators at near-threshold voltages,” IEEE International Symposium on Low Power Electronics and Design (ISLPED), pp. 1-6, Jul. 2017.
[5] A. Pravin, N. Sarma, F. Basha, M. Satyanarayana, “Design of low power efficient CMOS dynamic latch comparator,” J. of Very Large Scale Integration and Signal Processing (JVSP), vol. 6, no. 6, pp. 343-352, Dec. 2016.
[6] S. Mashhadi, R. Lotfi, “Analysis and design of a low-voltage low-power double-tail comparator,” IEEE Transactions on Very Large Scale Integration (VLSI), pp. 343-352, Feb. 2014.
[7] M. Ayesh, S. Ibrahim, M. Aboudina, “Design and analysis of a low-power high-speed charge-steering based strongarm comparator,” 28th International Conference on Microelectronics (ICM), pp. 209-212, Dec. 2016.
[8] J. He, S. Zhan, D. Chen, “Analyses of static and dynamic random offset voltages in dynamic comparators,” IEEE
(a)
(b)
Figure 10. Measured RX bathtub curves for (a) 7-Gb/s and (b)
10-Gb/s PRBS15 data
0.75 UI
Clock phase
(UI)
BE
R
-0.9
10-12
-0.6 -0.3 0 0.3 0.6 0.9
10-10
10-8
10-6
10-4
10-2
100
10-14
0.68 UI
proposed Conv. DFE off
0.4 UI
Clock phase
(UI)
BE
R
-0.6
10-12
-0.4 -0.2 0 0.2 0.4 0.6
10-10
10-8
10-6
10-4
10-2
100
10-14
proposed Conv. DFE off
Transactions on circuits and systems 1 (TCAS-1), pp. 911-919, May 2009.
[9] V. Sharma, G. Sharma, D. Kumar, “High speed power efficient dynamic comparator designed in 90nm CMOS technology,” International Conference on Communication, Control and Intelligent Systems (CCIS), pp. 368-371, Nov. 2015.
[10] H. Kimura et al., “A 28 Gb/s 560 mW multi-standard serdes with single-stage analog front-end and 14-tap decision feedback equalizer in 28 nm CMOS,” IEEE J. of Solid-State Circuits (JSSC), vol. 49, no. 12, pp. 3091-3103, Dec. 2014.
[11] F. Yuan, “CMOS active inductors and transformers: Principle, implementation and applications,” Springer, 1st edition, June 2008.
[12] T. Musah et al., “A 4–32 Gb/s bidirectional link with 3-tap FFE/6-tap DFE and collaborative CDR in 22 nm CMOS,” IEEE J. of Solid-State Circuits (JSSC), vol. 49, no. 12, pp. 3079-3088, Dec. 2014.
[13] J. Song, H. Lee, J. Kim, S. Hwang, C. Kim, “1V 10Gb/s/pin single-ended transceiver with controllable active-inductor-based driver and adaptively calibrated cascade-DFE for post-lpddr4 interfaces,” IEEE International Solid-State Circuits Conference (ISSCC), pp. 320-322, Feb. 2015.
[14] S. Ibrahim and B. Razavi, “Low-power CMOS equalizer design for 20-Gb/s systems,” IEEE J. of Solid-State Circuits (JSSC), vol. 46, no. 6, pp. 1321–1336, May 2011.
[15] K. Jung, A. Amirkhany , K. Kaviani, “A 0.94mW/Gb/s 22Gb/s 2-tap partial-response DFE receiver in 40nm LP CMOS,” IEEE International Solid-State Circuits Conference (ISSCC), pp. 42-43, Feb. 2013.
[16] Y. You, S. Chakraborty, R. Wang, J. Chen, “A 21-Gb/s, 0.96-pJ/bit serial receiver with non-50% duty-cycle clocking 1-tap decision feedback equalizer in 65nm CMOS,” IEEE Asian Solid-State Circuits Conference (ASSCC), pp. 1-4, Nov. 2015.