design and characterization of cmos high-resolution time...

UNIVERSIDADE TÉCNICA DE LISBOA

INSTITUTO SUPERIOR TÉCNICO

Design and Characterization of CMOSHigh-Resolution Time-to-Digital

Converters

Manuel José dos Reis Gaspar Seabra Mota

(Licenciado)

Dissertação para a obtenção do Grau de Doutor emEngenharia Electrotécnica e de Computadores

Orientador: Doutor José de Albuquerque Epifânio da Franca

Presidente: Reitor da Universidade Técnica de Lisboa

Vogais: Doutor Dinis Gomes Magalhães dos Santos

Doutor Moisés Simões Piedade

Doutor José de Albuquerque Epifânio da Franca

Doutor Diamantino Rui da Silva Freitas

Doutor António Manuel da Cruz Serra

Doutor João Paulo Calado Cordeiro Vital

Doutor Alessandro Marchioro

Outubro de 2000

UNIVERSIDADE TÉCNICA DE LISBOA

INSTITUTO SUPERIOR TÉCNICO

Projecto e Caracterização Experimentalde Circuitos Integrados CMOS paraMedição de Intervalos de Tempo com

Alta Resolução

Manuel José dos Reis Gaspar Seabra Mota

(Licenciado)

Dissertação para a obtenção do Grau de Doutor emEngenharia Electrotécnica e de Computadores

Orientador: Doutor José de Albuquerque Epifânio da Franca

Presidente: Reitor da Universidade Técnica de Lisboa

Vogais: Doutor Dinis Gomes Magalhães dos Santos

Doutor Moisés Simões Piedade

Doutor José de Albuquerque Epifânio da Franca

Doutor Diamantino Rui da Silva Freitas

Doutor António Manuel da Cruz Serra

Doutor João Paulo Calado Cordeiro Vital

Doutor Alessandro Marchioro

Outubro de 2000

Page i

Abstract

The subject of this thesis is the development and evaluation of high-resolutionTime-to-Digital Converter architectures suitable for the measurement of very short timeintervals in the context of the Time-of-Flight detector of the ALICE experiment.

The selected architectures are able to measure time intervals with a Root MeanSquare (RMS) resolution better than 50ps and a large dynamic range. Apart from thetiming characteristics of such TDC’s, their architectures enable the design of highlyintegrated multi-channel converter ASIC’s operating with low power dissipation.

The developed circuits are based on Delay Locked Loop (DLL) architectures. Thefeedback control loop of the DLL ensures that the time measurements are permanentlycalibrated in relation to a reference periodic signal. Schemes to obtain fine timeinterpolation without penalty in terms of added power dissipation or increased sensitivityto environmental changes (supply voltage or temperature) are investigated andimplemented. Two different approaches are selected and their detailed analysis carriedout. One uses several phase shifted DLL’s and the other a passive RC delay line. Theprototypes that implement these schemes were built in a standard 0.7µm CMOStechnology. In the first approach, an RMS resolution of 34.5ps across a dynamic range of3.2µs was measured. For the second, an RMS resolution of 21ps was obtained.

Keywords

Time-to-Digital Converter (TDC), Delay Locked Loop (DLL), self-calibration,high-resolution, multi-channel, passive RC delay lines.

Page ii

Page iii

Resumo

O objectivo desta tese é a avaliação e desenvolvimento de arquitecturas deConversão Tempo para Digital com alta resolução temporal adequados à medição deintervalos de tempo muito curtos, no âmbito do detector de Tempo de Voo da experiênciaALICE.

As arquitecturas seleccionadas são capazes de medir intervalos de tempo com umaresolução melhor do que 50ps (Desvio Quadrático Médio - RMS) ao longo de uma largagama dinâmica. Além das características temporais destes conversores, as suasarquitecturas permitem a implementação de circuitos integrados específicos multi-canal,operando com baixa dissipação de potência.

Os circuitos desenvolvidos são baseados em Malhas de Aquisição de Atraso (DLL)fechadas. A realimentação negativa da DLL garante que as medições temporais estãopermanentemente calibradas tendo como referência um sinal periódico. Foraminvestigados e implementados esquemas que permitem uma interpolação temporal muitofina sem aumentar significativamente a dissipação de potência ou a sensibilidade doesquema à variação das condições ambientais (tensão de alimentação ou temperatura deoperação). Dois destes esquemas foram seleccionados e a sua análise detalhada levada acabo. Um dos esquemas usa várias DLL’s com um atraso de fase fixo e o outro utilizauma linha de atraso passiva RC. Os protótipos em que foram implementados estesesquemas utilizam uma tecnologia CMOS de 0.7µm. Com estes protótipos obtiveram-se,respectivamente, resoluções de 34.5ps (RMS) ao longo de uma gama dinâmica de 3.2µs ede 21ps (RMS).

Palavras Chave

Conversor Tempo para Digital (TDC), Malha de Controlo de Atraso (DLL), auto-calibração, alta resolução, multi-canal, linhas de atraso passivas RC.

Page iv

Page v

Acknowledgements

It goes without saying that I am indebted to all the people whose contribution, smalland large, made my work and my life easier during the period that I spent working for thisthesis; the list of their names would be too long to write down. However, I wish toacknowledge in particular the help of my colleagues Jorgen Christiansen and PauloMoreira who had the kindness and patience to answer all my questions and whoseguidance and experience helped me to advance this work in the best direction.

I will also acknowledge the help of my supervisor, José Epifânio da Franca who wasalways attentive to my requirements, even the most pressing ones.

I thank Gaspar Barreira and Paulo Gomes who started it all and AlessandroMarchioro and Mike Letheren who welcomed me into the microelectronics group atCERN and provided me with the proper means and environment to proceed with mywork.

An acknowledgement is also due to JNICT, whose support made it all possible1 andto LIP, where the brave new world of microelectronics and High EnergyPhysics was first shown to me.

Since life is not only work, even when that work is exciting, I greet cheerfully thefriends I met in Geneva, whose warmth and imagination made life abroad very interesting.A final word is reserved to my family and friends back in Portugal who always found theright way to let me know they cared, even after being away for so much time.

1 The author is supported by a grant from the Junta Nacional de Investigação Científica e Tecnológica(JNICT) under the “Sub-Programa Ciência e Tecnologia do 2o. Quadro Comunitario de Apoio”.

Page vi

Page vii

Contents.PART I. Introduction. 1

1. Introduction and Structure of this Work. 3

2. Time Interval Measurements in HEP Experiments – An Introduction. 9

2.1. High Energy Physics experiments. 9

2.1.1. A HEP experiment at CERN: ALICE. 10

2.2. High resolution time interval measurements in ALICE. 13

3. Conversion Basics. 17

3.1. Performance metrics. 18

3.2. Error sources. 21

3.3. Converter calibration. 24

4. Review of TDC Architectures. 27

4.1. Overview of TDC architectures. 27

4.1.1. Current integration techniques. 27

4.1.2. Counter techniques. 29

4.1.3. Delay line-based techniques. 30

4.1.4. Phase Locked Loop (PLL) techniques. 31

4.1.5. Delay Locked Loop (DLL) techniques. 32

4.2. Beyond the limits of the technology: techniques to improve resolution. 33

4.2.1. Analogue time expansion. 33

4.2.2. Vernier differences. 35

4.2.3. Analogue time interpolation. 38

4.2.4. Array of coupled oscillators. 40

4.2.5. Array of Delay Locked Loops. 41

4.2.6. Time interpolation using passive RC delay lines. 43

4.3. Summary of characteristics of the TDC architectures. 44

References for Part I. 45

PART II. A TDC Architecture based on an Array of Delay Locked Loops. 49

5. Architecture Overview. 53

5.1. The Delay Locked Loop (DLL). 53

5.2. The Array of DLL’s (ADLL). 55

5.3. Conversion dynamic range. 57

5.4. Time critical paths. 59

5.5. Measurement acquisition and storage. 59

5.6. Read-out architecture. 60

5.7. The prototype. 62

Page viii

5.7.1. Performance analysis. 63

6. Analysis of the Limits to the TDC Resolution. 65

6.1. Non-linearity due to cell mismatch. 65

6.1.1. Origins of mismatch. 65

6.1.2. Effects of cell delay mismatch. 66

6.2. Jitter due to internal phase noise. 68

6.3. Non-linearity due to static phase error. 69

6.3.1. Effects of phase detector’s phase error. 70

6.3.2. Effects of phase detector input path’s mismatch. 72

6.3.3. Effects of unbalanced conditions of the cells in the extremes of the delay chain. 72

6.3.4. Effects of propagation delay on the sampling signal path. 74

6.3.5. Overall non-linearity due to static phase error. 76

7. Detailed Implementation. 79

7.1. DLL building blocks. 79

7.1.1. Phase detector. 79

7.1.2. Charge-pump and loop filter. 82

7.1.3. Delay cell. 86

7.1.4. Delay chain. 92

7.1.5. Closed control loop. 93

7.1.6. Initialisation procedure. 94

7.2. The ADLL. 95

7.3. Channel memory. 96

7.3.1. The store sampling signal distribution. 99

8. Experimental Results. 101

8.1. Delay cell range selection and charge-pump current level. 101

8.2. Converter linearity. 102

8.3. Linear time sweeps. 106

8.4. Inter-channel crosstalk. 107

8.5. Double hit resolution. 108

8.6. Power dissipation. 108

8.7. Summary of results. 108

8.8. Conclusion. 109

References for Part II. 111

PART III. A TDC Architecture based on a DLL and a Passive RC Delay Line. 113

9. Architecture Overview. 117

9.1. Time interpolation circuit. 118

9.2. Adjustable RC delay line. 119

Page ix

9.2.1. Adjustable delay line by tap selection. 120

9.2.2. Adjustable delay line by lumped capacitor selection. 121

9.3. Auto calibration. 122

9.4. The prototype. 122

9.4.1. Choice of technology. 122

9.4.2. Prototype characteristics. 123

9.4.3. Performance analysis. 125

10. Adjustable RC Delay Line using a Tap Selection Scheme. 127

10.1. RC delay line. 127

10.1.1. RC delay line simulation model. 129

10.2. Tap selection delay line. 131

10.2.1. Tap selection circuitry. 136

10.3. Auto calibration circuitry. 137

10.3.1. Calibration algorithms. 138

10.3.2. Hardware implementation. 142

11. Adjustable RC Delay Line using a Variable Lumped Capacitor Scheme. 145

11.1. Lumped capacitor delay line. 145

11.1.1. Lumped capacitor selection circuitry. 147

11.2. Auto calibration circuitry. 149

11.2.1. Calibration algorithm. 150

11.2.2. Hardware implementation. 153

11.3. Comparing the two adjustment schemes. 154

12. Experimental Results. 155

12.1. Tap selection scheme. 155

12.1.1. The complete interpolator. 157

12.2. Lumped capacitor scheme. 162

12.3. Conversion time offset. 164

12.4. Power dissipation. 165

12.5. Summary of results. 165

12.6. Conclusions. 165

References for Part III. 167

PART IV. Conclusion. 169

13. Summary of Results. 171

13.1. The ADLL architecture. 171

13.2. The DLL & RC delay line architecture. 172

13.3. TDC characterisation. 173

14. Future Developments. 175

Page x

PART V. Appendixes. 179

A. TDC Characterisation Test Bench. 181

B. Analysis of the DLL Closed Loop Behaviour. 187

C. Analysis of the Effects of Cell Delay Mismatch on the Integral Non-linearity of a DLL. 189

D. Number of Random Samples Required for TDC Characterisation. 193

E. TDC Characterisation Hit Frequency. 197

F. Analysis of the Limits to the TDC Resolution (Alternative Tap Definition). 201

G. DNL-aware Algorithms for the RC Delay Line Calibration. 203

References for the Appendixes. 209

Page xi

List of Figures.PART I. Introduction.

Chapter 1. Introduction and Structure of this Work.

Chapter 2. Time Interval Measurements in HEP Experiments – An Introduction.

Figure 1: The CERN particle accelerator complex (simplified) [4]. 10

Figure 2: Longitudinal and transverse view of ALICE detector [3]. 11

Figure 3: The hierarchical trigger data reduction block diagram of ALICE experiment [3]. 12

Figure 4: Schematic view of the TOF detector front-end. 13

Figure 5: The error propagation chain. 14

Chapter 3. Conversion Basics.

Figure 1: Ideal transfer characteristic of a 3-bit converter. 18

Figure 2: Example of a converter transfer function illustrating the static performance metrics. 20

Chapter 4. Review of TDC Architectures.

Figure 1: Block and timing diagram of a differential Current Integrating TAC (from [3]). 28

Figure 2: Delay line using double inverters as delay elements. 30

Figure 3: Asymmetric ring oscillator [24], able to generate a 2N number of timing signals from an odd-numbered oscillator. 31

Figure 4: Delay Locked Loop and hit registers. 32

Figure 5: Timing diagram of the dynamic range extension using a clocked time stretcher [33]. 34

Figure 6: Time expander circuit and corresponding timing diagram. 35

Figure 7: Time expansion using two delay lines with different cell delay. 35

Figure 8: Circular vernier scheme for dynamic range expansion. 36

Figure 9: A vernier caliber measuring a length of 0.43 mm. Note that the third tick mark in the vernier scale (lower) lines up with a tick mark in the reference scale (upper) [36]. 38

Figure 10: Time interpolation using voltage sums. 39

Figure 11: Time to analogue converter using a time interpolation technique [38]. 39

Figure 12: Coupled oscillators (time resolution of td * 2 / 3). 40

Figure 13: Array of DLL’s with phase shifting DLL. 42

Figure 14: A TDC converter based on a DLL and a RC delay line. 44

PART II. A TDC Architecture based on an Array of Delay Locked Loops.

Chapter 5. Architecture Overview.

Figure 1: Delay Locked Loop block diagram. 54

Figure 2: Delay Locked Loop used in a time base application. 54

Figure 3: Array of DLL’s with phase shifting DLL, showing bin definition. 55

Figure 4: Interpolation limits due to cell mismatch. 57

Page xii

Figure 5: Dynamic range extension using two coarse time counters. 58

Figure 6: Example of the first level of a read-out buffering hierarchy. 61

Figure 7: The prototype block diagram. 62

Figure 8: Prototype circuit showing main functional blocks. 64

Chapter 6. Analysis of the Limits to the TDC Resolution.

Figure 1: INL standard deviation curve resulting from a cell delay mismatch of σcell=1% (ADLL: N=35 and F=4, single DLL: N=140). 68

Figure 2: Standard deviation curve resulting from a closed loop jitter of σjitter=0.1% of the reference period (ADLL: N=35 and F=4, single DLL: N=140). 69

Figure 3: Detail of a delay locked loop depicting the important delays within the loop. 70

Figure 4: Illustration of the effect of the phase detector’s phase error (N=5). 71

Figure 5: Illustration of the effect of the phase detector input paths’ delay mismatch (N=5). 72

Figure 6: Illustration of the effect of unbalanced conditions in the first cell of the delay chain (N=5). 73

Figure 7: Illustration of the effect of unbalanced conditions in the last cell of the delay chain (N=5). 73

Figure 8: Illustration of the effect of the propagation delay on the sampling signal path - - case of the linear hit signal distribution network (N=5). 74

Figure 9: The T-shaped hit signal distribution network. 75

Figure 10: Illustration of the effect of the propagation delay on the sampling signal path - - case of the T-shaped hit signal distribution network (N=5). 75

Figure 11: DNL and INL curves resulting from a phase detector’s phase error (or phase detector input path’s mismatch): DPD(C / K + τdiff)=0.1% of the reference period (ADLL: N=35 and F=4, single DLL: N=140). 77

Figure 12: DNL and INL curves resulting from unbalanced conditions of the delay cells in the extremes of the delay chain: Din(δin)=1% and Dout(δout)=1% of the average cell (ADLL: N=35 and F=4, single DLL: N=140). 77

Figure 13: DNL and INL curves resulting from the propagation delay on the sampling signal path (linear hit signal distribution network): Dhit(−τhit)=0.1% of the reference period (ADLL: N=35 and F=4, single DLL: N=140). 78

Figure 14: DNL and INL curves resulting from the propagation delay on the sampling signal path (T-shaped hit signal distribution network): Dhit(−τhit)=0.1% of the reference period (ADLL: N=35 and F=4, single DLL: N=140). 78

Figure 15: DNL and INL curves resulting from the combination of the previous curves (ADLL: N=35 and F=4, single DLL: N=140). 78

Chapter 7. Detailed Implementation.

Figure 1: D-flip-flop operating as a two-state phase detector. 79

Figure 2: General and D-FF based two-state phase detector transfer characteristic. 80

Figure 3: Balanced D-flip-flop topology. 81

Figure 4: Balanced D-flip-flop topology featuring fast SR#1 operation. 82

Figure 5: Charge-pump and filter capacitor block diagram. 83

Figure 6: Charge-pump topologies (simplified). 84

Page xiii

Figure 7: Rising edge propagation along the DLL delay line and corresponding current consumption. 87

Figure 8: The self-biased differential delay cell (from [18]). 88

Figure 9: The current-starved inverter delay cell (simplified version). 88

Figure 10: Cell delay variation due to a 100mV supply voltage step, respectively for the differential and current-starved inverter structure. 89

Figure 11: Simplified representation of the delay range partition. 90

Figure 12: The selectable-range current-starved inverter cell. 91

Figure 13: The selectable delay ranges (simulation). 92

Figure 14: Detail of the closed control loop illustrating the propagation delay mismatch of the phase signals. 93

Figure 15: Schematic representation of the delay range partition illustrating the viable locking regions. 95

Figure 16: The ADLL tap distribution arrangement. 96

Figure 17: Functional diagram of the channel memory controller [3]. 97

Figure 18: The two-level hit register (1 bit). 97

Figure 19: Two-stage synchroniser using D flip-flops. 98

Figure 20: Alternative control signal distribution configurations within a channel memory row. 99

Figure 21: Integrated error histogram for the two proposed distribution configurations (simulation). 100

Chapter 8. Experimental Results.

Figure 1: DNL and INL graphs for the ADLL. 102

Figure 2: Analytical DNL and INL curves (Din=1% and Dout=-1% of the delay cell, DPD=-0.1% and Dhit=0.1% of the reference period). 103

Figure 3: DNL and INL graphs for the different Timing DLLs (LSBDLL=4·LSB). 103

Figure 4: DNL and INL graphs for the Phase Shifting DLL (LSBDLL=5·LSB). 104

Figure 5: The ADLL auto-correlation graph. 105

Figure 6: DNL and INL graphs for the converter along four reference clock periods. 105

Figure 7: Error graph and histogram resulting from a delay sweep of two reference periods (σ=0.39LSB). 106

Figure 8: DNL and INL graphs obtained from the linear delay sweep results. 106

Figure 9: Conversion error histogram for the first Timing DLL (σ=0.30LSBDLL). 107

Figure 10: Delay sweep over the full dynamic range. 107

Figure 11: Measurement error due to crosstalk in the worst configuration. 108

PART III. A TDC Architecture based on a DLL and a Passive RC Delay Line.


Figure 1: Detail of DLL signal propagation illustrating time interpolation through multiple delay line samples (in this example the number of samples acquired is M=5). 117

Figure 2: Time interpolation circuit. 119

Page xiv

Figure 3: Continuous delay adjustment scheme based on control of the distributed parameters (simplified). 120

Figure 4: Adjustable delay line using a tap selection scheme. 121

Figure 5: Adjustable delay line using a variable lumped capacitor scheme. 121

Figure 6: Block diagram of the prototype. 123

Figure 7: Prototype circuit showing main functional blocks. 125

Chapter 10. The Adjustable RC Delay Line using a Tap Selection Scheme.

Figure 1: RC line divided in two segments at access point x. R and C are, respectively resistance and capacitance per unit length. 128

Figure 2: Delay line division into equally sized sections. 129

Figure 3: Electrical model of an infinitesimal segment of a transmission line (the T-network). 130

Figure 4: Detail of the physical microstrip line and its equivalent simulation model. 130

Figure 5: Delay line segments’ length adjustment. 133

Figure 6: Adjustment function values. 134

Figure 7: Signal’s rise time along the original and the adjusted delay line, in typical conditions (simulated). 134

Figure 8: Delay and cumulative delay of each line segment (from simulations). 135

Figure 9: The leading and trailing adaptation sections. 135

Figure 10: Segment delay sensitivity to operating conditions (from simulations). The first and second graphs correspond, respectively, to the same line with and without leading and trailing sections. 136

Figure 11: The access point selection circuitry. 137

Figure 12: Calibration procedure for the tap selection adjustment scheme. 140

Figure 13: Results of calibration for different conditions, using the iterative algorithm (from simulation). 140

Figure 14: Results of calibration using the optimum linearity limit (from simulation). 141

Figure 15: Results of calibration for different conditions (from simulation). 142

Chapter 11. The Adjustable RC Delay Line using a Variable Lumped Capacitor Scheme.

Figure 1: Adjustment function values (calculated and actually implemented). 146

Figure 2: Bin size (from simulation). The first graph compares different design corners. The second graph shows the effects of extreme environment variations for the typical process. 147

Figure 3: The unit capacitor bank. 148

Figure 4: The lumped capacitor selection circuitry. 148

Figure 5: The effects of lumped capacitor unit variation in the bin size (from simulation). 149

Figure 6: The coarse calibration procedure. 151

Figure 7: The fine calibration procedure. 152

Figure 8: Results of the coarse calibration step for different conditions using the proposed algorithm (from simulation). 152

Page xv

Figure 9: Results of the fine calibration for different conditions using restrictive linearity limits (from simulation). 153


Figure 1: Delay line calibration results: DNL and INL graphs. 156

Figure 2: Spread of the RC line tap delay over the DLL cells. 156

Figure 3: Temperature dependency of the RC delay line. 157

Figure 4: DNL and INL graphs of the converter (using the tap selection adjustable delay line). 157

Figure 5: INL of the DLL, showing spread of the tap delay along the hit register rows. 158

Figure 6: Comparison of the INL graphs of the DLL and of the complete converter. 159

Figure 7: Conversion error (σ=0.51LSB). 159

Figure 8: Temperature effects on the conversion error (σ=0.50LSB/30oC and σ=0.52LSB/60oC). 160

Figure 9: DLL linear time sweep. 160

Figure 10: Detail of the DLL time sweep showing code transitions in opposite extremes of the delay chain. 161

Figure 11: DLL conversion error (σ=0.29LSBDLL). 161

Figure 12: RC delay line’s DNL and INL graphs (using the lumped capacitor adjustment scheme). 162

Figure 13: DNL and INL graphs of the converter (using the lumped capacitor adjustable delay line). 163

Figure 14: Comparison of the INL graphs of the DLL and of the complete converter. 163

Figure 15: Conversion error (σ=0.44LSB). 164

Figure 16: DLL conversion error (σ=0.29LSBDLL). 164

PART IV. Conclusion.

Chapter 13. Summary of Results.

Chapter 14. Future Developments.

Figure 1: A four channel TDC using a DLL based scheme and a single channel TDC with four times smaller LSB, using the same building blocks and an RC delay line. 176

Figure 2: The general purpose TDC architecture. 176

Figure 3: Block diagram of the general purpose TDC. 177

PART V. Appendixes.

Appendix A. TDC Characterisation Test Bench.

Figure 1: The linear passive delay generator block diagram (computer controlled). 183

Figure 2: The linear passive delay generator block diagram (automated). 184

Appendix B. Analysis of the DLL Closed Loop Behaviour.

Appendix C. Analysis of the Effects of Cell Delay Mismatch on the Integral Non-linearity of a DLL.

Figure 1: Voltage controlled delay line with fixed length. 189

Page xvi

Appendix D. Number of Random Samples Required for TDC Characterisation.

Figure 1: P(-zα/2 < Z < zα/2) = 1-α. 194

Appendix E. TDC Characterisation Hit Frequency.

Figure 1: The clock multiplying PLL. 199

Appendix F. Analysis of the Limits to the TDC Resolution (Alternative Tap Definition).

Figure 1: Detail of a delay locked loop depicting the important delays within the loop (notice the alternative location of tap 0). 201

Appendix G. DNL-aware Algorithms for the RC Delay Line Calibration.

Figure 1: Calibration procedure for the tap selection adjustment scheme. 204

Figure 2: The coarse calibration procedure. 206

Figure 3: The fine calibration procedure (first loop). 207

Figure 4: The fine calibration procedure (second loop). 208

Page xvii

List of Tables.PART I. Introduction.


Chapter 2. Time Interval Measurements in HEP Experiments – An Introduction.



Table 1: Comparison between the different architectures discussed in the chapter. 44

PART II. A TDC Architecture based on an Array of Delay Locked Loops.


Chapter 6. Analysis of the Limits to the TDC Resolution.


Table 1: Summary of noise sensitivity and power consumption analysis. 90

Table 2: Summary of noise sensitivity and power consumption analysis for the proposed cell. 92


Table 1: Locking status for each working range, after the initialisation procedure. 101

Table 2: Summary of the linearity obtained for each DLL in the array (LSBDLL=4·LSB and LSBDLL-PS=5·LSB). 104

Table 3: Characteristics of the TDC prototype. 109

PART III. A TDC Architecture based on a DLL and a Passive RC Delay Line.


Chapter 10. The Adjustable RC Delay Line using a Tap Selection Scheme.

Table 1: Comparison of the two proposed algorithms. 143

Table 2: Register (accumulator) requirements for the two proposed algorithms. 143

Table 3: Comparator requirements for the two proposed algorithms. 144

Chapter 11. The Adjustable RC Delay Line using a Variable Lumped Capacitor Scheme.

Table 1: Register (accumulator) requirements for the present algorithm. 153

Table 2: Comparator requirements for the present algorithm. 153


Table 1: Characteristics of the TDC prototype. 165

PART IV. Conclusion.



Table 1: Timing specification of the general purpose TDC. 178

Page xviii

PART V. Appendixes.



Appendix C. Analysis of the Effects of Cell Delay Mismatch on the Integral Non-linearity of a DLL.

Appendix D. Number of Random Samples Required for TDC Characterisation.


Appendix F. Analysis of the Limits to the TDC Resolution (Alternative Tap Definition).

Appendix G. DNL-aware Algorithms for the RC Delay Line Calibration.

Page xix

Glossary of Acronyms.

ADC Analogue-to-Digital Converter

ADLL Array of Delay Locked Loops

ALICE A Large Ion Collider Experiment

ASIC Application Specific Integrated Circuit

CDT Code Density Test

CERN European Organisation for Nuclear Research

CMRR Common Mode Rejection Ratio

CMOS Complementary Metal-Oxide-Silicon Field Effect Transistor Logic

CUT Channel Under Test

DAQ Data Acquisition System

D-FF D-type Flip-Flop

DLL Delay Locked Loop

DNL Differential Non-Linearity

DUT Device Under Test

HEP High Energy Physics

HMPID High-Momentum Particle Identification

HRTDC High Resolution Time-to-Digital Converter

IC Integrated Circuit

INL Integral Non-Linearity

ITS Inner Tracking System

JLCC J-Leaded Chip Carrier

LADAR Laser Radar

LHC Large Hadron Collider

LIDAR Light Detection and Ranging

LIP Laboratório de Instrumentação e Física Experimental de Partículas

Page xx

LSB Least Significant Bit

NMOS N-Channel Metal-Oxide-Silicon Field Effect Transistor

PDF Probability Density Function

PECL Positive Emitter Coupled Logic

PHOS Photon Spectrometer

PID Particle Identification

PLCC Plastic Leaded Chip Carrier

PLL Phase Locked Loop

PMOS P-Channel Metal-Oxide-Silicon Field Effect Transistor

RC Resistive-Capacitive

RMS Root Mean Square

TAC Time-to-Amplitude Converter

TDC Time-to-Digital Converter

T/D Time-to-Digital

TOF Time-of-Flight

TPC Time Projection Chamber

VCDL Voltage Controlled Delay Line

VCO Voltage Controlled Oscillator

PART I.

INTRODUCTION.


In this thesis we describe the development and demonstration of architecturesadapted for the accurate measurement of short time intervals. High-resolution timemeasurements have been performed in the past using instruments based on analoguemeasurement techniques. These instruments were built using discrete components orusing a single Integrated Circuit (IC) employing special high performance “analogue”technologies.

Our goal is to evaluate and demonstrate architectures that are suitable for monolithicintegration and which can be built in a standard CMOS technology. The ability to sharethe same time interpolator between several measurement channels is also a major aim ofthe work. Furthermore, it is intended that these architectures be implemented togetherwith all the necessary digital signal processing circuitry to build a converter with fullfunctionality.

Although the emphasis of this work is the architecture development, we carried outdetailed analysis of the critical circuitry that determines the timing performance of theconverter.

Domain of application of this work.

The work was carried out at the “European Organisation for Nuclear Research”(CERN), in Geneva, as a collaboration between the Microelectronics group and the“Laboratório de Instrumentação e Física Experimental de Partículas” (LIP), Lisbon.Therefore, emphasis is given to the specific requirements of the High-Energy Physicsexperimental environment. Nevertheless, the conclusions we obtain from the work areapplicable in any domain where high-resolution time measurements are required, forexample in LIDAR (LIght Detection And Ranging) and LADAR (Laser rADAR)applications. Our work contains contributions that can be useful in the domain of phaseand delay synthesis, in applications such as time bases for digital oscilloscopes, phasemodulation and demodulation as well as phase synchronisation.

Structure of the thesis.

The structure of this thesis follows naturally the developments achieved along theduration of the work. It is divided into four parts, each describing a major milestone of thework.

In the first part of this thesis, we start with an introduction to the subject. It includesa brief description of the goals of a High-Energy Physics experiment and the systemsneeded to achieve them. The necessity of high-resolution time measurements isemphasised together with the particular constrains of the experimental environment(Chapter 2.).

A general overview of the interesting characteristics of a Time-to-Digital Converter(TDC) is given in the form of the set of characterisation metrics that we used throughoutthe work to evaluate the time performance of T/D converters. A short description of theeffects of the quantisation error and of the different noise sources that may be present isalso given (Chapter 3.).

We then present a brief review of the common types of time interval measurementsystems that have been used in the past, highlighting their advantages and disadvantages.This review includes recent proposals that aim at the same goals as the ones pursued inthis work (Chapter 4.).

In the second part of this thesis we develop the analysis carried out to evaluate anarchitecture based on an Array of Delay Locked Loops (ADLL). As a corollary of thisevaluation, a TDC demonstrator was built based on this architecture.

An overview of the time interpolation scheme resulting from the phase shifting of anumber of Delay Locked Loops (DLL) is presented. We review the main features of thescheme, emphasising its inherent advantages and difficulties. A block diagram and a shortdescription of the TDC prototype is presented, together with the estimated timingperformance (Chapter 5.).

A detailed analysis of the causes of non-linearity that degrade the performance of aDLL-based converter is derived and an analytical model that predicts their effects in theconversion characteristic is presented. This analysis is extended to the ADLL-basedconverter. A similar analysis is carried out for the phase noise generated due to thedynamics of the DLL operation (Chapter 6.).

Having established a model for the causes and consequences of non-linearity andphase noise, the critical circuit blocks are then described. Ways to improve theirperformance and ensure that they match the required characteristics are proposed(Chapter 7.).

We then proceed to present the experimental results obtained from the prototypeTDC that was built based on this architecture, and demonstrate that these results are inaccordance with the analysis carried out (Chapter 8.).

Chapter 1: Introduction and Structure of this Work.

Page 5

In the third part of this thesis, a new architecture suitable for low power operation isproposed. The basic building block of this architecture is also a DLL, but finer timeinterpolation is obtained using passive RC delay lines. The principle of operation of thisnew architecture is described. The main characteristics of the architecture are detailed,with an emphasis on the interesting properties of RC delay lines. Two alternativeadjustable delay line schemes are proposed. A block diagram and a short description ofthe TDC prototype built using this architecture is presented and an estimation of thetiming performance exposed (Chapter 9.).

We then carry out the detailed analysis of the adjustable RC delay line based on atap selection scheme. We develop a simulation model of the distributed delay line thatincludes all the significant devices (lumped or distributed) that contribute to its delaycharacteristics. We propose a method to derive the dimensions of each of the segmentsinto which the line is divided based on the delay requirements as well as on the dimensionof the surrounding circuitry. A few calibration algorithms are also proposed and theirperformance is illustrated based on simulated delay line conditions (Chapter 10.).

The same kind of analysis is performed for the adjustable RC delay line based on avariable lumped capacitor scheme. We present different calibration algorithms (Chapter11.).

As a corollary of this part of the work we present the experimental results obtainedfrom a demonstrator TDC built using this architecture. Based on these results, we validateour analysis and confirm that this architecture performs as expected (Chapter 12.).

The concluding part of this work is divided into two chapters. In the first, wehighlight the contributions and developments carried out during this work (Chapter 13.).In the second, we propose what amounts to be the logical conclusion of this work: ageneral purpose TDC architecture using the DLL / RC delay line based architecture thatwe developed. This TDC is able to perform alternatively low resolution measurements ina large number of integrated channels or high-resolution time measurements in a smallnumber of integrated channels (Chapter 14.).

Finally a few appendices, complimentary to the main text, are included. Theyexpand and complete the explanations given in the main text. Of relevance is thedescription of the test bench that we developed specifically for TDC characterisation. Thistest bench was used throughout the work to evaluate the TDC prototypes that were built(Appendix A.).

Main contributions of this work.

As the structure of the thesis makes clear, we will present two integrated circuitsthat demonstrate two different solutions for the multi-channel, high-resolution timemeasurement system requirements.

• A four channel high-resolution TDC. This IC implements the Array of DelayLocked Loops (ADLL) architecture. Apart from the extended dynamic rangetime interpolation core this circuit also integrates digital logic to performimportant functions such as encoding, buffering and read-out management.

• A two channel high-resolution TDC. This IC implements a novel timeinterpolation architecture, based on a DLL and a passive RC delay line. Thisarchitecture allows for higher resolution with lower power operation.

Some important results were obtained while designing these circuits. They arepresented in this work:

• A detailed study of the behaviour of a Delay Locked Loop (DLL) was carriedout. We show how different error mechanisms affect the accuracy of the timeinterpolation and propose solutions to minimise these effects.

• These studies are extended to the more complex case of the Array of DLL’s(ADLL). We show that for a given device mismatch level, there is an optimalinterpolation factor (number of DLL’s in the array) that results in a consequentimprovement of the resolution of a converter built this way.

• An alternative architecture that avoids some of the limitations identified on theADLL-based architecture, such as power dissipation and maximum resolutionthat can be obtained.

• A procedure to compensate for technological tolerances in tapped passive RCdelay lines is proposed. We proceed to present several methods to characteriseand adjust these lines. We then analyse the possibility of integrating theadjustment algorithms in the same IC.

Related publications.

The contributions made during the course of this research led to the followingpublications:

Mota, M., Christiansen, J., A high-resolution time interpolator based on a DelayLocked Loop and an RC delay line, IEEE Journal of Solid-State Circuits, vol. 34, no. 10,pp. 1360-1366, Oct. 1999.

Mota, M., Christiansen, J., A four channel, self –calibrating, high-resolution Time-to-Digital Converter, Proceedings of the 5th. IEEE International Conference onElectronics, Circuits and Systems (ICECS’98), Lisboa, Portugal, Sep. 1998.

Mota, M., Christiansen, J., A high-resolution Time-to-Digital Converter based on anArray of Delay Locked Loops, Proceedings of the 3rd. Workshop on Electronics for LHCExperiments, London, UK, Sep. 1997.

Chapter 1: Introduction and Structure of this Work.

Page 7

Almasi, L. et al., New TDC electronics for a PesTOF tower – in NA49,ALICE/2000-02 internal note/TOF, Mar. 2000.

Mota, M., A high-resolution Time-to-Digital Converter – users manual, CERN/EPinternal note, Geneva, Switzerland, 1997.

Contributions in the field of microelectronics applied to the High-Energy Physicsdomain led to the following additional publications:

Mota, M., Gomes, P., Christiansen, J., MEC3 – A pipelined zero-suppression andtrigger matching chip, IEEE Transactions on Nuclear Science, vol. 42, no. 4, pt. 1, pp.808-811, Aug. 1995.

Gomes, P., Mota, M., Christiansen, J., NANA – An integrated signal processor andrecord builder for level-2 read-out of asynchronous event-filtering digital pipelines, IEEETransactions on Nuclear Science, vol. 42, no. 4, pt.1, pp. 849-853, Aug. 1995.

Chapter 2. Time Interval Measurements in HEPExperiments – An Introduction.

High-Energy Physics (HEP), or particle physics, is the discipline that explores andtries to understand the deep structure of matter [1]. As the discipline evolved, somemodels where developed to explain this structure. As in any scientific endeavour, theparticle physicist is not satisfied until his theoretical developments – the models – havebeen demonstrated by experimental means. His experiments may, however, bring to lightfiner, and not completely understood, phenomena. The cycle of scientific progress is nowclosed: new models have to be developed which require the elaboration of new and moreperformant experiments to verify them.

2.1. High-Energy Physics experiments.

The quest for the structure of the matter has been a progressive effort. In parallelwith this effort, and enabling it, a big development effort has been dedicated to the designof new and more powerful machines that act as “microscopes” exposing the ever smallerand hidden constituents of the matter.

These “microscopes” take the form of particle accelerators, where bunches ofparticles (for example ions, protons, electrons, etc) accelerated to very high energies aremade to collide. The interaction between these particles, due to the bunch collision, resultsin the conversion of the original particles into a diversity of new particles, in a processakin to the breaking up of a nucleus into its constituent protons and neutrons, whenbombarded by other energetic particles. It’s these new particles that are the object of theattention of the physicist, since they explain how the original particle is made and how itinteracts with its environment.

Surrounding the interaction point (where bunches of particles collide) is a complexset of detectors, sensitive to the different kinds of particles generated at the interactionmoment. As these resulting particles transverse the detectors, some of their energy iscaptured by the detector, which converts it into an electrical signal (charge, current orvoltage). This signal is then amplified and processed by the front-end electronics fromwhere it is transferred to powerful computers.

Traditionally, only the pre-amplifier would be mounted close to the respectivedetector cell. Its function was to optimally shape the detector signal and drive it through15 to 50 meters of cable up to the electronics hut, where all the front-end processingwould be performed. In modern experiments, where very high granularity is needed, withwell over 106 cells with independent sensors, this topology is no longer applicable.Fortunately, state-of-the-art technology can be used to integrate the required front-endelectronics into a limited number, or even a single ASIC (Application Specific IntegratedCircuit) that can be directly mounted on the detector. In this way, a vast quantity of cablesis avoided and a higher function density and lower power dissipation is achieved [2].

All the phenomena that are studied in a HEP experiment abide to statistical laws.The quantities that are to be measured with a detector sensor, either the amount energydeposited or the moment and position of the particle crossing also include someuncertainty in relation to their exact value. Therefore, multiple similar events must beanalysed, the standard deviation of their statistical distribution being of relevance to theiridentification.

2.1.1. A HEP experiment at CERN1: ALICE.

One of such detector systems is being developed in the context of the ALICEcollaboration (A Large Ion Collider Experiment) [3]. The main goal of this collaborationis to study experimentally the collision of heavy ions (for example, lead ions) at highenergy densities.

Figure 1: The CERN particle accelerator complex (simplified) [4].

These ions are accelerated to very high energies by a group of accelerator machinesconnected in series that culminate on the Large Hadron Collider (LHC), a 27Km 1 CERN: European Organisation for Nuclear Research, Geneva, Switzerland.

Chapter 2: Time Interval Measurements in HEP Experiments – An Introduction.

Page 11

perimeter circular accelerator. The LHC will include the interaction point where theALICE detector will be built to observe the particle collision (see Figure 1).

The LHC accelerator itself is made of two identical rings where bunches of ions (or,alternatively, protons) travel in opposite directions with high energy. In the interactionpoints, the two rings intercept and the particle bunches are allowed to collide.

The detector system itself is a group of detectors [3], each optimised to observedifferent ranges of particles emerging from the interaction point. These detectors comprisean Inner Tracking System (ITS) with six layers of high-resolution silicon trackingdetectors, a cylindrical Time Projection Chamber (TPC) and finally a large area ParticleIDentification (PID) array of Time-Of-Flight (TOF) counters.

The TPC is the main tracking system of the experiment. The ITS in mainly used fordetailed reconstruction of the vertex of the interaction very close to its origin. Both ofthem also aid the PID detector in the identification of particles.

In addition, a few specialised detectors are included: the electromagnetic calorimeter(PHOS – PHOton Spectrometer), the High Momentum PID (HMPID), the muonspectrometer and others. An outer magnet is necessary to bend the trajectory of chargedparticles, thereby easing their identification (Figure 2).

Particle are identified by two different mechanisms. Low and medium momentumparticles are identified, respectively, in the ITS and in the TPC by the dE/dx technique(the rate at which they loose energy as they transverse the detector). Higher momentumparticles are identified in the PID detector using the TOF technique (the time that theparticle takes to progress from the interaction point to the detector surface).

Figure 2: Longitudinal and transverse view of the ALICE detector [3].

The amount of data generated after each bunch collision (or event) is very large. Toreduce the bandwidth requirements on the data acquisition (DAQ) system, and also the

amount of memory needed for data storage, on-line data reduction algorithms are appliedto the data.

The data reduction algorithms take advantage of the spatial and temporalcharacteristics of the events: only a limited number of detector cells are actually crossedby an emerging particle. The output of the other, idle, cells can safely be discarded since itcontains no information. This operation is called “zero-suppression”. Furthermore, not allthe events are interesting to study. It is possible to implement in hardware algorithms thatsample the data of selected detectors to decide if an event includes some interestingcharacteristics that deserve further attention. Otherwise, all data pertaining to that eventmay be discarded. This operation is called “trigger based data reduction”.

In general, several levels of trigger based data reduction are implemented. Theycorrespond to a hierarchy of data reduction algorithms that are progressively moreselective. However, they are also more complex and slow.

Figure 3: The hierarchical trigger data reduction block diagram of the ALICE experiment [3].

The principle of the trigger based data reduction hierarchy in ALICE is pictured inFigure 3 [3]. A first level of data reduction (L0) is used simply to signal the existence ofan interaction as soon as possible. It is not a very selective filter. The second level of datareduction (L1) already uses information on the quality of the event to produce a largereduction in accepted event rate. Both of these trigger processors produce a decision witha fixed latency. After the L1 trigger decision is taken, the read-out of the data from alldetectors is started, pending the more selective decision of the third level trigger (L2). At


Page 13

that moment, the read-out of the detector’s data into the DAQ system can be finalised.Overall, an event rate reduction of the order of 103 is obtained. Consequently, thebandwidth of the DAQ system that is needed is proportionally reduced.

2.2. High-resolution time interval measurements in ALICE.

The efficiency of the particle identification using the TOF technique is directlyrelated to its time resolution. This is especially critical in the higher momentum side of theidentification range [5]. As a consequence, the TOF detector in the ALICE experiment isan array of sensors (counters) having a high time resolution (from σdet~40ps to 100ps,depending on the detector technology chosen).

The detector sensor is only a small part of the system. The front-end electronics alsogenerate some time uncertainties that will add up to the intrinsic detector resolution,limiting the overall time resolution of the system. A simplified view of the front-endelectronics proposed for the TOF detector is shown in Figure 4. The time of flight of theparticle resulting from the interaction is the difference between the instant when theinteraction occurred, t0, which is captured by a specialised detector (the t0 detector) andthe instant when the emerging particle transverses the TOF detector surface.

Traditionally, this time interval would be measured in a single device (a Time-to-Digital Converter – TDC). However, the dimensions of the detector system (>150,000cells distributed over ~100m2) render impractical the distribution of t0 over the wholesystem. A better solution is to rely on the reference clock (clkref), which has to bedistributed anyway, as the time reference of the measurements. Each limit of the timeinterval can then be measured individually and later subtracted digitally to obtain theoriginal interval.

time of flight

time of interaction (bunch ID)

TDC

TDC

3.5m

7m

TOF detector cells

t0 detector

pre-amplifier&

discriminator

clkrefdistribution

Interaction

Figure 4: Schematic view of the TOF detector front-end.

The actual interaction and crossing instants are reflected in the timing characteristicsof the electrical signal that the respective detector generates. These signals are the object

of some processing (amplification, discrimination, etc) in order to render them usable bythe TDC that converts the timing information they carry into a binary word.

The timing uncertainties created by such processing, and by the digital conversionprocedure, must be added to the intrinsic uncertainty of the TOF and t0 detectors (σdet andσt0, respectively) in order to obtain the overall time resolution of the system.

σt0

σTDC

σclk

σdet

σTDC

σclk

clkref

clkref distribution

TDC

detector cell (t0 / TOF)

σfe σfefront-end electronics(pre-amp & discriminator)

Figure 5: The error propagation chain.

In such a distributed system, it is reasonable to assume that all the time uncertaintiesgenerated in the different blocks are uncorrelated. Therefore, following the errorpropagation scheme of Figure 5, the time uncertainty of the TOF system is:

222220

2 222 clkTDCfedettTOF σ⋅+σ⋅+σ⋅+σ+σ=σ ,

where, for simplicity, the time uncertainty of the front-end block (σfe), of the T/Dconverter (σTDC) and of the clock distribution network (σclk) were considered having thesame statistical properties in the two independent chains.

If the intrinsic time resolution of the detector is to be respected, it is important tominimise the time uncertainty created by all the electronic components of the chain. Theoverall contribution of the electronics should only be a small fraction of the timeuncertainty of the complete TOF system. To obtain an overall time uncertainty better thanσTOF=150ps, as required by the ALICE experiment, the resolution of the T/D convertermust be σTDC<50ps. It is assumed, as in [6], that the time uncertainty of the TOF countersis σdet=100ps, and that the values for σt0, σfe and σclk are, respectively, 50ps, 10ps and50ps.

Apart from the timing performance of the TOF electronics, the particular physicalconstrains of these experiments (large number of detector cells, electronics mounteddirectly on the detector), generate new demands on the electronics to be used. Commercial


Page 15

components and instruments like low noise and fast amplifiers, low time-walkdiscriminators, and high-resolution T/D converters exist, but their size and powerdissipation are seldom adapted to the specific requirements of modern HEP experimentslike the one described.


The remarkable development of computers and other digital means of processingdata during the last few decades has enabled the creation of new and more powerfulinstruments for observing and studying the world that surrounds us. Of course, this isessentially an analogue world since observable quantities may suffer continuous time andamplitude variations. Their translation into electric signals also results in analoguequantities. The interfaces between the analogue domain and the digital domain areperformed by the Analogue-to-Digital Converters. They capture the analogue quantitiesand convert them into their digital representations, which should be the exact counterpartof the respective analogue quantity, independently of the properties of the converter used.

The capture of an analogue quantity in a discrete format by means of an electronicconverter is unfortunately not error-free. Indeed, some loss of information is inherent tothe amplitude quantising operation1. Furthermore, given the technological limitations andthe environment in which these converters operate, other sources of errors will indubitablyaffect the conversion transfer function, making it different from the idealised one.

Several converter architectures and several implementations of these architectureshave been proposed over time. All of them have claimed their advantages by showingdifferent, and some times conflicting, performance parameters. A quick scan of theliterature [7],[8] and of commercial converter data-sheets shows that even if someperformance metrics are commonly used (INL, DNL, etc), their definition may differ. It istherefore important to clarify which metrics will be used throughout this text tocharacterise the converters, and what is their significance.

Furthermore, most of the performance metrics have been developed and used in thecontext of conventional A/D converters. Some of these are not directly applicable to theT/D converter characterisation, either because they are meaningless (maximum inputfrequency, hold time, droop rate, etc), or because their meaning is different (maximumsampling rate). Also some new performance parameters, adapted to the specificapplication, must be developed.

1 Given some restrictions to the signal bandwidth B, The Nyquist criterion assures that the samplingoperation preserves all the characteristics of signal if the appropriate sampling frequency is used(fsample=2·B).

The performance metrics that will be used throughout this text are presented here.Their meaning and significance will be explained, as well as the way they can bemeasured, if relevant.

3.1. Performance metrics.

A T/D converter performs the conversion of a time interval (a delay) into a binaryword. This operation inevitably includes an amplitude discretisation (quantisation), whichmeans that its transfer function is staircase shaped, as shown in Figure 1.

analogue input

digitaloutput

LSB

dynamic range

tapi

tapi+1bini

Figure 1: Ideal transfer characteristic of a 3-bit converter.

An ideal converter is characterised by its Least Significant Bit (LSB) and theconversion Dynamic Range. The LSB corresponds to the smallest delay that can bediscriminated and the Dynamic Range corresponds to the larger delay that can bemeasured. After conversion, the delay is converted into a discrete number of Codes, eachcorresponding to a “stair” of the transfer curve. A delay is said to belong to bini if itslength is smaller than the one corresponding to Codei+1 but not smaller than the onecorresponding to Codei. For applications such as the T/D converters based on thearchitectures developed in this work, the definition of Code is interchanged with the moremeaningful definition of tap.

Departures from the ideal behaviour of the converters are usually characterisedusing a given set of metrics, such as Differential and Integral non-linearity, Gain error,Offset. Since some of these static performance metrics have different definitionsdepending on the application, a set of appropriate definitions is given and brieflydiscussed:

Differential Non-Linearity (DNL) is the deviation of the output bin size from itsideal value of one least significant bit (LSB). For a given bini, the differential non-

Chapter 3: Conversion Basics.

Page 19

linearity DNLi is given by the following equation, where di is the measured cumulativedelay from the origin to the tapi.

LSBLSB1 −−

= + iii

ddDNL , i= 0..N-1.

The result is usually presented as a graph representing all the N bins beingcharacterised, together with the standard deviation of the DNL.

Integral Non-Linearity (INL) is the deviation of the input/output characteristic and astraight line of ideal gain (slope) that best fits the curve, obtained by adding an offset tothe ideal transfer characteristic. Using this definition, Gain error is zero, because its effectis included in the INL result. The INL graph is usually presented, together with thestandard deviation of the INL.

This definition of INL does not exactly match the usual definitions, as summarisedin [7]. However, it satisfies the particular requirements of the T/D converters to whichthese metrics are applied. The principle of operation of most of the T/D converters thatwill be presented here relies in the concatenation of repeated images of a transfer functionwith small LSB along the full dynamic range of the converter. The concatenation beingguided by an external reference signal that also serves as the overall reference to theconverter.

In this context, it is standard practice to characterise in great detail only a limitedsection of the dynamic range, corresponding to one or more images of the abovementioned transfer function. The performance measured in this section is thenextrapolated to the full dynamic range (which is itself a simple repetition of this section).The definition of INL used must allow for this extrapolation operation, therefore the gainerror must be included in the INL measure.

The concatenation of the transfer function must be verified to confirm that theextrapolation of the INL measure is valid. Given the principle of the operation of theseT/D converters, it is only necessary to check that all of the images are present and are notsuperimposed. A coarse INL characterisation of the full dynamic range identifies anyconcatenation error that may be present.

For a given bin i, the integral non-linearity INLi is given by the following equation,where di is the measured cumulative delay from the origin to the tapi and the Offset Odelay

is defined below

LSB

LSB⋅−−=

iOdINL delayi

i , i=0..N-1.

Gain error is the deviation of the slope of the line used in the INL calculation fromits ideal value. As stated before, the definition of INL used results in null gain error.

Offset is the vertical intercept of the line to which the transfer function is comparedin the INL calculation. The Offset, Odelay, is such that the squared residual of εi isminimised,

LSB⋅−−=ε iOd delayii , i=0..N-1.

In our case, this definition results in a relative offset of the transfer curve. Anabsolute offset would have to take into account the offset due to different signal paths ofthe reference and hit signals within (and outside) of the circuit. Since an absolute offsetvalue depends on the system where the TDC is incorporated and must anyway bemeasured at system level, no further mention is made of this metric.

These static metrics (illustrated in Figure 2) reflect how close the transfer functionof the converter is to the ideal curve. They can be obtained using statistical methods suchas the histogram method, also known as the Code Density Test (CDT). A more detailedoverview of this method and of the test set-up used can be found in [9] and Appendix A.

Offset

DNLi+LSB

INLi

analogue input

digitaloutput

Figure 2: Example of a converter transfer function illustrating the static performance metrics.

Another important characteristic of the converter, which reflects its behaviour in thepresence of random error sources such as loop jitter, electrical noise or quantising noise isthe Conversion error:

Conversion Error is the deviation of the input/output characteristic from a straightline of ideal gain (slope) that best fits the curve. The result is presented as an histogram ofthe error, and its standard deviation is defined as the RMS Resolution of the converter.

This definition is quite similar to the INL definition given above, the differencebeing on the method by which the transfer curve is obtained. In this case it is obtained viaa linear time sweep over the dynamic range (see Appendix A), while the INL graph is (inour case) obtained using randomly generated hits in code density tests.


Page 21

This metric reflects a different way of characterising the circuit, very appropriate forHigh-Energy Physics experiments, where the response of most of the detectors to aparticle crossing includes some time (and amplitude) uncertainty which is reflected in thestandard deviation of their transfer function.

Other performance metrics of a converter are included here, for completeness.

Crosstalk between channels reflects the error introduced in the transfer function of agiven channel when electric activity occurs in any other channel integrated (or not) in thesame circuit. It is presented as a maximum deviation of the transfer function in anycoupling conditions.

Double hit resolution is a measure of the minimum time interval between twoconsecutive samples of the quantity being measured. In the TDC domain this quantity is atime interval. This metric is similar to the maximum sampling frequency used in thecontext of ADC characterisation. However it is more adapted to the characterisation ofT/D converters due to the random nature of their sampling activity.

The following characteristics do not reflect the timing performance of the converter,but they are important to establish the applicability of one particular converter circuit tothe envisaged system.

Number of integrated channels.

Power dissipation per channel.

Calibration requirements.

System-level functionality integrated (memory, etc).

3.2. Error sources.

The performance metrics already discussed describe the observable effects of all theerror sources that influence the converter system. In this section the causes of these errorswill be briefly exposed. Only the general error causes will be discussed. Particularconversion architectures are affected by different error mechanisms. These will bediscussed together with the respective architecture.

Quantisation error.

The quantising operation is inherent to the operation of any converter. It consists ofthe approximation of the amplitude of the quantity being converted to a level that is partof a limited set of available levels. The resulting signal is a discrete amplituderepresentation of the sampled signal. It can be directly represented in a binary format.

The effect of the quantising operation is an error in the conversion result. This erroris proportional to the LSB of the conversion, varying between –LSB/2 and LSB/2.Quantising is usually seen as a source of additive noise. To formulate its impact on the

performance of the converter, this additive noise is assumed to be a random variable witha uniform distribution between –LSB/2 and LSB/2 and that it is independent of the inputamplitude [8]. While these assumptions are not strictly valid, they do result in areasonable approximation for converters above 4 bits. This random variable has a standarddeviation of:

12LSB

q =σ .

Reference phase noise (Jitter).

The quality of the reference that the converter uses is determinant to the operation ofthe converter. Some converter architectures include means of averaging the importantproperties of the reference over time, thereby filtering out harmful variations of theseproperties and reducing the conversion errors. However, this filtering function has limitedeffects and therefore it is safer to rely on a high quality reference that can be used as it isdelivered to the converter.

In the context of modern T/D converters, the reference is usually a periodic signalwith its phase noise (or jitter) being the important quality factor. Jitter present in thereference will force the converter to permanently try to adapt to the changing period of thereference. Therefore any jitter on the reference signal will lead to an added random noisecomponent to the conversion function.

Other noise sources.

Several other sources of conversion errors may be present in a T/D converter, justlike in any other electronic circuit. A careful design minimises de sensitivity of thetransfer function of the converter to these noise sources.

A distinction can be made between intrinsic and extrinsic noise sources. Intrinsicnoise is due to random motion of charge carriers in the devices (active or passive) thatmake up the circuit. It is always present in the signals flowing in the circuit.

The origin of several kinds of intrinsic noise will be shortly described here [10].However, given the large voltage levels of most of the signals used in the convertersdiscussed in this dissertation, their influence in the performance of the converters is small.

Thermal noise is a temperature dependent noise. It originates from the thermallyinduced random motion of charge carriers within the device. It has a flat spectral density(white noise) and a gaussian amplitude probability distribution function (PDF) with zeromean. The variance σ2(i) is a function of the temperature T and the resistance value R (k isthe Boltzman constant and f is the frequency).

)(A 1

4)( 22 fR

Tki ∆⋅⋅⋅⋅=σ


Page 23

Shot noise is due to the random passage of charge carriers across a potential barrierin a semiconductor junction. Therefore it depends on the direct current flowing on thedevice. It has the same spectral and amplitude characteristics of thermal noise. Thevariance σ2(i) is a function of the direct current ID and of the electronic charge q.

)(A 2)( 22 fIqi D ∆⋅⋅⋅=σ

Flicker noise (or 1/f noise) describes the quality of the conductive medium withrespect to the direct current flow. Several origins may contribute to this noise. Itsamplitude PDF is often non-gaussian, but the spectral density is proportional to 1/f (hencethe name). The expression of the variance σ2(i) of the amplitude of this kind of noiseincludes two terms that have to be experimentally determined, K and a:

)(A )( 22 ff

IKi

aD ∆⋅⋅=σ

Other kinds of intrinsic noise having a spectral density with a higher orderdependency on the frequency, such as popcorn, or burst noise (1/f2) reflect mostly thequality of the processing of the material. Their amplitude PDF is not gaussian.

Finally avalanche, or breakdown noise is caused by the avalanche process justbefore junction breakdown. Its spectral density is usually flat and its amplitude PDF is notgaussian.

Extrinsic noise, on the other hand, is a product of the interference of the externalcircuitry in the behaviour of the sensitive circuit [11]. Extrinsic noise requires a path viawhich the noise source can couple into the sensitive circuit. Therefore it is strongly linkedto the circuit layout and to the signal distribution topology. This interference may berandom or deterministic.

Of several possible coupling methods we will only discuss the more relevant in theintegrated circuit domain, Capacitive coupling, Conductive coupling (via shared signalpaths) and Inductive coupling.

Capacitive coupling is due to the existence of electric fields between any twoconductors. The current flowing through the coupling capacitor is a function the rate ofchange of the potential difference across its terminals. Therefore any signal variation inone of the plates of the coupling capacitor induces a variation in the other plate. Thiseffect is often known as crosstalk. It may be significant where the coupling capacitor islarge (for example, two long parallel lines) or where high frequency, and large amplitudesignal variations occur close to a weak signal path.

Conductive coupling is due to the existence of a direct signal connection betweenthe noise generating circuit and the sensitive circuit. These connections may be the inputsignals, the common power supply or ground node.

Power supply and ground distribution within IC circuits requires complex networks.Although these networks are made of low resistivity lines, the overall resistance is not

negligible. In the presence of switching activity, periodic current surges flow throughthem, leading to voltage drops or bounces. These voltage variations may affect thesensitive circuit. Noise coupling through the power supply distribution is also known assupply noise.

Inductive coupling is usually not considered in the context of the integrated circuititself, given its small dimensions. However the package interconnects and the bond wiresthat establish the connection between the IC and the rest of the circuit can be sensitive tothis coupling effect. It is due to a varying magnetic field around a conductor where currentis varied. Since the magnetic field extends around other conductors in the vicinity, itsvariation may provoke a voltage change in them.

In the case of the bond wires dedicated to power supply and ground, whererelatively large current variations may be present due to the switching activity of thecircuit, the inductance of the wire may cause a voltage change across the supply network.This effect is also named supply noise.

As mentioned before extrinsic noise may be of random or deterministic nature. If itis of random nature, then it must be studied using statistical analytical methods. If it is ofdeterministic nature, circuit analysis methods can be used. In synchronous circuits supplynoise disturbs the sensitive circuit in a systematic (and periodic) way. The knowledge ofthe characteristics of the noise generating circuit can be used to minimise its effects on thefunctionality of the sensitive circuit.

Offset variation.

In a TDC system, conversion offset is determined by the delay that the samplingsignals experience throughout the system as it progresses until the converter. It is asystem-wide characteristic, therefore it only makes sense to discuss it at system level.Typically, offset is calibrated at start-up time, performing a direct measurement of thepropagation delay of the sampling signal (using the converter itself).

At the TDC circuit level, it is possible to minimise the temperature sensitivity of theconversion offset, by forcing the reference and the sampling signals to have similar delaysinside the circuit and to have the same temperature dependency of the two paths’ delay.

Since the sampling signal will typically transverse a front-end chain consisting ofsome electronic devices, like buffers or signal conditioners, its delay will be sensitive totemperature changes. These changes are expected to be larger than the correspondingvariations at the TDC level. Periodic system-wide calibrations are therefore required ifenvironment changes are expected.


Page 25

3.3. Converter calibration.

Any converter system requires a known reference from which the conversion gain(or constant of proportionality) can be derived. The procedure that leads to the adjustmentof the transfer function to the idealised characteristics is called Converter Calibration. In awider sense, the offline determination of the transfer function that leads to the relationshipbinding the digital representation to the measured quantity can also be included in thisdefinition, although it does not influence the converter operation.

The calibration reference can be a set of pre-determined quantities, convertedtogether with the actual signal, which can be used to derive the transfer function of theconverter. A single start-up calibration is sufficient if the converter circuit is not sensitiveto environment variations. On the other hand, if the constant of proportionality is sensitiveto environment changes, this procedure has to be executed periodically and the updatedtransfer function applied to the data. This calibration procedure does not set anyrequirements to the converter, since it is executed offline. However some conversion deadtime is incurred due to the conversion time of the reference quantities.

The hardware necessary to calculate the transfer function from these referencequantities can be integrated in the converter. Its knowledge can then be used to performinternal calibration of the converter. In this case the output data will always be calibratedin relation to the given reference. Conversion dead time is, however, unavoidable.

In all these schemes the calibration procedure is performed periodically, thereforechanges that may occur between these calibration runs are not accounted for and largeconversion errors may develop. To avoid this problem the best solution is make theconverter perform continuous calibration in a non-intrusive way, so that no dead timepenalty is incurred. In these schemes the transfer function is directly derived from areference signal and does not depend on environment conditions. A consequence of thispermanent auto-calibration is that (in normal operation) the conversion error iscontinuously minimised.


Several methods have been proposed in the past to solve the problem of accuratelymeasuring time. Traditional techniques fall into a few categories [12]: counter basedtechniques, vernier techniques, pulse overlap techniques and current integrationtechniques. TDC circuits can be built using discrete, standard, components and thereforeavoid the need to develop special purpose monolithic circuits. Recently the demand hasbeen pushing for higher level of system integration and lower power dissipation, domainswhere traditional methods find it difficult to compete.

The advent of sub-micron digital CMOS technologies, due to their availability, hasenabled the emergence of new TDC architectures. Time interpolation using delay linebased architectures can achieve comparable resolution to the more traditional methods andprofit from the new technology’s capabilities in terms of integration and powerdissipation.

An historical review of time interval measurement circuits can be found in [12]. Inthe meantime several architectures have been described in the literature, but only partialreview papers have been published (for ex. [13]). In this Chapter, a small review of themost relevant architectures is presented, focussing on the topics that are fundamental forthis work: time resolution, dynamic range, power dissipation, calibration, possibility ofsharing a common time interpolator block between several integrated channels and cost. Atable summarising the characteristics of each of the architectures described is presented inthe end of the chapter.

4.1. Overview of TDC architectures.

4.1.1. Current integration techniques.

Current integration is probably the most common technique used for time intervalmeasurements. In this architecture, a capacitor is charged linearly with a constant currentI. The charging of the capacitor is gated on by a “start” pulse at time t1 and off by a “stop”pulse (time t2). The charge stored in the capacitor is thus proportional to the time intervalbetween the “start” and “stop” pulse. Assuming a voltage independent capacitor, thevoltage drop at its terminals (Vcap) is also proportional to this time interval.

( )C

ttIV cap

12 −⋅= .

Any kind of ADC can be used to convert the Vcap into a suitable digital code. Thetime resolution of these converters can be made very high. The stability of the currentsource, the linearity of the capacitor and the resolution of the ADC determine theresolution that can be achieved using this technique. Another important constrain is thehigh noise sensitivity of the current integrating node. Differential schemes have beendeveloped to reduce noise sensitivity and enable higher resolution measurements ([14]and [15]). Figure 1 shows the basic scheme and timing diagram of one of thesetechniques. The time lapsing between the “start” and “stop” signals and the end of the“gate” signal are measured by two independent Time-to-Analogue Converters (TAC). Thedifference between these two measurements, given by an analogue voltage at the output ofa differential amplifier, corresponds to the original time interval. Mismatches between thecapacitors (C) and current levels (I) in the two TACs can be taken into account via theappropriate changes to the constant of proportionality of the measure.

TAC#1

TAC#2

Vcap (stop)

Vcap (start)

Vcap (differential)

Hit available

Reset

Gate

Start

Stop

Reset

Gate

Start

Stop

Vcap (start)

Vcap (stop)

Hit availabletstop

tstart

Figure 1: Block and timming diagram of a differential Current Integrating TAC (from [14]).

The time difference being measured is, in this case:

( )I

VVCttT stopcapstartcap

stopstart)()( −⋅

=−= .

Chapter 4: Review of TDC Architectures.

Page 29

In Current integration techniques, the converter is occupied for as long as themeasurement is being acquired. This results in a considerable dead time betweenmeasurements. Flash-ADC’s can be used to reduce the analogue to digital conversiontime. Unfortunately the cost penalty of using these devices can be prohibitive. Anotherapproximation is to rely on the statistical properties of the event arrival time. An analoguememory could then store the measurements before conversion thus de-randomising theevent rate (see [16]). In this way, a single Flash-ADC could be shared between severalchannels, or a slower ADC’s could be used without any throughput penalty.

Another limitation of these techniques is their limited dynamic range. Given amaximum voltage to which a capacitor can be charged (for example, the supply voltage),the only way to increase dynamic range is to decrease the constant of proportionality ofthe measurement, either by decreasing the current level (I) or by increasing the capacitor(C). In some applications the dynamic range is divided in separate resolution ranges [17].In this way it is possible to measure long time intervals with a limited resolution, andmeasure short time intervals with high resolution. The identification of the range to whichthe measurement belongs is performed by selection of the smallest non-overflowingrange.

Low-power operation is possible (disregarding the flash-ADC dissipation).However large-scale integration is difficult due to the requirements on good analogueprocess characteristics and the noise sensitivity inherent to the architecture. Current levelsand actual capacity values depend on process, on temperature and supply voltage, forcingcalibrations of the converter.

4.1.2. Counter techniques.

Counter based time measurement techniques generally rely on a Gray code counterrunning at very high speed. A “start” and a “stop” pulse mark the moments when thecounter is sampled, the difference between these two samples corresponds to the timeinterval measured. The frequency and stability of the reference clock determine theresolution and accuracy of this scheme [12].

This method offers a very large dynamic range, in a highly integrated digital design.However, to obtain high resolution a reference clock frequency on the GHz range(∆tmin<1ns) is required and thus very fast processes must be used to implement it. Also itresults in a power consuming system, due to the large toggling rates present.

Alternatively, several counters, synchronous to different phases of the same clockcan be used to increase the resolution using a slower reference clock [18]. The timemeasurement can be easily interpolated from the results of all the counters. The accuracyof the synthesised clock phases sets the achievable resolution.

These techniques are sensitive to the metastability in the counter’s registers. If thesampling “start”/“stop” signals arrive when the counter is toggling, the resulting output

may be unpredictable [19]. Simple Gray code counters are less sensitive to this problem,since only one bit toggles for each clock transition. Interpolation between several Graycode counters can worsen the problem because in that configuration one bit toggle in onecounter corresponds to more than a single Least Significant Bit (LSB) change.

4.1.3. Delay line-based techniques.

The clock rate requirements that limit the use of counter techniques for time intervalmeasurements can be relaxed if the basic CMOS gate delay is used as the time unit.Modern CMOS technologies have gate delays in the order of 100ps thus the resolution ofthe conversion can be quite good.

In this technique, several delay elements (usually inverters [20], alternativelysegments of a transmission line can be used [21][22]) make up a delay line through whicha signal pulse is propagated. The progression of the pulse along the delay line reflects thetime interval being measured.

In Figure 2 an example of such a line is shown. Delay elements made of twoinverters make good building blocks for these lines since they respect the polarity of theinput signal in every output tap. Alternatively differential cells can be used, but they resultin higher static power dissipation.

Tap 0 Tap 1 Tap 2 Tap 3 Tap 4 Tap N

Pulse

Figure 2: Delay line using double inverters as delay elements.

Since standard CMOS technologies are used, an easy to design and highlyintegrated monolithic converter can be developed. Complex systems, including theconverter and large logic units can be integrated in a single IC with low power dissipation.However the delay of a CMOS gate is highly dependent on the process parameters,temperature and supply voltage, therefore requiring frequent calibration. The linearity ofthe conversion transfer function is determined by the matching of the delay cells. Strictdesign rules must be followed to reduce device mismatch to acceptable levels.

Large dynamic ranges can only be achieved if very long delay lines are used. Sincelong lines are difficult to obtain, this technique is limited to short dynamic ranges.


Page 31

4.1.4. Phase Locked Loop (PLL) techniques.

Some of the limitations of the delay lines previously discussed can be overcome bycontinuously adjusting the delay of its elements, using as a reference a clock signal. If thedelay line is closed in a voltage controlled ring oscillator (VCO) topology and theoscillation frequency is controlled via a feedback loop, a PLL is obtained. Control of thedelay of each element can be performed by limitation of the current available to it [23].Analogue control loops are common [24], but digital loops have also been implemented[25]. Alternatively to current limitation, the load at the output of each delay element canbe controlled [26].

This kind of system is able to generate precisely timed signals that can be used intime interval measurement instruments. The inclusion of the oscillator in a closed loopguarantees self-calibration and, thus, low sensitivity to environmental and processchanges. It’s interesting to note that the need to have dynamic control of the delay of thedelay line leads to a slowing of the line in typical operation, meaning that the technologyis not pushed to its limits. Like in any delay line based architecture, delay cell mismatchlimits the linearity of the conversion.

Using asymmetric ring oscillators [24] or differential pairs [27] as the cells of theoscillator, it is possible to obtain the convenient 2N number of time bins per clock cycle.Measurements performed using this technique are related to the reference clock. If a timeinterval is to be measured, the difference between two measures acquired at the end and atthe beginning of the time interval must be subtracted.

PhaseFrequencyDetector

ChargePump

Clkref

Hit

VCO

Hit registers

Figure 3: Asymmetric ring oscillator [24], able to generate a 2N number of timing signals from an odd-

numbered oscillator.

PLL based circuits have the convenient property of being able (depending on theclosed loop properties) to filter out phase noise (jitter) associated with the reference clock,therefore loosening the requirements for the time reference path. Jitter internal to the loopcan also be filtered. However, the increased PLL bandwidth required to perform thatfiltering reduces the filtering capability of the jitter associated with the reference. Notethat phase noise generated within the VCO is accumulated between oscillator periods,thus leading to increased output jitter, when compared to other delay line based schemes,such as the Delay Locked Loop (DLL) [28].

Large dynamic ranges can be obtained by counting the number of oscillations of thering oscillator. The less significant bits of the measurement are thus obtained from thePLL and the most significant bits from the counter. Since both parts of the measurementare generated using the same reference signal (the oscillation period), there is noambiguity in the final result.

PLL’s have been extensively discussed in literature (for example in [29] and [30]),demonstrating their flexibility, high integration level and low power dissipation. Howeverthey require careful layout design, to ensure that all the cell delays are identical and thatthe interconnection capacity on the output of each cell is matched. A PLL is a second (orhigher) order system, therefore the loop stability must be carefully evaluated.

4.1.5. Delay Locked Loop (DLL) techniques.

If the delay line is not closed and it is included inside a feedback control loop, then aDLL is obtained [13][31]. Various topologies of the control loop have been described, butthey typically include a Phase Detector to measure the phase error and a filter thatconverts this information into a meaningful quantity. In contrast to PLL’s, the referenceclock signal is injected directly into the voltage controlled delay line (VCDL) and itsphase is compared with the corresponding phase in the output of the line (see Figure 4).

A DLL has some characteristics in common with a PLL such as the ability togenerate precisely timed signals with high resolution, the self-calibration of the systemand the large dynamic ranges achievable. In order to guarantee a good linearity betweenconsecutive delay elements, matching of devices is a critical parameter.

ClockPhase

Detector

Hit Hit registers

ChargePump

VCDL

Figure 4: Delay Locked Loop and hit registers.

Self-calibration is based on phase information from the extremes of the delay chain.To guarantee that the delay chain is permanently calibrated, the reference clock must beconstantly circulated through it. A constant level of power is thus dissipated, regardless ofthe rate of the hits being acquired.

Dynamic ranges wider than the reference clock period can be achieved byintroducing a counter synchronous to the reference clock. Since both the DLL and thecoarse counter measurement are obtained with the same reference, the expansion of themeasurement’s dynamic range is unambiguous. Using this technique a time stamp


Page 33

converter is obtained, where the time measurements is referred to the clock signal. Inmany applications the reference clock can be used as the “start” or “stop” signal. If that isnot the case, “start”/“stop” measurements can easily be obtained by subtraction of the timestamps of these two signals.

Unfortunately, this kind of controlled loop, unlike PLL loops, lack the capability offiltering jitter coupled to the reference signal. Therefore the time critical paths should bedesigned to be noise insensitive and the reference clock must be stable. Careful design ofthe delay locked loop is also essential, in order to guarantee that each of the delay cellshave the same delay characteristics.

DLL’s can be built using standard digital CMOS technologies, which allows for ahigh integration level and thus lowers system costs. Sensitivity to environmentalconditions is factored out by the self-calibration mechanism and noise sensitivity can belowered to acceptable levels by careful layout and power distribution.

4.2. Beyond the limits of the technology: techniques to improveresolution.

The schemes previously presented have their time resolution limited to the unit celldelay, usually made of two inverter gates. As the demand for higher resolutions grows,faster technologies must be used. Unfortunately the access to these technologies is, atpresent, rather expensive. Another possibility to overcome the resolution limit is to devisedifferent techniques that are able to interpolate time within the basic cell delay. Severalarchitectures have been proposed in the literature, some of them are discussed in the nextfew sections.

4.2.1. Analogue time expansion.

The analogue time expansion technique extends the current integration techniqueinto a scheme where the time interval to be measured is stretched by a factor k dependenton the circuit’s parameters. The expanded time interval thus obtained can be measured byany TDC with smaller resolution.

Several topologies can be used to obtain a time stretcher. The simplest one is in factsimilar to the Wilkinson ADC. In this topology the capacitor that was charged during themeasurement is discharged with a much smaller current. The ratio between the charge anddischarge current is the stretch factor k. If this factor is big enough, a simple counter basedTDC can be used to measure the discharge (stretched) time interval and thus obtain theoriginal time measurement with improved resolution. Even finer resolution can beobtained using DLL based TDC’s.

The dynamic range obtained using this technique can be extended if the start and thestop time are separately measured in relation to a reference clock and the number of clock

cycles elapsing from one measure to the other are also recorded [32]. A refinement of thistechnique allows for the simultaneous calibration of the stretch mechanism [33].

T1 T2 T3 T4

clkstretch

pulse

synchronised pulse

integrator voltage

output to TDC

Figure 5: Timing diagram of the dynamic range extension using a clocked time stretcher [33].

For each pulse to be measured, the TDC captures two time intervals. The first (T2-T1) reflects the unstretched time difference between the pulse arrival and an edge of thereference stretch clock, the second (T3-T2) reflects the stretched image of this timedifference. The stretch factor k is:

12

23

TTTT

k−−

= .

To obtain a high precision measurement of k, an average of several random timedifference measurements is performed. Since the normal data is uncorrelated with thestretcher reference clock, this averaging operation will reduce the error to acceptablysmall levels.

Previous techniques are sensitive to noise in the integrating node or non-linearity ofthe capacitor. This sensitivity can be reduced by the use of two identical capacitors thatare discharged by different currents respectively when the Start and Stop signals arrive. Acomparator is used to identify the moment when the voltages on the two capacitors areagain the same, as is shown in Figure 6.

If the “stop” discharge current is k times the “start” current, then the resulting timeexpansion is given by the following expression, where tsame is the extended time intervallimit:

( )startstopstartsame ttk

ktt −⋅

−=−

1.

In a differential architecture like this the expanded time is very insensitive to supplynoise, or to any non-linearity of the capacitor or current sources used, as long as theyaffect both branches in the same way. Any mismatch between capacitor values or currentlevels will only produce a change in the expansion factor k, which can easily be calibratedat set-up time.


Page 35

I k.I

C

C

reset

start

stop

same

start stop same t

Q

Figure 6: Time expander circuit and corresponding timing diagram.

The main disadvantages of this scheme is the demanding requirements it sets on thecomparator in terms of offset and propagation delay stability along a considerablecommon mode. Its rather short dynamic range and considerable dead time betweenmeasurements also can limit its utility, especially in high hit rate applications.

Large dynamic ranges can be obtained if the measurement is in some waysynchronised to a reference clock. If the time difference from the start to a clock edge andfrom the stop to a clock edge is added to the time between these edges, dynamic rangebecomes independent of the charge/discharge current levels or capacitor sizes.

4.2.2. Vernier differences.

This technique is an extension of the analogue vernier technique [12] where the tworeference signals with slightly different periods are substituted by more convenient delaylines with different delay per cell [34]. A “start” and “stop” pulses are propagated througheach of these lines.

D Q D Q D Q D Q D Q

T1 T1 T1 T1 T1

T2T2T2T2T2

T1 > T2

Start

StopReset

Tap 0 Tap 1 Tap 2 Tap 3 Tap 4

Figure 7: Time expansion using two delay lines with different cell delay.

The rising edge of the “stop” pulse latches the state of the “start” delay line. If thecell delay T1 of the “start” delay line is slightly bigger than the cell delay T2 of the “stop”

delay line, the position of the first flip-flop not set (N) gives the time interval between the“start” and “stop” signals, that is

( )21 TTNTin −⋅=

Very good time resolution can be obtained with this technique. In order to savesilicon area, several improvements can be made: the “stop” delay line can be replaced bythe propagation delay of each D flip-flop. Another technique is to use a single delay linewith different rise (Tr) and fall (Tf) times in the “stop” path, and to connect the “start” lineto logical one [13]. This results in a shrinking pulse and the position of the first flip-flopthat is not set gives the original pulse width in terms of Tr-Tf.

Converters using these schemes have a very limited dynamic range and require verylong delay lines for the desired resolution level. Also when pulses are propagating throughthe lines, no other hits should occur, leading to some dead time between measurements.Another drawback of these schemes is the difficulty of controlling the bin sizes in eachline used. Process spreads, temperature and supply voltage influence these delays,therefore frequent calibrations of the circuit are required.

Vernier techniques are very sensitive to the matching of the delay of the cells acrossthe delay lines. The effects of mismatch are amplified by the nature of the timeinterpolation, where the high resolution is obtained from the small difference between the(comparatively) large delay of the cells in each of the delay lines.

Circular vernier method.

The need of very long delay lines to obtain a reasonable dynamic range can beobviated if the two lines are closed in a ring oscillator-like structure, such as the oneshown in Figure 8. Theoretically this configuration corresponds to an infinite length lineand thus arbitrary dynamic ranges should be obtainable.

D Q D Q D Q D Q D Q

T1 T1 T1 T1 T1

T2T2T2T2T2

T1 > T2

Start

StopReset

Tap 0 Tap 1 Tap 2 Tap 3 Tap 4

Figure 8: Circular vernier scheme for dynamic range expansion.

Both the “start” and “stop” signals are fed into the respective delay line via amultiplexer. As soon as these signals are progressing within the delay line, the


Page 37

multiplexers are switched thereby establishing a ring oscillator like structure. Countingthe number of oscillations completed by each of the signals before they coincide enablesthe correct expansion of the dynamic range. Unfortunately the inversion of the signalpropagating on these ring oscillators makes the decoding of the moment when the twosignals coincide difficult. Solutions have been proposed where different structures areused to detect the coincidence of the two signals in a different way depending on thenumber of oscillations that occurred in each oscillator [35]. However the usage ofdifferent structures in the time critical circuitry makes it hard to equalise their dynamicresponse in all conditions. This may produce considerable non-linearity on the conversiontransfer function.

Another undesirable side effect of this closed loop topology is that all timing errorsthat may occur during the measurement time (due to noise or any other source) willaccumulate in the final measurement. This scheme has the property of integrating all theerrors present during the measurement time.

Calibration, using a PLL-like control around the closed delay line may only be doneoff-line, when there are no measurements. In a high hit rate environment calibration canonly be performed infrequently, which may result in loss of accuracy. Furthermore,coupling between the two closed delay lines may also be a problem. Due to layoutconsiderations they should be implemented close together, and to obtain good resolution,their oscillation frequency (delay of cells) should be very similar. If coupling is presentand there is no active control of the lines during measurement, one of the lines may bepulled to oscillate at the frequency of the other line, which would ruin the measurement.To avoid this problem, calibration can be performed using a dummy channel in a doublePLL like structure. Control information derived from it can be used to control the delay ofthe lines even when measurements are being performed. In this way all the lines areactively pulled to their correct oscillation frequency. The calibration circuitry can beshared between all channels in a circuit, therefore resulting in an efficient use of silicon.

Dual scale vernier method.

There is an alternative implementation of the vernier technique where the dead timebetween measurements is small and the converter is self-calibrating. Contrary to previoustechniques, this technique results in time stamp measurements. The principle of operationis the same as the vernier caliber (Figure 9) used to measure length [36].

Two scales are required, the reference scale, which has a time bin T and the vernierscale, which has a time bin slightly shorter, but spans N reference bins. The differencebetween the two scales determines the bin size of the converter. For example, to obtain abin of 0.1·T the vernier scale must span 9 reference bins, being divided into 10 time bins.A measurement word is made of two components, the higher order bits are obtained fromthe reference scale and the lower order bits form the vernier scale.

0 1

0.43

Figure 9: A vernier caliber measuring a length of 0.43 mm. Note that the third tick mark in the vernier scale

(lower) lines up with a tick mark in the reference scale (upper) [36].

The reference scale can be made with a counter counting cycles of a reference clock.The vernier scale is, for example, a DLL calibrated delay line that spans 9 clock cyclesand is divided into 10 time bins1. When the hit signal is asserted the status of the twoscales is captured. The low order bits of the measurement result from the identification ofthe next bin that will switch. If this bin number is n, then the resulting time measure is:

TmTnTF

t ⋅+

⋅⋅

−= ,

11Mod ,

where Mod(a,b) is the modulus operation, F is the interpolation factor and m is thereference scale measurement. The number of time bins into which the vernier line isdivided is equal to the interpolator factor F. The number N of clock cycles that it spans isF-1.

This technique is very sensitive to the accumulation of non-linearity along thevernier delay line. This sensitivity is amplified if a high interpolation factor isimplemented since the length of the line is increased and the LSB is shortened.

4.2.3. Analogue time interpolation.

In a locked DLL, the signals propagating through the delay chain have edges withalmost constant slopes, directly related to the delay of the delay elements. By performingan analogue sum of the signals in consecutive time taps, it is possible to obtain a timeinterpolation between these taps, thereby increasing the resolution to a level that is betterthan the intrinsic delay of a delay cell (Figure 10).

The design of such a system is made difficult by the need to match the delaythrough the summing circuitry with the direct signal from the taps themselves. Analternative approach is to store all the analogue voltages from each tap when a hit occurs,and later perform the interpolation, either by analogue summing, or by using the storedvoltages as inputs to a weighted filter which output would then be converted using anADC. 1 In fact the delay line includes many more delay elements to avoid interactions between leading and trailingedges of the signal that progresses in it.


Page 39

ClockPhase

DetectorChargePump

Hit

+ + + + + +Hit registers

Figure 10: Time interpolation using voltage sums.

Small ring oscillators, controlled by a PLL structure, can also be used as the basis ofthe time interpolation [37]. First order equalisation of the delay between different timetaps is obtained by including a dummy analogue phase interpolator (weighted sum of thevoltage at its two inputs) in the non-interpolating taps. In this scheme the phaseinterpolator circuit must be calibrated to improve the linearity of the interpolation.

Other interpolation techniques try to generate the voltage ramp typical of currentintegration schemes in a “digital” form [38]. As the “start” signal progresses along thedelay line, a voltage ladder is generated on the summing node. Each step represents thecrossing of a new delay cell by the “start” signal. A high order filter can be used tosmooth out the edges of the steps, thus obtaining the intended voltage ramp. The “stop”signal forces each delay cell into high impedance and disconnects the hold capacitor at thefilter’s output, allowing the resulting measurement to be kept stable for the time necessaryto process it via an ADC.

R R R R R

1

0

Q

1

0

Q

16 digital gates withtri-statable outputs

Analog output

Holdcapacitor

High orderfilter

*1

*1

Stop

Start

Reset

enable

Figure 11: Time to analogue converter using a time interpolation technique [38].

When compared to current integration techniques, this scheme has the advantage ofbeing potentially less sensitive to noise coupling into the summing node. Since theinterpolation is done resistively, the node has much less impedance than a capacitive nodeand there is no integration of noise effects over the measurement period. To convert the

measurement into a binary word, an ADC must be used, which will increase powerdissipation and system costs.

4.2.4. Array of coupled oscillators.

Some techniques have been proposed to increase the resolution of PLL based timeinterpolation circuits to time intervals smaller than the intrinsic gate delay. One way ofachieving this is to use an array of coupled oscillator rings [39]. Each delay cell is made ofa dual input voltage controlled buffer. Both inputs have the same polarity and togetherthey define the output transition time. One of the inputs is used to form the ring oscillator,the other to couple consecutive ring in the array as shown in Figure 12.

If a fixed phase shift is established between two consecutive oscillators, then theidentical coupling between oscillators will create a uniform phase shift between alloscillators. The oscillation frequency remains the same for all oscillators. The timeresolution achieved is the cell delay (td in Figure 12) divided by the number of rings in thearray.

The fixed phase shift is established by connecting the outputs of the boundaryoscillator to the inputs of a cell located in a different position on the oscillator in theopposite extreme of the array. In this architecture the time bin is defined by two closelycoupled delay cells that belong to separate ring oscillators. The inter-coupling betweenconsecutive rings forces the size of each time bin to be set by the complete array. Thisintimate coupling guarantees a good linearity of the conversion function. However, devicematching is a critical parameter for this topology.

T4 T5 T1 T2 T3

T1 T2 T3 T4 T5

td

Figure 12: Coupled oscillators (time resolution of td * 2 / 3).

At initialisation time several modes of oscillation for which the array’s boundaryconditions are met will be present. Each corresponds to the case of having a phase shiftbetween the boundary oscillators that is a multiple of the oscillation period. The locking


Page 41

procedure has to be able to force the circuit into the correct mode, where phase shift issmaller than one oscillation period. This task may not be trivial.

The resolution achievable with this architecture is defined as:

( )M

NTkTxTbin

⋅⋅+⋅=

2,

where T is the oscillation period set by a PLL control loop, N is the number of delay cellsper oscillator and M is the number of oscillators in the array. Variable k reflects couplingtopology of the boundary oscillators (offset in number of delay cells) and x the arrays’modes of oscillation. The correct mode of oscillation is when x= 0. It results in thesmallest bin size.

Layout of these circuits is critical to their correct behaviour. Every delay cell mustdrive exactly the same load, if a good linearity of the measurements is to be maintained.Therefore, a good layout of the consecutive rings is essential in order to guarantee that therings on the extreme of the array are in the same conditions as the rings in the middle andthat there is no systematic effect that affects the size of some time bins. The sameconsiderations apply to the delay cells on the extreme of each ring. Interleaving oscillatorsand the delay cells that make them is, therefore, essential.

This architecture enables high time resolution and large dynamic range in aconveniently dead-timeless converter system. It can be implemented in standard CMOStechnologies, thereby allowing for high levels of integration and low system costs.However it suffers from the same drawbacks of other PLL’s such as sensitivity to VCOinternal noise and error feedback from the end to the beginning of each oscillator ring, etc.

Sharing the array of coupled oscillators between several channels is an effectiveway to compensate for the higher power dissipation required by the use of several ringoscillators.

4.2.5. Array of Delay Locked Loops.

The use of an array of several uniformly offset DLL’s can increase the resolution ofa system to a fraction of the intrinsic gate delay [23][40]. A different DLL (herein referredas Phase Shifting DLL), made with a smaller number of delay elements, is used toprecisely generate the required offsets.

In order to increase the resolution of the converter, the offset between DLL’s shouldonly be a fraction of the delay of the basic cell. This fraction cannot be obtained directly,but a delay that is a fraction bigger than the basic cell delay is easily obtained using aphase shifting DLL locked to the same reference. An arrangement like the one in Figure13, due to the symmetry of the array, is made to look like the DLL’s in the array are onlyoffset by a fraction of the basic cell delay.

The time bin of such a circuit is

NT

MT

TTT clkclknmbin −=−= .

If the required time bin size is a fraction F of the basic cell delay of the DLL’s of thearray, then the relation between M, N and F can be expressed as

1+⋅=

FF

NM ,

where M, N and F are integers.

One disadvantage of this scheme is its inability to divide the reference period in anumber of bins that is a power of two. This means that the measurement obtained will notbe in a pure binary unit of 1/2N, but rather in a unit of 1/(N·F). A special encoder thatconverts this code into a normal binary code must be used, if it is to be used together withother binary measurements such as dynamic range extension using the coarse time counterresults.

tn φ1

φ2

Vc

tn φ1

φ2

Vc

tn φ1

φ2

Vc

tn φ1

φ2

Vc

tmφ

1

φ2

Vc

Clk

N

M

M < N

Figure 13: Array of DLL’s with phase shifting DLL.

Extensions to this architecture, where the use of auxiliary (controlled) delay linesallow for the realisation of any number of subdivisions of the clock period (including thepure binary number) have been proposed [41]. Unfortunately they increase the complexityof the array and thus render it more difficult to design.

Power dissipation is also a concern on this architecture due to the large number ofDLL’s that are continuously active. This drawback can be limited if several channelsshare the same array.


Page 43

Like all DLL based techniques, this technique can be implemented in a standard“digital” CMOS technology. It is therefore easy to integrate it with digital processinglogic in order to build a complex TDC system in a single IC.

4.2.6. Time interpolation using passive RC delay lines.

Most of the techniques discussed so far use to their advantage a closed control loopto guarantee that the converter is permanently calibrated. Schemes to increase the limitedtime resolution that can be directly obtained are based on time interpolation. They usuallyrequire more closed loops in complex topologies, which invariably lead to higher powerdissipation and increased non-linearity.

Minimum delays for a given architecture can only be achieved if the parasitic RCdelay lines present in every metal or polisilicon line are used (see [21] for an example).Delay lines built in this way suffer from a big parameter spread due to process constrains,rendering their exact delay difficult to predict. On the other hand they are ratherinsensitive to supply voltage and temperature variations.

In order to obtain the desired delay from these lines, a calibration procedure must beused. Calibration is mainly needed at start-up, during normal operation the slowtemperature variations and supply changes will not affect substantially the behaviour ofthe lines.

Such a delay line can be used as a stand-alone delay generator, but a converter builtthis way would have very limited dynamic range. However, when used together with aDLL, this limitation is overcome. This converter adds the high resolution possibility to theother benefits of a DLL based scheme, such as large dynamic range, self-calibration, etc[22].

The block diagram in Figure 14 depicts the scheme. When the hit signal is asserted,several (M) consecutive samples of the status of the DLL are acquired with a constanttime interval between them. If this time interval is made such that it is a fraction 1/M ofthe cell delay, it is possible to perform time interpolation within the delay of a DLL delaycell by identifying after which sample the reference clock has exited a given cell.

If the reference clock has a period T, and the DLL is made of N delay cells, the binsize of the resulting converter is:

MNT

Tbin ⋅= .

In this scheme there is no restriction to the values of N and M, therefore it ispossible to directly obtain the measurements performed in a pure binary format.

PDclkref

hit

RC

delay line

N delay cells

M taps

M row

s

hit register

hit register

hit register

hit register

Figure 14: A T/D converter based on a DLL and a RC delay line.

To operate in a truly self-calibrating mode, the circuit that implements this schemeshould also include the RC delay line’s start-up calibration hardware. Fortunately a simplecode density test is sufficient to characterise the RC delay line. From this characterisationthe calibration parameters are obtained and then applied to the line. Any standard CMOStechnology can be used to implement this scheme.

4.3. Summary of characteristics of the TDC architectures.

In the following table, a summary of the interesting characteristics of thearchitectures that have been discussed in the chapter is presented.

Architecture ResolutionDynamic

RangeDead Time

Auto Calibration

Power Consumption

Time Interpolator

SharingTechnology Ref.

Current Integration + - - - - no analogue [14]Counter - inf. no yes - yes digital [12]

Delay Line + - - - + no digital [20]PLL + inf. no yes + yes digital [24]DLL + inf. no yes + yes digital [13]

Analogue Time Expansion ++ - / inf. -- - / +- - no analogue [32]/[33]Vernier Differences ++ - -- - + no digital [34]

Circular Vernier ++ inf. -- - + no digital [35]Dual Scale Vernier ++ inf. no yes + yes digital [36]

Analogue Time Interpolation ++ inf. / - no / - +- / - + yes / no analogue [37]/[38]Array of Coupled Oscillators ++ inf. no yes +- yes digital [39]Array of Delay Locked Loops ++ inf. no yes +- yes digital [23]

DLL / RC delay line ++ inf. no +- + yes (DLL) digital [22]

Table 1: Comparison between the different architectures discussed in the chapter2.

2 Inf. (infinite) means that there is no intrinsic limit to the dynamic range that can be implemented. No dead time meansthat there is no dead time in the time interpolation circuitry. There may be some dead time associated with the read-outof the measurements. + and – means that the characteristic under consideration is advantageous (disadvantageous). + –means that the condition is only partially met or that it is only met under certain conditions.

References for Part I.

[1] Rubbia, C., The quest for the infinitesimally small, CERN/PPE 94-15, Feb. 94.[2] Verweij, H., Electronics for experiments at CERN, CERN/ECP 91-4, Feb. 91.[3] The ALICE collaboration, ALICE – A large ion collider experiment technical

proposal, CERN/LHCC 95-71, Dec. 95.[4] Gomes, P. On-line algorithms for future HEP data acquisition systems, PhD. thesis,

Universidade Técnica de Lisboa, 1995.[5] Batyunya, B. et al., Influence of the time resolution of the time-of-flight system in

ALICE on the measurement of observables, ALICE/SIM 98-08 Internal note, Feb.98.

[6] Kluge, A., ALICE Time-of-Flight Readout – AFRO, ALICE Internal note, Jun. 99.[7] Martins, R. C. et al., Taxonomic problems on ADC characterisation, Proceeding of

the 5th. IEEE International Conference on Electronics, Circuits and Systems, Vol. 3,pp. 445-448, Sep. 98.

[8] Razavi, B., Principles of data conversion system design, IEEE press, Chapter 6,1995.

[9] Doernberg, J. et al., Full-speed testing of A/D converters, IEEE Journal of Solid-State Circuits, Vol. 19, No. 6, pp. 820-827, Dec. 84.

[10] Gray, P. R. et al., Analysis and design of analogue integrated circuits, John Wiley &Sons, Inc, Chapter 11, 1993.

[11] Fish, P. J., Electronic noise and low noise design, McGraw Hill, Inc, 1994.[12] Porat, D. I., Review of sub-nanosecond time-interval measurements, IEEE

Transactions on Nuclear Science, Vol. 20, pp. 36-51, 1973.[13] Rahkonen, T. E. et al., The use of stabilized CMOS delay lines for the digitization

of short time intervals, IEEE Journal of Solid-State Circuits, Vol. 28, No.8, pp. 887-894, Aug. 93.

[14] Tanaka, M. et al., Development of Monolithic Time-to-Amplitude Converter forHigh precision TOF Measurement, IEEE Trans. on Nuclear Science, Vol. 38, No. 2,pp. 301-305, Apr. 91.

[15] Sasaki, O. et al., A high-resolution TDC in TKO BOX system, IEEE Trans. onNuclear Science, Vol. 35, No. 1, Feb. 1988.

[16] Stevens, A. E. et al., A Time-to-Voltage Converter and Analog Memory forColliding Beam Detectors, IEEE Journal of Solid State Circuits, Vol. 24, No.6, Dec.89.

[17] Yamrone, B. et al., LeCroy MQT300 charge-to-time converter, Conference Recordof the IEEE Nuclear Science Symposium 1996. Vol. 1, pp. 436-438, Nov. 96.

[18] Veneziano, S. et al., Performances of a Multichannel 1 GHz TDC ASIC for theKLOE Tracking Chamber, Proceedings of the Elba conference on AdvancedDetectors, 1997.

[19] Kim, L.-S., Metastability of CMOS latch/flip-flop, IEEE Journal of Solid-StateCircuits, Vol. 25, No. 4, pp. 942-951, Aug. 90.

[20] Bailly, P. et al., A 16-channel digital TDC chip, Conference Record of the IEEENuclear Science Symposium 1997.

[21] Gogaet, S. et al., A 10 ps resolution 1.6 ns tuning range CMOS delay line for clockdeskewing in data recovery systems, Proc. ESSIRC'95, Lille - France, pp. 54-57,Sep. 95.

[22] Mota, M. et al., A high-resolution time interpolator based on a Delay Locked Loopand an RC delay line, IEEE Journal of Solid-State Circuits, Vol. 34, No. 10, pp.1360-1366, Oct. 99.

[23] Mota, M. et al., A four channel, self-calibrating, high-resolution Time-to-DigitalConverter, Proceedings of the 5th. IEEE International Conference on Electronics,Circuits and Systems (ICECS’98), Lisboa, Portugal, Sep. 98.

[24] Arai, Y. et al. A time digitizer CMOS gate-array with a 250 ps time resolution,IEEE Journal of Solid-State Circuits, Vol. 31, No. 2, pp. 212-220, Feb. 96.

[25] Dunning, J. et al., An all-digital Phase-Locked Loop with 50-cycle lock timesuitable for high-performance microprocessors, IEEE Journal of Solid-StateCircuits, Vol. 30, No. 4, pp. 412-422, Apr. 95.

[26] Johnson, M. G. et al., A variable delay line PLL for CPU-coprocessorsynchronisation, IEEE Journal of Solid-State Circuits, Vol. 23, No. 5, pp. 1218-1223, Oct. 88.

[27] Loinaz, M. J. et al., A CMOS multichannel IC for pulse timing measurements with1 mV sensitivity, IEEE Journal of Solid-State Circuits, Vol. 30, No. 12, pp. 1339-1349, Dec. 95.

[28] Weigland, T. C. et al., Analysis of timing jitter in CMOS ring oscillators,Proceedings of International Symposium on Circuits and Systems (ISCAS), Jun. 94.

[29] Razavi, B. et al., Monolitic phase-locked loops and clock recovery circuits – theoryand design, IEEE press, 1996.

[30] Gardner, F. M., Phaselock techniques, John Wiley & Sons, 1979.[31] Christiansen, J. et al., An integrated 16-channel CMOS time-to-digital converter,

Conference Record of the IEEE Nuclear Science Symposium 1993, pp. 625-629,Oct. 93.

[32] Raisanen-Ruotsalainen, E. et al., A time digitiser with interpolation based on Time-to-Voltage Conversion, Proceedings of the 40th. Midwest Symposium on Circuitsand Systems (MSCAS), Vol. 1, pp. 197-200, Aug. 97.

[33] Blanar, G. et al., A self-calibrating high-resolution common stop time digitisercircuit, IEEE Transactions on Nuclear Science, Vol. 45, No. 3, Pt. 1, pp. 801-804,Jun. 98.

[34] Bailly, P. et al., A 100 picosecond resolution, 6 microsecond full scale multihit timeencoder, in CMOS technology. Proc. of Third International Conference onElectronics for Future Colliders, pp. 57-68, May 93.

[35] Fota, C., Modélisation et étude de faisabilité d’un codeur de temps numérique àhaute résolution en technologie intégrée sur Silicium et Arséniure de Gallium. Thèsede Doctorat de l’Université Pierre et Marie Curie (Paris VI), Dec 96.

References for Part I.

Page 47

[36] Gorbics, M. S. et al., A high-resolution multihit time to digital converter integratedcircuit, IEEE Transactions on Nuclear Science, Vol. 44, No. 3, Pt. 1, pp. 379-384,Jun. 97.

[37] Knotts, T. A. et al., A 500MHz time digitiser IC with 15.625ps resolution, Digest ofTechnical Papers of the IEEE International Solid-State Circuits Conference 1994,Vol. 37, pp. 58-59, Feb. 94.

[38] Neyer, C. et al., Internal Note ALICE 94-07 (CERN).[39] Maneatis, J. G. et al., Precise delay generation using coupled oscillators, IEEE

Journal of Solid-State Circuits, Vol. 28, No. 12, Dec. 93.[40] Christiansen, J., An integrated high-resolution CMOS timing generator based on an

array of Delay Locked Loops, IEEE Journal of Solid-State Circuits, Vol.31, No.7,pp. 952-957, Jul. 96.

[41] Chu, H.-C. et al., A General High-Resolution Multiphase Clock Generator,submitted to the IEEE Journal of Solid-State Circuits in Oct. 97.

PART II.

A TDC ARCHITECTURE BASED

ON AN ARRAY OF DELAY

LOCKED LOOPS.

In this Part of the dissertation we will discuss the work performed in order todevelop and demonstrate an architecture suitable for high-resolution time intervalmeasurements in the context of the ALICE Time-of-Flight detector collaboration.

Particle identification in the ALICE experiment requires an accurate measurementof the time that the particles take to cross a cylindrical surface located at a fixed distancefrom the interaction point. For this purpose a dedicated Time-of-Flight detector will bebuilt. The detector itself is able to resolve time with a resolution between 40ps and 100psRMS, depending on the technology chosen [1]. All the front-end components must have abetter resolution, in order not to compromise the characteristics of the detector.

Given the time uncertainty associated with the response of the detector and with theunderlying physical process, the main performance metric used to characterise the front-end electronics is the standard deviation of the error it generates, σ, also known as theRMS (root mean square) resolution. In this application, it is required that the Time-to-Digital converter has a RMS resolution better than 50ps across the full dynamic range.

Depending on the measurement method used (time tagging or start-stop), thedynamic range that is required varies. To avoid any ambiguity, especially when the timetagging method is used, the TDC must allow for a large dynamic range.

Another important feature of the detector is its granularity. In order to differentiateparticles crossing the detector close to each other, it is subdivided in a large number ofindependent detector cells, each having its dedicated front-end. Therefore a large numberof electronic channels are required (> 150,000), of which the front-end must sit close tothe detector. The number of channels involved and area constrains imply a largeelectronics integration level.

It is our goal to demonstrate an architecture that adheres to all the previousrequirements in terms of resolution and potential for dynamic range expansion. It allowsfor start-stop and time tagging measures and has a low dead time between measures. We

will use a standard “digital” CMOS technology that has a proven digital library available,so that digital functionality can be easily implemented at low costs. The architectureenables the integration of several TDC channels into a single chip and allows the sharingof common data processing and buffering logic. To demonstrate this feature, fourconversion channels and a small number of simple system-related functions such as dataencoding and buffering are included. In this way the basic functionality required to build atime acquisition system is included in the demonstrator.

The first chapter of this part (Chapter 5) is dedicated to the presentation of thearchitecture being used. Analytic tools developed to study the way different errors thatmay occur in the conversion circuitry will be exposed in Chapter 6. The Chapter 7includes a detailed description of the important electronic blocks that define the converterperformance and in Chapter 8 this part of the dissertation is concluded by the expositionof the experimental results that were obtained using the prototype TDC.


5.1. The Delay Lock Loop (DLL).

A simple instrument to measure time intervals with fine resolution can be made witha delay line tapped at regular (time) intervals. If a reference signal is progressing alongthat line and its position is sensed at the limits of the time interval, the measured time isproportional to the number of taps that the signal covered during this interval. The delaybetween two consecutive taps is the constant of proportionality.

In standard CMOS technologies, the most commonly available delay cell is thelogic gate. The usual choice for these cells is the inverter because of its simplicity andspeed. Delays of the order of few hundreds of pico-seconds can currently be obtainedunder worst case operating conditions in a 0.7µm CMOS technology.

Unfortunately, the gate delay is very sensitive to process parameters, temperatureand supply voltage. This means that the circuit has to be characterised periodically inorder to measure the delay of each gate. A simpler way of operating this circuit is to builddelay elements which delay can be externally controlled. The delay of the cells isconstantly sensed and forced to the desired value, regardless of environmental changes.This is the operating principle of a Delay Locked Loop (DLL).

In a DLL the signal progressing through the delay line is a reference clock. Acontrol loop encloses the delay line and constantly monitors the delay between thereference clock at the beginning and at the end of the line. If this delay is different fromone clock period, the control loop adjusts the delay of the delay cells until the correctvalue is obtained.

When the hit signal is asserted, the status of the line is stored in a set of hit registers.The stored data reflects the time difference from one edge of the reference clock to themoment the data was stored. A random time interval can be measured if two of such timedifferences are stored. The difference between them is the pretended measurement.

The control loop has three main functions: sense the delay difference between thesignal at the begin and end of the delay line, convert the error information into ameaningful quantity and integrate and hold the control information until a new decision istaken.

These functions correspond to the building blocks on Figure 1. The phase detector isused to determine if the delay line is too fast or too slow. A sequential phase detector isusually chosen to perform this function. The resulting (binary) information is thenconverted by the charge-pump into a “packet” of charge that is stored in (or taken from) afilter capacitor. The capacitor in this example behaves as the loop integrator.

clock

phasedetector

chargepump

hit

D

hit registers

Q

Qb

C

Figure 1: Delay Locked Loop block diagram.

In contrast with Phase Locked Loops (PLL), which have another integrator in theVCO (voltage controlled oscillator), the DLL loop is a first order system. The presence ofthe second integration and of the proportional term in PLL’s is due to the necessity oftracking both phase and frequency. A DLL only tracks delay (or, equivalently, phase),resulting in a simpler loop which is inherently stable.

The scheme so far described acquires only a limited number of features of the hitsignal, like the arrival time and possibly also the pulse length (if the delay between riseand falling edge is also measured). Alternatively, the DLL can be used to generate a timebase for a set of registers that sample the hit signal with a short periodicity (determined bythe cell delay) [2], as shown in Figure 2. In this way, a full picture of the timingcharacteristics of the hit signal can be sampled and stored. Digital signal processingalgorithms can then be used to extract the interesting features from the data stream.

clock

phasedetector

chargepump

hit

D

hit registers

Q

Qb

C

D D D D D D

Figure 2: Delay Locked Loop used in a time base application.

Chapter 5: Architecture Overview.

Page 55

With this scheme it is easy to identify glitches (short pulses) or any other undesiredpulse characteristics. However, this sampling scheme results in a continuos activity of thehit registers, increasing the power dissipation and, possibly, the noise in the power supply.Also, the data is produced at a quite high rate and therefore a large read-out bandwidth isnecessary to assure that no data is lost. In these conditions, data reduction algorithms mustbe applied at a very early stage.

5.2. The Array of DLL’s (ADLL).

The time resolution of a DLL based converter is determined by the gate delay. Toobtain better resolution either a faster technology is selected, which results in shorter gatedelays, or an architecture that is able to interpolate time within the gate delay is used.

One way of achieving this interpolation is to use a group of F Timing DLL’s thathave a small time offset between them. This offset is precisely determined by a PhaseShifting DLL, which is locked to the same reference clock (see Figure 3) [3].

tn φ1

φ2

Vc

tn φ1

φ2

Vc

tn φ1

φ2

Vc

tn φ1

φ2

Vc

tmφ

1

φ2

Vc

clkref

N

M

M < N

Tm

Tn

tapm

tapn-1

tapn

tapm

+1

Figure 3: Array of DLL’s with phase shifting DLL, showing bin definition.

A time offset smaller than the minimum gate delay is, of course, not possible toobtain directly. However, it is possible to obtain an offset that is slightly larger than theminimum gate delay. Assuming that the offset (Tm) is a fraction 1/F bigger than the delayof each delay cell in the Timing DLL’s (Tn) then, as shown in Figure 3, the time offset

obtained from corresponding taps in consecutive DLL’s is Toff = Tm= Tn·(1+1/F). If theprevious tap of the second DLL is used to define the end of the bin the resulting bin sizewill be Tbin = Toff-Tn = Tn·(1+1/F)-Tn = Tn/F, as intended.

Bins in the extremities of the Timing DLL’s are defined from taps in opposite endsof consecutive DLL’s, profiting from the periodicity of the clock.

The size of a bin is defined as the delay difference of taps in the two ends of the bin:

⇒+−+=−= −+−+ )taptap(taptaptaptapbin 11,1,1 nmnmnmnm

⇔⋅+⋅−⋅−+⋅+= )()1()1( nmnmbin TnTmTnTmT

nmbin TTT −=⇔ ,

where m and n are the position of the taps that define the bin, as shown in Figure 3.Variable m represents the timing DLL and n is the tap number within that DLL (0 ≤ m < F≤ M and 0 ≤ n < N). Delays Tn and Tm are related to the period of the reference clock (Tclk)by the number of taps M and N of the respective DLL’s:

NT

T clkn = ,

MT

T clkm = .

From these equations, the relationship between M, N and F can be defined as:

⇔−==N

TM

TF

NT

T clkclkclk

bin

FF

NM+

⋅=⇔1

.

This definition unfortunately shows that for any given fraction F, the applicablevalues for M and N do not result in a number of bins N·F that is a pure binary number(N·F?2n, for any n). To obtain such a convenient representation, a code conversion shouldbe performed latter in the data acquisition chain.

Contrary to other vernier techniques that also use delay differences to obtain sub-gate resolution, in this architecture all the delay lines are locked to the same referencesignal and only have to span a short length (corresponding to one reference clock period).Also, the ADLL can be shared between several channels therefore increasing theintegration level and decreasing the overall power dissipation.

The relations that have been established show that using this scheme, one cantheoretically achieve a bin size that is any fraction of the original cell delay. In practicethis is not the case since this interpolating procedure, where a small time difference (Tm-Tn) is extracted from two large delays ((m+1)·Tm+(n-1)·Tn and m·Tm+n·Tn) is very sensitiveto any errors present in the array. A small error in the definition of Tm or Tn is amplifiedby the nature of the interpolation and becomes a significant part of Tbin, therefore limitingthe achievable resolution. Bins in the extremities of the DLL’s are also sensitive to the


Page 57

error accumulation, since they are interpolated from taps in opposite extremes ofconsecutive DLL’s.

This interpolation method sets, therefore, stringent requirements on the DLL’s thatmake up the array. Minimisation of device mismatch and of phase error are veryimportant design criteria.

An interpolator based on the ADLL scheme can, in principle, be designed in such away as to minimise reference clock jitter and all static sources of non-linearity. Thedegradation of the time resolution due to delay cell mismatch is, however, harder to dealwith since it is a characteristic inherent to the fabrication of the circuit that cannot becompletely eliminated by design. Therefore delay cell mismatch, and ultimately devicemismatch, sets the limit to the resolution achievable with these converters.

0

0.05

0.1

0.15

0.2

0.25

0.3

1 2 3 4 5 6 7 8 9 10

interpolation factor (F)

ideal

1%

2%

3%

4%

5%

delay cellmismatch (σ)

Figure 4: Interpolation limits due to cell mismatch.

The graphic of Figure 41 shows the root mean square (RMS) resolution that can beachieved using an ADLL based interpolator in the presence of delay cell mismatch2

(assuming N=35 delay cells per Timing DLL). As would be expected the effects ofmismatch increase as the interpolation factor (F) increases. Therefore, the gain inresolution obtained by increasing the interpolation factor vanishes after a certain level ofdelay cell mismatch. The maximum interpolation factor that is rewarded by a consequentimprovement in resolution varies between F=4 and F=5, depending on the actual delaycell mismatch.

5.3. Conversion dynamic range.

The use of a periodic reference signal in the array of DLL’s makes it impossible todifferentiate two measurements resulting from hit signals arriving separated by multiples 1 Given a number of cells per Timing DLL, N, some of the interpolation factors, F, displayed do not result ina realistic ADLL. However they are included for completeness.2 As explained in Chapter 6, device parameter’s mismatch leads to identical delay cells having a differentpropagation delay. The delay of a cell is seen as a random variable with a normal PDF, having a variance σ2.

of the reference clock period. The dynamic range of such a converter is therefore limitedto one reference clock cycle.

The simplest way to increase the dynamic range would be to increase the clockperiod. However this solution requires the use of a longer delay chain (or, conversely,smaller resolution). A better solution is to include in the converter a counter synchronousto the reference clock.

n bitcounter

n bitcounter

Reset

Clk

Hit

Sel

Register #0 Register #1

Coarse word

N N+1

N N+1 N+2

N N+1

Clk

Register #0

Register #1

Sel

Coarse word

Figure 5: Dynamic range extension using two coarse time counters.

The counter is itself a converter with a coarse resolution (one reference clock cycle)but a large dynamic range (depending on the number of bits implemented). Its results canbe appended to the results of the array conversion, which have a fine resolution but asmall dynamic range. Since both coarse and fine time words are obtained using the samereference clock, no ambiguity is generated from the dynamic range extension.

The critical moment for such a scheme is when a measurement is performed whilethe counter is switching and thus not yet stable. In this situation, the captured coarse wordis not predictable or may be in an intermediate state and thus induce metastability in thehit registers.

If two counters are used, synchronous to opposite phases of the reference clock,there is at any time one counter with stable outputs (see Figure 5). All the converter has todo is to select the correct counter results in order to obtain the correct coarsemeasurement.


Page 59

The selection of the stable counter is done in accordance with the phase of thereference clock at the moment that the hit signal is asserted. Fortunately the status of theDLL, that is acquired at the same moment, accurately reflects the phase of the referenceclock, thus it can be used to determine the correct coarse result.

Time stamp measurements obtained from such a converter are referred to an initialinstant in the beginning of the clock period when the coarse time counters are at zero. Onecan thus see the counter reset signal as a common Start signal that sets the time zero at thebeginning of the clock cycle. In these conditions, the initialisation of the coarse countersis also an important parameter for the performance of the converter. Start-Stopmeasurements don’t require any special initialisation of the coarse counter, since they arenot referred to a particular initial instant.

5.4. Time critical paths.

Timing information is delivered to the converter via two main signals: the referenceclock and the hit signals. The reference clock is used as the basis for the measurementsand the hit signals set the exact time the measurement is to be acquired. The highfrequency spectral components of these signals are determinant to set the accuracy of thetiming information received.

These signal paths must be handled carefully since any deterioration of therespective signals’ time characteristics will not be regenerated inside the converter andthus will degrade the resolution of the converter. Jitter in the reference clock received bythe array must be very small since the DLL loop is unable to filter jitter in its input signal(see Appendix B).

Most of the noise that may couple into the time critical paths can effectively befactored out if differential signalling levels are used. In fact, noise coupling into signalpaths at board and bonding level affect close-by paths in the same way, thus it is mainlycommon mode noise. Several standards are commercially available for differentialsignalling. Selection should be based on bandwidth, compatibility of supply levels,simplicity of receivers, etc.

These considerations are not so critical inside the converter IC because there thesignal paths are short and the noise environment can be designed such that the signals arenot very sensitive to noise that is generated in the circuit. Increased noise immunity couldbe achieved if differential logic was also used throughout the time critical circuitry insidethe circuit. However this increased noise immunity would be obtained at the expense ofincreased power dissipation.

5.5. Measurement acquisition and storage.

The measurement instant is defined by the assertion of the hit signal. At thismoment the status of the array and of the coarse counters is captured in a group of hit

registers. Data stored in these registers reflects the time that lapses from the beginning ofthe reference clock period to the instant the hit signal was asserted. The measurementconsists only of the storing operation, therefore the time spent on this operation is minimaland the converter has no dead time.

The hit register is the interface between the time measurement circuitry and thetiming insensitive digital processing performed afterwards. Its activity has an importantcontribution to the converter linearity and should therefore be treated as a time criticalcircuitry.

In order to avoid degradation of the linearity of the converter due to the acquisitionstage, the latching instant must be well defined and the same for all the tap registers. Thisrequires matching-minded approach to the design and layout of the registers, sincemismatch at this level results in different latching times for each register and thus inincreased non-linearity of the converter. Furthermore, the latching signal should arrive atthe same time to every register involved in the measurement. If the intended resolution isvery high, small propagation delays along the lines that distribute this signal will degradethe measurement accuracy, as will be shown in Chapter 6. In some topologic conditionspropagation delays may accumulate resulting in non-negligible non-linearity.

Due to the large number of registers integrated in one circuit, the power dissipationmay be important. A side effect of the large instantaneous currents that may be required atthe acquisition moment is the noise it induces in the power supply. Power supply noise atthis stage may cause crosstalk between channels, if they are performing measurementsconcurrently. Careful power distribution is therefore necessary to reduce this effect andalso the possible deterioration of the DLL’s closed loop dynamic behaviour.

5.6. Read-out architecture.

A converter circuit is not complete only with the time acquisition circuitry.Important functions such as buffering, data encoding, data reduction and handling of theread-out protocol have an impact on the converter performance and enhance itsfunctionality, turning it into an integrated time measurement system.

Data buffering is probably the function that has the biggest effect on the converter’sperformance (considering High-Energy Physics applications). Due to the random natureof the assertion time of the hit signal, measurements must be performed at unpredictabletimes. Usually the data acquisition system down-stream of the converter is only able tohandle a limited data rate from any given origin, because the communication medium isshared between several data sources. Measurements acquired with shorter time separationthan the read-out period would then be lost, even if the converter it-self was fast enough toprocess them. This would result in an increased converter dead time.

This limitation can be circumvented in several different ways. The read-out rate canbe made much higher, thus decreasing the minimum interval between two accepted


Page 61

measurements. Alternatively, a derandomising buffer can be included after the converter.This buffer holds data arriving in quick succession until it can be read-out. Also a datareduction function (trigger based data reduction) may exist that discards measures that donot qualify in the acceptance criteria. If applied it can reduce the data rate significantly.

Channel #0 Channel #1 Channel #N

Read-out

hit hit hit

@ Hit rate

@ Internal clock rate

@ Read-out rate

Channel buffer(s)

Group buffer

Figure 6: Example of the first level of a read-out buffering hierarchy.

The first solution is usually not applicable. Increased read-out speed increasessystem costs and results in an ineffective use of this resource since most of the time thehigh speed would not be needed. The two other solutions, if used together, are veryeffective in smoothing the read-out rate so that an effective usage of a low speed read-outchannel can be made without increasing the dead time between accepted hits.

Using one large derandomising buffer per channel would however be expensive interms of silicon usage. A preferred solution is to build a buffering hierarchy, bypartitioning the conversion channels into small groups, use a common buffer for eachgroup and a small individual buffer for each channel (as in Figure 6). Each group ofchannels can then be merged into a larger “super-group” and so on, until the hierarchy thatis best adapted to the application has been built.

The size of the channel group and of the individual buffers is defined by theexpected acquisition rate and channel occupancy, as well a by the read-out rate andallowed measurement loss. A good knowledge of the application in view is thereforerequired, prior to defining these buffers.

5.7. The prototype.

A Time-to-Digital Converter (TDC) based on this architecture was built [4]. Thecircuit demonstrates the feasibility of the ADLL as a time interpolator. Furthermore, toemphasise the ability to integrate all the required functionality in a single, inexpensive,circuit, the prototype was implemented in a commercial 0.7µm CMOS technology. Ablock diagram depicting the prototype is shown in Figure 7.

8 bitcounters

clkref

rsttime

serialinterface

programinterface

32-wordFIFO

data encoder

read-out interface

hit<3:0>

hit enable

4 channels2-word

channel buffer

M=28 cells N=35 cells

clkro

PD

PD

PD

PD

PD

control.

Figure 7: The prototype block diagram.

The ADLL is made of four (F=4) Timing DLL’s each dividing the reference periodin 35 parts (N=35). A 28-tapped Phase Shifting DLL (M=28) is required to achieve thecorrect adjustment of Timing DLL’s. An 8-bit coarse time counter is used to obtain adynamic range extension to 256 reference clock cycles.

Using an 80MHz reference clock (T=12,500ps), the bin size, over a full dynamicrange of T·256=3.2µs, is

ps3.89140

500,12==

⋅=

NFT

Tbin .


Page 63

The bin size of the independent DLL’s is Tm=446.4ps and Tn=357.1ps, respectivelyfor the Phase Shifting and for the Timing DLL’s. The reference clock (and the hit signal)receivers are implemented differentially, to avoid common mode noise coupling into thesetime critical paths.

The demonstrator includes a common data encoder that converts the ‘thermometer’code in which the fine time measurements are encoded at the output of the ADLL into abinary encoded word. It also merges the correct coarse time word into the finalmeasurement word. The encoding results in a data word reduction from 156-bit to 16-bit.

Four TDC channels were integrated in the IC. Each channel includes a two-worddeep asynchronous pipeline buffer (channel buffer). A common 32 word deepderandomising buffer (group buffer) is also included in order to ease the read-out raterequirements. This partition of the buffering hierarchy is well adapted to the low hit rateexpected in the application and demonstrates the partition concept. The read-out interfacelogic, as well as the encoding and common buffering circuitry work asynchronously to thereference clock (clkref), using a clock (clkro) of up to 40MHz.

A slow, serial read-out interface is also implemented to facilitate the necessary testand debugging tasks. All necessary programming is performed via an independentprogram port that is adapted for the daisy-chaining of several TDC’s in a single serial line.In fact, the prototype includes sufficient functionality to allow it to be used in the actualworking environment, included in the data acquisition chain of a High-Energy Physicsexperiment.

In the photograph of Figure 8, the main functional blocks of the prototype arehighlighted. This circuit is encapsulated in a 68-pin plastic PLCC package.

5.7.1. Performance analysis.

Timing characteristics.

The short analysis that will be made here takes into account only the errors intrinsicto converters built using this architecture. Other sources of errors degrade the resolution ofthe measurements, but they can be avoided, or at least minimised by careful circuit design.

The LSB (least significant bit) of the converter, in the configuration proposed, isTbin=89.3ps. The theoretical RMS resolution σq is determined by the quantisation error:

ps8.2512

==σ binq

T.

The resolution is, however, limited by the unavoidable delay cell mismatch. Theanalysis developed in the Chapter 6 shows that the maximum effect of cell mismatch isseen in the middle of the last Timing DLL (m=F-1 and n=N/2). Assuming a mismatch(σmatch) of 1%, the additional RMS error due to the array is:

( ) ps8.124

11

1 2

=⋅+

+

⋅+−⋅−

⋅⋅σ=σ binmatchADLL TN

FF

FMM

FF .

Array of DLLs

Hit registers (4 channels)Read-out

FIFO

Read-out and encoding logic

Coarse timecounter

Figure 8: Prototype circuit showing main functional blocks.

In addition, unavoidable jitter present in the reference clock and intrinsic to theclosed loop operation is estimated to be on the order of σjitter=15ps. Adding thesecontributions quadratically, the overall RMS resolution should be ~32.5ps (0.36LSB).

This value reflects the expected resolution if a number of converters are measured.Individual converters may have a greater or smaller resolution, depending on their actualmatching parameters. Other sources of errors will most likely degrade the converterresolution, therefore this value can be used as a benchmark to evaluate the characteristicsof the actual prototypes.

The results of tests carried out with the prototype are detailed in Chapter 8. Theyshow an overall RMS resolution of 34.5ps (0.38LSB), which is in accordance with theexpected value previously shown.

Chapter 6. Analysis of the L imits to the TDC Resolution.

In this chapter we will develop mathematical tools to predict and analyse the effectsof different error sources in the linearity and in the time resolution of a DLL basedconverter. The analysis is extended to the more complex case of the ADLL. Theseanalysis tools allow for the translation of important system level performance parametersinto design variables that can then be used to judge the design against the expectedperformance.

All the most important internal error sources are accounted for, namely the delaycell mismatch, the dynamic behaviour of the closed control loop and several causes ofphase error.

6.1. Non-linear ity due to cell mismatch.

The delay cell defines the LSB of a DLL based converter. Delay differencesbetween cells produce variations of the LSB along the dynamic range. Therefore, theconversion becomes non-linear and the resolution is degraded.

Although all cells have identical layout and are biased in the same conditions, theirdelay is not the same. If the delay of a large number of these cells is measured, theirdistribution is found to have mean µ and variance σ2. The delay of a cell can, therefore, beseen as a random variable with a normal Probability Density Function (PDF) having amean µ and variance σ2. The mean corresponds to the expected cell delay, and thevariance gives a measure of the spread of the actual delays around that value.

6.1.1. Origins of mismatch.

Delay mismatch has its origins in the variation, due to the fabrication process, of theelectrical parameters of the devices that constitute the cell. Two kinds of parametervariations can be distinguished: local and global variations [5][6][7]. Local variationsaffect devices that are immediate neighbours. This kind of random variation is generallycalled parameter mismatch. Global variations affect devices that are located far away inthe same die, in different dies or even in different wafers. At a circuit level, globalvariations can be seen as static errors that affect the absolute values of the respectiveparameters. These variations are mainly due to process and temperature gradients, non-

uniformity of the photo-lithographic processing caused by proximity effects and differentorientation of devices.

Circuit topologies that rely on relative, rather than absolute device parameterseffectively counter global mismatch variations. The DLL structures only rely on relativecell delay, therefore the effects of global parameter variations will be disregarded in thisstudy.

The effects of local variations can be limited by proper layout of the cells, keeping aconstant orientation of the devices, avoiding temperature gradients and guaranteeing thateach cell has the same “physical” patterns in its vicinity. Local variations result fromunavoidable deviations from the intended values of key parameters during fabrication.Thin oxide thickness, bulk doping levels, mobility, etc. suffer statistical variations thataffect important electrical parameters, such as the threshold voltage (Vt), the devicecurrent factor (β) and the body factor (γ). These random variations are usually assumed tobe uncorrelated, having a normal distribution with a variance that is inversely proportionalto the gate area.

As devices approach their minimum feature size, especially in deep submicrontechnologies, mismatch also becomes dependent on gate length, L and width, W,separately. To guarantee a good matching behaviour, devices should be drawn with anappropriate gate area and using conservative (larger than minimum) gate dimensions.

6.1.2. Effects of cell delay mismatch.

The integral linearity error results from the accumulation of the individual cell delayerrors, subject to the limits imposed by the closed control loop of the DLL (the overalldelay of the line is the period of the reference clock). The analysis in Appendix C showsthat the standard deviation of the integral error (σDLL(i)) in a N-tapped DLL is defined bythe following expression, where σcell=σ/µ reflects the matching of the delay of theindividual delay cells as a fraction of their mean value µ=T/N.

)()( nNN

ni cellDLL −⋅⋅σ=σ ,

where the timing variable n is defined in accordance with the bin position i along thedelay chain by ),1(Mod Nin += 1, Ni <≤0 . This definition of the timing variable n will

be used throughout this chapter, in the context of the analysis of isolated DLL’s.

From the previous equation it can be observed that for the same matching betweendelay cells σcell, the standard deviation of the integral error is bigger for longer delay lines(higher N). Therefore, for a given cell delay, better results can be obtained using a short

1 The notation Mod(a,b) denotes the modulo operation. It is required to capture the reference periodicity ofthe DLL timing interpolation: The last bin (N-1) has its limits defined by tap N-1 and tap 0.

Chapter 6: Analysis of the Limits to the TDC Resolution.

Page 67

delay line operating at higher frequency than with a long delay line operating at lowerfrequency.

In a ADLL, time interpolation is obtained using taps from several phase shiftedDLL’s. The standard deviation of the overall integral error can be obtained by taking intoaccount the error accumulation along the DLL’s in the delay path. For any bin underconsideration, the path from the origin includes delay cells in the Phase Shifting DLL andin the respective Timing DLL. Since delay variations due to mismatch are not correlatedbetween DLL’s, the standard deviation of the integral error is the square sum of the partialerrors:

( ) ( )nNN

nmM

M

m

F

FFi cellarray −⋅+−⋅⋅

+⋅σ⋅=σ

21

)( ,

where M, N and F are defined in accordance with the allowed combinations for the array.The phase shifting variable m and the timing variable n are, respectively the timing DLLnumber and the bin number in the corresponding DLL. They are calculated taking intoaccount the staggering of the DLL’s across the clock period. If i ( )FNi ⋅<≤0 is the

array bin number, then:

),1(Mod Fim += ,

−

+= Nm

F

in ,

1FloorMod .

The Mod(a,b)2 and Floor(a) operations are, respectively the modulo and the integertruncation operations. The definition of n is a generalisation of the one presented for theisolated DLL case, where the interpolation factor was F=1. These definitions of the phaseshifting variable m and of the timing variable n will be used throughout this chapter, in thecontext of the analysis of ADLL structures.

In Figure 1 an example of the expected integral error due to cell delay mismatch isshown. It corresponds to the case of an array with N=35 (number of cells per timingDLL), F=4 (interpolation factor) and a cell delay with a standard deviation of 0.01 (1%)of the cell delay.

When several DLL’s are assembled in an array structure, the single DLL’s roundedcurve shape (also shown) is distorted by the introduction of the Phase Shifting DLL.There is a strong periodic component with a periodicity of F, corresponding to the foldingof the array from the last Timing DLL to the first one.

The larger non-linearity found on the first part of the curve is due to the fact thattiming interpolation in this region is performed using cells in different extremities ofsuccessive timing DLL’s. 2 The use of the modulo operation reflects the folding operation introduced by the ADLL scheme. Thisresults in some bins being defined from the time interpolation of taps in opposite extremes of consecutiveDLL’s.

0

0.025

0.05

0.075

0.1

0.125

0.15

0 20 40 60 80 100 120 140

bin

ADLL single DLL

Figure 1: INL standard deviation curve resulting from a cell delay mismatch of σcell=1%

(ADLL: N=35 and F=4, single DLL: N=140).

6.2. Jitter due to internal phase noise.

In the previous section the DLL was considered as an ideal closed control loop, ableto keep the delay of voltage controlled delay chain (VCDL) exactly at one reference clockperiod. The deviations from the ideal behaviour found in real control loops can beclassified into two categories, in accordance with their origin:

• Deviations of external origin: The reference signal has some phase noise that ispropagated, without attenuation, along the VCDL. The control loop tries to trackthese random reference period variations by constant changes of the delay ofeach cell.

• Deviations of internal origin: The control loop tries to keep the delay of theVCDL as close as possible to the reference period. In the absence of an idealfeedback loop, the dynamics of the control loop will generate some variation ofthe VCDL delay around its ideal value. These variations are seen as jitter.

Since we are mainly interested in the study of the DLL internal sources of errors, wewill focus on the deviations of internal origin, assuming an ideal reference clock.

The delay oscillation induced by the operation of the control loop translates intojitter in the signal seen at the end of the delay chain. This jitter can be approximated,without loss of generality, to a random delay error with a normal PDF. The mean value ofthis error is 0=µ jitter and the standard deviation, normalised to the reference period, is

σjitter.

The error due to jitter affects all delay cells in the same way but, since it iscompletely correlated, the variance of the integral error increases linearly along the VCDLof a DLL. The resulting standard deviation of the delay σDLL, normalised to the delay of asingle delay cell is (following the same definition of n as before and Njitterj ⋅σ=σ ):

N

ni jDLL ⋅σ=σ )( .


Page 69

In the case of the array of DLL’s, the same considerations of the previous sectionapply and, using the same naming conventions, the resulting variance is:

22

)(

+

⋅⋅σ=σ

N

n

M

mFi jarray .

Note that the DLL’s in the array have statistically independent jitter, thereforestandard deviation components from different DLL’s are added quadratically.

0

0.025

0.05

0.075

0.1

0.125

0.15

0 20 40 60 80 100 120 140

bin

ADLL single DLL

Figure 2: Standard deviation curve resulting from a closed loop jitter of σ=0.1% of the reference period


The curve in Figure 2 describes de effect of jitter with σ=0.1% of the referenceclock period (σj=3.5% of the cell delay if N=35). The topology of the ADLL is reflectedon the saw-tooth shape of the curve. The same periodic components described in theprevious section are present. For comparison, the effect of the same amount of jitter on asingle DLL is also shown.

6.3. Non-linear ity due to static phase er ror.

Systematic offsets and unwanted delays present in the converter adversely affect thelinearity of the system. They should be carefully identified and minimised. Main sourcesof non-linearity, identified in Figure 3, are:

• Phase detector’s phase error (F(D1,D2)

D1=D2-T).

• Mismatch of the propagation delay of the lines carrying phase information fromthe delay chain to the phase detector (τ1 τ2).

• Unbalanced load and signal characteristics on the delay cells at the extremes ofthe delay chain (d0, dN-1 n, 1 n N-2).

• Propagation delay along the sampling signal distribution for the hit registers(thit

Clock

PhaseDetectorTap 0 Tap 1 Tap 2 Tap N-2 Tap N-1

d0 d1 d2 dN-2 dN-1

τ1

τ2

D2

D1

F(D1,D2)

D D D D D

τhit τhit τhit τhitHit

Figure 3: Detail of a delay locked loop depicting the important delays within the loop.

6.3.1. Effects of phase detector ’s phase error .

The phase detector responds to differences in the phase of its input signals bygenerating an electrical quantity A (voltage, charge, etc) proportional to the measuredphase difference.

))(),(()( 21 ttFtA φφ= , where CttKttF −φ−φ⋅=φφ ))()(())(),(( 1221

and K and C are, respectively, the gain and the phase error of the phase detector. φ1(t) andφ2(t) are the phases of the two signals being compared by the phase detector.

In the context of DLL analysis it is more convenient to discuss the properties of theloop in terms of delay instead of phase. These concepts are equivalent, their relation beinggiven by the transformation T⇒π⋅2 .

The previous equation is therefore transformed in:

))(),(()( 21 tDtDFtA = , where CTtDtDKtDtDF −+−⋅= )))(()(())(),(( 1221

and the π2 phase difference between the two extremes of the delay line is explicitly stated(clock period, T). D1(t) and D2(t) are the two delays being compared.

The loop equilibrium is obtained when A(t)=0, which should correspond toTtDtD += )()( 12 . However, this is not the case if C "!# %$"&'((&!#)* ,+-"!#

detector error C will be reflected in the effective static delay (phase) error. The origin of Cmay be attributed to an unbalanced phase detector, resulting in an offset in the outputsignal.

The following discussion assumes an N-tapped DLL spanning a time interval totD

that corresponds to a reference clock of period T. It is further assumed that no errors, otherthan the one under study, are present.

In equilibrium,

KCtDTtDtDCTtDtDK err ==−−⇔=−−−⋅ )()()(0))()(( 1212 .


Page 71

The total time interval spanned by the delay chain is KCTDchain += . Therefore

the length of each bin is

K

C

NN

Td i ⋅+= 1

, 10 −<≤ Ni .

Since the periodicity of the reference clock is T, the total time covered by the delaychain must be TDchain = . The remaining delay is subtracted from the last bin of the chain

1 , −= Nid i , which is defined from the time difference (modulo T) between two taps on

opposite extremes of the delay chain (tap N-1 and tap 0 in Figure 3).

K

C

N

N

N

Td i ⋅−−= 1

, 1−= Ni .

In Figure 4 the effect of this error mechanism is illustrated. Each rectanglecorresponds to a bin. For comparison the ideal case is shown in the top of the figure.Notice that due to the periodicity of the scheme (period T), the last bin corresponds to afraction of the delay of the last cell.

T

T+C/K

T/N+1/N.C/K T/N-(N-1)/N.C/K

C/K

T/N

bin 0 bin N-1

bin 0

bin N-1

(ideal) bin 0

Figure 4: Illustration of the effect of the phase detector’s phase error (N=5).

The error of the phase detector can be referenced to its input, and translated into an

added delay to one of the input signals. If we set delay ’diffτ = C/K, this delay can be

lumped into the propagation delay mismatch of the input paths (τdiff) and the phasedetector considered ideal.

The behaviour of digital, two-state phase detectors is quite different, because theydon’ t extract information on the magnitude of the phase error. However the static phaseerror of such a phase detector may also be referenced to its input and therefore can bestudied in the same way.

6.3.2. Effects of phase detector input paths’ delay mismatch.

If the propagation delay of the signals carrying the phase information from the twoextremes of the delay line to the phase detector is different, then this difference willinduce conversion non-linearity:

TTTD diffchain ⋅τ+=⋅τ−τ+= )1()( 12 ,

where τ1 and τ2, the propagation delays shown in Figure 3, are normalised to the referenceperiod T. Therefore,

TNN

d diffi ⋅

τ⋅+= 11

, 10 −<≤ Ni

TN

N

Nd diffi ⋅

τ⋅−−= 11

, 1−= Ni

This effect is illustrated in Figure 5:

T

T+C/K

T/N.(1+τdiff) T/N.(1-(N-1).τdiff)

τdiff.T

T/N

bin 0 bin N-1

bin 0

bin N-1

(ideal) bin 0

Figure 5: Illustration of the effect of the phase detector input paths’ delay mismatch (N=5).

Assuming, in the interest of simplicity, that C/K and τdiff are represented as afraction of the reference period T, the conversion integral non-linearity due to these errorsis obtained from the expression:

)(1

)( diffDLL K

Cn

NiINL τ+⋅⋅= , 10 −≤≤ Ni

6.3.3. Effects of unbalanced conditions of the cells in the extremes of the delaychain.

Cells in the extremes of the delay chain are under the effect of different environmentconditions. For example, the last cell in the chain drives a smaller load than internal cellsand the signal arriving in the first cell has different rise time than the signals inside thedelay chain. For simplicity we will consider that these conditions affect only the bins on


Page 73

the extremities of the delay chain. In this case the resulting bin delays due to an increaseof inδ and of outδ in the delay of the first and the last bin are:

N

T

N

Nd outin

i ⋅

δ−δ⋅−

+=)1(

1 , 0=i ,

N

T

Nd outin

i ⋅

δ+δ

−= 1 , 21 −≤≤ Ni ,

N

T

N

Nd inout

i ⋅

δ−δ⋅−

+=)1(

1 , 1−= Ni .

The effects of unbalanced conditions of the cells in the extremes of the delay chainare separately illustrated in Figure 6 (for the first cell) and in Figure 7 (for the last cell). Inboth cases the larger first (or last) cell leads to a larger first (or last) bin and to smallerother bins, thus maintaining the clock periodicity of the scheme.

T

T/N.(1+(N-1)/N.δin)

T/N

bin 0 bin N-1

bin 0 bin N-1

(ideal) bin 0

T/N.(1-1/N.δin)

Figure 6: Illustration of the effect of unbalanced conditions in the first cell of the delay chain (N=5).

T

T/N.(1+(N-1)/N.δout)

T/N

bin 0 bin N-1

bin 0 bin N-1

(ideal) bin 0

T/N.(1-1/N.δout)

Figure 7: Illustration of the effect of unbalanced conditions in the last cell of the delay chain (N=5).

The expression for the conversion integral non-linearity due to these errors is,therefore:

N

n

N

niINL outinDLL ⋅δ−

′⋅δ=)( , 10 −≤≤ Ni ,

where iNn −−=′ 1 and n was previously defined.

6.3.4. Effects of propagation delay on the sampling signal path.

All non-linearity sources within the DLL loop have been covered, but there is alsoan external source that affects the linearity of a DLL based converter. In fact, due tounavoidable propagation delays in the hit sampling signal distribution, the sampling of thestatus of the DLL occurs at different times for different taps. The error generated by thiseffect is a function of the hit register topology.

This effect is corresponds to the vernier interpolator configuration previouslydescribed (see Chapter 4) Considering, for example, the linear hit sampling signaldistribution configuration shown in Figure 3 and a constant3 τhit propagation delay per hitregister, the resulting apparent cell delay is:

( )N

Td hiti ⋅τ−= 1 , 20 −≤≤ Ni ,

( )N

TNd hiti ⋅τ⋅−+= )1(1 , 1−= Ni .

This effect is illustrated in Figure 8. In this case the last bin is extended to the end ofthe clock period so that the full period is covered.

T

T/N.(1+(N-1).τhit)

T/N

bin 0 bin N-1

bin 0 bin N-1

bin 0

T/N.(1-τhit)

Figure 8: Illustration of the effect of the propagation delay on the sampling signal path – case of the linear

hit signal distribution network (N=5).

The linearity of the conversion is given by:

nINL hitDLL ⋅τ−= , 10 −≤≤ Ni .

In order to reduce this effect, lines with smaller propagation delays can be used.Alternatively more complex distribution configurations, such as the T-shaped distributionnetwork, can be used. In this distribution network the hit sampling signal is distributed intwo separate branches starting from the middle of the hit register row. In this way, thedistance from the source to the register further away is halved, and therefore thepropagation delay τhit is reduced. A positive side effect of this network is that in one of thebranches the vernier interpolation results in smaller bins and in the other in larger bins. 3 A signal propagating along a finite RC delay line does not progress at constant speed. Typically itaccelerates along the line, therefore τhit is not constant. However this convenient simplification enables afaster understanding of this effect.


Page 75

The Figure 3 is repeated in Figure 9 for this configuration. This configuration reduces theintegral non-linearity (see Chapter 7 for a detailed analysis of this distribution network).

For this particular distribution, the resulting effective cell delay is (assuming, forsimplicity, an even number of delay cells, N):

( )N

Td hiti ⋅τ+= 1 ,

20

Ni <≤ ,

( )N

Td hiti ⋅τ−= 1 , 1

2−≤≤ Ni

N.

Clock

PhaseDetectorTap 0 Tap 1 Tap N/2 Tap N-2 Tap N-1

d0 d1 dN/2 dN-2 dN-1

τ1

τ2

D2

D1

F(D1,D2)

D D D D D

τhit τhit τhit τhit

Hit

Tap N/2-1

dN/2-1

D

τhit

Figure 9: The T-shaped hit signal distribution network.

The illustration of this effect, for the T-shaped sampling signal distribution networkis shown in Figure 10. Notice the larger initial bins and the smaller final bins.

T

T/N.(1-τhit)

bin 0 bin N-1

bin 0 bin N-1

(ideal) bin 0

T/N.(1+τhit)

T/N

Figure 10: Illustration of the effect of the propagation delay on the sampling signal path – case of the T-

shaped hit signal distribution network (N=5).

The linearity of the conversion due to this error source is given by:

−−⋅τ=

2

2

Nn

NINL hitDLL , Nn <≤0 .

6.3.5. Overall non-linear ity due to static phase error .

The effects of all static error sources can be included in a single integral non-linearity expression, where i is the bin position Ni <≤0 . Making the following variable

substitutions, ininD δ= , diffPD K

CD τ+= , outoutD δ= , hithitD τ−= and

),1(Mod Nin += ,

iNn −−=′ 1 ,

the overall integral non-linearity expression is obtained:

( )N

nNDDD

N

nDiINL hitoutPDinDLL ⋅⋅+−+

′⋅=)( ,

in case the linear hit signal distribution is being used or

( )

−−⋅−⋅−+

′⋅=

2

2)(

Nn

ND

N

nDD

N

nDiINL hitoutPDinDLL ,

if the alternative T-shaped distribution hit signal distribution is being used.

In the case of the array of DLL’s, the integral non-linearity along the delay path isadded linearly. We assume that, regardless of the actual detailed hit signal distribution, toeach of the Timing DLL’s corresponds a set of hit registers that are driven through aseparate signal path. In this context, Figure 3 and Figure 9 correspond to one of theTiming DLL’s that make up the array. The Phase Shifting DLL is not directly sampledtherefore this effect is only visible in the Timing DLL’s.

Taking into account the staggering of the multiple Timing DLL’s we define thefollowing variables as a function of the position of the bin i ( FNi ⋅<≤0 ):

),1(Mod Fim += ,

−

+= Nm

F

in ,

1FloorMod ,

+−=′ N

F

imn ,

1FloorMod .

The overall integral non-linearity expression is:

nFDN

n

F

F

M

mFD

N

n

M

mFD

N

n

F

F

M

mFDiINL

hitout

PDinarray

⋅⋅+

++⋅⋅⋅−

+

+⋅⋅+

′

++⋅−⋅⋅=

1

1)(

,

if the linear hit signal distribution is being used or


Page 77

−−⋅⋅−

++⋅⋅⋅−

+

+⋅⋅+

′

++⋅−⋅⋅=

2

2

1

1)(

Nn

NFD

N

n

F

F

M

mFD

N

n

M

mFD

N

n

F

F

M

mFDiINL

hitout

PDinarray

,

if the alternative T-shaped distribution is chosen.

The curves in Figure 11, Figure 12, Figure 13 and Figure 14 are intended toillustrate the shape of the INL curve resulting from the indicated sources of linearityerrors. No attempt is made to compare them, since they don’ t reflect an actual value. Forcompleteness, the corresponding DNL graphs are also shown. They are directly obtainedfrom the respective INL curve.

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0 20 40 60 80 100 120 140

bin

ADLL single DLL

0

0.05

0.1

0.15

0.2

0 20 40 60 80 100 120 140

bin

Figure 11: DNL and INL curves resulting from a phase detector’s phase error (or phase detector input path’s

delay mismatch): DPD ( diffK

C τ+ ) =0.1% of the reference period


-0.05

-0.025

0

0.025

0.05

0.075

0.1

0 20 40 60 80 100 120 140

bin

ADLL single DLL

-0.05

-0.025

0

0.025

0.05

0 20 40 60 80 100 120 140

bin

Figure 12: DNL and INL curves resulting from unbalanced conditions of the delay cells in the extremes of

the delay chain : Din(δin)=1% and Dout(δout)=1% of the average cell


0.15

-0.1

-0.05

0

0.05

0.1

0.15

0 20 40 60 80 100 120 140

bin

ADLL single DLL

-0.15

-0.1

-0.05

0

0 20 40 60 80 100 120 140

bin

Figure 13: DNL and INL curves resulting from the propagation delay on the sampling signal path (linear hit

signal distribution network): Dhit(−τhit)=0.1% of the reference period


-0.05

-0.025

0

0.025

0.05

0 20 40 60 80 100 120 140

bin

ADLL single DLL0

0.025

0.05

0.075

0.1

0 20 40 60 80 100 120 140

bin

Figure 14: DNL and INL curves resulting from the propagation delay on the sampling signal path

(T-shaped hit signal distribution network): Dhit(−τhit)=0.1% of the reference period


In Figure 15, The combined effect of all these sources of non-linearity, when usingthe T-shaped hit signal distribution network, is shown.

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0 20 40 60 80 100 120 140

bin

ADLL single DLL

0

0.025

0.05

0.075

0.1

0.125

0.15

0 20 40 60 80 100 120 140

bin

Figure 15: DNL and INL curves resulting from the combination of the previous curves



The circuitry included in the ADLL, as well as the channel buffers are the criticalcircuit blocks responsible for the performance of the converter. Their implementation willbe analysed in detail, highlighting the advantages expected from the design options taken.

7.1. DLL building blocks.

7.1.1. Phase detector.

The DLL closed loop operation is, in normal conditions, only required to trackvariations of the delay between the two extremes of the VCDL, the frequency of thereference signal being constant. In these conditions a simple two-state phase detector canbe effectively employed. This phase detector presents some advantageous characteristics,such as implementation simplicity and ±T/2 operating range1.

VCDL_in

VCDL_out VCDL_fast

VCDL_slow

QD

Qb

Q=1 Q=0D=1

D=1

D=0

D=0

@clk

Figure 1: D-flip-flop operating as a two-state phase detector.

A D-flip-flop (D-FF) connected as in Figure 1 behaves as a two-state phase detector.It samples the signal coming out of the DLL delay chain (VCDL_out) at the rising edge ofthe reference clock entering the chain (VCDL_in). Therefore, the phase detector outputreflects the absolute value of the delay difference (referred as the phase error).

When a zero phase error situation is approached, the output of the phase detectorwill permanently shift from one state to the other, resulting in what is called a “bang-bang” behaviour of the closed loop it controls. Therefore the average phase error of the

1 The standard notation to describe phase detector operation refers to phase instead of period. Following thatnotation the operating range would be termed ±π, instead of ±T/2. However these are equivalent notationsand in the context of DLL’s and TDC’s it seems more adequate to deal with time and delay instead offrequency and phase. Some exceptions to this rule are made, for example, we use the usual nouns PhaseError and Phase Detector, instead of Delay Error and Delay Detector.

closed loop is zero, but its instantaneous value oscillates around this ideal value withoutever settling into it. The oscillation amplitude is independent of the phase detector. It is setby other loop parameters.

T/2-T/2 φe

Vpd

T/2-T/2 φe

Vpd

Figure 2: General and D-FF based two-state phase detector transfer characteristic.

The transfer curve of a general two-state phase detector is shown in Figure 2. Thebi-stable characteristic of the D-FF based phase detector is also shown. It does not carryquantitative information about the phase error, however when integrated along the time,the general transfer curve is obtained.

Optionally a 3-state sequential phase-frequency detector (PFD) [8],[9] could havebeen used, and the “bang-bang” behaviour avoided. However, 3-state PFDs are morecomplex devices and must be carefully designed to avoid developing a dead-band aroundthe zero-phase error. The main application of 3-state PDFs is in PLL control loops, wheretheir ability to capture frequency error information is required. Furthermore, since themain function of a PLL is to track frequency, they can usually tolerate small phase errors.This is not the case for a DLL, whose main function is to track delay (phase). Since theamplitude of the “bang-bang” oscillation can be made arbitrarily small by setting thecorresponding loop parameters, it is preferable to use the simpler 2-state phase detectorconfiguration.

The information on the amplitude of the phase error carried by the PFD output alsoenables it to perform faster corrections to the VCDL, in case of severe reference clockperiod variations. However this feature is not necessary in a TDC, where the referenceclock is, by definition, stable. It is therefore more important to avoid the dead-band andobtain a better discrimination around zero phase error, which is easier to achieve if the 2-state phase detector is used.

D-flip-flop implementation.

In Chapter 6, the degradation of the converter linearity due to a phase errorgenerated in the phase detector was discussed. It is therefore important to understand whatare the phase detector characteristics that generate a phase error, in order to be able tocounteract them.

Chapter 7: Detailed Implementation.

Page 81

Two conditions may generate a phase error in a D-FF based phase detector:

• Sampling moment shifted from the input signal’s arrival time (for example, dueto unbalanced loads in internal nodes).

• Metastability conditions.

To avoid these conditions, the sampling uncertainty of the D-FF must be limited to avery narrow time window exactly centred on the arrival time of the input signal risingedge (the sampling instant). These characteristics should not change in any operatingconditions and should be immune to process variations or device mismatch.

The configuration we will study is the balanced implementation of a D-FF, asdescribed on [10] and shown in Figure 3. In this topology, all internal nodes have thesame fanout and all gates have the same driving capability. A very balanced circuit isobtained and therefore no shift should be seen in the sampling instant.

The critical nodes that define the speed of the data latching are included in the SR#1block highlighted in Figure 3. This latch should be very fast to achieve its final state, aftera change in the inputs. In these conditions, the sampling time is well resolved under anyoperating conditions.

DSR#1

dummy gate

dummy gate

Figure 3: Balanced D-flip-flop topology.

Metastability will affect the phase detector operation by delaying the phase detectordecision. This, in turn, will limit the amplitude of the corrections the closed control loopcan perform in one reference clock period. If the delay is large enough, the decision maynot be taken at all, resulting in the absence of a correct control loop decision during thatperiod (corresponding to one clock cycle). If the metastability probability is large, a “deadband” where the loop is unable to react to delay differences, will appear around the zerophase error point. To avoid this situation the D-FF must be able to get out of themetastable condition very quickly. Again, the critical SR#1 latch must be designed havingin mind this problem [11].

This D-FF topology does not produce any hysteresis in its transfer function, sincethe state of the critical SR#1 latch is independent of the output state of the flip-flop.Therefore no “dead band” related to hysteresis can exist.

The D-FF implemented is a variation of the one shown in Figure 3, where maximumpriority was given to the correct operation of the critical latch. For this the inherently slow3-input SR latch was substituted for a faster 2-input latch, as shown in Figure 4. Thelayout of two and gates that had to be introduced in the decision path is equal and is madein close proximity so that their delay matching is optimised and they are simultaneouslyaffected by supply noise. In this way, these two gates only affect the latency of the phasedetector and not its timing resolution or its static phase error.

DSR#1

and#1

and#2

dummy gate

dummy gate

Figure 4: Balanced D-FF topology featuring fast SR#1 operation.

Device matching also affects the performance of the circuit, by making the delay ofidentical gates different from each other. All devices have, therefore, large gate area andtheir layout is done following matching minded rules [12],[13]. The width of the gate isalso determined by the speed requirements. Simulations have shown that, for thetechnology used, a 3:1 ratio between effective gate sizes of the PMOS and the NMOSbranch of the gates results in an improved phase detector accuracy and a smallerdependency on environment variations.

The accuracy of the phase detector, obtained from simulations is better than 12psunder any environment or process conditions. In the presence of large mismatch(simulated by varying the gate length of selected devices) a maximal degradation of theaccuracy to 22ps was observed.

7.1.2. Charge-pump and loop filter.

The behaviour of closed control loops built with a sequential phase detector, acharge-pump and a filter have been analysed in detail [14],[15] and numerical simulation


Page 83

models have been built [16]. These loops present several advantages in comparison withthe conventional loops built with a combinatorial phase detector and filter. The mainadvantage for our application is their ability to obtain zero static phase error using apassive loop filter.

The charge-pump, together with the loop filter convert the logic state of the phasedetector into an analogue quantity that can be used to control the delay chain. Since thecontrol loop is only required to track delay variations between the two extremes of theVCDL, the loop filter can be made of a simple capacitor. The resulting closed control loopis a first order system, therefore it is inherently stable.

The charge-pump is made of a current source and a current sink that, depending onthe state of the phase detector will either deliver a “packet” of charge, or extract a“packet” of charge from the loop filter capacitor. The capacitor behaves as an integrator ofthe charge, converting it into the control voltage for the VCDL.

Vctrl

CfilterIcp

Icp

(to VCDL)(from phase

detector)

Figure 5: Charge-pump and filter capacitor block diagram.

This configuration of charge-pump and 2-state phase detector leads to the “bang-bang” behaviour of the closed control loop. After delay lock has been achieved, the actualdelay of the delay chain will be permanently oscillating around the zero phase error delay.This oscillation translates into loop jitter. Assuming an otherwise ideal loop behaviour,the amplitude ∆Vctrl of the oscillation corresponds to the charging (discharging) of thefilter capacitor (Cfilter) by a constant current (Icp) during the reference period (T):

filter

cpctrl C

TIV

⋅=∆ .

Therefore, given a fixed reference period, the only way to decrease the amplitude ofthe oscillation and the loop jitter is to reduce the charge-pump current and/or increase thefilter capacitance.

The current on the two branches of the charge-pump is assumed matched. Howeverthis is not a very critical parameter if only low amplitude ∆Vctrl oscillations are allowed,since the static phase error it may entail is very small (smaller than the amplitude ofoscillation).

Charge-pump implementation.

The implementation of the charge-pump is driven by the necessity of accuratelyswitch current sources into a capacitive node. In this context, the current switches arecritical to the correct behaviour of the circuit. Gate signal feedthrough in these switchesresults in unwanted changes in the amount of charge stored in the filter capacitor. If thesechanges are comparable to changes due to normal loop function, the behaviour of the loopbecomes unpredictable and a large static phase error may develop.

Icp

Icp

VctrlCgdn

CgdpVCDLfast

Mswp

Mswn

Cgdn

Cgdp

M:1

M:1

Icp

Icp

VctrlVCDLslow

Mop

Mon

Mdn

Mdp

Figure 6: Charge-pump topologies (simplified).

In the first schematic of Figure 6 the feedthrough mechanism is illustrated. The gatedrain overlap capacitance of the switch transistors (Msw) and the filter capacitor work as acapacitive voltage divider. Therefore when the switch of a charge-pump branch opens,Vctrl will experience a variation proportional to:

filtergd

ggdctrl CC

VCV

+∆⋅

=∆ .

The gate voltage swing ∆Vg is, in this case, the supply voltage.

To guarantee that the Vctrl variation due to the control loop is bigger than theparasitic variation due to feedthrough, the charge-pump current should be:

T

CV

CC

CC

T

VI gdg

filtergd

filtergdgcp

⋅∆≈

+⋅

⋅∆

>> .

The second schematic in Figure 6 shows the circuit used to reduce the feedthroughinto the Vctrl node. In this circuit the switching activity is mixed with the currentmirroring. Switching is limited to move the Vgs of the output transistors (Mo) to just belowtheir threshold voltage, reducing ∆Vg to a small swing. Cgd is also reduced, since thesetransistors are made narrow to obtain low charge-pump currents.


Page 85

A diode-connected transistor (Md) defines the lower limit to the ∆Vg swing. Tomake its Vgs voltage lower than the threshold voltage of the output transistor, it is designedvery wide and short, its threshold voltage resulting smaller. The output transistor, on theother hand, is conveniently narrow and long, therefore it has a slightly higher thresholdvoltage, as intended.

Since the output transistors (Mo) are only lightly switched off, the sub-thresholdcurrent is not completely eliminated. However, this current is substantially smaller thanthe “on” current, therefore it does not affect the operation of the charge-pump.

When the charge-pump operates at low current levels, the mirror transistor operateswith a Vgs only a few hundred milivolts higher than threshold voltage, resulting in an orderof magnitude reduction in the ∆Vg swing. Overall, a 20 to 50 times reduction in theminimum usable charge-pump current can be obtained using this scheme.

However, the switching speed of this charge-pump scheme is low. When a branch isreleased, the gate of the output transistor must be charged using the limited currentavailable from the current mirror source. In order to increase the switching speed a currentdividing mirror should be used. The switching speed limits the reduction of Icp that can beachieved, since the effective time T’ in which the charge-pump current is available to acton the Vctrl is smaller than the period T.

Using this configuration, current levels as low as 200nA can be used. Taking intoaccount the limited speed of the switch at this low current levels and other designconstrains, the charge-pump implemented was designed to deliver a (programmable)current between 10µA and 100µA.

Filter capacitor.

The filter capacitor was made as a n-well isolated PMOS transistor working inaccumulation mode [17]. In this mode of operation, a majority carrier channel is alwayspresent under the gate. This results in a voltage independent capacitance across thetransistor gate2 and, due to the ready availability of carriers, it also has good highfrequency characteristics. A capacitor built this way has the back plate always tied toground. Therefore the control voltage Vctrl is defined having the ground node as areference.

Using a large transistor gate area, a capacity of ~47.7pF is obtained. If minimumcharge-pump current levels are used, the resulting voltage control step is 2.6mV perreference clock period (T=12,500ps).

2 This statement holds true for most of the applicable gate voltage range with the exception of a narrow verylow gate voltage range, where a depletion region subsists underneath the gate oxide and the gate capacitanceis voltage dependent.

7.1.3. Delay cell.

The VCDL is made of a number of identical delay cells. In these cells the controlvoltage generated in the closed control loop is translated into a propagation delay. TheADLL is made of two different types of DLL’s, the Timing DLL, that requires a cell delayof T/N = 357.1ps and the Phase Shifting DLL, that requires a delay of T/M = 446.4ps percell. These DLL’s are built using the same building blocks (but a different number ofdelay cells) therefore the delay cell operating range must cover the two distinct operatingpoints, in any conditions. Using four Timing DLL’s in a ADLL architecture, a timeinterpolation F=4 times better than the simple Timing DLL is obtained, leading tostringent matching requirements for the delay cells.

The ADLL architecture uses a large number of fast identical cells. Furthermore, thedelay matching required between these cells leads to the specification of large sizeddevices, which results in high gate capacitance. To drive these high loads at the necessaryspeed, large power dissipation is required. It is therefore important to choose a cellstructure that reduces the dissipation, for a given speed and matching requirements.

The delay of a cell is sensitive to temperature and supply voltage variations. It alsodepends on the process parameters. The correct operation of the DLL closed lock looptherefore, requires that a sufficient delay range is available to cover any operatingconditions.

Choice of cell structure.

In summary, the choice of delay cell structure must conform to the followingcriteria:

• Power dissipation.

• Noise sensitivity.

• Device matching.

• Cell delay control range.

Two structures where compared having in mind the particular operation of a DLL.These structures where the differential cell using symmetric loads as developed in [18]and the single-ended cell, based on a current-starved inverter structure.

The sudden supply current variations due to the switching activity of the single-ended delay cell structures entail noise in the power supply network. Supply noisetranslates into changes in the instantaneous decision threshold of each inverter andtherefore in the time characteristics of the other cells in the delay line. Differential delaycells enjoy an apparent advantage in this respect, since their large common mode rejectionratio (CMRR) insures good supply noise immunity. Also their constant power dissipation


Page 87

generates less supply noise. On the other hand, the constant tail current used in thedifferential delay cell significantly increases the power dissipation of the ADLL structure.

One important characteristic of the operation of a locked DLL is that all switchingactivity in the delay line occurs evenly spread along the reference period, as illustrated onFigure 7. As a consequence, the instantaneous current requirements are averaged along thetime and, therefore, the inductive supply voltage variations are strongly reduced. In theseconditions, the delay cells that make up the DLL are not adversely affected by theswitching activity and a careful distribution of the power supply, separating the DLL fromany noisy digital circuitry, will suffice to obtain a good noise performance. Simple, andmore power conservative single-ended delay cells are, therefore, a viable alternative todifferential logic.

Clock

PhaseDetector

ChargePump

t0

0

T

T

T/N

T/N

Voltage at tapi

VDD

Current from supply

Iave

0 1 2 N-1

N

Figure 7: Rising edge propagation along the DLL delay line and corresponding current consumption.

In order to obtain a high CMRR [18], a differential amplifier must have a linearresistive load in each branch. Furthermore, the impedance of the tail current mirror mustbe high. In the delay cell shown in Figure 8, a variable linear load is obtained using thesymmetric load structure. If correctly biased, this structure guarantees a first orderlinearity of a high impedance load around the half-swing output voltage. Automatic bias isderived from the control voltage using the self-biasing circuitry (also shown).

Delay control is obtained by variation of the load impedance. Simultaneousvariation of the tail current ensures that the symmetrical load remains linear throughoutthe range of operation.

in inb

outoutb

Vctrl

N cells

+

-

+

-

Figure 8: The self-biased differential delay cell (from [18]).

Single-ended architectures traditionally rely on the current starvation of two seriesCMOS inverters (Figure 9). Current starvation is usually performed on both branches(NMOS and PMOS) of the inverters in order to guarantee a perfect symmetry ofoperation. However, this is not a limiting requirement, since the two inverters in seriesalready guarantee the correct operation of the delay cell. The cell delay is defined by theamount of current available to charge the load at the output of each inverter. The matchingcharacteristics of the current-starving transistors are, therefore, critical to ensure thematching of the cell delay. These transistors must have large gate areas.

The matching characteristics of the switching transistors are not critical, since theyare sized in such a way that they don’ t limit the current available to charge the outputload.

in out

Vctrl

N cells

+-

Figure 9: The current-starved inverter delay cell (simplified version).

The delay cells are isolated from the hit registers by a tap buffer. In the case ofcurrent-starved inverter based cells, it is recommended to implement also a dummy bufferin the output of the first inverter, in order to guarantee symmetry of the propagation delayof the rising and the falling edge.


Page 89

These two delay cell structures where analysed in detail to verify their powerdissipation and noise immunity. Simulations where used extensively, in order toaccurately capture the delay variations due to noise. Only power supply noise wasconsidered in this study, since it was found to be the dominant effect. Other noise sources,such as thermal noise, are completely hidden by supply noise.

The procedure that was followed in this study was to simulate the two VCDL’s (onefor each of the structures) with a square signal of a given amplitude modulated into thepower supply voltage. The phase of the square noise signal was made to vary in relationwith the phase of the signal propagating within the delay line. In this way it is possible toidentify a time window where the delay cell is sensitive to supply noise and also themaximum delay shift.

The same procedure was also used to analyse the delay sensitivity to noise in thecontrol node. Noise can couple into this node via two different paths, the substrate andcapacitive coupling with the switching nodes. In a locked DLL, there are always twoopposite edges of the signal propagating inside the delay line, therefore their oppositeeffects should keep the control node balanced. However, since the sensitivity of this nodeis high, it is important to minimise any coupling into it.

The resulting supply noise delay sensitivity graphics are shown in Figure 10, whereall delay cells are tuned for a 390ps delay. A window of increased sensitivity,corresponding to the cell switching moment (time=0ns), can be identified. A summary ofthe sensitivity of each structure, within the sensitive window, is tabled in Table 1. Theaverage power dissipation obtained when the cells are biased to operate with the requireddelay is also shown. The single-ended structure also shows (time<0ns) a noticeable delayvariation due to slow (or DC) changes in supply. However, it should be noticed that theamplitude of these slow variations depends on the control voltage applied and they areeffectively countered by the closed control loop.

-6

-4

-2

0

2

4

6

8

10

-1.5 -1 -0.5 0 0.5

time (ns)

Differential delay cell

-6

-4

-2

0

2

4

6

8

10

-1.5 -1 -0.5 0 0.5

time (ns)

Current-starved delay cell

Figure 10: Cell delay variation due to a 100mV supply voltage step, respectively for the differential and the

current-starved inverter structure.

The differential structure needs 5.6 times more current that the single-ended CMOSinverter structure, for the same propagation delay.

Supply Control Power dissip.

amplitude 100mV 20mV (average/cell)

3ps 11ps 4.2mW

5ps 15ps 0.74mW

Step noise sensitivity

Symmetric load differentialCurrent-starved inverter

Table 1: Summary of noise sensitivity and power dissipation analysis.

Offset and gain selection.

Apart from gate area, the matching characteristics of a device also depend on itsoperating point. As the gate voltage approaches the threshold voltage (Vth) and theoperation of the device moves closer to weak inversion, its matching characteristics areseverely degraded [6],[19]. Therefore, the operating point of the devices that make up thedelay cell should be reasonably away from Vth, in any conditions.

However, depending on the process parameters and on the specific conditions underwhich the cell is being used, the closed control loop may force the current-starvingdevices to operate in disadvantageous matching conditions. The current-starved inverterstructure was changed in order to force the cell to operate in optimal matching conditionsunder any circumstances.

Vctrl

delay

original

partitioned

VpVo

Figure 11: Simplified representation of the delay range partition.

The principle of operation of this cell is to divide the delay range into small andpartially overlapping ranges, as shown in Figure 11. These delay ranges are wide enoughto enable the DLL to track delay variations due to changes in the environment conditionsthat may occur during operation. The selection of the operating range is performed atstart-up. It is a function of the device matching, the delay tracking coverage and theparticular operating conditions found. To enable the automation of the range selectionalgorithm, the range partition is made such that in any conditions lock can be achieved inat least three ranges. By selecting the appropriate delay range, the cell can be made to


Page 91

operate at a point Vp further away from the threshold voltage of the current-starvingtransistors than would be the case in the original cell (point Vo).

Another advantage gained from partitioning the operation range is the reduced cellgain (the slope of the cell transfer curve in s/V). Therefore the forward gain of the controlloop is smaller and a finer adjustment of the delay is possible. In the “bang-bang”configuration used, it translates into smaller amplitude of the periodic delay oscillation.Alternatively, the filter capacitor can be made smaller without degrading the closed loopperformance. The sensitivity to noise in the control node is also reduced.

The proposed cell topology is shown in Figure 12. The selection of the operatingrange is done using the offset signal. The offset control is implemented in the NMOS andPMOS branches of the inverter. It generates a fixed delay offset in the transfer curve.

To improve the cell flexibility, the gain of the current-starving transistor connectedto the loop control node can be changed, using the slope signal. It is, therefore, possible toincrease the tracking coverage (range length) of each range, if it is necessary for a specificapplication. The slope control is only implemented in the NMOS branch of the inverter.

The offset selection signal is obtained from the digital-to-analogue conversion of adigital control signal and the slope selection is performed digitally, therefore theycorrespond to discrete settings. Figure 13 shows the actual delay ranges. Depending on theoffset and slope selection, the cell gain will be different. A method for automatic selectionof the range will be described in Section 7.1.6.

in out

Vctrl

N cells

2 inverters

offsetP

offsetN

slope<0:1>

+-

Vctrl

delay

offset

slope

Figure 12: The selectable-range current-starved inverter cell.

The simulation results exposed in Figure 13 show that the maximum delay cell gainfor a given range varies from 50ps/V to 713ps/V depending on the selection of offset andslope.

200

300

400

500

600

700

800

900

0 1 2 3 4 5

control voltage (V)

cell

dela

y (p

s)

Figure 13: The selectable delay ranges (simulation).

The same noise sensitivity analysis was also performed for this cell. The results, forthe sensitivity window (time=0ns), are given in Table 2. When compared to thedifferential cell structure, substantial power savings (3.3 times) can be obtained using thiscell, if a similar increase in supply noise sensitivity is accepted. In relation to the simplecurrent-starved inverter, better matching and closed loop characteristics can be obtained atthe expense of increased power dissipation.

Supply Control Power dissip.

amplitude 100mV 20mV (average/cell)

8ps 3ps 1.29mWRange partition

Step noise sensitivity

Table 2: Summary of noise sensitivity and power dissipation analysis for the proposed cell.

In summary, the advantages of such a delay cell are:

• Lower power dissipation.

• Smaller device matching sensitivity.

• Variable cell gain.

• Increased immunity to noise in the control node.

7.1.4. Delay chain.

The delay cell is a part of a chain of cells whose overall delay is the clock period. Toachieve maximum delay matching between cells, all cells should have the same physicaland electrical environment. This consideration is especially true for the cells in theextremities of the delay chain. Physically they have no cell in one of their sides, thereforetheir matching is worse [6],[7]. Electrically the last cell does not have to drive the loaddue to the input of the next cell and the first cell is driven with a signal that doesn’ t havethe same timing characteristics (namely slew rate) as the other cells.


Page 93

In order to equalise the environment of all the cells, additional dummy delay cellsare implemented in both extremes of the delay line. The purpose of these cells is to forcethe environment of all cells to be the same, and therefore improve their delay matching.They have no other timing functionality.

7.1.5. Closed control loop.

The implementation of the delay chain, the phase detector, the charge-pump and thefilter capacitor has been discussed. Together they make up the closed control loop. Thelayout of the complete DLL should follow conservative layout rules, with especial carebeing given to the power supply distribution network and to the transport of the signalscarrying phase information to the phase detector.

If the propagation delay of the two feedback signals going to the phase detector isnot the same, then the delay difference ∆tpd translates into closed loop static phase error,resulting in similar consequences as a phase error generated in the phase detector. Theorigin of this delay error is depicted in Figure 14.

Since the delay chain of the Timing DLL’s is physically long (~2mm, in thisprototype), the propagation delay of the feedback signals is considerable, and a large ∆tpd

may arise. This situation was analysed in detail [20] to derive the topology that minimisesthe delay difference while not imposing a heavy area penalty on the design. Thepropagation delay of the two transmission lines was made as small as possible by theircareful sizing. Also the load at the output of each of drivers was equalised to keep theirslew rate similar. In this way it was possible to keep the delay error under 20ps. The PhaseShifting DLL has a shorter delay chain, therefore the delay difference is even smaller.

∆tpd

dummy dummy

clkrefphase detector& charge pump

C

Figure 14: Detail of the closed control loop illustrating the propagation delay mismatch of the phase signals.

The variation of the delay of the chain within a clock period can be estimated fromthe following equation, where Kcell is the gain of each delay cell:

( ) NKC

TINKVTt cell

filter

cpcellctrlDLL ⋅⋅

⋅=⋅⋅∆=∆ .

Assuming the minimum charge-pump current level is being used and the cell delayrange with minimum gain is selected, the delay variation of the Timing DLL is:

ps6.435Vps5047.7pF

12,500psA10)ps500,12( =⋅⋅⋅µ=∆ DLLt

The “bang-bang” oscillation amplitude is half this variation (~2.5ps). If, on the otherhand, the delay range with the maximum gain is selected, ∆tDLL(T) may become as big as65.4ps, resulting in an amplitude of oscillation of ~33ps.

7.1.6. Initialisation procedure.

The loop initialisation is the procedure by which the loop acquires initial lock to thereference period. If the loop natural delay is close enough to the reference period, lock isacquired without any external help. Since this cannot be guaranteed in all circumstances,ways to pull the loop to within its locking range must be implemented.

In the case of the loop architecture using the delay range partitioning, the bestoperating range must also be selected.

Achieving lock.

The transfer function of the phase detector has a periodicity of T, which means thatit is unable to distinguish signals whose delay is multiple of a period T. Therefore, it maytry to lock the DLL into a state where the VCDL delay is a multiple of T. An initialisationprocedure must be used to force the closed loop to lock to the correct delay.

One way to resolve this ambiguity is to initialise the VCDL with a delay that isknown to be smaller than the reference period T. In this situation it is possible to qualify(the correctness of) the error information generated in the phase detector. Starting fromthis point, regardless of the phase information generated by the phase detector, the loop isconstrained to slowly increase the delay of the VCDL until the phase detector is within itslocking range (±T/2). This range can be identified by the generation of the correct errorinformation by the phase detector, when it recognises that the VCDL delay is too short. Atthis point the loop is released to proceed with the locking acquisition.

Since the forward open loop gain is small, the lock acquisition is a slow procedure.One way to improve the loop initialisation speed is to increase the charge-pump currentlevels before lock is achieved. Therefore the lock acquisition time can be decreasedwithout compromising the dynamic behaviour of the loop.

Range selection.

The range selection is an iterative procedure. In a first step, the tracking range widthnecessary for the application is selected using the slope signal. Typically the smallerwidth is selected, because it results in the minimum forward open loop gain. However,


Page 95

other range widths can be selected if a wider tracking range is desired. The second stepcorresponds to the actual range selection. This step can be automated and included in theloop locking procedure. It uses the offset signal.

The range selection is performed by sequentially scanning the ranges for lock,starting with the fastest range (smallest offset). After having identified the ranges wherelock can be achieved, the middle range3 is selected, because it corresponds to an operatingpoint in the middle of the respective range, leaving a wide delay tracking margin. Thisproperty is depicted in Figure 15, where the viable initialisation regions within each rangeare identified with a heavier line. Note that these viable regions correspond to the initiallocking range. They are a small part of the full range (thinner line) that is available fortracking of environment variations, after the initialisation has been completed.

Vctrl

delay

Figure 15: Schematic representation of the delay range partition illustrating the viable locking regions.

7.2. The ADLL.

Fine time interpolation is obtained by accurately phase shifting each of the TimingDLL’s by a fraction of their cell delay. The Phase Shifting DLL is used for this purpose.

The ADLL taps result from the distribution of the phase shifted Timing DLL taps inaccordance to the arrangement in Figure 16, where each rectangle represents the size of aDLL bin. The shaded bins represent a copy of the actual bin introduced to make the timeinterpolation on the extremes of the Timing DLL’s more clear. Due to the clockperiodicity, the copy and the original bin occupy exactly the same time interval.

An ADLL bin is defined from the difference between two taps in consecutiveTiming DLL’s. This distribution of bins highlights some of potential sources of non-linearity inherent to the architecture:

3 In extreme conditions, corresponding to the extreme ranges, lock may only be obtained for one or tworanges (see Figure 15). If only one locking range is identified, range selection is evident. If two lockingranges are identified, the extreme range should be chosen, because it results in the widest delay trackingmargin.

• Some bins are defined by taps in opposite extremes of consecutive Timing DLL’s(see, for example, bin 5 in Figure 16). Potential phase errors in any DLL willaccumulate in these bins, resulting in large non-linearity.

• There is a potential F (=4) bin periodicity in the linearity error due to the foldingof the tap distribution. Non-linearity of any DLL will increase this error (see, forexample, bin 23 in Figure 16).

• There is another potential F+1 (=5) periodicity in linearity error whichcorresponds to the spacing between two taps driven directly by the PhaseShifting DLL. Non-linearity of this DLL determines this error.

The non-linearity generated by these errors is limited by reduction of any source ofphase error and cell mismatch caused by the DLL building blocks. Coupling betweenDLL’s can also be a source of conversion errors. It can be reduced by proper electricalisolation of individual DLL’s, using careful supply distribution and providing guard-ringsto isolate them from capacitive and substrate noise coupling.

T/28 =5·∆T

0 4

5 9

10 14

15 19

8

13

18

23

12 16

17

136

1137

138

139

2

3

6

7 11

136

1

6

11

T

137

2

7

138

3139

T/35=4·∆T

132

133

134

135

22

20

21

ps0 ps1 ps2 ps3 ps4PS-DLL

T-DLL 0

T-DLL 1

T-DLL 2

T-DLL 3

T/140=∆T

0 4 8 12 16 136136 13220T-DLL 0

bin 5 bin 23

24

24

Figure 16: The ADLL tap distribution arrangement.

7.3. Channel memory.

The channel memory is made of a two-word deep pipeline. In order to reduce the hitrejection rate to acceptable levels, an asynchronous state machine controls the pipeline.This state machine generates the latching signals (store) for the two pipeline levels andcontrols the interface with the subsequent logic blocks [3]. The functional diagram isshown in Figure 17.


Page 97

∆t

∆t

storereg. level #1

storereg. level #2

dataavailable

write

writeenable

read

clear

rst

D

Figure 17: Functional diagram of the channel memory controller [3].

When the store signal is asserted, the data is stored in the level #1 register. If thelevel #2 register is free, the data is moved to this register, where it becomes available to bepassed on to the digital processing unit. A data available flag is asserted to signal theexistence of data in the channel memory. If the two register levels are full, further hits willbe lost, until memory space becomes available again. The channel memory was designedto store data corresponding to two consecutive hits separated by at least 6ns.

The hit register itself is required to capture the data present at the DLL taps in theinstant that the store signal is asserted. Mismatching of the hit registers can generate aspread in the register acquisition time, which translates into an increased differential non-linearity of the converter. The effects of tap register mismatch are not distinguishablefrom the effects of delay cell mismatch.

The reduction of acquisition time mismatch can be done in two different ways:

• Increase of device matching by increasing the gate area of critical devices.

• Increase of acquisition speed by increasing the transconductance of criticaldevices so that delay variations from register to register are smaller.

Since the time critical data sampling is performed only on the level #1 register, onlythe performance of this register is critical. The gate level diagram of a single bit of the hitregister is shown in Figure 18.

storereg. level #1storereg. level #2

tapdata

enable

output(inverted)

Figure 18: The two-level hit register (1 bit).

The load on the ADLL tap output node must be kept low, in order to reduce thepower necessary to drive it. It is therefore important to make the register’s input invertersmaller. The adverse effects of the increased device mismatch are limited by keeping thepropagation delay of this gate low. On the other hand, the back-to-back inverters thatmake up the memory can be made bigger, so that their matching properties and theirdriving characteristics are good.

In order to achieve a good accuracy of the acquisition time, the level #1 register istransparent until the acquisition of a hit. Since the tap outputs are switching at thereference clock frequency, it is necessary to limit the activity of the register by blockingthe level #2 register until data has been acquired in the previous level. For the samereasons tri-statable gates are used, instead of pass-gates. This approach leads to slowersignal propagation, but it reduces to half the number of switching devices when the circuitis idle. Corresponding supply noise reduction and power savings are obtained.

In this application, the data signal is changing asynchronously to the store signal,therefore there is a finite probability that metastability conditions will occur. However, itshould be noticed that this condition only affects one register, where the transition on thedata and store signals occur “simultaneously” . Whichever logic level the register ends upresolving leads only to a measurement error that is at maximum the same as themetastability window width. Since this window is very small, the measurement error isalso small.

In order to synchronise the clkro synchronous read-out and processing control logicand the asynchronous tap register control state machine and avoid metastability to disturbthe correct circuit functionality, two-stage synchronisers [21] were implemented in thesignal paths interfacing the two domains (see Figure 19). Using two-stage synchronisersgreatly reduces the probability of triggering the output of the synchroniser to itsmetastable condition. In addition, the latency that it introduces between the moment datais available in the tap register and the moment that these can be passed on to theprocessing logic is sufficient to resolve any metastability that may have occurred in thetap registers.

clkro

signal signal_syncD D

Figure 19: Two-stage synchroniser using D flip-flops.

When a measurement is performed, the status of the 140 taps that make up theADLL must be accurately captured. The effect of this activity in the accuracy of themeasurement is limited by the fact that it affects the same way all measurementsperformed in a given channel, it only contributes to generate an offset in the measurement.


Page 99

Noise generated from activity in a neighbouring channel, due to its random nature,may disturb the other channels, generating crosstalk. To limit channel to channel crosstalkand obtain an acceptable performance out of these registers, the supply and controldistribution must be carefully designed.

7.3.1. The store sampling signal distribution.

The organisation of the individual tap registers follows naturally the organisation ofthe ADLL. Therefore four rows of 35 two-bit deep tap registers make up the channelmemory. Four similar registers are appended to each of these lines, to store the coarsecounter results (half of each counter word width per row).

These register rows are quite long (>2mm), therefore the store signals arrive to theindividual registers with a time difference proportional to the propagation delay of the linethat distributes them. Two distribution configurations are shown in Figure 20. The lineardistribution configuration corresponds to the vernier time interpolation scheme describedin Chapter 4. The resulting bin size is the difference between the delay of the bin definedby two consecutive taps and the difference between the arrival time of the store signal tothe corresponding tap registers. This error accumulates in the bin that is defined by taps inboth extremes of the row (which correspond to registers in the opposite extremities of theregister row). This error is equivalent to a static phase error in the Timing DLL’s.

Alternatively, the T shaped distribution configuration can be used. In this case theerror distribution is somewhat more complex. Depending on the branch of the distributionT network the bins become larger or smaller than the corresponding delay cell (seeChapter 6 for detailed analysis).

1 2 N/2-1

N/2 N-2 N-1

(linear distribution)store

0

N-bit register row

controlst. machine

1 2 N/2-1

N/2 N-2 N-1

(T distribution)store

0

N-bit register row

controlst. machine

Figure 20: Alternative control signal distribution configurations within a channel memory row.

However, two advantages are obtained from this configuration. The first is thatsince each branch of the T is half as long as the complete row (and is loaded by half thenumber of cells), the propagation delay along the branch is smaller, resulting in a smallerdifference between the store signal arrival time to each register.

The second advantage is that the accumulation of the error is only relative to onebranch of the T, corresponding to half the number of registers in the row, therefore theaccumulated error is smaller than on the linear configuration.

In Figure 21, a comparison between the integrated error obtained when using thetwo configurations is shown. The actual register row configurations are simulated,including the lumped loads connected to the lines due to the registers. They also includethe registers needed to store the coarse time measure, which explains the imbalance of thetwo branches of the T configuration.

-35

-30

-25

-20

-15

-10

-5

0

5

10

0 5 10 15 20 25 30 35

register

inte

grat

ed e

rror

(ps

)

Linear

T-shape

Figure 21: Integrated error for the two proposed distribution configurations (simulation).

Using the T shaped configuration, it is possible to obtain a 6-times reduction of theintegrated error, as shown in Figure 21. The non-linearity of the ADLL due to thepropagation delay of the store signal is improved correspondingly.


The performance of the demonstrator of the ADLL architecture described in thispart of the dissertation is resumed in this Chapter. Only the relevant timing characteristicswill be discussed here, a detailed test report is included in the HRTDC users manual [4].The test bench used to characterise the converter is explained in Appendix A.

8.1. Delay cell range selection and charge-pump current level.

Selection of the delay cell working range is an important feature of the architecture,because it allows adapting the cells to the specific operating environment. Theinitialisation procedure was tried for every working range, using an 80MHz referenceclock. The ranges for which lock was obtained are shown in Table 11.

offset

slope 2 1 0 2 1 0 2 1 0 2 1 0 2 1 0

Phase Shifting ok OK ok ok ok ok

Timing ok ok ok ok ok ok ok OK ok ok ok ok

4working range

DLL

0 1 2 3

Table 1: Locking status for each working range, after the initialisation procedure.

Following the range selection algorithm explained on Chapter 7, the rangeshighlighted in Table 1 are chosen. This selection was used throughout the tests performed.

The smallest possible current level was selected for the charge-pump, since it resultsin the smallest closed loop jitter. The cycle to cycle jitter measured at the output of the lastdelay cell in the fourth Timing DLL is σjitter=15.6ps, for the selected range. It does notvary substantially with the current level of the charge-pump (σjitter=19.4ps at maximumsettings), confirming that the charge-pump operation does not adversely affect theperformance of the converter.

1 Each offset and slope selection pair corresponds to a working range. Offset selection is divided into fiveoptions, ranging from 0 (maximum range offset) to 4 (minimum range offset). Slope selection in dividedinto 3 options, from 0 (minimum range slope) to 2 (maximum range slope).

8.2. Converter linearity.

The measurement of the converter’s linearity required the collection of 840,000random hits generated from an external pulse generator. The results obtained with thisCode Density Test (CDT) test are, with a 98% confidence level (1-α=0.98, thereforeα=0.02), comprised within a tolerance of 3% (DNL) and 17.7% (INL) of the actual values(respectively β=0.03 and β=0.17). If individual DLL’s are evaluated using the same data,a tolerance of 1.5% and 4.4% are obtained, respectively for the DNL and INL, with thesame confidence level (see the Appendix D for details on how to measure the toleranceand confidence level of the test results).

In an architecture such as the one used in this converter, the conversion transferfunction is made of successive replications of the fine time interpolation transfer curvealong the dynamic range. The coarse time counter is responsible for the correct fineinterpolation repetition. Therefore, the linearity of fine time interpolator, made by thearray of DLL’s (ADLL), has the largest contribution to the overall linearity. The ADLLwill be characterised in great detail, whereas a simpler verification will be performed forthe extended dynamic range mechanism.

-1

-0.75

-0.5

-0.25

0

0.25

0.5

0.75

1

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140

bin

DN

L (

LS

B)

-1

-0.75

-0.5

-0.25

0

0.25

0.5

0.75

1

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140

bin

INL

(L

SB)

Figure 1: DNL and INL graphs for the ADLL.

The graphs in Figure 1 show the differential and integral non-linearity of the ADLL.A DNLmax of 0.71LSB (σDNL=0.17LSB) and an INLmax of 0.67LSB (σINL=0.19LSB) isobtained. The main feature of these graphs is the significant non-linearity found in thefirst few bins in the array. These errors occur in the bins whose limits are defined by tapsin opposite extremes of consecutive Timing DLL’s. They are the result of the presence ofphase errors and of the delay cell mismatch on the Timing and Phase Sifting DLL’s.These phase errors2 are originated by any of the mechanisms previously exposed inChapter 6.

2 The phase error must be understood in its wider sense. It may be caused by an actual phase error in thephase detector, to different propagation delay of the phase detector’s input signals or to a significantpropagation delay in the distribution of the sampling signal to the hit registers.

Chapter 8: Experimental Results.

Page 103

The DNL and INL graphs can be compared to the curves in Figure 2. These curveswere obtained from the analytical studies that were carried out3 in Appendix F. It can beseen that, using the analytical model and reasonable assumptions of the direction of thestatic errors that affect the converter, it is possible to estimate the main characteristics ofthe actual non-linearity graphs (the amplitude of each error is normalised). Note that delaycell mismatch was not taken into account on the analytical results shown here.

-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

0 20 40 60 80 100 120 140

bin

DN

L (

LSB

)

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

0 20 40 60 80 100 120 140

bin

INL

(L

SB)

Figure 2: Analytical DNL and INL curves (Din=1% and Dout=-1% of the cell delay, DPD=-0.1% and

Dhit=0.1% of the reference period).

From the same set of data, the characteristics of the four Timing DLL’s can beextracted. These graphs are shown in Figure 3. The relevant feature is the presence of aphase error4 apparent in the large first bin of each DLL. It turns out to be significant inone of the Timing DLL’s (DLL0). A summary of the characteristics of the individualTiming DLL’s is presented in Table 2. From the data presented in the table, the delay cellmismatch obtained for these DLL’s is estimated to be ~4%, a bigger value than what wasexpected.

-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

0 5 10 15 20 25 30 35

binDLL

DN

L (

LSB

DL

L)

DLL0 DLL1 DLL2 DLL3

-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

0 5 10 15 20 25 30 35

binDLL

INL

(L

SBD

LL)

Figure 3: DNL and INL graphs for the different Timing DLL’s (LSBDLL=4.LSB).

3 Due to implementation details, tap 0 of each of the Timing DLL’s was placed in the end of the respectivedelay chain. This position is delayed by one reference clock cycle from the original position, therefore theirtiming is the same. However, in a non-ideal converter the non-linearity graphs corresponding to the twocases are different. The analytical results shown here are obtained taking this into account.4 See footnote2 in page 102.

There may be two origins for this larger value. It may be an effect of the actualdevice mismatch, a technological property seldom disclosed with adequate accuracy bythe vendors, or due to electrical noise coupling into the channel buffers or the delay cells.Note that device mismatch may also affect channel registers and that this effect is notdistinguishable from the delay cell mismatch.

Timing DLL DNL σDNL INL σINL unit

0 0.21 (0.84) 0.06 (0.23) 0.18 (0.71) 0.06 (0.24) LSBDLL (LSB)

1 0.13 (0.52) 0.05 (0.18) 0.10 (0.41) 0.05 (0.19) LSBDLL (LSB)

2 0.12 (0.49) 0.04 (0.17) 0.11 (0.44) 0.04 (0.16) LSBDLL (LSB)

3 0.11 (0.46) 0.04 (0.18) 0.11 (0.43) 0.04 (0.15) LSBDLL (LSB)

PS scheme 0.06 (0.28) 0.04 (0.21) 0.04 (0.22) 0.03 (0.15) LSBDLL-PS (LSB)

Table 2: Summary of linearity obtained for each DLL in the array (LSBDLL=4·LSB and LSBDLL-PS=5·LSB).

The phase shifting DLL can also be characterised using the same data set. Thegraphs in Figure 4 show the non-linearity of the first few cells of the Phase Shifting DLL.

-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

1 2 3 4

binDLL-PS

DN

L (

LSB

DLL

-PS

)

-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

1 2 3 4

binDLL-PS

INL

(L

SB DL

L-P

S)

Figure 4: DNL and INL graphs for the Phase Shifting DLL (LSBDLL-PS=5·LSB).

The non-linearity of the Phase Shifting DLL and the phase error accumulated in thefirst bin of each Timing DLL add up to a large ADLL non-linearity particularly in theADLL bins number 4, 9, and 14, 139 and their neighbours, as would be expected.

The auto-correlation function was applied to the DNL graph of the ADLL (Figure5). It reveals peaks in the auto-correlation factor with a periodicity of 4·λ, whichcorresponds to the interpolation factor F used. Secondary peaks at λ=5 and 10 can also beidentified, corresponding to the phase shifting performed by the Phase Shifting DLL,which introduces a delay of F+1=5 (LSB) between consecutive Timing DLL’s.


Page 105

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

0 4 8 12 16 20 24 28 32 36

λ coefficient

auto

-cor

rela

tion

Figure 5: The ADLL auto-correlation graph.

Although the extension of the dynamic range beyond the reference clock period isachieved by successive translations of the ADLL transfer curve, it is important to verifythe correct behaviour of this operation. In the graphs of Figure 6 only four reference clockperiods are analysed. This dynamic range is judged sufficient for the test being carriedout. A detailed characterisation of the full dynamic range would be unpractical because ofthe large number of hits that would have to be collected. The result of a specific testenabling the verification the correctness of the dynamic range extension across the fulldynamic range, is described later in this chapter. The periodicity of the non-linearitygraphs is evident over the extended dynamic range.

-1

-0.75

-0.5

-0.25

0

0.25

0.5

0.75

1

0 40 80 120 160 200 240 280 320 360 400 440 480 520 560

bin

DN

L (

LS

B)

-1

-0.75

-0.5

-0.25

0

0.25

0.5

0.75

1

0 40 80 120 160 200 240 280 320 360 400 440 480 520 560

bin

INL

(L

SB)

Figure 6: DNL and INL graphs for the converter along four reference clock periods.

For this test 1,680,000 hits where collected, therefore its results have a tolerance of4.2% and 50% respectively for the DNL and INL curves (β=0.04 and β=0.5) with aconfidence level of 98% (α=0.02). It is impractical to collect more hits, due to the longtime it would require, therefore the tolerance in the INL measurements is wide. However,the values obtained for DNLmax and INLmax, respectively 0.73LSB (σDNL=0.18LSB) and0.78LSB (σINL=0.21LSB) are similar to those obtained for the array itself, the differencesbeing well within the tolerances accepted for such tests.

8.3. Linear time sweeps.

The nature of statistical tests such as the CDT results in the averaging of randomeffects like phase noise and electrical noise. Phase noise (or jitter) can be present in thereference clock received, in the hit signal path, or may be due to the closed loop behaviourof the DLL’s. Electrical noise may couple into the DLL’s or into the hit sampling registersthrough the power supply or the substrate. To evaluate the effect of such random noise inthe conversion error, a linear delay sweep is performed, using the test bench described inAppendix A. The following graphs result from a linear delay sweep where 42,000 sampleswhere collected, corresponding to the accumulated effect of 5 samples collected for eachdelay interval of 3ps (5 ‘ trombone’ delay steps of ~0.6ps).

In Figure 7 the error graph resulting from a linear delay sweep spanning tworeference clock cycles is shown. The RMS resolution of the converter is determined fromthe standard deviation of the error histogram. Its value is σ=0.39LSB (34.5ps). Themaximum observed error is 1.62LSB (144.9ps).

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

0 6000 12000 18000 24000 30000 36000 42000

delay step

erro

r (L

SB

)

0

400

800

1200

1600

2000

2400

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

error (LSB)

coun

ts

Figure 7: Error graph and histogram resulting from a delay sweep of two reference periods (σ=0.39LSB).

From the linear delay sweep results, the linearity of the conversion can also becharacterised. As shown in Figure 8, the DNLmax and INLmax measured using this methodare, respectively, 0.73LSB (σ=0.18LSB) and 0.61LSB (σ=0.22LSB).

-1

-0.75

-0.5

-0.25

0

0.25

0.5

0.75

1

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140

bin

DN

L (

LSB

)

-1

-0.75

-0.5

-0.25

0

0.25

0.5

0.75

1

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140

bin

INL

(L

SB)

Figure 8: DNL and INL graphs obtained from the linear delay sweep results.


Page 107

This alternative method enables the confirmation of the results obtained using thestatistical CDT test. The difference found is within the expected tolerance limits for thistest. They can be justified by the sensitivity of the linear time sweep to the accumulationof errors generated during the delay generator alignment step (see Appendix A).

The conversion error of a single Timing DLL (the first one) was also evaluatedusing the same set of data. The error histogram is shown in Figure 9. The measured RMSresolution of this DLL is σ=0.30LSBDLL (105.5ps), very close to the quantising limit(0.29LSBDLL). The maximum error observed was of 0.67LSBDLL (239.3ps).

0

400

800

1200

1600

2000

2400

-1 -0.5 0 0.5 1

error (LSBDL L )

coun

ts

Figure 9: Conversion error histogram for the first Timing DLL (σ=0.30LSBDLL).

0

5000

10000

15000

20000

25000

30000

35000

40000

0 500 1000 1500 2000 2500 3000 3500

delay (ns)

bin

Figure 10: Delay sweep over the full dynamic range.

The correctness of the dynamic range extension up to 3.2µs is confirmed in thegraph of Figure 10. It shows a coarse delay sweep over the conversion dynamic range.The delay step is, in this case, only 1ns. The limit of the dynamic range is clearlyidentified by the step visible in the transfer function after bin number 35,839.

8.4. Inter-channel crosstalk.

Crosstalk between channels is an important characteristic of a multi-channelconverter. The (almost) simultaneous acquisition of hits in several channels should notaffect the individual channel’s performance. Evaluation of the channel performance in the

presence of activity in other channels was done following the procedure exposed inAppendix A. All channels in the IC (except one) were excited simultaneously, and thetime difference between the hit arrival into the channel being evaluated and these channelsis varied so that it covers all the reference clock cycle, a pessimistic, worst case, crosstalksensitivity value is obtained. The measurements performed showed that, even in thepresence of the most unfavourable conditions, the crosstalk is smaller than ±2LSB. Thissituation is shown in Figure 11. Notice that the measurement error is larger than ±1LSBonly when the skew between the reference channel and the three crosstalk channels iswithin a time window of ~0.5·T (6.25ns).

-5-4-3-2-1012345

0 1 2 3

time skew (T)

Figure 11: Measurement error due to crosstalk in the worst configuration.

8.5. Double hit resolution.

To verify the correct functionality of the asynchronous channel buffers, and theirability to capture hits arriving in quick succession, a double hit resolution test wasperformed in accordance to the procedure in Appendix A. Burst of two pulses (the sameas the depth of the channel buffer) with a separation down to 8.5ns (limited by theinstrument used) were correctly acquired, as intended.

8.6. Power dissipation.

The power dissipation of the fully operational circuit was measured to be 800mW. Itincludes the activity of the encoding, buffering and read-out logic integrated in the sameIC. The demonstrator was built using a technology that requires a 5V supply voltage.

8.7. Summary of results.

A summary of the relevant timing features observed in the prototype’s test is shownin Table 3. A full description of the converter characteristics may be found in [4].


Page 109

89.3 ps0.71 LSB / 63.4 ps0.17 LSB / 15.2 ps0.67 LSB / 59.8 ps0.19 LSB / 17.0 ps0.38 LSB / 34.5 ps

3.2 µs< 2 LSB< 8.5 ns

80 MHz4

0.8 W

0.7µm CMOS

6.1 mm2

23 mm2

68 pin PLCC

area

INL

DNL

timing circuitry

IC

max

σmax

σ

package

RMS resolution (σ)

LSB

dynamic rangecrosstalk

double hit resolution

reference clocknumber of channelspower dissipation

technology

Table 3: Characteristics of the TDC prototype.

8.8. Conclusion.

This implementation of the ADLL scheme demonstrates that it is possible to obtaina high-resolution time measurement system using cheap commercial CMOS technologies.The timing characteristics measured on the Time-to-Digital Converter match well withwhat had been predicted during the analysis and development of the circuit.

Four TDC channels were integrated in the IC, together with the necessary encodingand buffering logic. Therefore, sufficient functionality is included to allow it to be used inreal high-resolution time measurement systems. A batch of 1,000 TDC circuits wasproduced in order to be used in the preliminary system tests necessary for thedevelopment of the ALICE TOF detector [1] and also in the front-end of the PesTOFdetector [22][23] used in the NA49 experiment running at CERN.

The drawback of timing interpolator architectures based on the ADLL principle isthe large power necessary to drive a significant number of active DLL delay elements.Since the time interpolator is shared between all the channels in the IC, the powerdissipation per channel would be reduced if more channels are integrated in the samecircuit. However, the overall IC power dissipation would increase, which could renderimpossible the utilisation of standard plastic packages.

References for Part II.

[1] ALICE collaboration, A large ion collider experiment – technical proposal,CERN/LHCC 95-71, Dec. 95.

[2] Aray, Y. et al., A CMOS four-channel x 1K time memory LSI with 1ns/b resolution,IEEE Journal of Solid-State Circuits, Vol. 27, No. 3, pp. 359-364, Mar. 92.

[3] Christiansen, J., An integrated high-resolution CMOS timing generator based on anArray of Delay Locked Loops, IEEE Journal of Solid-State Circuits, Vol. 31, No. 7,pp. 952-957, Jul. 96.

[4] Mota, M., A high-resolution Time-to-Digital Converter – users manual, CERN/EP-MIC.

[5] Lahshmikumar, K. et al., Characterisation and modeling of mismatch in MOStransistors for precision analogue design, IEEE Journal of Solid-State Circuits, Vol.21, No. 6, pp. 1057-1066, Dec. 86.

[6] Pelgrom, M. et al., Matching properties of MOS transistors, IEEE Journal of Solid-State Circuits, Vol. 24, No. 5, pp. 1433-1440, Oct. 89.

[7] Nekili, M. et al., Spatial characterisation of process variations via MOS transistortime constants in VLSI and WSI, IEEE Journal of Solid-State Circuits, Vol. 34, No.1, pp. 80-84, Jan. 99.

[8] Kaenel, V. et al., A 320MHz, 1.5mW @ 1.35V CMOS PLL for microprocessorclock generation, IEEE Journal of Solid-State Circuits, Vol. 31, No. 11, pp. 1715-1722, Nov. 96.

[9] Maneatis, J., Low-jitter process-independent DLL and PLL based on self-biasedtechniques, IEEE Journal of Solid-State Circuits, Vol. 31, No. 11, pp. 1723-1732,Nov. 96.

[10] Johnson, M. et al., A variable delay line for CPU co-processor synchronisation,IEEE Journal of Solid-State Circuits, Vol. 23, No. 5, pp. 1218-1223, Oct. 88.

[11] Kim, L. et al., Metastability of CMOS latch/flip-flop, IEEE Journal of Solid-StateCircuits, Vol. 25, No. 4, pp. 942-951, Aug. 90.

[12] Vittoz, E., The design of high performance analogue circuits on digital CMOSchips, IEEE Journal of Solid-State Circuits, Vol. 20, No. 3, pp. 657-665, Jun. 85.

[13] Bastos, J. et al., Matching of MOS transistors with different layout stiles,Proceedings of the IEEE International Conference on microelectronic test structures,pp. 17-18, Mar. 96.

[14] Gardner, F., Charge-pump Phase-Lock Loops, IEEE Transactions onCommunications, Vol. 28, No. 11, pp. 1846-1858, Nov. 80.

[15] Gardner, F., Phase accuracy of charge-pump PLL’s, IEEE Transactions onCommunications, Vol. 30, No. 10, pp. 2362-2363, Oct. 82.

[16] Paemel, M., Analysis of a charge-pump PLL: a new model, IEEE Transactions onCommunication, Vol. 42, No. 7, pp. 2490-2498, Jul. 94.

[17] Behr, A. T. et al., Harmonic distortion caused by capacitors implemented withMOSFET gates, IEEE Journal of Solid-State Circuits, Vol. 27, No. 10, pp. 1470-1475, Oct. 92.

[18] Maneatis, J. et al., Precise delay generation using coupled oscillators, IEEE Journalof Solid-State Circuits, Vol. 28, No. 12, pp. 1273-1282, Dec. 93.

[19] Forti, F. et al., Measurements of MOS current mismatch in the weak inversionregion, IEEE Journal of Solid-State Circuits, Vol. 29, No. 2, pp. 138-142, Feb. 94.

[20] Mota, M. et al., A high-resolution Time-to-Digital Converter based on an Array ofDelay Locked Loops, Proceedings of the 3rd. Workshop on Electronics for LHCExperiments, pp. 338-342, Oct. 97.

[21] Horstmann, J. U. et al., Metastability behaviour of CMOS ASIC flip-flops in theoryand test, IEEE Journal of Solid-State Circuits, Vol. 24, No. 1, pp. 146-157, Feb. 89.

[22] Pestov, Y., Timing below 100ps with spark counters: work principle andapplications, Invited talk at the 36.th International Winter Meeting on NuclearPhysics, Bormino, 98.

[23] Almasi, L. et al., New TDC electronics for a PesTOF tower – in NA49,ALICE/2000-02 internal note/TOF, Mar. 00.

PART III.

A TDC ARCHITECTURE BASED

ON A DLL AND A PASSIVE RC

DELAY LINE.

Future High-Energy Physics experiments will require complex electronic systems inorder to handle the millions of data channels that constitute them. A significant part ofthese systems will be housed within the respective detectors’ structure.

Given the large number of electronic circuits close to the detector, overall powerdissipation is an issue. Increased detector temperature due to power dissipation is usuallyunacceptable and the weight and area that the power network occupies puts a heavyburden in the detector infrastructure. It is therefore essential to reduce the powerdissipation of the individual circuits to minimal levels.

In the Array of Delay Locked Loops (ADLL) architecture previously discussed,resolution improvements can be obtained if faster delay cells are used, or if theinterpolation factor is increased using extra timing DLLs. Both methods result in higherpower dissipation.

In this part of the dissertation, an alternative time interval measurement architectureis introduced. This architecture uses a different time interpolation principle, which resultsin higher time resolution and lower power dissipation. This architecture offers the samepotential of integration as the ADLL and has ability to perform automatic self-calibration,thus addressing all the requirements set forward by the ALICE TOF collaboration.

In Chapter 9 the proposed architecture is introduced and the method used to obtainincreased time interpolation is explained. Two time interpolation schemes are presented,together with the means necessary to achieve correct operation. Chapters 10 and 11,respectively, include a detailed look into the performance of these two schemes and totheir calibration requirements. Finally, in Chapter 12 the results of the tests performed on aprototype IC that implements these two schemes are reported.


The advantageous characteristics of the DLL’s have already been described in thisdissertation and their use in the context of time interval measurements shown. Analternative architecture, which takes advantage of these characteristics to build a high-resolution time interpolator, is now introduced.

The basis of the time interpolator is a single DLL. Finer time interpolation can beachieved either by further dividing the clock period, using extra phase-shifted timingDLL’s as was done in the ADLL or, alternatively, by sampling the status of the DLLseveral times with a small time interval between samples. In the later case, afterdetermining which sample of the DLL has the reference clock edge arriving to the outputof a given cell, it is possible to derive the hit arrival time with a resolution that is equal tothe sample interval. To get full time coverage over the clock period, the samples must beobtained at uniform intervals over the full delay of a single DLL delay cell. Thisinterpolation method is clarified in Figure 1.

tap n

Vcontrol

tap n+1 tap n+2

tap n-1

Tcell/5

t

tap n

tap n+1

tap n+2

s1 s2 s3 s4

Tcell

s0

(= thit)

Figure 1: Detail of DLL signal propagation illustrating time interpolation through multiple delay line

samples (in this example the number of samples acquired is M= 5).

If a single sample of the DLL status (cell delay is Tcell) is acquired at hit signalarrival time (s0), a transition 1 to 0 is found between the data corresponding to tap(n) andtap(n+1) of the status word. In this case, the hit time referenced to the clock is1:

nTt cellhit ⋅= .

Therefore, the resolution of the measurement is the intrinsic resolution of the DLL,Tcell.

However, if several (M) uniformly spaced samples are acquired across the celldelay, the number of samples (m) elapsed before the reference edge appears in the outputof the cell (no transition found) improves the time measurement accuracy:

−+⋅=

M

mMnTt cellhit , Mm ≤≤1 .

The resolution of this measurement is Tcell/M, where the interpolation factor Mcorresponds to the number of cell delay sub-divisions created by multiple sampling.Considering an N-tapped DLL, the overall resolution, related to the reference clockperiod, Tclk, is:

MN

TT clk

bin ⋅= .

Parameters N and M are, in this scheme, independent. This means that there is nonumerical limit to the ratio to which the reference clock can be divided. Chiefly, it is

possible to divide the reference period into a binary number of bins ( nMN 2=⋅ , with nbeing an integer). This division was not possible in the ADLL scheme.

9.1. Time interpolation circuit.

A time interpolation circuit based on this principle is shown in Figure 2. It includesan N-tapped DLL and M rows of hit registers in order to store the M samples of the DLLstatus that are acquired for each measurement. The multiple sampling signals are definedat fixed time intervals from the moment the hit signal arrives. It is, therefore, natural togenerate these signals using taps of an open-ended delay line through which the hit signalis propagated. However, guaranteeing short delays with high precision is not easily done.Active devices (even if they were fast enough) have timing characteristics that varysignificantly with operating temperature, supply voltage and process parameters.Continuous calibration schemes similar to the DLL are not applicable, since no referencesignal exists, therefore a different delay line should be used.

Passive RC delay lines have been used in the past for timing generation [1], becauseof their low sensitivity to supply and temperature changes. Typically, a sensitivity ofaround 500ppm per Volt or oC is usually found in standard technologies. On the other

1 By convention, the limits of bin n are tap n and tap n+1.


Page 119

hand, their delay is strongly dependent on the circuit processing, since the characteristicsof parasitic devices, such as resistivity, capacitance and even physical dimensions are onlyweakly controlled in digital CMOS technologies. Large circuit to circuit delay variationsare thus expected, which makes start-up calibration of the lines essential to theperformance of the proposed architecture. However frequent calibration is not needed dueto the low supply and temperature dependencies.

PD

N delay cells

hit r

egis

ters

( M

row

s )

cont

roll

able

del

ay li

ne

fromcalibration

Reference clock

Hit signal

Figure 2: Time interpolation circuit.

9.2. Adjustable RC delay line.

In order to be able to perform start-up calibration, the delay line should be madeadjustable. Continuous and discrete adjustment schemes are possible, the choice betweenthem must take into account the linearity requirements and the scheme’s complexity.

Continuous adjustment schemes can achieve maximal interpolation linearity, at theexpense of circuit complexity and higher noise sensitivity. For example, it is possible tovary the depth of the depletion region along the length of a diffused resistor (see Figure 3)by changing the voltage drop across the parasitic junction. This results in a change of thecross-section of the resistor, and therefore of its distributed resistance. The depletionregion across the junction also acts as the dielectric of a distributed capacitor. Therefore achange in its depth affects its capacitance. These resistance and capacitance variationshave opposite effects on the time constant of the delay line, but the overall result is acontinuous control of the line’s propagation delay.

However, this method presents some drawbacks that render its implementationimpractical. The depletion region extends mostly into the less doped n-well, leaving littlecontrol of the line resistivity. The control voltage range limits the amplitude of theprogressing signal. The signal, in fact, also influences the depletion region width, makingthe time constant of the line a complex function of the signal itself.

substrate p-

n-well n-

diffusion p+

depletion region

in out

n+

Figure 3: Continuous delay adjustment scheme based on control of the distributed parameters (simplified).

Discrete adjustment methods provide a better solution for our application. Theirimplementation can be simple and the noise sensitivity of the adjustment scheme can bequite low. A time interpolator that uses these methods has, by their discrete nature, lowerlinearity. Fortunately their non-linearity can be limited to very good levels by a carefulchoice of adjustment range. Of the several possible schemes for discrete adjustment, twowill be described shortly.

9.2.1. Adjustable delay line by tap selection.

One implementation of the discrete adjustment scheme for an RC delay line is todivide it into a large number of small segments. Their extremities are made accessible viabuffered outputs, as shown in Figure 4. Calibration of the line consists in selecting theoutputs that best approximate the interpolation linearity criteria. Since the delay line timeconstant has a strong dependency on parasitic technological parameters, and these areonly weakly controlled during IC production, wide delay variations are expected from onecircuit to the other. This leads to some overlap between the adjustment range ofconsecutive taps. Therefore, it must be possible to connect some of the segment outputs tovarious taps.


Page 121

∆R, ∆C ∆R, ∆C ∆R, ∆C ∆R, ∆C ∆R, ∆C ∆R, ∆C ∆R, ∆C ∆R, ∆C

tap n tap n+1tap n-1

calibration

Figure 4: Adjustable delay line using a tap selection scheme.

All the output buffers are identical and due to the symmetry of their operation theirdelays can, to a first approach, be subtracted and factored out. However device mismatchand temperature gradients will affect them differently, contributing to the degradation ofthe interpolation linearity. These effects can be minimised by careful buffer design and arein fact taken into account when the line is calibrated.

9.2.2. Adjustable delay line by lumped capacitor selection.

Another implementation of the discrete adjustment scheme is to insert a variablelumped capacitor in selected positions along the delay line, as in Figure 5. Thesecapacitors participate in the definition of the line’s time constant, therefore changes intheir capacity affect the delay of the line.

The variable capacitors can be made of a bank of unit-sized capacitors that can beselectively connected to an RC delay line node in order to obtain the best interpolationlinearity. As before, the effects of delay mismatch of the tap buffers are factored outduring calibration.

Contrary to the previous scheme, where the adjustment of the position of one tapdoes not affect any other tap, in this scheme calibration is obtained by changing the delayproperties of the line. Therefore the adjustment of the delay of one tap affects the delay ofthe whole delay line. An iterative adjustment procedure will adequately take into accountthis effect.

∆R, ∆C ∆R, ∆C ∆R, ∆C ∆R, ∆C ∆R, ∆C ∆R, ∆C ∆R, ∆C ∆R, ∆C

tap n tap n+1tap n-1

calibration( independent

calibration per tap )

Figure 5: Adjustable delay line using a variable lumped capacitor scheme.

9.3. Auto calibration.

The automatic self-calibration of the time interpolator is a key part of thearchitecture. The DLL closed control loop is able to perform constant self-calibration,tracking temperature and supply variations. The passive RC time interpolator, on the otherhand, requires initial calibration so that its delay matches the delay of a DLL delay cell.The calibration could be performed at production test time, either by laser trimming or bypre-programming calibration parameters in a ROM-like structure. However this methodwould be expensive and would limit the correct interpolator operation to a very specificreference frequency, leaving the user no with flexibility to adapt the circuit to hisparticular needs.

A more flexible calibration procedure can be obtained if internal means are providedfor in-situ start-up calibration. Collection of hits generated at random time intervals offersan accurate method of characterising the interpolator [2] (see Appendix D). If the hits arecollected into time bins corresponding to the output codes and these are histogrammed,the resulting count differences accurately represented the size of the bins. Using thissimple procedure, the whole interpolator can be characterised. The characterisationobtained can be used to identify the calibration corrections necessary.

This procedure requires a random hit generator and a simple arithmetic unit. Hitsgenerated from a simple, slow, oscillator can be used for characterisation. The mainrequirement is that the oscillation frequency is such that it doesn’ t beat with the referenceclock. A sufficient condition to satisfy this requirement is that the ratio of its frequencyand the frequency of the reference clock is a rational number given by the ratio of twoprime numbers [3] (Appendix E). The arithmetic unit needs only a few accumulators andcomparators. The calibration can be performed in an iterative fashion, thereby improvingits accuracy.

9.4. The prototype.

9.4.1. Choice of technology.

A demonstrator circuit was implemented in order to explore the capabilities of theproposed architecture. A major goal of this work is to define architectures that are wellsuited for high-resolution time measurements, independently of the technology in whichthey are produced. Therefore no special features should be required apart from the onesavailable in standard CMOS technologies.

Since actual results are partially determined by technological properties such astransistor transconductance, gate capacitance, parasitic resistance and capacitance, a faircomparison of the capabilities of the architecture is best obtained if the technology usedfor the ADLL demonstrator is also used to build the demonstrator of this architecture.


Page 123

Furthermore, to emphasise the suitability of the architecture to standard technologies, thesame 0.7µm CMOS technology was used for this prototype.

9.4.2. Prototype characteristics.

The prototype includes all the blocks necessary to demonstrate the proposedarchitecture: The complete time interpolator, together with the respective hit registers, asimplified read-out control unit, a serial programming interface and wide bandwidthdifferential receivers.

The key feature that should be verified with this demonstrator is the ability toperform internal calibration. The adjustment algorithm is made only of registers andcombinatorial logic. These are easily implemented using standard cell libraries availablefor most commercial technologies. It was, therefore, decided that it could be implementedin software, allowing for a higher flexibility of the demonstrator. The calibration hitgenerator, on the other hand, should be implemented in the circuit to evaluate thecorrectness of the assumption that a hit frequency with the required characteristics can begenerated inside the circuit.

The prototype is schematically represented in the block diagram of Figure 6:

φ

hit registers

hit registers

channel 0

channel 1

ref. clock

calibration interface

read

-out

con

trol

ler

tap selection adjustment scheme

lumped capacitor adjustment scheme

R-C

del

ay li

ne

RC

del

ay li

ne

hitgenerator

Figure 6: Block diagram of the prototype.

In this prototype, the two interpolation schemes previously described whereimplemented using a single shared DLL. Together they form a two-channel Time-to-Digital Converter. Each interpolator channel is made of the differential receiver, the signal

selection multiplexer, the adjustable RC delay line, the hit registers and the shared DLL. Itwas shown in Chapter 6 that the integral error due to cell mismatch in a single DLL is afunction of the cell mismatch σcell and of the number of cells N that make up the delaychain. The maximum standard deviation of this error has the following expression, at thecentre of the delay chain:

2

N

cell

cellDLL ⋅

µσ

=σ .

It is therefore important to build the delay chain with a small number of delay cells.In this prototype, we chose N=16 cells as the best compromise between reducing theintegral error due to mismatch and keeping the reference clock frequency within the limitsimposed by the technology.

The DLL delay cells to be used in this circuit have the same time characteristics ofthe ones in the previous (ADLL) circuit. The same cells are therefore used, together withthe same control loop building blocks. The cell reutilization is advantageous since theyhave already proved to have the necessary characteristics in terms of control range, ofmatching and of noise sensitivity. Since some implementation details are common, thecomparison between architectures also becomes easier.

To obtain the intended ~390ps time interpolation in the outputs of the DLL, areference period of 6,250ns, corresponding to a frequency of 160MHz was used. Asshown in the block diagram, the reference clock is only used in the DLL, the read-out andcalibration interfaces are asynchronous to this clock and work at lower frequencies.

The high interpolation factor is obtained using either of the two adjustable RC delayline schemes already described. In both schemes, the delay of the DLL delay cell isdivided into M=8 similar time intervals, resulting in a LSB of ~48.8ps. A total ofM·N=128 hit registers are required to achieve full reference period coverage for eachchannel.

The hit signal integrity is of paramount importance, since the critical timeinformation is mostly contained in the high frequency components of the signal.Differential receivers are used in all external time critical signal paths so as to avoid thecommon noise coupled to these signals as they traverse the system outside the circuit.

A simple hit generator is also included. It is built as a slow, free running, five-inverter ring oscillator whose output frequency is further divided by an 8-bit ripplecounter. The oscillation frequency is selectable via a program word. In a final circuit theoscillator frequency must have a fixed relation to the reference clock frequency, definedby the relations established in Appendix E. Since the clock frequency may changedepending on the application it is reasonable to generate the calibration hit signal based onthe reference clock (see also Appendix E).


Page 125

The externally generated calibration parameters are fed to the delay lines via acalibration interface. Changes on these parameters are only performed at start-up time,when calibration is being performed. Therefore, a slow, serial interface is used.

In the photograph of Figure 7 the main functional blocks of the prototype arehighlighted. The circuit uses 10.7mm2 of silicon and was packaged in a 68 pin ceramicJLCC package.

Tap

Sel

ecti

on

Lumped Capacitor

Hit Registers

Hit Registers

DLL

Oscillator

Figure 7: Prototype circuit showing main functional blocks.

9.4.3. Performance analysis.

Timing characteristics.

The configuration proposed for this converter results in a LSB of Tm=48.8ps. Thetheoretical RMS resolution σq is determined by the quantising performed duringconversion:

ps1.1412

==σ mq

T.

Matching limitations of the DLL degrade the conversion resolution. The maximumcumulative effect of cell mismatch is seen in the middle of the DLL delay chain (seeChapter 6). Assuming a mismatching (σmatch) of 1%, the additional RMS error due to theDLL is:

ps8.72

=⋅⋅⋅σ=σ MTN

mmatchDLL .

The calibration of the RC delay lines acts on its integral non-linearity in such a wayas to limit it to acceptable values. A worst case ±0.5LSB delay line non-linearity results inan additional RMS error of:

ps1.712

5.0=

⋅=σ m

dlT

.

Jitter intrinsic to the closed control loop of the DLL is estimated to be on the orderof σjitter=8ps. Adding all these contributions quadratically, the estimated intrinsic RMSresolution should be ~19.3ps (0.40LSB). External sources of errors, such as referenceclock jitter are not included in this estimation.

The tests performed with this prototype, and the measurement results, will bediscussed in Chapter 12.

Power dissipation.

In DLL based converters, power is mainly dissipated in the DLL itself. Reduction ofthe power needed for the switching of its delay cells is mainly hampered by the devicemismatch. Since the matching requirements are quite high for these architectures, reducedpower dissipation per cell would come at the price of reduced resolution.

In this architecture, power dissipation is reduced by minimisation of the number ofDLL’s. The fine time interpolation is obtained using a passive delay line. Since the DLLis built with the same building blocks used in the ADLL circuit, the power dissipated bythe DLL in this circuit can be estimated from what was measured in the previousprototype to be of the order of 180mW.

Chapter 10. Adjustable RC Delay Line using a TapSelection Scheme.

In this chapter, the implementation of the tap selection adjustment scheme isdescribed. We start with a general analysis of how to build and analyse high accuracy RCdelay lines. These lines must abide to some layout constrains: the line dimensions mustmatch the dimensions of the circuits to which it interfaces and the delays are generated byparasitic devices. The particular characteristics of this scheme are then described, togetherwith the calibration algorithm required to obtain the uniform time intervals.

10.1. RC delay line.

An integrated microstrip RC delay line can be built from any of the interconnectionlayers available in the chosen technology. Diffused layers usually suffer a hightemperature and supply voltage dependency, due to carrier mobility degradation and to thevariation of the depth of the junctions’ depleted region [4]. A polysilicon layer, on theother hand, has a lower temperature dependency and negligible supply dependency (ifbuilt over the thick oxide layer). Metal (or silicided polysilicon) layers have even smallerenvironment dependency, however their small resistivity renders them impractical fordelay generation applications. The polysilicon layer will, therefore, be used to build thedelay line.

The interpolating microstrip line spans a fraction (M-1)/M of the delay of a DLLdelay cell, where M is the interpolation factor, regardless of the operating conditions. Thisdelay is generally short and, traditionally, the line would be analysed as a lumpedelectrical element. However, such analysis would lack accuracy, since most of the criticaltime information is contained in the rising edge of the propagating signal. Accurate delayestimation must take into account the large bandwidth of the signal and thus the longelectrical length of the line at high frequencies. In these conditions, transmission lineanalysis methods must be used.

Several analytical and numerical methods to perform the transient analysis of acomplex network of distributed RC lines have been proposed [5][6][7] resulting in equallycomplex expressions for the propagation delay along the network. A voltage step injectedin an open-ended distributed RC line of length L propagates according to the followingequation [8], where x is an arbitrary position along the line:

( ) ( )

⋅π⋅

−−⋅

−⋅π⋅

−⋅

−

−⋅π

+= ∑∞

= RC

tk

L

xk

kv

txv

k

k

cc

22

1 2

1exp1

2

1cos

2

112

1,

.

The total resistance R and capacitance C of the line are obtained from the distributedresistivity rsq and plate and fringing capacitance, respectively cplate and cfringing.

( )W

LcrLcrLcLWc

W

LrRC fringingsqplatesqfringingplatesq

22 22 +=+⋅

= .

An important characteristic of RC lines that determines the dimensions of theinterpolator is not evident from the propagation delay equation above: as the signalpropagates along the delay line it experiences an apparent increase of propagationvelocity. The reasons for this contra-intuitive effect can be found in the slow slope of theinput pulse when compared to the propagation delay of the RC delay line. In such a shortopen-ended line, the reflected pulse travelling back along the line catches up the forwardpulse before its level has crossed the logic threshold. Looking at Figure 1, if the overallpulse amplitude is observed at position x, the closer x is to the end of the line, the earlier isthe superposition of the reflected pulse and the original pulse and, thereby, the fastest theedge of the overall pulse crosses threshold.

x L-x

R.x , C.x R.(L-x) , C.(L-x)

~

Figure 1: RC line divided in two segments at access point x. R and C are, respectively resistance and

capacitance per unit length.

The delay line interfaces with the rest of the interpolator through output buffers.Efficient layout style requires that these buffers have the same physical design so that theresulting structure is regular and no layout related mismatches occur. Since the signal edgedoes not propagate along the line at a constant velocity, an uniform delay division of theline is obtained only if the line is accessed at irregular distances. To accommodate thesecontradictory demands, the line is divided into equal delay segments that are positionedwith a pitch similar to the pitch of the output buffers, as shown in Figure 2. The gapsopened in the line are filled with a spacer1 made of a conductor whose parasitic resistanceand capacitance is small. These spacers can be built in the metal1 layer. They are includedin the signal path therefore their contribution to the total line delay must be correctlyevaluated. 1 The distinction made between microstrip delay line and spacer reflects only a functional difference. Inreality they are microstrip lines made of different materials but embedded in the same silicon oxide dielectricand having as reference plane the IC substrate. In consequence they are both modelled as devices withdistributed parameters.

Chapter 10: Adjustable RC Delay Line using a Tap Selection Scheme.

Page 129

Other solutions based on non-uniform lines are difficult to implement because of thesmall dependency of the delay with the line width, and the limited number ofinterconnection layers available.

in

in

tap tap tap tap tap

segment of equal delayand equal length

segment of equal delay

polysilicon microstriplayer

metal1 spacer layer

Figure 2: Delay line division into equally sized sections.

10.1.1. RC delay line simulation model.

The complex propagation delay expression shown in the previous section does notlend itself to easy analysis. Approximate delay estimation methods have been developedfor applications in the design automation domain [9][10]. Unfortunately, they tend toreflect a particular network geometry and the accuracy of the delay estimations isgenerally limited. In order to obtain an accurate estimation of the interpolator’s timecharacteristics, a simulation model was developed that includes all the elements thatinfluence them. These include the polysilicon microstrip delay line segments, the metal1spacers, the connection lines, the inter-layer contacts and the devices that make up thedriver, output buffers and capacitors.

The simplest and most accurate model of a uniform line (polysilicon or metal1) isobtained by dividing it into small segments. The number of segments should be enoughso that each of these can be correctly modelled using a network of lumped elements. Theoverall behaviour of a complex line can be obtained by connecting the uniform linesegments through the equivalent circuit of the discontinuities present in the network.

HSPICE [11] has internal models for transmission lines (U-model) which internallydivide the line into multiple T-network sections as the ones in Figure 3. However, in ourwork we chose to explicitly use T-network sections as the basis of the model. It is thuspossible to avoid any dependency on the particular implementation of the simulator. In amicrostrip line with the characteristics of the one under study, inductance Ll and dielectricconductance Gl are very small and, therefore, are not considered. The reference plane ismodelled as a single node. In reality this plane is the lightly doped IC substrate, however

its resistivity Rref can be minimised if some layout rules are followed. These will beexplained latter.

Inter-layer contacts are modelled as single resistors whose values are extracted fromthe technology parameters. In reality their resistivity depends on factors such as currentflow, and a small capacitance to the reference plane is present. However the total contactresistance can be made small by increasing its area, rendering its variation negligible. Allother (lumped) circuit elements can be directly modelled using their equivalent circuit.

Gl.δx Cl

.δx

0.5.Rl.δx0.5.Rl

.δx 0.5.L l.δx 0.5.L l

.δx

line element length = δx

0.5.Rref.δx 0.5.Rref

.δx

Figure 3: Electrical model of an infinitesimal segment of a transmission line (the T-network).

A detail of a section of polysilicon line together with the metal1 spacers and contactsis shown in Figure 4. The distributed electrical parameters are highlighted, for illustrationpurposes. The inter-layer contact is modelled as a resistor to which the capacitorscorresponding to the ends of the connected layers are added, since they turn out to besignificant for the line width being considering.

Rc

CtCt Ct Ct

metal1 metal1polysiliconcontact contact

T-network T-network T-network T-network T-network T-network T-networkRc

Rml

RmlRp

lRc Rc

Rsub

Cml(plate+fringe)

Cml(plate+fringe)Cp

l(plate+fringe)Ct Ct

substrate

polisiliconcontact

thick oxide

metal1

Figure 4: Detail of the physical microstrip line and its equivalent simulation model.

A sample of the Spice model of a delay line with dimensions W (width) and L(length), divided in N infinitesimal lumped elements is shown in the next lines. It includesa single T-element plus the contact.


Page 131

.subckt T-element in out ref (layer parameters, N)

r1 in 1 ‘Rsq_layer*L/(W*N*2)’

r2 1 out ‘Rsq_layer*L/(W*N*2)’

c1 1 ref ‘Cpl_layer*L*W/N+Cfr_layer*2*L/N’

.ends T-element

.subckt Contact in out ref (layer parameters)

c1 in ref ‘Cfr_metal1*W’

c2 out ref ‘Cfr_polysilicon*W’

r1 in out ‘Rcontact/(W/2)

.ends Contact

Rsq, Cpl and Cfr are, respectively, the resistivity, the plate and the fringingcapacitance of the respective layer. Rcontact is the resistance of the contact.

The parameter spread inherent to the fabrication processing is included in the modelthrough three different sets of technology parameters. Each set corresponds torepresentative corners of the process distribution, the centre and the two tails. The effectsof temperature and supply variation are correctly taken into account in the active devicemodels. This is not the case for parasitic devices, such as the microstrip delay line.However, since this dependency is small, it can safely be disregarded.

10.2. Tap selection delay line.

The design of an RC delay line conforming to the requirements of the built converterstarts by the definition of the general dimensions of the line and of the number of accesspoints needed. At this stage only the overall properties, such as the total delay, the totallength and the width of the line are important. Each line segment is made identical, forsimplicity. After having defined these properties, individual line segments can be adjustedso that their delay becomes identical but the overall line delay does not change, resultingin the desired RC line characteristics.

In the following lines, a more detailed description of the general design guidelinesthat were followed is carried out.

Definition of the line width:

The microstrip line should be made wide so that its distributed characteristicsdominate the interpolator’s behaviour. If this was not the case, the temperature and supplysensitivity of the lumped loads connected to the line could undermine its behaviour. Theline should also be wide enough to minimise the dimensional uncertainties due to ICprocessing.

In the tap selection adjustment scheme, the buffers are the only significant loadsconnected to the line. These are simple two-stage buffers made of static inverters. Theinput inverter transistors’ gate area defines the lumped loads attached to the RC line.Mismatch considerations lead to the utilisation of large gate areas for these transistors.First order calculations based on technological parameters result in a total gate capacitanceof ~33fF.

Since the gate capacitance has only a weak dependency on temperature and supplyvoltage, it is enough that the distributed capacitance of each line segment has a largervalue than the lumped capacitor. A line width of 52µm is sufficient to obtain thesecharacteristics.

Definition of the number of access points:

The number of access points is determined by the adjustment scheme followed. Forthe tap selection scheme, the criteria is to define the maximum allowed time intervalbetween access points that results in an acceptable linearity after line calibration.

Given a LSB of 48.8ps, a maximum non-linearity of ~15ps (less than 1/3 LSB) isaccepted. Therefore the maximum delay between access points has been set to 30ps. Usingthe simple definition of time constant, τ=RC, as a rough approximation of the microstripline delay, a time constant variation of ±30% is found as process parameters are changed,for the selected technology. Conversely, to obtain a worst case access interval of 30ps,separation in typical conditions should not be bigger than ~21ps. Dividing the line into 32segments (defining 33 access points) more than covers this requirement.

Definition of the line length:

A total delay of 350ps (~LSB·(M-1)) must be achieved regardless of the processcorner. The same considerations as before show that in the fast corner, the time constant is~30% smaller than in typical conditions. Conversely, if the line covers 350/0.7=500ps intypical conditions, then the initial condition is met in any operating conditions.

The line length is determined from parametric simulations of the completeinterpolator model, including all the devices connected to it. The output buffer pitchdefines the length of the line segments between access points. During simulations thelength of all the microstrip segments is simultaneously varied until the total required delayis obtained.

Assuming a buffer pitch of 31.2µm, the required overall line delay is obtained wheneach of the 32 segments includes a polysilicon microstrip line 7.4µm long and a metal1spacer of 23.8µm. The resulting total distributed capacitance in each line segment is largerthan the buffer input capacitance, as desired.


Page 133

Adjustment of the delay of the line segments:

With identical segments all over the line, the delay between access points is smallertowards the end the line. Adjusting these delays could be done following a trial and errorprocedure, but instead a simpler approach was used:

The previous step resulted in a constant microstrip length vs. segment curve, and inthe corresponding non-linear delay versus segment curve. If an analytical function thattransforms the delay curve into a constant curve is found, the corresponding microstriplength curve can be obtained using the same transformation (see Figure 5).

segment

segment

dela

yle

ngth

f(segment)

segment

segment

dela

yle

ngth

f-1(segment)

f-1(segment)

microstrip line

metal1 spacer

tap

0 m 0 m

segment

Figure 5: Delay line segments’ length adjustment.

This transformation is valid if the microstrip line is uniform and the edgepropagating along the line has constant characteristics. This is not the case of the lineunder study, since metal1 spacers interrupt the microstrip line and the output buffers loadthe line in discrete points. However the uniform line approximation has enough accuracysince the metal1 spacers, due to their low resistivity, have little effect on the delaycharacteristics of the line. The first design criteria also guarantees that the characteristicsof output buffers can be neglected in this analysis. The signal characteristics along the lineare, to a large extent, invariant.

The original delay vs. tap curve can be accurately described by a high orderpolynomial. In this case a fifth order is accurate enough:

55

44

33

2210)delay( xaxaxaxaxaax ⋅+⋅+⋅+⋅+⋅+= ,

where the polynomial constants are obtained from a best squares fit to the delay curve.The inverse function converts the curve into a constant value. It is sufficient to multiplythis result by the desired segment delay (delayave) to obtain the required transformation:

54

44

33

2210

1 )(delay)F(xaxaxaxaxaa

delayxdelayx ave

ave ⋅+⋅+⋅+⋅+⋅+=⋅= − .

The multiplying factors obtained for each segment of the actually implemented lineare shown in Figure 6. The factor that corresponds to the buffer pitch is also shown. Thetransformation results in three segments being larger than the buffer pitch. This leads to alonger delay line, which in turn affects the total line delay.

0

1

2

3

4

5

6

7

8

0 4 8 12 16 20 24 28 32

segment

A djustment Function

B uf fer Pi tch

Figure 6: Adjustment function values.

The lengthening of the line after application of the transformation stems from the,limited, inaccuracy of the uniform line approximation used. In particular the assumptionthat the characteristics of the signal propagating along the line do not change is not true.The rise time of this signal is longer towards the end of the line, as shown in Figure 7.

1.75

1.8

1.85

1.9

1.95

2

2.05

2.1

2.15

0 4 8 12 16 20 24 28 32

segment

rise

tim

e (n

s)

Original

Adjusted

Figure 7: Signal’s rise time along the original and adjusted delay line, in typical conditions (simulated).

However, the deviation caused by this assumption is small and it is effectivelycountered by designing the original line with a shorter delay range. Simulations of theadjusted line result in the graphics shown in Figure 8. A maximum segment delay non-linearity of 4ps is found under typical conditions. Only minor linearity degradation isobserved as operating conditions are varied. These simulations confirm that the maximumsegment delay is 31.3ps and that the line spans a minimum of 378ps thus abiding to alldesign criteria.


Page 135

Other considerations:

An RC line has a low-pass filter behaviour. The attenuation of the high frequencysignal components as it progresses along the line contributes to the delay characteristics ofthe line, due to the degradation of the edge slope it provokes. This effect should be keptsmall, so that the uniform line approximation we have been considering is valid.Therefore, the line should be made such that the edge slope along the line segments usedfor time interpolation is constant or has only a small degradation, regardless of variationson the input signal due to temperature or supply variations.

0

5

10

15

20

25

30

35

0 4 8 12 16 20 24 28 32

segment

dela

y (p

s)

0

100

200

300

400

500

600

700

800

900

1000

0 4 8 12 16 20 24 28 32

segment

cum

mul

ativ

e de

lay

(ps)

typical

fast

slow

Figure 8: Delay and cumulative delay of each line segment (from simulations).

The inclusion of a leading adaptation section in the beginning of the line is a simpleway of achieving this goal. This section is not directly used for time interpolation, but itadapts the signal bandwidth to the delay line’s characteristics. The signal delay due to theinput adaptation section results in an added offset to the measurements, however it doesnot influence the time interpolation function.

The signal’s velocity increase along the line is very marked in the last interpolationsegments. The adjustment function would thus generate very large multiplication factorsfor these segments. The resulting long microstrip segments would make an inefficientinterpolator layout. The use of a trailing adaptation section to behave as a load to the lastsegments of the line allows for a smaller spread of the segment delays and thus, shorterfinal segments are possible. The length of the adaptation sections is limited by the drivingcapability of the input driver. The use of these adaptation sections is illustrated in Figure9.

leadingsection

trailingsection

taps

spacer

Figure 9: The leading and trailing adaptation sections.

The graphs in Figure 10 clearly show the effects of the inclusion of a leading and atrailing section of 79µm length. They result from simulations of the complete interpolator,including input driver and output buffers. The segment delay sensitivity to operatingconditions is minimal if these sections are included, whereas it increases if they areexcluded. The absence of trailing section also generates very small segments towards theend of the line.

6

8

10

12

14

16

18

20

22

0 4 8 12 16 20 24 28 32

segment

5V/25C

4.5V/100C

5.5V/0C

line with leading and trailing sections

6

8

10

12

14

16

18

20

22

0 4 8 12 16 20 24 28 32

segment

line without leading and trailing sections

Figure 10: Segment delay sensitivity to operating conditions (from simulations). The first and second graphs

correspond, respectively, to the same line with and without leading and trailing sections.

All the graphs presented so far are obtained from simulations of the complete modelof the RC delay line. This model assumes an ideal reference plane for the distributed line,which is only roughly approximated by the lightly doped p-substrate used. In order toreduce the reference plain resistance a ground connected wide guard-ring structure isimplemented enclosing the RC line. This way the path of the charges displaced on thesubstrate as the signal progresses through the line is reduced and its effective resistance issmall. The guard-ring also collects charges that are coupled to the substrate by otherdevices on the circuit, therefore obtaining a better isolation of the RC delay line.

10.2.1. Tap selection circuitry.

The selection of access points for the taps is performed after the output buffers, sothat it doesn’ t influence the delay line. To achieve maximum design flexibility, it wasdecided that all access points be accessible to all the taps. This results in a somewhatcomplex connectivity and in a long serial selection chain, as shown in Figure 11.

The selection of the actual access point is performed by the assertion of therespective programmable selection bit. This closes the adjoining transmission gate switch,establishing the intended connection. No hard-wired restriction exists to the parallelconnection of a tap to more than one access point, which would result in a finer timeinterpolation. However this option will not be used since it would require an unnecessarilycomplex calibration algorithm, leading to increased silicon consumption.


Page 137

The programmable serial chain is quite long, having 256 bits. It should be noticedthat the program word can be loaded at a low rate and that once the final calibrationparameters are established, all activity in this circuitry is stopped, reducing the powerdissipation and eliminating potential noise sources. The full selection circuitry shown inFigure 11 uses 0.85mm2 of silicon. The area occupied by this block can be reduced bylimitation of the selectivity of the access points.

RC

del

ay li

ne a

cces

s po

ints

0

32

0 7

taps

serial selectionchain

sel. insel. out

sel. strobe

Figure 11: The access point selection circuitry.

10.3. Auto calibration circuitry.

The adjustable line requires some means of automatic calibration in order to becomplete. The calibration procedure can be divided into two major steps. In a first step thedelay line is characterised (characterisation step). These characteristics can then be usedto compute the access points that tune the taps to the required position (tuning step).

Characterisation step.

The characterisation of each segment of the RC delay line could be done using thedelay of one DLL cell as a reference. However, the delay of these cells suffers somevariation due to mismatch and, therefore, they are not a good reference. Since the numberof bins into which the reference period is divided is fixed, this knowledge can be used toderive the size of the ideal bin (LSB) and use it as a reference for calibration.

A statistical code density test (CDT) [2] offers a characterisation method with therequired properties and, furthermore, is easily implemented on chip. The code density testapplied to a time interpolator requires the collection of a large set of random hits. Thesehits are registered and the number of hits collected for each possible output code (or bin) is

histogrammed. The difference between the bin contents is a direct measure of the relativesize of each time bin.

The histogramming can be performed for all the individual time bins in the circuit( NM ⋅ bins) to obtain a detailed characterisation of the combination of the DLL, the RCdelay line and the hit registers. However only the RC delay line must be characterised.Therefore, the values corresponding to the same RC line bin can be summed across theDLL, effectively obtaining an average measure of the line across the DLL. An addedadvantage is that the effects of hit register mismatch are also averaged, therefore anaccurate characterisation of the size of the RC line bins is obtained.

The size difference between bins due to mismatch of the output buffers isindistinguishable from the difference due to mismatch of the actual line segments. It is, infact lumped together with it and so the line characteristics obtained reflect this increasederror. However this is advantageous, since in this way also the buffer delays are calibrated.

Tuning step.

In the tuning step the measured line characteristics are analysed. Non-linearitysurpassing a given limit is identified and correction measures computed. These measuresare then translated into a calibration word that is serially programmed into the adjustabledelay line. Computation complexity, and therefore the amount of hardware needed,depends on the amount of information that can be extracted from the characterisation step.

A trade-off can be established between these two steps. A faster calibration requiresa larger hardware block and a slower calibration can be performed with little hardware.Two calibration algorithms, representing the two extreme cases, will be presented. In thefirst one, an iterative procedure is established where a global line characterisation is usedto make small adjustments to the line. The procedure is repeated until the line has beentuned to the desired linearity range. This algorithm requires a small calibration hardwareblock, but it may result in long calibration time for extreme parameter deviations. In thesecond algorithm, a lengthy, but complete, characterisation of the line is performed. Fromthis the calibration parameters are obtained in one step at the expense of significanthardware requirements.

10.3.1. Calibration algorithms.

The RC delay line adjustment allows only for a discrete number of adjustmentoptions, therefore the accuracy of the calibration results are limited by the adjustmentquantising step. In the calibration algorithms that we developed for this purpose, theconcept of tolerance, or of non-linearity limit, is used to express the maximum calibrationtolerance allowed for a given application. In the case of the iterative algorithm, thecalibration tolerance can be traded of for calibration time.


Page 139

The algorithms presented here must be simple to implement in hardware, thereforeINL was chosen as the only accuracy criteria. Integral non-linearity error, due to itscumulative action, is the limiting factor in the overall linearity of the converter.Algorithms that also take into account DNL as an accuracy criteria are presented inAppendix G. Their hardware implementation is more complex, and their convergenceslower.

Iterative algorithm.

The starting point of this algorithm is the bin size histogram, obtained after runningthe characterisation step with the calibration parameters extracted from simulationscorresponding to the typical process and environment conditions. Each iteration of thealgorithm consists in the sequential analysis of a bin to verify if it conforms to the non-linearity limits. If this is not the case, new calibration parameters, corresponding to theaddition or subtraction of one delay segment to the respective tap, are calculated. Thesame variation is applied to all the taps in front of it so that the time difference betweenthese taps (the bin size) is unchanged. These steps (characterisation, analysis and tuning)are repeated until the bin linearity conditions are met. The procedure is then repeated forthe next bin in the sequence.

The analysis of the linearity of a bin is based on the bin cumulative histogramch[bin]. It is compared to the ideal histogram (developed from the knowledge of the idealconverter’s bin size LSB). The following operations check if the line conforms to theintegral linearity limit and takes corrective measures for the offending bins.

for i= 0 to M-1

tap[ i] = segment_from_simulation_of_typical_conditions;

for bin= 0 to M-2

repeat until no_changes

Characterisation step;

if ( ch[bin] < LSB·( bin+1-limINL))

for i= 0 to M-bin-2

tap[bin+ i+1] = tap[bin+ i+1] +1;

else

if ( ch[bin] > LSB·( bin+1+limINL))

for i= 0 to M-bin-2

tap[bin+ i+1] = tap[bin+ i+1] -1;

else

no_changes

In Figure 12 the algorithm is illustrated. The acceptable limit of the integral non-linearity is limINL. This limit must be chosen in accordance to the calibration stepsavailable. Limits in the order of 0.5LSB guarantee sufficient linearity and only require alimited number of iterations per tap. The access point selection for each tap is captured intap[i].

CDT

(bin+1-limINL).LSB

(bin+1+limINL).LSB

for i=0..M-bin-2

tap[bin+i+1]=tap[bin+i +1]+1

for i=0..M-bin-2

tap[bin+i +1]=tap[bin+i +1]-1

cumulativehistogram[bin]

<Y

N

<Y

N

for bin=0..M-2

changes=1

repeat until changes=0

tap[all]=typical conditions

Figure 12: Calibration procedure for the tap selection adjustment scheme.

In Figure 13 the results of a simulated calibration run using the proposed algorithmwith limINL=0.3LSB are shown. The interpolation non-linearity is kept within theestablished limits (0.3LSB). By construction, the algorithm doesn’ t search for the optimalcalibration parameters; it stops immediately after the non-linearity limits have beenachieved. The calibration of the particular line conditions exposed required only 10 and 8characterisation steps, respectively for the “ fast” and for the “slow” parameter conditions.

-0.5

-0.4-0.3-0.2-0.1

0

0.10.20.30.4

0.5

0 1 2 3 4 5 6

binRC

typical fast slow

-0.5-0.4-0.3

-0.2-0.1

0

0.10.2

0.30.40.5

0 1 2 3 4 5 6

binRC

Figure 13: Results of calibration for different conditions, using the iterative algorithm (from simulation).


Page 141

The definition of the calibration starting point as being the typical calibrationparameters reflects the probability of starting the iteration close to the final result. In fact,any starting point could be used since it would only affect the speed of convergence of thealgorithm.

If tighter linearity limits are enforced, it is possible to obtain better results. Thegraphs in Figure 14 where obtained with limINL=0.1LSB for the “ fast” conditions.However, in worst case conditions (“slow”) this limit cannot be enforced, since the delayline segments are longer than that limit. If the linearity limit is set too tight, than theconvergence of the simple algorithm here proposed may not be guaranteed. A simple wayto solve this problem is not to allow the algorithm to oscillate between two calibrationsettings for any bin.

The DNL graphs obtained after calibration was performed are also shown. Theyemphasise the fact that due to the regularity of the structure, the maximum DNL is smallerthan what would theoretically be its limit (2·limINL).

-0.5-0.4

-0.3-0.2

-0.10

0.10.20.30.4

0.5

0 1 2 3 4 5 6

binRC

typical fast slow

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.30.4

0.5

0 1 2 3 4 5 6

binRC

Figure 14: Results of calibration using the optimum linearity limit (from simulation).

Single step algorithm.

The first step of this algorithm is a detailed characterisation of the RC delay line,where the size of all the line segments are histogrammed. It is then possible to select thetap access points that lead to the best interpolation linearity.

To characterise the 32 line segments into which the delay line is divided using onlythe 8 taps available, 5 characterisation steps are needed. The small overlap between therange that each covers is required to guarantee that also the segments in the extremities ofeach range are covered. After these 5 characterisation steps, all information required tobuild a cumulative histogram of the segment size is available and it is sufficient tocompare this histogram with the ideal cumulative bin size curve to derive the desiredaccess points.

In the next few lines, an algorithm that finds the best possible calibration parametersfor the line, regardless of the particular conditions, is schematically presented. The

algorithm finds the tap access points that result in the nearest approximation to the idealcumulative bin size curve.

tap[0] =0 ;

for i=1 to M-1

for segment=0 to 31

if (ch[segment] < LSB·i & ch[segment+1] > LSB·i)

if (LSB·i-ch[segment] < ch[segment+1] -LSB·i)

tap[ i] =segment ;

else

tap[ i] =segment+1 ;

In Figure 15 the results of a simulated calibration of the delay line using thisalgorithm are shown. The emphasis on minimising the integral non-linearity of the line isclearly seen in the graphs. The differential non-linearity is, anyway, kept within theaccepted limits for any conditions. The same simulation conditions as before were used.

Comparison with the results obtained using the iterative algorithm show that, if thelinearity limits enforced when using that algorithm are tight enough, then similar resultsare obtained, as would be expected.

-0.5-0.4-0.3-0.2-0.1

00.10.20.30.40.5

0 1 2 3 4 5 6

binRC

typical fast slow

-0.5-0.4-0.3-0.2-0.1

00.10.20.30.40.5

0 1 2 3 4 5 6

binRC

Figure 15: Results of calibration for different conditions (from simulation).

10.3.2. Hardware implementation.

Two variables determine the silicon area required to implement these calibrationalgorithms, the amount of memory needed and the complexity of the calculations needed.These may be traded-off for calibration time.

To determine the amount of memory needed, the number of hits n that must becollected is determined from the formula developed in the Appendix D:


Page 143

−⋅

β

≥ α 11

22/

p

zn .

We will consider the same tolerance (β=5% of the final bin) and confidence level(98%, corresponding to α=2%) for both cases, so that the number of required hits is onlydepending on the bin size that is to be characterised. In the iterative procedure the bin tobe characterised corresponds to the interpolator’s LSB, with a hit probabilityp=1/M=0.125. In the single step procedure all the line segments must be characterised,regardless of the particular working conditions. The minimum bin that must be accuratelycharacterised is then ~10ps wide, corresponding to a hit probabilityp=10/(LSB·M)=0.0256. The following table summarises the relevant numbers obtainedwhen these calculations are carried out. The tolerance for the INL measurements isobtained using the expressions that were also developed in Appendix D.

confidence tolerance number number tolerancelevel (DNL) of bins of hits (INL)

iterative 98% 5% (LSB) 7 <16383 14 13% (LSB)

single step 98% 11% (seg.) 32 <16383 14 62% (seg.)

2nalgorithm

Table 1: Comparison of the two proposed algorithms.

In this table the tolerance is measured a fraction of the quantity being measured, oneLSB (~48.8ps) for the iterative algorithm and one minimum segment delay (~10ps). Thesame reasoning used for the determination of the tolerance of the INL measurements leadsto the conclusion that the addition of a number of line segments to obtain the calibratedbin results in similar final DNL and INL measurement tolerances, expressed in LSB, forboth algorithms.

The register requirements for bin storage and histogram build-up for the twoarchitectures are shown in Table 2. To each of these registers corresponds an equal lengthaccumulator.

total

number size (bits) number size (bits) (bits)

iterative 7 12 1 14 98

single step 32 11 1 14 366

algorithmhistogram cumulative histogram

Table 2: Register (accumulator) requirements for the two proposed algorithms.

The other comparison item is the complexity of the computing needed for eachalgorithm. The iterative algorithm, as shown in Figure 12, requires only a few comparators(see Table 3), one accumulator per bin, and a small amount of decision logic. The singlestep algorithm needs a larger arithmetic unit, capable of performing the more complex

decisions required. The silicon area that it uses is therefore much bigger than in the case ofthe simple iterative algorithm.

algorithm number size (bits)

iterative 2 14

single step 4 14

Table 3: Comparator requirements for the two proposed algorithms.

The time used by each calibration algorithm is, to a large extent, determined by thehit collection time. The iterative algorithm does not have a fixed number ofcharacterisation runs, so the calibration time will vary with the actual conditions found.However, if the number of iterations is f, then the time is proportional to f·214, whereas thesingle step algorithm takes a time proportional to 5·214, where the constant ofproportionality is the collection time of a single hit. It is therefore clear that only if morethan 5 characterisation steps are required (f

!

the single step algorithm.

Chapter 11. Adjustable RC Delay Line using a VariableLumped Capacitor Scheme.

In this chapter an RC delay line adjustment scheme using banks of selectablecapacitors will be analysed in detail. We follow the same analysis method that waspursued for the tap selection adjustment scheme. We will only concentrate on the featuresthat differ from the previous chapter, referencing to it the relevant common topics.

11.1. Lumped capacitor delay line.

In the lumped capacitor adjustable delay line scheme, the adjustment of the RC lineis performed by lumped load variation. This load is an important contributor to the overalldelay therefore the uniform line approximation previously used is no longer valid. Aslightly different set of design rules applies to this line:

Definition of the line width:

The width of the microstrip line is mainly defined by layout considerations. Itshould result in a good compromise between two conflicting requirements. The lineshould be kept wide enough to render dimensional uncertainties due to IC processingsmall1 and to lower the contact resistivity. However, it should be made narrow so that itsoverall capacitance is small and that small selectable capacitors can be used to adjust theits delay. The capacity of the unit capacitor used is ~37.5fF, therefore a line width of40µm results in an acceptable calibration sensitivity.

Definition of the number of access points:

The number of access points is predefined by the intended interpolation factor M.The number of required access points is M=8, corresponding to M-1=7 line segments. TheRC line dimensions must match the dimensions of the output buffer and associated delayadjustment circuitry. The 7 segments into which the line is divided include a polysiliconmicrostrip line and a metal1 spacer.

1 This condition is not strictly necessary since any delay mismatch due to these uncertainties can becorrected during calibration. However, to enable the utilisation of the calibration parameters derived for onechannel in several channels, it is convenient to minimise the mismatch between delay lines.

As will be shown later, the last taps along the line have smaller adjustmentsensitivity (see Figure 5) since their delay can only be adjusted varying the capacitors infront of it. To extend the adjustment range of the last taps, an extra adjustment point isintroduced after the last access point. For reasons of symmetry of the timingcharacteristics of the line, this adjustment point is treated as another access point.Therefore the number of access points implemented is M+1=9, the line being divided in 8segments.

Definition of the line length:

The total line length is defined as for the previous scheme. A total delay of ~350ps,corresponding to 7 segments of 48.8ps must be covered, regardless of operatingconditions. A parametric simulation of the complete interpolator model was again used toobtain the correct overall delay. However, since similar segments are used, the delay ofeach of the line segments changes considerably along the line.

Given a pitch of the adjustment circuitry of 50µm and typical working conditions,the required overall line delay is obtained when each segment is made of a polysiliconmicrostrip 35µm long and a metal1 spacer of 15µm. The middle calibration parametersare used, resulting in a capacitance of ~150fF connected to each adjustment point.

Adjustment of the delay of the line segments:

The distributed line parameters are not dominant in this scheme therefore theprocedure previously used is not accurate. It results in a rough first approximation thatshould be improved by means of parametric simulations. These simulations include thelumped capacitors that make up the calibration scheme. The calibration is performed byaddition, or subtraction, of ~37.5fF unit capacitors from a bank middle capacity value of~150fF.

The multiplication factors obtained from the transformation function previouslydeveloped applied to this line and the ones actually implemented are shown in Figure 1.

0

1

2

3

4

5

6

7

0 1 2 3 4 5 6 7

segment

Calculated

A ctual

Buffer Pitch

Figure 1: Adjustment function values (calculated and actually implemented).

Chapter 11: Adjustable RC Delay Line using a Variable Lumped Capacitor Scheme.

Page 147

The size of the RC line bins, after the adjustment has been performed, is shown inFigure 2. The effects of the parameter spread due to IC processing are clearly visible inthe first graph. In the second graph only the environment conditions are changed, thecalibration parameters are the same for all conditions. It demonstrates that only minorvariation of the delay is provoked by extreme environment conditions.

Other considerations:

The same considerations developed for the previous scheme lead to the inclusion ofleading and trailing section 210µm long. A longer leading section would lead to a reducedeffect of varying input signal characteristics (due to environment changes) in the delayline. However, the driver capabilities would be unnecessarily stretched by this increase inoutput load.

0

10

20

30

40

50

60

70

80

0 1 2 3 4 5 6

binRC

typical slow fast

20

25

30

35

40

45

50

55

60

0 1 2 3 4 5 6

binRC

5V/25C

4.5V/100C

5.5V/0C

Figure 2: Bin size (from simulation). The first graph compares different design corners. The second graph

shows the effects of extreme environment variation for the typical process.

11.1.1. Lumped capacitor selection circuitry.

The variable capacitors implemented in each of the 9 access points are made of abank of 7 unit sized capacitors that can be selectively connected to the RC delay line. Theselection of the number of bank capacitors that are connected to the line is binaryencoded. It is therefore possible to select 8 discrete capacitance levels, the resulting in a±3 levels selection range.

The capacitor bank is schematised in Figure 3. Each capacitor is made of a square16µm2 PMOS device working in accumulation mode. This mode of operation results in amore linear and fast capacitor since the accumulation of charges under the gate guaranteestheir immediate availability. The temperature and supply voltage sensitivity of devicesoperating in accumulation mode is very low. Furthermore the n-well in which they arebuilt increases their isolation from substrate noise. In typical conditions, each of thesecapacitors has ~37.2fF of capacitance. Unit-sized capacitors are used instead of scaledsingle capacitors to guarantee good matching of their values.

2x 4x1x

cal<0> cal<1> cal<2>

fromline

to hitregisters

Figure 3: The unit capacitor bank.

The selection of capacitors is made using a NMOS pass-transistor. This transistor issized to have a high source-drain conductance. The conductance of a device is sensitive totemperature and supply variations: the quantity of thermally generated carriers and thesaturation velocity of the carriers in the channel are a function of the device temperature.The electric field across the channel is a function of the gate voltage, itself proportional tothe supply voltage. The conductance of the pass-transistor must be high enough, tominimise the effects of these variations.

The pass-transistor is cut during a part of the signal excursion. In fact, as the inputsignal rises and the voltage on the gate of the capacitor follows, the Vgs of the pass-transistor is reduced. When it is smaller than the threshold voltage Vth, the pass-transistorcuts its channel, therefore isolating the line from the adjustment capacitor. This, however,does not affect the timing characteristics of the line since it occurs well after the thresholdvoltage of the output buffer has been crossed. The signal edge progressing towards thefollowing taps is not affected by the variations on the line characteristics occurring in thesection of the line already crossed. In addition to the bank capacitance, a fixed capacitancedue to the output buffer and to the diffusions of the pass transistors is also connected tothe line.

R-C delay line access points

870

70

taps

serial selectionchain

Figure 4: The lumped capacitor selection circuitry.


Page 149

In Figure 4 the lumped capacitor selection circuitry is shown. Each capacitor bank isrepresented by a variable capacitor. The capacitor bank connected to tap0 is included onlyfor layout symmetry purposes, since it does not affect the tap delay. This adjustmentscheme gives a compact layout, the selection circuitry of each tap requiring only 6620µm2

of silicon, resulting in a total area of 0.25mm2.

11.2. Auto calibration circuitry.

The auto calibration procedure follows the same basic steps previously described. Itstarts by characterising the line and proceeds to tune the calibration parameters in order tomake the integral and differential non-linearity of the line smaller than a pre-determinedlimit.

In this scheme the sensitivity of the delay between two taps (bin size) to a unitvariation in a given capacitor bank is a complex function of the distance between the tapand the capacitor bank being changed and the position of the capacitor bank within theline. The graph in Figure 5 summarises the tap delay sensitivity to a unit change in eachcapacitor bank. It is not practical to identify all combinations of bin size sensitivity in allenvironment conditions. The calibration procedure must therefore be able to tune the sizeof the bin without this knowledge. An iterative procedure that follows the two stepcharacterisation/tuning scheme is proposed to obtain the correct calibration parameters.

0

2

4

6

8

10

12

14

16

0 1 2 3 4 5 6

binRC

cap 1 cap 2 cap 3cap 4 cap 5 cap 6cap 7 cap 8 all

Figure 5: The effects of lumped capacitor unit variation in the bin size (from simulation).

The adjustment capacitor banks (cap1-7) are located, respectively, in tap1-7 and anextra capacitor bank (cap 8) is included in the end of the line to enable a wider tuningrange of the last tap. The graph in Figure 5 shows that the sensitivity of the bin sizeincreases as the varying capacitor is closer to it and that the cumulative effect of a unitvariation in all capacitor banks is quite independent of the bin under consideration.

The graph also shows the capacitor variations occurring before the bin underconsideration do not change its size. The reason for this is that the properties of a signalpropagating on an RC line are dominated by the characteristics of the section of the linethat lay ahead of it. There is a small contribution from the line section behind it through

signal attenuation and edge slope degradation. However, for the short line underconsideration, these effects are small.

11.2.1. Calibration algorithm.

The starting point of the algorithm is the delay histogram obtained after running thecharacterisation step using the smallest capacitor selection in every bank. With thesecalibration settings the overall delay of the line and of the individual bins is guaranteed tobe shorter than the required delay, regardless of the operating conditions.

The calibration sequence tries to tune the delay line to the linearity limits followingtwo procedures sequentially. In the coarse tuning procedure, the overall line delay isincreased until it is close to the desired delay. The following fine tuning procedureindividually adjusts the delay of each tap to make them conform to the linearityrequirements. Delay tuning using this sequence is preferred to the use of the single finetuning procedure because it results in faster convergence and, therefore, in better results.

Coarse tuning procedure.

In this procedure the capacity of all the banks is simultaneously incremented by oneunit capacitor, resulting in a uniform increase of the size of all bins. The procedure isrepeated until the cumulative bin size is smaller than the ideal delay by less than adetermined limit limcoarse, which is set to 1LSB. In the following lines the procedure isschematically described:

for bank= 1 to M

cap[bank]= 0;

repeat until ( ch[M-2]

·( M-1-limcoarse ) )


for bank= 1 to M

cap[bank]= cap[bank]+1;

The calibration parameters for each capacitor bank are described by cap[bank] andch[M-1] is the cumulative bin size histogram. A block diagram of the algorithm is shownin Figure 6, where the characterisation step is represented by the Code Density Test itperforms.

When coarse tuning has been completed, the size of each bin is similar for all binsin the line, to the extent of its matching characteristics. The delay error is therefore evenlydivided among all the bins. The average differential non-linearity is then small and so thefine tuning procedure can mainly concentrate on adjusting the integral non-linearity.


Page 151

CDT

(M-1-limcoarse).LSB

cap[bank]= cap[bank]+1

cumulativehistogram[M-2]

<Y

N changes= 1

repeat until changes=0initial calibration

for bank=1..M

Figure 6: The coarse calibration procedure.

Fine tuning procedure.

After coarse delay tuning, the fine tuning procedure can be used. Each bin isevaluated one by one and a new set of calibration parameters is iteratively determined toadjust the line delay. The fine tuning procedure builds on the results obtained with thecoarse procedure. Each bin is sequentially evaluated to determine if it adheres to thelinearity limits. If that is not the case, the capacity of the respective capacitor bank isincreased by one unit. This unit increase is repeated for all subsequent banks until asatisfactory result is obtained.

Changing the capacitance of a capacitor bank affects all the bins that are locatedprevious to it in the line. However, since the coarse adjustment step guarantees that theline is shorter than the ideal line and that all the bins have similar size, this effectcontributes to improve the linearity of the line.

The fine calibration algorithm is schematically presented in the next few lines.limINL is the differential and integral linearity limit.

for bin= 0 to M-2

bank= bin+1;

repeat until ( no_changes | bank> M)


if( ch[bin] < LSB·( bin+1-limINL ))


bank= bank+1;

else

no_changes

This algorithm approaches the final calibration solution by small increases in thesize of the bin, therefore only the inferior limits to the linearity need to be checked. Thetap delay increase per fine characterisation step is not enough to surpass the superiorlinearity limits, in any conditions. In Figure 7 a diagram of the fine calibration algorithmis shown.

CDT

(bin+1-limINL).LSB


cumulativehistogram[bin]

<Y

N

for bin=0..M-2

changes= 1

repeat until changes=0 | bank>M

from coarse calibration

bank= bank+ 1

bank= bin+ 1

Figure 7: The fine calibration procedure.

On the graphs of Figure 8, the results of the coarse calibration step are shown fordifferent simulation conditions. Since calibration started from the shortest possible lineconfiguration, all the bins are smaller than intended and the expected downward slope ofthe INL curve is found.

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.30.4

0.5

0 1 2 3 4 5 6

binRC

typical fast slow

-1-0.8-0.6-0.4-0.2

00.20.40.60.8

1

0 1 2 3 4 5 6

binRC

Figure 8: Results of the coarse calibration step for different conditions using the proposed algorithm (from

simulation).

Using restrictive limits in the fine calibration steps, an optimised calibration can beobtained. In Figure 9 the results of the fine calibration step are shown. The linearity limitlimINL was set to 0.1LSB. In extreme conditions this limit proves to be too strict for the


Page 153

simple algorithm proposed. However, the linearity of the line after calibration is betterthan 0.2LSB, in any conditions.

-0.5-0.4-0.3-0.2-0.1

00.10.20.30.40.5

0 1 2 3 4 5 6

binRC

typical fast slow

-0.5-0.4-0.3-0.2-0.1

00.10.20.30.40.5

0 1 2 3 4 5 6

binRC

Figure 9: Results of the fine calibration for different conditions using restrictive linearity limits (from

simulation).

The number of calibration steps required for each of these conditions where 12, 15and 7 steps, respectively for the “ typical” , “ fast” and “slow” simulation conditions. Sincethe calibration algorithm begins with the calibration settings resulting in the fastestpossible RC delay line, the simulation conditions that lead to a slower starting point(“slow” conditions), require less calibration steps to converge into the final calibrationsettings.

11.2.2. Hardware implementation.

This calibration algorithm is quite similar to the iterative algorithm proposed for thetap selection implementation of the line. The hardware requirements are also similar, sincethe number of taps to tune and the number of hits that should be collected for linecharacterisation are the same. The following tables resume the hardware requirements interms of registers (and respective accumulators) and comparators. Requirements in termsof control logic are similar to the iterative calibration algorithm proposed for the tapselection adjustment scheme.

algorithm totalnumber size (bits) number size (bits) (bits)

iterative 7 12 1 14 98

histogram cumulative histogram

Table 1: Register (accumulator) requirements for the present algorithm.

algorithm number size (bits)

iterative 1 14

Table 2: Comparator requirements for the present algorithm.

11.3. Comparing the two adjustment schemes.

A simple comparison of the two adjustment schemes proposed in this part of thethesis shows that it is possible to adjust the linearity of RC delay line to the desiredvalues. Although the calibration aims at obtaining a small integral non-linearity, thedifferential non-linearity that is achieved under any simulation conditions is also small.Simulations show that using the lumped capacitor scheme leads to better final results.However, these results are obtained at the expense of a longer calibration time.

Due to the independence of the calibration of each tap, the calibration principle ofthe tap selection scheme is simple. The limit for the linearity that can be achieved with theRC delay line is determined the number of access points that are implemented.

In the lumped capacitor scheme, the calibration of each tap is not independent, itschange affecting several taps differently. The calibration algorithm takes into account allthese effects, therefore its working principle is more complex and the calibration time islonger. However, due to the multiple combinations of effects that can be used, the finalRC delay line linearity is potentially better.


In this chapter the results of tests performed on the TDC’s prototype are reported.The test procedure followed is very similar to the one detailed for the ADLL prototype inthe previous part of this work, so it will not be described again. The performance of thetwo interpolation topologies will be shown separately. Their evaluation follows the samecriteria: Linearity, temperature sensitivity, power dissipation and timing resolution.

The calibration algorithms for the RC delay line where implemented in software,which has the advantage of allowing for high flexibility. For example, the calibrationlimits can be easily adjusted to the performance required. To generate the random hits,both an external pulse generator and an internal oscillator where used, without anynoticeable difference. A set of 600,000 random hits is used to characterise the delay line.According to the calculations obtained in Appendix D, this results in a 98% confidencelevel that the measured results are correct within a tolerance of 0.8% (DNL) and 2.2%(INL). It should be noted that when characterising the complete converter, a tolerance of3.4% (DNL) and 19.4% (INL) is obtained for the same confidence level.

12.1. Tap selection scheme.

The graphs in the Figure 1 illustrate the results of the calibration of the delay lineimplementing the tap selection adjustment scheme, obtained using the iterative calibrationalgorithm. The graph labelled “before” represents the state of the line before calibration.In this situation the calibration parameters resulting from simulations of the typicalconditions are used. The other graph (labelled “after” ) is the final result of the calibration.Differential and Integral non-linearity of the RC delay line better than ±0.2LSB isachieved.

It is noteworthy that the linearity of the line previous to calibration is close to thetraditional 0.5LSB acceptance limit, which shows that the models used to describe thedelay line are quite accurate.

0.7-0.6-0.5-0.4-0.3-0.2-0.1

00.10.20.30.40.5

0 1 2 3 4 5 6 7

binRC

before after-0.5-0.4-0.3-0.2-0.1

00.10.20.30.40.5

1 2 3 4 5 6 7 8

binRC

before after

Figure 1: Delay line calibration results: DNL and INL graphs.

The DNL graph is repeated in Figure 2, together with the maximum and minimumdelay measured for each tap in every hit register column. The spread in the measureddelay results from timing mismatch of the hit registers corresponding to the same tap. Itshows a maximum timing error spread of ~0.55LSB (27ps). The delay of the last tap (tap8) is defined by the difference between the propagation delay of the hit signal along theRC delay line and the propagation delay of the clock signal along one DLL delay cell. Thevariation of its delay includes, therefore, a contribution from the delay mismatch of theDLL delay cells, which cannot be distinguished from the other contributions. Therefore itis not shown in this graph.

-0.5-0.4-0.3-0.2-0.1

00.10.20.30.40.5

0 1 2 3 4 5 6 7

binRC

min max ave

Figure 2: Spread of the RC line tap delay over the DLL cells.

The RC delay line was measured at several temperature conditions to verify itsimmunity to temperature changes. The results are shown in the graphs of Figure 3.

The circuit was heated up to the specified temperatures using a heat source thatcould be moved closer or further away from the circuit. The temperature was measureddirectly on the package using an electronic thermometer. Only after the selectedtemperature stabilised was the characterisation performed. A different chip was used inthis test, therefore the linearity graphs have different shapes from the ones previouslyshown. However, it is clear that the calibration procedure used also resulted in good RCdelay line linearity.


Page 157

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0 1 2 3 4 5 6 7

binRC

30C 40C 50C 60C-0.5-0.4-0.3-0.2-0.1

00.10.20.30.40.5

1 2 3 4 5 6 7 8

binRC

30C 40C 50C 60C

Figure 3: Temperature dependency of the RC delay line.

The delay variation of the complete line is measured from the variation of the delayof the last tap. This method is valid since the last tap is defined in one extreme by thetemperature independent delay of the DLL delay cell. Any variation of the delay of theline will be reflected in a symmetric variation of the delay of the last tap. A total variationof 17,3% of an LSB is observed for a temperature increase of 30oC, which means that thedelay of each RC line tap increased in average ~2.5%. This result can be extrapolated tothe complete temperature range, resulting in a temperature sensitivity of only 0.83% per10oC.

Voltage supply sensitivity was also investigated. The procedure used was tocharacterise the delay line at different supply levels, within the allowed range for thetechnology. No significant delay variation was observed.

12.1.1. The complete interpolator.

The RC delay line is an integral part of the time interpolator. Their correctintegration is proven by the linearity graphs of the time-to-digital converter built from it.The graphs of Figure 4 correspond to the DNL and INL of the converter.

-1

-0.75

-0.5

-0.25

0

0.25

0.5

0.75

1

0 16 32 48 64 80 96 112 128

bin

-1.25

-1

-0.75

-0.5

-0.25

0

0.25

0.5

0.75

1

1 17 33 49 65 81 97 113

bin

Figure 4: DNL and INL graphs of the converter (using the tap selection adjustable delay line).

A maximum integral non-linearity INLmax=1.12LSB and differential non-linearityDNLmax=0.72LSB were measured. The non-linearity is a result of the delay mismatch ofthe DLL cells. Consequently, as shown in the graphs, it is found in taps corresponding tothe transitions between successive DLL delay cells. The measured DLL delay cellmismatch is 3-4% (RMS), slightly larger than expected. It was shown on Chapter 6 thatthe contribution of the DLL cell mismatch σDNLDLL to the converter non-linearity σINLconvert.

is determined by the by the following expression:

2

NMDLLDNLconvert.INL ⋅⋅σ=σ .

Therefore, disregarding the contribution of the RC delay line, a maximum converternon-linearity of 0.5LSB (50%) requires a DLL cell mismatch smaller than 3.1%. Sincethis matching level has not been obtained, integral non-linearity of the interpolator islarger than the goal of ±0.5LSB.

-0.25-0.2

-0.15-0.1

-0.050

0.050.1

0.150.2

0.25

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

binDLL

max min ave

Figure 5: INL of the DLL, showing spread of the tap delay along the hit register rows.

The integral non-linearity graph of the DLL is shown in Figure 5. The spread of theDLL tap delay along the eight hit register rows is also shown. The delay differencebetween these eight samples of the DLL is due to the mismatch of the hit registers, whichleads to different sampling times for each tap in different rows. The maximum spread thatwas observed is 0.06LSBDLL (~25ps), which corresponds to 0.51LSB. This result agreeswith what was previously obtained from the different samples of the RC delay line (seeFigure 2).

In Figure 6 the integral non-linearity graph of the interpolator is superimposed onthe one of the DLL. The interpolator closely follows the DLL non-linearity, as would beexpected since the non-linearity of the RC delay line can only accumulate along its limitedlength.


Page 159

-1.25-1

-0.75-0.5

-0.250

0.250.5

0.751

0 16 32 48 64 80 96 112 128

bin

-0.15625-0.125-0.09375-0.0625-0.0312500.031250.06250.093750.125

0 2 4 6 8 10 12 14 16

binDLL

converter DLL

Figure 6: Comparison of the INL graphs of the DLL and of the complete converter.

A statistical test such as the code density test just described is, by its nature,insensitive to random effects. This is an advantage when static characteristics are beingmeasured. However, it is important to verify that none of the random noise mechanisms,such as electrical noise or phase noise (jitter), degrades significantly the dynamiccharacteristics of the converter. The effects of clock correlated noise can be interpreted asa static degradation mechanism, since they interact with the measurement the same wayevery reference period. They are, therefore, captured by the statistical tests.

0

200

400

600

800

1000

1200

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

error (LSB)

Figure 7: Conversion error (σ=0.51LSB).

A linear time sweep covering the complete clock period was performed. During thistest 26,000 samples where collected, corresponding to 10 samples per step of ~2.4ps. Thehistogram of Figure 7 represents the conversion error along the full dynamic range of theinterpolator. The distribution of the error has a RMS of σ=0.51LSB, with tails extendingto ~1.5LSB. The same test was performed at different temperatures, to prove that theconversion error is not affected by temperature variations. The resulting histograms,displayed in Figure 8, show that only minimal temperature sensitivity is found.Temperature sensitivity of the RMS error is ~1.3% per 10oC.

0

200

400

600

800

1000

1200

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

error (LSB)

30C

60C

Figure 8: Temperature effects on the conversion error (σ=0.50LSB/30oC and σ=0.52LSB/60oC).

It may be interesting to evaluate the dynamic performance of the DLL itself, tounderstand its contribution to the overall conversion error. In Figure 9 the characteristicstep-wise transfer function that results from a DLL time sweep is shown. Phase noise(jitter) present on the reference clock itself, or due to the dynamics of the DLL, force theoutput code transitions to jitter around their average value. If a number of DLL samples istaken close to the expected transition time, the output will vary between the two codes dueto jitter. Variations due to the test set-up, such as small changes of the sampling timeitself, are also included in this result, since they are indistinguishable from variations dueto intrinsic jitter.

This test enables the measurement of the DLL’s internal jitter. The time interval inwhich the output code uncertainty occurs corresponds to the peak-peak jitter seen on thattransition. The maximum uncertainty is expected in the last transition. This can be verifiedin the graphs of Figure 10 that show the two code transitions occurring in the oppositeextremes of the delay chain1.

0

2

4

6

8

10

12

14

16

14868 17522 20176 22830 25484

step

Figure 9: DLL linear time sweep.

1 Tap 0 was implemented in the end of the delay chain, therefore it is the tap with the worst jitter. Forconvenience, bin 15 is renamed bin –1.


Page 161

The second graph in that picture is a magnification of the transition from code –1 to0, representing the jitter at tap 0. The “ trend” line in that graph represents the relativenumber of samples in the two consecutive codes. From this curve, the average transitioninstant can be extracted and so the deviation of the transition occurrence (the jitter) isreadily obtained.

The peak-peak jitter for these two transitions was measured to be, respectively,14.4ps and 19.2ps. To perform this measurements, 100 samples where taken for each timestep of 2.4ps (equivalent to 4 “ trombone” steps). The maximum jitter is measured to beσjitter

DLL).

The jitter that is observed in the first cell (σref !"$#&%')(*!+!,(*-!.-/jitter of the reference clock as it arrives to the delay chain. In the end of the chain thedynamics of the DLL increase the uncertainty of the transition time. Assuming,(optimistically) that these two sources of jitter are uncorrelated, the jitter generated by theactivity of the DLL closed loop is σloop012 346572

-1.5

-1

-0.5

0

0.5

1

1.5

2350 2550 2750 2950 3150

step

-1.5

-1

-0.5

0

0.5

2368 2376 2384 2392 2400 2408 2416

step

data trend

Figure 10: Detail of the DLL time sweep showing code transitions in opposite extremes of the delay chain.

The DLL conversion error histogram in Figure 11 is obtained from the same set ofdata as the one in Figure 7. It shows that the conversion error of the DLL consideredindependently has an RMS of σDLL=0.29LSBDLL, with very small tails.

0

200

400

600

800

1000

1200

1400

1600

-1 -0.5 0 0.5 1

error (LSBDLL)

Figure 11: DLL conversion error (σ=0.29LSBDLL).

The conversion error stems from several contributions, which add up to the totalerror. The main contributors to the conversion error are the quantising mechanism

( DLL. LSB121=σquant ), the integral non-linearity (measured to be σINL=0.05LSBDLL)

and the reference clock jitter (σjitter=0.01LSBDLL).

DLL221222

. LSB29.001.005.012 =++=σ+σ+σ=σ −jitterINLquantDLL .

The measured RMS error (σDLL=0.29LSBDLL) is in accordance with the expectedvalue, demonstrating that no major error source was left unaccounted for.

12.2. Lumped capacitor scheme.

The tests previously described were also applied for the channel using the RC delayline implementing the lumped capacitor adjustment scheme. Only the relevant results thathighlight the differences between the two adjustment schemes will be discussed.

-0.5-0.4-0.3-0.2-0.1

00.10.20.30.40.5

1 2 3 4 5 6 7 8

binRC

-0.5-0.4-0.3-0.2-0.1

00.10.20.30.40.5

1 2 3 4 5 6 7 8

binRC

Figure 12: RC delay line’s DNL and INL graphs (using the lumped capacitor adjustment scheme).

The graphs in Figure 12 represent the results of a calibration run of the RC delayline that uses the lumped capacitor adjustment scheme. The linearity obtained is wellwithin the limits set forth for calibration. The DNL graph shows that the predictedmatching characteristics of the line where not obtained. This also leads to a worse INLthan what was predicted in simulations.

The linearity graphs corresponding to measurements performed on the full converterare shown in Figure 13 and Figure 14. The integral non-linearity of the converter closelyfollows the, appropriately scaled, DLL non-linearity. This shows that the converter’scharacteristics are limited by the DLL, as was also seen in the previous scheme.


Page 163

-1

-0.75

-0.5

-0.25

0

0.25

0.5

0.75

1

1 17 33 49 65 81 97 113

bin

-1

-0.75

-0.5

-0.25

0

0.25

0.5

0.75

1

1.25

1 17 33 49 65 81 97 113

bin

Figure 13: DNL and INL graphs of the converter (using the lumped capacitor adjustable delay line).

The differential non-linearity and the integral non-linearity are measured to be0.70LSB and 1.03LSB, respectively. The main non-linearity errors are, again, found in thetaps corresponding to DLL delay cell transitions. Comparison of the INL graphs of theDLL in Figure 6 and Figure 14 reveals the limitation of the DLL topology used. Thedifferent behaviour of the DLL when use together with each channel is most likely due toclock related noise coupling into the converter. The small non-linearity of the DLL is animportant fraction of the bin, at the level of the time interpolation implemented in thisconverter.

-1-0.75-0.5

-0.250

0.250.5

0.751

1.25

1 17 33 49 65 81 97 113 129

bin

-0.125-0.09375-0.0625-0.0312500.031250.06250.093750.1250.15625

1 3 5 7 9 11 13 15 17

binDLL

converter DLL

Figure 14: Comparison of the INL graphs of the DLL and of the complete converter.

The conversion error was measured using the same set-up as before. Thesemeasurements resulted in the histogram of Figure 15. A RMS error of 0.44LSB (~21.5ps)is obtained, and the maximum observed error is smaller than 1.5LSB. The resolutionmeasured with this adjustment scheme is slightly better than with the previous scheme.The improvement is a consequence of the better DLL linearity obtained.

0

200

400

600

800

1000

1200

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

error (LSB)

Figure 15: Conversion error (σ=0.44LSB).

The DLL behaviour is expressed in the histogram of Figure 16. It confirms thecorrect dynamic behaviour of the DLL.

0

200

400

600

800

1000

1200

1400

1600

-1 -0.5 0 0.5 1

error (LSBDLL)

Figure 16: DLL conversion error (σ=0.29LSBDLL).

12.3. Conversion time offset.

In the measurements so far presented, the conversion time offset has not beeninvestigated. Conversion time offset is a characteristic that cannot be consideredindependently from the extrinsic offsets generated by the acquisition circuitry before theconverter. Offset variation internal to the converter, due to temperature changes or anyother origin, can be measured and circuit techniques may be applied to reduce them.However, more important offset variations will be present, namely on the sensor,discrimination, signal shaping and driving circuitry. It is, therefore, important tocharacterise all the acquisition chain together when environment conditions changes areobserved and absolute time measurements are to be acquired. This test is out of the scopeof these studies.

The maximal temperature dependency of the internal offset was measured using thetap selection scheme. It is of ~124ps/10oC. This dependency is a consequence of variationof the delay of the long internal hit signal’s path. In this circuit, no effort was made tocompensate this variation with a similar variation in the reference clock path.


Page 165

12.4. Power dissipation.

One important design goal of this circuit is to achieve reduced power dissipation perchannel. An overall power dissipation of 0.22W was measured, which comparesfavourably with the results obtained with the ADLL architecture.

12.5. Summary of results.

The main characteristics measured during the tests are summarised in Table 1. Someimportant properties of a TDC, such as multi-hit capability were not characterised, sinceno effort was made to optimise the prototype for them. Also crosstalk between channelswas not investigated because the two channels are very dissimilar and cannot be usedsimultaneously. However, it is possible to extrapolate the results obtained with the ADLLbased prototype to be confident that multi-hit capability is easily obtained and that smallcrosstalk is possible.

tap selection lumped capacitormax 1.12 LSB / 55 ps 1.04 LSB / 51 ps

σ 0.44 LSB / 21 ps 0.39 LSB / 19 psmax 0.72 LSB / 35 ps 0.68 LSB / 33 ps

σ 0.18 LSB / 9 ps 0.22 LSB / 11 ps0.51 LSB / 25 ps 0.44 LSB / 21 ps0.15 LSB / 7 ps 0.21 LSB / 10 ps0.21 LSB / 10 ps 0.30 LSB / 15 ps

0.85 mm2

0.25 mm2

10.7mm2

68 pin JLCC

INL

DNL

temperature sensitivitypower dissipation

number of channelstechnology

area

package

2

0.7µm CMOS

1.3% / 10oC

0.22W

common characteristics

area

DNLINL

adjustment scheme

R-C line

converter

RMS res. (σ)

LSB 48.8 ps

Table 1: Characteristics of the TDC prototype.

12.6. Conclusions.

The experimental results obtained with this prototype demonstrate that it is possibleto build a low cost high-resolution integrated TDC using the proposed architecture. Theconverter has lower power dissipation than what was measured with the ADLL-basedprototype previously described.

It was shown that the performance of the converter is mainly limited by the DLLcharacteristics, essentially its non-linearity. The DLL used in this circuit was built usingthe same circuit blocks as in the ADLL TDC, which use single ended signalling levels. Amore linear DLL can, most likely, be obtained if a less noise sensitive, differential,topology is used.

The two adjustable RC delay line schemes proved to work according to the designgoals, both in terms of calibration and of temperature sensitivity. Furthermore, the modeldeveloped to study the delay line proved to be accurate, even with the limitedtechnological information available.

References for Part III.

[1] Gogaert, S. et al., A 10ps resolution 1.6ns tuning range CMOS delay line for clockdeskewing in data recovery systems, Proceedings of the ESSIRC’95, pp. 54-56, Sep.95.

[2] Doernberg, J. et al., Full speed testing of A/D Converters, IEEE Journal of Solid-State Circuits, Vol. 19, no. 6, pp. 820-827, Dec. 84.

[3] Bossche, M. V. et al., Dynamic testing and diagnostics of A/D converters, IEEETransactions on Circuits and Systems, Vol. 33, no. 8, pp. 775-785, Aug. 86.

[4] Tsividis, Y., Mixed Analog-Digital VLSI devices and technology – an introduction,McGraw-Hill 1996, Chapter 5.

[5] Elmore, W., The transient response of damped linear networks with particularregard to wideband amplifiers, Journal of Applied Physics, Vol. 19, pp. 55-63, Jan.48.

[6] Dvorak, V., On the transient analysis of distributed RC networks, InternationalJournal of Electronics, Vol. 33, no. 4, pp. 385-391, 1972.

[7] Antinone, R. et al., The modeling of resistive interconnectors for integrated circuits,IEEE Journal of Solid-State Circuits, Vol. 18, no. 2, pp. 200-203, Apr. 83.

[8] Sakurai, T., Approximation of wiring delay in MOSFET LSI, IEEE Journal ofSolid-State Circuits, Vol. 18, no. 4, pp. 418-426, Aug. 83.

[9] Rubinstein, J. et al., Signal delay in RC tree networks, IEEE Transactions onComputer-Aided Design, Vol. 2, no. 3, Jul. 83.

[10] Lee, M., A multilevel parasitic interconnect capacitance modeling and extraction forreliable VLSI on-chip clock delay evaluation, IEEE Journal of Solid-State Circuits,Vol. 33, no. 4, pp. 657-661, Apr. 98.

[11] The HSPICE user’s manual, Meta-Software 1996.

PART IV.

CONCLUSION.


In this work we studied the problem of building integrated Time-to-DigitalConverters featuring very high resolutions. Our main goal was to demonstrate the abilityto perform these time measurements in a single, low cost, monolithic circuit produced instandard commercial CMOS technologies. Stand-alone operation was envisaged, thereforethe selected architectures are able to perform self-calibration. Also the possibility ofincluding digital signal processing functionality in the same circuit was pursued.

Several architectures where analysed, of which one was selected for a more detailedanalyses that lead to the construction of a demonstrator IC. Furthermore, a novel high-resolution time interpolation architecture was proposed and the analysis carried outconfirmed a good time resolution and low power operation.

13.1. The ADLL architecture.

The study of a time interpolation technique using an array of phase shifted DLL’swas pursued. In this study, we analysed:

• The origins of non-linearity in a DLL based converter. We have showed theeffects of delay cell mismatch and how it accumulates along the delay chain. Wehave also highlighted the diverse causes of phase errors intrinsic to a DLL andthe effect of these errors along the delay chain. An additional source of non-linearity was revealed, which has similar effects on the converter non-linearityas a phase error. The source of this delay error was shown to be the differentpropagation delays of the sampling signal towards the individual registers.

• The origins of phase noise in a DLL. We have analysed the effects of phase noisedue to the “bang-bang” operation of the closed control loop.

The results of the analysis carried out for the single DLL case was extended to thecase of an array of phase shifted DLL’s (the ADLL).

• The non-linearity of an ADLL based converter. We have developed an analyticalmodel of the ADLL that permits to establish the effects of independent delayerror sources in the overall converter non-linearity. The presence of the phaseshifting DLL is also accounted for in the model. We have highlighted the mostimportant modes of delay error accumulation, in particular showing that there is

an intrinsic periodicity on the non-linearity curves (periodicity F and also F+1,where F is the interpolation factor) due to the interpolation scheme.

• The optimal interpolation factor F. Based on the expected conversion integralnon-linearity due to delay cell mismatch, we have established a relation betweenthe mismatch level and the resolution of the converter. It shows that, dependingon the actual mismatching characteristics of the delay cells, the maximuminterpolation factor F that corresponds to a consequent increase in resolution islimited to F=4 or 5.

Using the analysis tools developed, we where able to translate the performancegoals into circuit requirements. We then proposed simple ways of constraining theindividual blocks to the requirements by optimisation of the critical performanceparameters.

• Minimisation of the phase error. We have proposed solutions to reduce all thephase error sources, including an alternative topology for the distribution of thesampling signal to the individual registers. The need to use distributed parametertechniques when studying signal distribution in the time critical circuitry ishighlighted.

• Minimisation of the delay cell mismatch. A method for reducing the delaymismatch of a current-starved delay cell regardless of the operating conditionswas proposed.

• Noise sensitivity minimisation. The noise sensitivity of the scheme was analysedand minimised using simple circuit layout rules.

A multi-channel high-resolution TDC based on these studies was built in a standard0.7µm CMOS technology. It demonstrated the correctness of the conclusions of theanalysis. In particular an RMS resolution of 34.5ps was obtained throughout the full 3.2µsdynamic range. This performance, which has been confirmed in several applications, isobtained in an IC that also includes processing and buffering logic.

13.2. The DLL & RC delay line architecture.

We have proposed a new interpolation technique for Time-to-Digital Converters.The possibility of designing adjustable RC delay lines in a “digital” technology wasdemonstrated and we have also showed how a self-calibrating scheme can beimplemented in the same circuit. In this study, we have analysed:

• Adjustment methods for RC delay lines. We have proposed two discreteadjustable RC delay line schemes.

• The characteristics of RC delay lines. We have proposed a methodology topartition such a delay line so that it complies with the timing and layoutrequirements of a particular design. We give guidelines to the design of circuits

Chapter 13: Summary of Results.

Page 173

that interface with the delay line without increasing its sensitivity to variationsof the environment conditions.

• Calibration procedure. A Code Density Test based calibration scheme wasproposed. This simple scheme can be hardware implemented and integrated inthe converter IC. It requires a pulse generator uncorrelated with the referenceclock and a calibration logic block.

• Calibration algorithms. We have proposed several calibration algorithms. Theiradvantages or disadvantages were discussed.

Based on these studies, and on the DLL building blocks developed for the ADLLbased TDC, we have built a TDC prototype. Two different channels, each implementingone of the adjustable RC delay lines proposed, were included. Dividing these delay linesin M=8 segments, an interpolation factor F=8 is obtained. The technology used is also a0.7µm CMOS technology. Using the calibration algorithms that we have proposed, wewhere able to calibrate the two delay lines, obtaining an INLmax better than 0.21LSB ineach of them. The RMS resolution of the converter was measured to be as low a 21ps.We also have shown that the performance of the converter is very insensitive to variationsof the environment conditions. Furthermore, the use of passive RC delay lines to performtime interpolation results in a low power operation, as was demonstrated with theprototype.

13.3. TDC characterisation.

We have developed a consistent methodology to characterise the timingperformance of a T/D converter. With this methodology we were able to evaluate thestatic and the dynamic characteristics of the converter.

• Define a consistent set of performance metrics. These metrics, adapted from theADC world, are well matched to the TDC environment.

• Build a comprehensive test set-up. We have developed an automated test set-upthat is able to perform very linear time sweeps across an extended dynamicrange. This set-up is critical for the evaluation of the dynamic characteristics ofthe converters that we have developed.


The major goal of the work described in this dissertation was to demonstrate thepossibility of using standard digital CMOS technologies to build integrated, multi-channel, time measurement systems with high resolution. Having established thispossibility, by means of two different successful architectures, a wide range of fullyintegrated systems can be developed to match the specific requirements of the severalinterested users within the High-Energy Physics community. Alternatively a single“universal” system could be designed to fulfil all these separate requirements.

During this work, although only cursory attention was given to the actualimplementation of the system level functionality, its presence was always accounted forand the architectures proposed are adapted to operate in that environment.

Two logical development paths may now be followed:

• Profit from short gate delays available in the new, sub-micron, technologies todemonstrate the “ultimate” performance that can be extracted following thearchitectures here presented (or any other having the same capabilities).

• Develop a general purpose T/D converter. This IC would cover the entireresolution spectrum envisaged for the near future, from the “ low” 250ps range,to the “high” 25ps range. It should also allow for different buffering strategiesand also for intelligent data filtering.

Although the first path is scientifically stimulating and poses some interestingdesign challenges, it’s the second path that results in a better engineering compromisebetween single-minded performance and overall functional flexibility. It is also a more“multi-discipline project” , requiring the convergence of multiple design techniques (fullcustom / standard cell) and therefore including important challenges.

Such a converter as been envisaged and preliminary studies carried out. Theenabling architecture, the interpolator based on a DLL and on a RC delay line, wasdeveloped and proven during this work. Most of the system level functionality has beendemonstrated elsewhere in the context of lower resolution converters.

In a conventional, DLL based, converter, all the channels integrated in the same ICperform their time interpolation by sampling the status of a common DLL, as isschematically described in Figure 1. To obtain a higher resolution TDC, using the scheme

based on a DLL and an RC delay line that was proposed in this work, a number of equallyspaced samples of the status of the DLL must be stored. The scheme is also pictured inFigure 1.

PDclkref

hit<0>

hit<1>

hit<2>

hit<3>

PDclkref

hit

RC

delay line

Figure 1: A four channel TDC using a DLL based scheme and a single channel TDC with four times smaller

LSB, using the same building blocks and an RC delay line.

A close look at this figure already gives a hint on how to obtain high resolution fromwhat is intrinsically a lower resolution converter (the DLL). By the simple addition of anadjustable RC delay line (and the calibration hardware), it is possible to obtain a higherresolution converter channel using for the effect a small number of lower resolutionconversion channels. By proper selection of the hit signal origin, a single IC can be usedas a high channel density, low resolution, T/D converter or as a low channel density, high-resolution, T/D converter, depending on the user needs (see Figure 2).

PDclkref

hit

RC

delay line

hit<0>

hit<1>

hit<2>

hit<3>

Figure 2: The general purpose TDC architecture.

Timing information can be carried in one, or in the two edges of the hit signal. Itwould therefore be convenient for the converter to be able to measure these two instants inthe same channel. This feature will be implemented in this converter.

Modern CMOS technologies, for example with a 0.25µm minimum feature size,result in very small gate delays. It is, therefore, possible to build a very compact timeconversion block and integrate it with a large processing logic block.

It is envisaged to include a more complete buffering hierarchy. Each channel willhave a dedicated four measurement deep pipelined memory (to store two pairs of rising-

Chapter 14: Future Developments.

Page 177

falling edge measurements). The second level of hierarchy will group 8 channels (2 inhigh-resolution mode) in a deeper FIFO memory. Each of these groups includes a separatepre-processing logic block that performs encoding, coarse time selection, etc. The groupsare then multiplexed into a single data stream.

An optional, trigger based, data reduction processor will also be included. Thisprocessor receives commands from a central processor used to identify time windows ofinterest. Measurements occurring outside these time windows are deemed uninterestingand, therefore, are filtered out of the data stream.

The function of the local data reduction processor is to compare the timemeasurements acquired in each channel with the interesting time window, which isidentified by a “ trigger” time-tag. Measurements that are accepted by this criterion arestored in a common read-out FIFO memory.

PDclkref

RC

delay line

hit<0>

hit<1>

hit<2>

hit<3>

coarse counter

channel buffer (4 words)

encoding & offset adjustchannelarbitration

group buffer(256 words)

triggermatching

x2

x4

1 low resolution channel

4 low resolution channels(1 high resolution channel)



read-outinterface

triggerinterface& control

PLL

JTAGinterface(testing /

programming)

super-groupbuffer

(256 words)




mux.

mux.

hit<31:0>

JTAG

trigger interface

read-out interface

RC delaycalibration

(& hit oscillator)

calibrateR-C delay

Figure 3: Block diagram of the general purpose TDC.

A simplified block diagram of the general purpose TDC is shown in Figure 3. Aclock multiplying PLL is included to generate the required reference period for the high-resolution option.

The timing specification of this TDC is shown in the next table. Three resolutionlevels can be obtained with the specified 40MHz reference clock, 224.5ps, 56.4ps and14.2ps. These values correspond to the standard deviation of the quantisation error (σq) of

an ideal converter. In reality other sources of time uncertainty will add up. They willaffect more the higher resolution options. The experience gained during this work allowsfor a preliminary estimation of the RMS resolution to be (σTDC) ~226ps, ~61ps and ~25ps,respectively.

ref. frequency 40 160 MHzref. period 25 6.25 nsDLL LSB 781.3 195.3 ps 32 cells / DLL

RC line LSB - 48.8 ps using 4 channelsdynamic range 102.4 102.4 µs

Table 1: Timing specification of the general purpose TDC.

PART V.

APPENDIXES.


The evaluation of the high-resolution TDC prototypes produced during this workrequired the development of a specific test bench. This test bench allows for themeasurement of several important timing characteristics of the converter:

• Conversion linearity (differential and integral).

• Conversion error, from a linear time sweep.

• Crosstalk between channels.

• Double hit resolution.

In particular, the linear delay generator used for the characterisation of theconversion error required the development of an adequate instrument.

Given the fine time characteristics that this test bench is intended to measure,especial attention was given to the integrity of the time critical signals. High performancePECL logic is used wherever the reference or the hit signals are handled. Controlledimpedance (50Ω) micro-strips and cables are used to transport, or delay, these signals.

Conversion linearity.

The static characteristics of the converter (INL, DNL) are measured using a standardCode Density Test (CDT) that has been extensively described in the literature (in thecontext of ADC testing) [1],[2],[3]. Other methodologies have been used to characteriseconverters, for example using Walsh Functions [4], but their complexity does not seemrequired for the test of TDC’s, which typically require that only a limited number of binsbe characterised in great detail. The resulting characterisation includes some uncertainty,which can be limited as discussed in Appendix D.

In a CDT, the device under test (DUT) collects a large number of hits generatedwith a random time interval. Due to the randomness of the hit arrival time, they areuniformly distributed along the dynamic range of the DUT. Therefore, if the conversionresult of each hit pulse is read-out and accumulated in an histogram whose binscorrespond to an LSB of the converter, the number of hits collected in each of thehistogram bins is proportional to the size of the actual converter bin. The DNL graph isobtained directly from the test. The INL graph is derived from the cumulative histogram

of the bin sizes, which is obtained by adding up consecutive bin sizes. Unfortunately, alsothe uncertainty of the size of each bin is accumulated in this operation. Therefore, for thesame number of collected hits, the accuracy of the differential characterisation is greaterthan the accuracy of the integral characterisation.

The CDT test requires a random pulse generator or, instead, a pulse generator whichfrequency is selectable (the choice of the sampling frequency is done in accordance withAppendix E) and a computer to collect and histogram the measurements obtained. In ourset-up we used a Hewlett-Packard 8012B pulse generator. Data is collected in a computerthat also controls the test bench.

Since this is a statistical test, no information is obtained on the dynamiccharacteristics of the converter. Chiefly, random errors due to reference clock jitter or tothe dynamics of the DLL and random noise due to other activity within the circuit areaveraged out. In order to observe these effects, a linear time sweep is performed across asignificant segment of the dynamic range.

Conversion error.

The linear time sweep is performed with a very short delay step (more than an orderof magnitude shorter than the LSB of the converter under consideration), over a range of afew reference clock cycles. This range is wide enough to characterise the fine timeinterpolation scheme and also to verify that the dynamic range extension scheme does notinterfere with the interpolation performance.

Standard (active) delay generators do not have the linearity required to perform alinear time sweep suitable for this application. Therefore a computer controlled passivedelay generator, using a step-motor driven coaxial phase shifter (also known as“trombone”), was used. Although no direct measurement of the “trombone” linearity wasperformed, the measurements obtained and the mechanics of the instrument give a highdegree of confidence in its linearity. In order to expand the small dynamic range of the“trombone”, a selectable delay box was used. When the “trombone” reaches the end of itsdynamic range, it is rewinded to the initial position and a corresponding delay isincremented in a delay box, by proper selection of the internal cable length.

The accuracy of this alignment procedure is a concern. Even a small differencebetween the delay of the apparatus before and after the adjustment step will accumulateinto a sizeable error, after a few adjustment steps. To guarantee an adequate alignment ofthe delay generator its delay is measured prior to adjustment, using an adjustment TDC,and again after adjustment. The two measures are compared and a fine adjustment isperformed (changing the “trombone” delay), if required. The adjustment TDC does nothave to be linear, since the two measurements it has to perform are identical. However, itmust have a resolution better than the “trombone” delay step. Averaging many hits is aneasy way of achieving high resolution in commercial delay measurement instruments.

Appendix A: TDC Characterisation Test Bench.

Page 183

In Figure 1, a block diagram of the computer controlled linear delay generator isshown, illustrating its connection to the device under test (DUT). Since the DUT is a timestamp TDC, the hit signal was synchronised with the reference clock (clkref) before itprogresses through the trombone and the selectable cable delay box. The adjustment TDCwas mounted in parallel with the DUT, in such a way that in normal operation it does notinfluence the test.

In our test bench, we used the Sage model 6709 coaxial phase shifter driven by acomputer controlled stepper motor, to obtain a minimum delay step of ~0.6ps in adynamic range of 2ns. The CAEN programmable delay box N-146A, which has aminimum delay step of 0.5ns and a dynamic range of ~80ns, was used to extend thedynamic range of the apparatus. The adjustment TDC used was the Stanford ResearchSR620 universal time interval counter, which quoted resolution is ~2ps if 1000 hits areaveraged.

DUT

clkref

selectablecabledelay

trombone

fine adjustment

coarse adjustment

adjustmentTDC

hit signal

adjustment control(from computer)

Figure 1: The linear passive delay generator block diagram (computer controlled).

This apparatus is rather cumbersome and requires an external adjustment TDC.Therefore a simpler, but more reliable method was developed to perform the delayadjustment. If the two extremes of a delay line are connected to each other by means of aninverting amplifier, the frequency of oscillation of the oscillator thus generated is given bythe following expression:

)(21

delayline ADf

+⋅= ,

where Dline is the delay of the line and Adelay is the propagation delay of the amplifier thatcloses the loop. Therefore it is possible to derive the delay of the line from themeasurement of the oscillation frequency (given the delay of the amplifier).

As explained before, the absolute value of Dline is not necessary, since it is used onlyfor the comparison between the delay of the line before and after alignment. Therefore,the only important property of the Adelay is its invariance and not its absolute value. A fastPECL inverter guarantees this invariance (within acceptable limits).

The block diagram of this scheme is shown in Figure 2. When the delay of the delaygenerator is to be measured, a set of relays is switched in such a way that the oscillatorloop is closed and the DUT is disconnected from the generator. The oscillation frequencyis measured before the adjustment step and again after it. If these frequencies are different,the ‘trombone’ delay is again adjusted until the frequency agrees with the one measuredbefore the adjustment step.

A simple procedure to measure frequency is to count the number of oscillationcycles completed in a given time interval. The bigger the time interval, the better is theaccuracy of the measurement. The oscillation period of a stable oscillator (or a multiple ofit) can be used to set the counting time interval. This simple delay generation scheme wasimplemented in a 9U VME board that also includes all the alignment logic required.

It is not practical to extend this test to the full dynamic range of the converter, due toits duration and to the possible accumulation of errors generated on the successivealignment steps. Fortunately the verification of the correctness of the dynamic rangeextension over its full dynamic range does not require the generation of small delay steps.For this application, it is more convenient to perform a coarse time sweep with delay stepsof ~1ns. Since the requirements in terms of jitter and linearity of the hit signal are relaxed,an active instrument can be used as a delay generator, resulting in a faster characterisation.In our test bench, the Stanford Research model DG535 digital delay generator was used.

DUT

clkref

selectablecabledelay

trombone

fine adjustment

coarse adjustment

hit signal

cycle counter adjustmentcontrol

VME interface

oscillator

Figure 2: The linear passive delay generator block diagram (automated).

Appendix A: TDC Characterisation Test Bench.

Page 185

Other characteristics of the converter, such as crosstalk, double hit resolution andsensitivity to the activity on the digital circuitry can be evaluated with this test bench (theyare applicable only for the converter based on an array of DLL’s).

Crosstalk.

The characterisation of the crosstalk between channels was performed in accordancewith the following procedure:

A double delay sweep is generated using the Stanford Research model DG535digital delay generator. One channel (the channel under test - CUT) is stimulatedindependently from all the other channels in the circuit (the offending channels - OC). Foreach delay step in the CUT, a delay sweep spanning three reference clock cycles issimultaneously performed on all the OC. In this way, the worst correlation between thesimultaneous hits in the OC, a hit in the CUT and the phase of reference clock can befound. The comparison between the peak error obtained using this procedure and the errorobtained for the same delay in the CUT, but with the OC inactive, gives a measure of theworst case, maximum error due to crosstalk.

Double hit resolution.

Double hit resolution is measured using the Philips PM5786 pulse generator togenerate bursts of pulses. This pulse generator is able to generate pulses with a minimumseparation of ~8.5ns, corresponding to the maximum double hit resolution that can bemeasured. The bursts are generated asynchronously to the reference clock so that anycorrelation between the reference the clock and the activity in the channel buffer can beidentified.


The control operation of a DLL is based on the integration of the phase errorresulting from the comparison of the phase of the periodic reference signal and of theVCDL output. The negative feedback control loop adjusts the delay of the VCDL in orderto minimise the phase error.

The DLL configuration is a first order loop, therefore, if the sampling operationinherent to the phase detector is ignored, a simple continuous time approximation can beused to analyse its frequency response. This approximation can be used for loopbandwidths a decade or more smaller than the operating frequency.

Following the naming conventions established in [5], we define output delay Do(s)as the delay established by the VCDL and input delay Di(s) as the delay to which thephase detector compares the output delay. These two quantities are related by thefollowing expression:

( )TCs

KIsDsDsD

F

VCDLCPoio ⋅⋅

⋅⋅−= )()()( ,

where ICP is the charge-pump current, KVCDL is the gain of the VCDL, CF is the loop filtercapacitance and T is the period of the reference signal. The average charge-pump currentis given by the fraction of the reference period in which the charge-pump is activated(Di(s)-Do(s)/T) times its peak current (ICP)1. It is, therefore, proportional to the phase(delay) error.

The closed loop response is then:

n

i

o

wssD

sD

+=

1

1)()(

,

where wn is the loop bandwidth.

TCKI

wF

VCDLCPn ⋅

⋅= .

1 If the loop is built in a “bang-bang” configuration, using a two-state phase detector, the average charge-pump current can be evaluated over a large number of reference periods.

Since a first order loop is inherently stable, the only stability criteria of interest is toavoid the influence of the higher order poles introduced by the delay around the sampledfeedback loop. In our application, the reference signal has a known and stable frequency,therefore it doesn’t require a high tracking bandwidth. It is, therefore, interesting to reducethe bandwidth of the loop by increasing the filter capacitor and decreasing the charge-pump current and the gain of the VCDL. In this way the phase noise inherent to the“bang-bang” loop operation can be minimised.

The nature of the loop, where a reference signal is propagating along a VCDL,means that variations of the input signal’s phase will also propagate through the VCDLand thus reduce the measurement accuracy. Therefore, although internal phase noise canbe minimised and the delay of the VCDL stabilised at one reference period T, the phasenoise carried by the reference signal must be eliminated at its origin, if the reduction ofthe measurement accuracy is to be minimised.

Appendix C. Analysis of the Effects of Cell Delay Mismatchon the Integral Non-Linearity of a DLL.

A DLL is a closed feedback control loop with a somewhat complex dynamicbehaviour. The object of this study is the static behaviour of the DLL that results fromaveraging of the dynamics of the control loop over a long period. Without loss ofgenerality, we will assume an ideal control loop that is able to keep the delay along theDLL stable and equal to one clock period T. The following analysis follows broadly themethod developed in [6] for resistor strings in flash ADC’s.

For the purpose of this analysis, we will focus only on random mismatch effects.The delay of each cell in the DLL can be seen as an independent random variable with anormal probability distribution (PDF) G of mean NTm =µ and variance 2

mσ (N is the

number of cells that make up the DLL). The mean corresponds to the expected cell delay,and the variance gives a measure of the spread of the actual delays around the mean.

In these conditions one can see the DLL as a delay chain whose delay at the originis D=0 and at the other extreme is D=T.

tap 0 tap 1 tap j tap N-1 tap N

0 T/N j·T/N (N-1)·T/N T 0 ≤ j < N

Figure 1: Voltage controlled delay line with fixed length.

The delay Di of each cell is defined as random variable with a normal PDF( )2,G mNT σ . The delay from the origin to the output of cell j can be expressed as a

fraction of the total delay of the delay chain:

( )YX

XYXu j +

=, ,

where ∑=

=j

iiDX

1

and ∑+=

=N

jiiDY

1

.

Since Di have normal PDF’s, X and Y are also random variables with normal PDF:

X: ( )211 ,G σµ , with mj µ⋅=µ1 and mj σ⋅=σ1

Y: ( )222 ,G σµ , with ( ) mjN µ⋅−=µ2 and mjN σ⋅−=σ2 ,

using the variable transformations:

YXX

u+

= and Xv = ,

we have

( ) ( ) ( )( ) J⋅= vuYvuXfvug ,,,, ,

where |J|, the Jacobian of the function, is defined as:

( ) ( )

( ) ( )( ) ( ) ( ) ( )

uvuY

vvuX

vvuY

uvuX

vvuY

vvuX

uvuY

uvuX

∂∂

⋅∂

∂−

∂∂

⋅∂

∂=

∂∂

∂∂

∂∂

∂∂

=,,,,

,,

,,

J .

From u

uvY

−⋅=1

and vX =

we get ( )

XYX

uv

uv

uv

uu 2

2221

10

+==−=⋅−

−⋅=J

and thus

( ) ( ) ( )X

YXYXfvug

2

,,+

⋅= .

Considering X and Y independent variables, their joint PDF is:

( ) ( ) ( )YfXfYXf ⋅=,

X and Y have normal PDF’s,

( ) ( )

σ⋅µ−

−⋅σ⋅π⋅

=21

21

1 2exp

21 X

Xf ,

( ) ( )

σ⋅µ−

−⋅σ⋅π⋅

=22

22

2 2exp

21 X

Yf ,

thus

Appendix C: Analysis of the Effects of Cell Delay Mismatch on the Integral Non-Linearity of a DLL.

Page 191

( )( ) ( )

σ⋅σ⋅

µ−

−⋅⋅σ+µ−⋅σ

−⋅σ⋅σ⋅π⋅

⋅=22

21

2

221

21

22

212 2

1

exp2

1,

uuv

v

uv

vug .

The PDF for u is, by definition

( ) ( )∫∞

∞−

= dvvugug ., ,

thus

( )

dvAB

vu

Av

AB

Cuu

ug

⋅

−⋅

⋅σ⋅−⋅⋅

⋅

−⋅

⋅σ⋅−⋅

⋅σ⋅σ⋅π⋅=

∫∞

∞−

2

222

2

222

221

2exp

21

exp2

1

with 22 )1( uurA −+⋅= , uurB ⋅µ+⋅µ−µ⋅= 22

21 )( , 22

21 )( urC ⋅µ+µ⋅= and

21

22 σσ=r .

If the substitution 10 uuu += ( Nju =0 ) is made, the equation is obtained:

⋅−+

⋅⋅

⋅

−⋅+

⋅−⋅

⋅⋅π⋅

=

21

002

23

00

21

001 )1(

1

12

exp

)1(1

1)1(

12

)(0

u

uuCN

uuu

uuCN

ugmm

u

where m

mmC

µσ

= .

Since ))1(( 0021 uuu −⋅ « 1, the following equations are obtained:

Nuu

Cmu)1( 00

0

−⋅⋅=σ

⇔

σ⋅⋅

σ⋅π⋅=

2

21

1

000 2

exp2

1)(

uuu

uug

( )

σ⋅−

⋅σ⋅π⋅

=2

20

000 2

exp2

1)(

uuu

uuug

Thus, u (the delay division ratio) has a normal probability density with average00

uu =µ and a standard deviation 0uσ . The standard deviation of the integral error is

obtained if 0uσ is normalised to the (average) cell delay:

( )

( )N

jNjC

NN

mDLL

um

muDLL

−⋅⋅=σ

⇔σ⋅=µ

µ⋅⋅σ=σ

00

The maximum standard deviation of the integral error is found in the middle of thedelay chain, with a value σDLL(max) of:

2(max)

NCmDLL ⋅=σ

which compares favourably with the maximum standard deviation of the integral error inan open (not enclosed in a control loop) delay chain σDC(max), found in the end of thedelay chain:

NCmDC ⋅=σ (max)

Therefore, the inclusion of a delay line inside a closed control loop such as the DLLimproves the standard deviation of the integral linearity error by a factor of two.

Appendix D. Number of Random Samples Required forTDC Characterisation.

A hit arriving at a time interpolator at a random time has equal probability p ofbeing collected by each of the bins into which the reference period is divided (assumingidentical bins). This probability is a function of the total number of subdivisions (Nbins),given by binsNp 1= .

To estimate the size of a given bin, an experiment can be devised where random hitsare generated (trials). The possible outcomes of a trial are success, if a hit is collected inthe bin, or failure, if not. After a large number of trials have been executed, the ratio of thenumber of successes over the number of trials is a direct measure of the bin size.

The accuracy of the estimation is, of course, related to the number of trials. It istherefore, important to know what is the minimum number of trials that should beexecuted to obtain the required accuracy.

The experiment just described has the following properties:

• It consists of a number (n) of repeated trials.

• Each trial has an outcome that may be classified as a success or as a failure.

• The probability of success remains (p) constant from trial to trial.

• The repeated trials are independent.

It therefore classifies as a set of n Bernoulli Trials and, therefore, the number ofsuccesses has a Binomial probability distribution with mean pn ⋅=µ and variance

)1(2 ppn −⋅⋅=σ . It is known that the distribution of a Binomial random variable can beapproximated by the normal distribution, having the same mean and variance, if thenumber of trials is large. In a normal distribution, the probability that a random variable Xwill assume a value that deviates from its average µ less than zα/2·σ is 1-α:

( ) α−=σ⋅+µ≤≤σ⋅−µ αα 1P 2/2/ zXz .

The variable zα/2 is the standard normal distribution z-value that is the limit of anarea under the (standard) normal curve of α/2 (see Figure 1 for clarification of thesedefinitions). It can be obtained from any table of areas under the normal distribution curve(for example [7]).

zzα/2-zα/2 0

1-α

α/2α/2

n(z;µ=0,σ=1)

Figure 1: P(-zα/2 < Z < zα/2) = 1-α.

The result of the experiment, x successes representing the measured size of the bin,is a sample of a normal random variable X with mean µ and variance σ2. From theprevious probability limit it is, therefore, possible to conclude that the bin size lies withinits true value µ with a tolerance of zα/2 standard deviations (σ), with a 100.(1-α) percentconfidence. If the accepted tolerance to which the bin size is to be determined is set to β.µand µ and σ are substituted for their actual values, we get the following expression for thenumber of trials needed n:

−⋅

β

≥⇔⋅⋅β≤−⋅⋅⋅⇔µ⋅β≤σ⋅ ααα 1

1)1(

22/

2/2/ pz

npnppnzz .

The probability p is defined as 1/Nbins. Therefore the number of hits required toobtain the bin size with a tolerance 100·β% and a confidence 100·(1-α)% in themeasurement is

( )12

2/ −⋅

β

≥ αbinsN

zn .

With the same set of hits, a similar estimation of the size of each bin can beobtained. Therefore the DNL characteristics of the line are obtained.

In principle, the INL characteristics of the line are directly obtained by cumulatingthe DNL histogram. It should be noticed that while performing this operation, theuncertainty of the results (described by the variance) must also be added. For an openended line, the worst variance is measured in the last bin to be:

σ⋅=σ binsc N .

The number of samples needed to obtain the INL characteristics with the sametolerance and confidence level must then be increased to

binsbinsbinsc NNz

nNn ⋅−⋅

β

=⋅= α )1(2

2/ .

Conversely, for the same number of samples, the tolerance of the INL estimation is

β⋅=β binsc N .

Appendix D: Number of Random Samples Required for TDC Characterisation.

Page 195

If an enclosed line, for example within the DLL closed loop, is considered, then theworst variance is measured in the middle bin to be:

σ⋅=σ2bins

cN

.

The number of samples needed to obtain the INL characteristics with the sametolerance and confidence level must then be increased to

4)1(

4

22/ bins

binsbins

cN

Nz

nN

n ⋅−⋅

β

=⋅= α .

Conversely, for the same number of samples, the tolerance of the INL estimation is

β⋅=β2bins

cN

.


Interpolator characterisation requires that the reference clock period be sampled atrandom times. However, sampling at random, by its strict definition, would be impossible.What must be done is to guarantee that the reference clock frequency is not sampledrepeatedly at the same phase (beating effect). By choosing a sample frequency to be non-harmonically related to the clock frequency, we are assured of this [8]. Therefore, when asufficient number of equidistant samples has been acquired, a uniform distribution of thesamples along the clock period is obtained.

The sample frequency must, of course, be stable in order to guarantee that it doesn’twander into a beating frequency during the characterisation procedure. Fortunately, veryaccurate and stable oscillators are common. They can be used directly or as a reference fora clock multiplying PLL, enabling the generation of basically any frequency ratios. It is,for example, possible to generate the sample frequency from the clock frequency, thusguaranteeing correct characterisation regardless of the actual clock present.

Any jitter present in the sampling frequency will only contribute to furtherrandomise the sampling time, which benefices the characterisation. In this context, therequirements for a PLL can be quite relaxed.

The relation between the sample period Tsample and the clock period Tclk may begenerally described by the following equation, where A and B are integers that have nocommon divider1:

clksample TBA

T ⋅= .

This relation merits a closer look to identify aids to the choice of the samplefrequency. If we expand A, by letting

SDBM

CA ⋅

±⋅

+=

1,

then the previous equation can be expanded to:

1 It is commonly found in the literature that the integers A and B should be prime numbers [8]. However thisis only a sufficient condition to generate a non-beating frequency, corresponding to a sub-set of the possibleinteger ratios that satisfy the absence of beating effect requirement.

clksample TSBD

MCT ⋅⋅

±+=

1

The constants on this equation are all related to identifiable characteristics of thesampling frequency:

B is the number of intervals into which the sampling divides clock period. It shouldbe large enough so that the sampling coverage is compatible with the expectedcharacterisation accuracy.

S reflects the possible existence of sub-sampling, where only every nth. sample outof the ones generated is collected. It is now clear that the sub-sampling rate cannot bechosen randomly, because the definition of the constant A restricts it. If S and B havecommon dividers, then the effective number of intervals B’ is reduced to B divided bythem.

C is the number of integer Tclk periods contained in Tsample (or one more if01 <± BDM ). This constant must also abide to the rules of the definition of A.

M gives a measure of the spread between consecutive samples (normalised to Tclk).The actual sample spread is ( ) clkTSBDM ⋅⋅±1 . The constant M should be the same as

the number of sub-divisions (bins) of the interpolator being characterised. In this way, amore uniform sample distribution is obtained along the time that the test is beingperformed. Since it is included in the definition of constant A, it must also obey to therequired restrictions.

D is a small perturbation that actually defines the constant A. There is no realrestriction to this constant, except for the rules defining A, but it should be made smallerthan MB , to keep these definitions coherent.

Typically, when determining the sampling frequency, B, S, C and M are defined bysystem requirements, and then D is determined so that the resulting BA don’t have anycommon dividers. The existence of common dividers between these two constants resultsin a decreased effective number of intervals B’.

The clock multiplying system required to perform these operations is graphicallydescribed in Figure 1. The critical operation is the clock multiplication (by B) on thereturn path of the PLL control loop. The delay introduced by this operation influences thestability of the closed control loop and, therefore, should be carefully analysed andminimised.

It is interesting to note that, in a noiseless system, after collecting B samplesgenerated this way, the interpolator is completely described, with a measurementtolerance of ±1/(2.B).100% of the clock period. In the presence of inevitable noise, it issafer to assume a random uniform sample distribution and collect the conservativenumber of samples determined in Appendix D.

Appendix E: TDC Characterisation Hit Frequency.

Page 199

SDMB

BC

1

⋅

±+⋅

LPF VCO

B

PD

PLL

Fclk Fsample

Figure 1: The clock multiplying PLL.

Appendix F. Analysis of the Limits to the TDC Resolution(Alternative Tap Definition).

This Appendix completes the Chapter 6. It contemplates the case where the tap 0 ofeach of the Timing DLLs is located in the end of the delay chain (Figure 1). Since thedelay chain spans exactly one clock period, this alternative definition doesn’t change theperformance of the converter. However the shape of the non-linearity histograms isaltered, so we present here the corresponding non-linearity histogram expression for asingle DLL (F=1) and for an ADLL.

Clock

PhaseDetectorTap 0Tap 1 Tap 2 Tap N-2 Tap N-1

d0 d1 d2 dN-2 dN-1

τ1

τ2

D2

D1

F(D1,D2)

DD D D D

τhitτhit τhit τhitHit

Figure 1: Detail of a delay locked loop depicting the important delays within the loop (notice the alternative

location of tap 0).

The alternative timing and phase shifting variables m, n and n’ as a function of thebin position i ( NFi ⋅<≤0 ) are defined as:

),1(Mod Fim += ,

+

−= NF

imn ,

1FloorMod ,

−

+

=′ NmF

in ,

1FloorMod .

The following expressions reflect, respectively, the standard deviation of theintegral non-linearity error due to cell delay mismatch and loop jitter:

( ) ( )nNNn

mMMm

FF

Fi cellarray −⋅+−⋅⋅

+

⋅σ⋅=σ21

)( .

22

)(

+

⋅⋅σ=σ

Nn

Mm

Fi jarray .

The integral non-linearity due to combined effect of all static errors is given by thefollowing expressions, respectively for the case where the hit sampling signal isdistributed via a linear network or via the T-shaped network.

nFDNn

FF

Mm

FD

Nn

Mm

FDNn

FF

Mm

FDiINL

hitout

PDinarray

⋅⋅−

′

++

⋅⋅⋅−

+

+−⋅⋅−

+

+⋅⋅⋅=

1

1)(

.

−−⋅⋅−

′

++

⋅⋅⋅−

+

+−⋅⋅−

+

+⋅⋅⋅=

2

2

1

1)(

Nn

NFD

Nn

FF

Mm

FD

Nn

Mm

FDNn

FF

Mm

FDiINL

hitout

PDinarray

,

where, as before, the following variable transformations are used:

ininD δ= , diffPD KC

D τ+= , outoutD δ= and hithitD τ−= .

Appendix G. DNL-aware Algorithms for the RC Delay LineCalibration.

The calibration algorithms so far exposed used integral non-linearity as the onlycriteria for judging the correctness of the calibration results. If differential non-linearity isalso to be used, more complex calibration algorithms are needed. Since these algorithmstry to optimise two variables simultaneously, their convergence may be hazardous whenthe two goals require contradictory directions. The logic controlling the execution of thealgorithm must be able to decide which goal is more important and pursue the calibrationtaking in account that decision.

In the following lines the algorithms previously described are modified so that theycan also set limits to differential non-linearity.

Tap selection adjustment scheme.

Iterative algorithm.

The analysis of the linearity of a bin is based on the bin histogram h[bin]. Acumulative histogram ch[bin] is built from it and both are compared to the idealhistograms (developed from the knowledge of the ideal converter’s bin size LSB). Thefollowing operations check if the line conforms to the differential and integral linearitylimits and takes corrective measures for the offending bins.

for i= 0 to M-1

tap[i]= segment_from_simulation_of_typical_conditions;

for bin= 0 to M-2

repeat until no_changes


if ( ch[bin]< LSB.( bin+1-limINL) & h[bin]< LSB·( 1+limDNL) |

| h[bin]< LSB·( 1-limDNL) )

for i= 0 to M-bin-2

tap[bin+i+1]= tap[bin+i+1]+1;

else

if ( ch[bin]> LSB.( bin+1+limINL) & h[bin]> LSB·( 1-limDNL) |

| h[bin]> LSB·( 1+limDNL) )

for i= 0 to M-bin-2

tap[bin+i+1]= tap[bin+i+1]-1;

else

no_changes

In Figure 1 the algorithm is clarified. The acceptable limits of the integral anddifferential non-linearity are, respectively, limINL and limDNL. These linearity limits must bechosen in accordance to the size of the calibration steps. The access point selection foreach tap is captured in tap[i].

CDT

(bin+1-limINL).LSB

(1+limDNL).LSB

(1-limDNL).LSB

(bin+1+limINL).LSB

for i=0..M-bin-2

tap[bin+i+1]=tap[bin+i+1]+1

for i=0..M-bin-2

tap[bin+i+1]=tap[bin+i+1]-1

histogram[bin] cumulativehistogram[bin]

<Y

N

<Y

N

<Y

N

<Y

N

for bin=0..M-2

changes=1

repeat until changes=0

tap[all]=typical conditions

Figure 1: Calibration procedure for the tap selection adjustment scheme.

The accepted limits to integral and differential non-linearity do not have to be thesame. Setting different limINL and limDNL, it is a simple way to force the algorithm to givepriority to one of the goals pursued.

Appendix G: DNL-aware Algorithms for the RC Delay Line Calibration.

Page 205

Single step algorithm.

This algorithm finds the tap access points that result in the nearest approximation tothe ideal cumulative bin size curve. It also checks that the specified limit to the differentialnon-linearity, limDNL, is not surpassed.

tap[0]=0 ;

for i=1 to M-1

for segment=0 to 31

if (ch[segment]< LSB·i & ch[segment+1]> LSB·i)

if (LSB·i-ch[segment]< ch[segment+1]-LSB·i &

& 1-limDNL< ch[segment]-ch[tap[i-1]]< 1+limDNL)

tap[i]=segment ;

else

tap[i]=segment+1 ;

Lumped capacitor adjustment scheme.

Coarse tuning procedure.

In this procedure the capacity of all the banks is simultaneously incremented by oneunit capacitor, resulting in a uniform increase of the delay of all taps. The procedure isrepeated until the cumulative bin size is smaller than the ideal delay by less than adetermined limit limcoarse. In the following lines the procedure is schematically described:

for bank= 1 to M

cap[bank]= 0;

repeat until ( ch[M-2]= LSB·( M-1-limcoarse ) )


for bank= 1 to M


The calibration parameters for each capacitor bank are described by cap[bank] andch[M-1] is the cumulative bin size histogram. A block diagram of the procedure is shownin Figure 2, where the Characterisation step is represented by the Code Density Test itperforms.

CDT

(M-1-limcoarse).LSB


cumulativehistogram[M-2]

<Y

N changes= 1

repeat until changes=0initial calibration

for bank=1..M

Figure 2: The coarse calibration procedure.

Fine tuning procedure.

The fine tuning procedure builds on the results obtained with the coarse procedure.Each bin is sequentially evaluated to determine if it adheres to the linearity limits. If that isnot the case, the capacity of the respective capacitor bank is increased by one unit. Thisunit increase is repeated until a satisfactory result is obtained.

The fine calibration algorithm is schematically presented in the next few lines. Thebin size histogram is h[bin] and limDNL and limINL are the differential and integral linearitylimits.

for bin= 0 to M-2

bank= bin+1;

repeat until ( no_changes | bank> M )


if( ch[bin] < LSB·( bin+1-limINL ) & h[bin]< LSB·( 1+limDNL ) |

| h[bin]< LSB·( 1-limDNL ) )


bank= bank+1;

else

no_changes

The algorithm approaches the final calibration solution by small increases in the binsize, therefore only the inferior limits to the linearity need to be checked. In this version ofthe algorithm, a second loop (shown bellow) can be used to perform a final adjustment tothe calibration settings. This loop may be required in case the pursuit of one linearityparameter goal forces the RC delay line to surpass the superior limit of the other linearity

Appendix G: DNL-aware Algorithms for the RC Delay Line Calibration.

Page 207

parameter. Since the bin size increase/decrease per fine characterisation step is very small,this situation only occurs if the linearity limits are too narrow.

for bin= M-2 to 0

bank= bin+1;

repeat until ( no_changes | bank< 1 )


if( ch[bin] > LSB·( bin+1-limINL ) & h[bin]> LSB·( 1-limDNL ) |

| h[bin]> LSB·( 1+limDNL ) )

cap[bank]= cap[bank]-1;

bank= bank-1;

else

no_changes

In Figure 3 and Figure 4, the diagrams of the two fine calibration algorithm loopsare shown.

CDT

(bin+1-limINL).LSB

(1+limDNL).LSB

(1-limDNL).LSB



<Y

N

<Y

N

<Y

N

for bin=0..M-2

changes= 1

repeat until changes=0 | bank>M

from coarse calibration

bank= bank+1

bank= bin+1

Figure 3: The fine calibration procedure (first loop).

CDT

(bin+1+limINL).LSB

(1-limDNL).LSB

(1+limDNL).LSB

cap[bank]= cap[bank]-1


>Y

N

>Y

N

>Y

N

for bin=M-2..0

changes= 1

repeat until changes=0 | bank<1

from fine calibration (1st. Loop)

bank= bank-1

bank= bin+1

Figure 4: The fine calibration procedure (second loop).

References for the Appendixes.


[2] Ginetti, B. et al., Reliability of code density test for high-resolution ADCs,Electronics Letters, Vol. 27, No. 24, pp. 2231-2233, Nov. 91.

[3] Bossche, M. V., et al., Dynamic testing and diagnostics of A/D converters, IEEETransactions on Circuits and Systems, Vol. 33, No. 8, pp. 775-785, Aug. 86.

[4] Brandolini, A. et al., Testing Methodologies for analogue-to-digital converters,IEEE Transactions on Instrumentation and Measurement, Vol. 41, No. 5, pp. 595-603, Oct. 92.

[5] Maneatis, J. G., Low-jitter process-independent DLL and PLL based on self-biasedtechniques, IEEE Journal of Solid-State Circuits, Vol. 31, No. 11, pp. 1723-1732,Nov. 96.

[6] Kuboki, S. et al., Nonlinearity analysis of resistor string A/D converters, IEEETransactions on Circuits and Systems, Vol. 29, No. 6, pp. 383-390, Jun. 82.

[7] Walpole, R. E. et al., Probability and statistics for engineers and scientists - fifthedition, MacMillan Publishing Company, 93.


design and characterization of cmos high-resolution time...

Documents