# TIME-MODE ANALOG CIRCUIT DESIGN FOR NANOMETRIC TECHNOLOGIES

A Dissertation

by

# MOHAMED MOSTAFA ELSAYED

Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements for the degree of

# DOCTOR OF PHILOSOPHY

December 2011

Major Subject: Electrical Engineering

Time-Mode Analog Circuit Design for Nanometric Technologies

Copyright 2011 Mohamed Mostafa Elsayed

# TIME-MODE ANALOG CIRCUIT DESIGN FOR NANOMETRIC TECHNOLOGIES

A Dissertation

by

## MOHAMED MOSTAFA ELSAYED

# Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements for the degree of

# DOCTOR OF PHILOSOPHY

Approved by:

Chair of Committee, Committee Members, Edgar Sánchez-Sinencio Aydin Karsilayan Alexander Parlos Jun Zou Costas N. Georghiades

Head of Department,

December 2011

Major Subject: Electrical Engineering

#### ABSTRACT

Time-Mode Analog Circuit Design for Nanometric Technologies.

(December 2011)

Mohamed Mostafa Elsayed, B.Sc., Cairo University, Egypt;

M.Sc., Cairo University, Egypt

Chair of Advisory Committee: Dr. Edgar Sánchez-Sinencio

Rapid scaling in technology has introduced new challenges in the realm of traditional analog design. Scaling of supply voltage directly impacts the available voltage-dynamic-range. On the other hand, nanometric technologies with  $f_T$  in the hundreds of GHz range open opportunities for time-resolution-based signal processing. With reduced available voltage-dynamic-range and improved timing resolution, it is more convenient to devise analog circuits whose performance depends on edge-timing precision rather than voltage levels. Thus, instead of representing the data/information in the voltage-mode, as a difference between two node voltages, it should be represented in time-mode as a time-difference between two rising and/or falling edges. This dissertation addresses the feasibility of employing time-mode analog circuit design in different applications. Specifically:

1) Time-mode-based quantizer and feedback DAC of  $\Sigma\Delta$  ADC.

2) Time-mode-based low-THD 10MHz oscillator,

 A Spur-Frequency Boosting PLL with -74dBc Reference-Spur Rejection in 90nm Digital CMOS.

In the first project, a new architectural solution is proposed to replace the DAC and the quantizer by a Time-to-Digital converter. The architecture has been fabricated in 65nm and shows that this technology node is capable of achieving a time-matching of 800fs which has never been reported. In addition, a competitive figure-of-merit is achieved.

In the low-THD oscillator, I proposed a new architectural solution for synthesizing a highly-linear sinusoidal signal using a novel harmonic rejection approach. The chip is fabricated in 130nm technology and shows an outstanding performance compared to the state of the art. The designed consumes 80% less power; consumes less area; provides much higher amplitude while being composed of purely digital circuits and passive elements.

Last but not least, the spur-frequency boosting PLL employs a novel technique that eliminates the reference spurs. Instead of adding additional filtering at the reference frequency, the spur frequency is boosted to higher frequency which is, naturally, has higher filtering effects. The prototype is fabricated in 90nm digital CMOS and proved to provide the lowest normalized reference spurs ever reported.

# DEDICATION

To my beloved parents, to my dear wife Shaimaa, to my sons Adam and Musa, and to all my family members for their love and support

#### ACKNOWLEDGEMENTS

First and foremost, I would like to thank Allah for all His bounties that He provided me through my life and my PhD. Without His guidance, I would have been lost in this life.

"My Lord! Grant me the power and ability that I may be grateful for Your Favors which You have bestowed on me and on my parents, and that I may do righteous good deeds that will please You, and admit me by Your Mercy among Your righteous slaves."

#### [Al-Naml: 27]

I would like to present my deep appreciation to my advisor Dr. Edgar Sánchez-Sinencio. In addition to being an academic advisor, Dr. Sánchez-Sinencio is a personal advisor for all his students. The environment that he provided me during my studies was really one of the great factors in the success of this dissertation. I would like also to thank Dr. José Silva-Martínez for providing precious advice and sharing his time and experience with me in the time-domain ADC project.

Special thanks also go to my ex-roommate, Mohammed Mohsen Abdul-Latif, not only for sharing an apartment together, but also for sharing a 13-year long journey of undergraduate and graduate studies. I thank him for all his help and support. Many thanks go also to all my colleagues especially Faisal Hussien and Mohamed Mobarak who made my transition from Egypt to College Station smooth. Special thanks are due to Mohammed El-Nozahi, Ahmed Amer, Ehab Abdulghany, Ahmed Helmy, Ramy Saad, Ayman Ameen and Ahmed Ragab for their numerous technical discussions and the good environment that they all helped to maintain in our group. Thanks also to my research colleagues Vijaykumar Dhanasekaran, Manisha Gambhir and Erik Pankratz.

I would like to present my sincere gratefulness to my wife, Shaimaa, for her continuous love, support, encouragement and patience and for her care of our little babies Adam and Musa. She provided me with the emotional support that I was really missing after leaving my family in Egypt.

Finally, words cannot express my endless thanks and gratitude for my parents, Prof. Mostafa Elsayed and Prof. Bothina Abdulfattah for all their love, help and support. May Allah reward them for all the good that they did for me.

# TABLE OF CONTENTS

| ABSTR  | RACT                                                                               | iii                                                         |
|--------|------------------------------------------------------------------------------------|-------------------------------------------------------------|
| DEDIC  | ATION                                                                              | v                                                           |
| ACKN   | OWLEDGEMENTS                                                                       | vi                                                          |
| TABLE  | E OF CONTENTS                                                                      | viii                                                        |
| LIST O | PF FIGURES                                                                         | xi                                                          |
| LIST O | PF TABLES                                                                          | xvi                                                         |
| СНАРТ  | ΓER                                                                                |                                                             |
| Ι      | INTRODUCTION                                                                       | 1                                                           |
|        | 1.1 Motivation1.2 Organization                                                     | 1<br>3                                                      |
|        | A 67DB DYNAMIC RANGE TIME-TO-DIGITAL CONVERTER<br>FOR TIME-MODE-BASED ΣΔ MODULATOR | 6                                                           |
|        | <ul> <li>2.1 Introduction to Time-to-Digital Converters</li></ul>                  | 6<br>7<br>8<br>10<br>12<br>12<br>14<br>15<br>16<br>22<br>22 |
|        | <ul> <li>2.2.2 Dive of FDC's Time Steps</li></ul>                                  | 22<br>28<br>34<br>34<br>38<br>39                            |

|     | 2.4.4 SR Latch.22.4.5 Reset Unit.22.4.6 Calibration Circuit.22.5 Jitter and Data Dependent Delay22.6 Layout Considerations and Experimental Results.22.7 Summary.2                                                                                                                                                                              |
|-----|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| III | A LOW THD, LOW POWER, HIGH OUTPUT-SWING TIME-<br>MODE-BASED OSCILLATOR VIA DIGITAL HARMONIC-<br>CANCELLATION TECHNIQUE                                                                                                                                                                                                                          |
|     | 3.1 Introduction.53.2 Background of Low THD Oscillators.53.3 Harmonic Cancellation Technique.53.3.1 Harmonic Cancellation Theory.63.3.2 System Level Design.63.4 Circuit Implementation.63.5 Performance Limitations.73.5.1 Amplitude and Phase Error Analysis.73.5.2 Even Harmonic Distortion Analyses.73.6 Measurement Results.83.7 Summary.8 |
| IV  | A SPUR-FREQUENCY BOOSTING PLL FOR LOW SPUR<br>FREQUENCY SYNTHESIZER                                                                                                                                                                                                                                                                             |
|     | 4.1 Introduction84.2 PLL Dynamics94.3 PLL Reference Spurs14.4 Low Spurs PLLs14.4.1 Gear-Shifting PLL14.4.2 Dead-Zone Controlled PLL14.4.3 Variable-K <sub>VCO</sub> PLL14.4.4 Multi-Path PFD-CP PLL14.4.5 Spur Suppression PLL Based on Sample-Reset Filter14.5 Spur-Frequency Boosting PLL1                                                    |
|     | 4.5.1 System Level Design14.5.2 Transistor Level Implementation14.5.2.1 PFD14.5.2.2 Divider14.5.2.3 Voltage-Controlled Oscillator14.5.2.4 Charge Pump1                                                                                                                                                                                          |

Page

|       | 4.5.2.5 Time-to-Voltage Converter | 132 |
|-------|-----------------------------------|-----|
|       | 4.5.2.6 Voltage-to-Time Converter | 140 |
|       | 4.5.3 PLL Implementation          | 147 |
|       | 4.5.4 Measurement Results         | 151 |
|       | 4.6 Summary                       | 156 |
| V     | CONCLUSIONS AND FUTURE WORK       | 157 |
| REFEI | RENCES                            | 159 |
| VITA. |                                   | 168 |

# Page

# LIST OF FIGURES

| FIGURE |                                                                                                                                                                                      | Page |
|--------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|
| 1.1    | Signal representation in the different modes.                                                                                                                                        | 3    |
| 2.1    | Inverter chain based TDC.                                                                                                                                                            | 8    |
| 2.2    | Vernier line-based TDC                                                                                                                                                               | 10   |
| 2.3    | Local passive interpolation TDC                                                                                                                                                      | 11   |
| 2.4    | Multistage TDC and its timing diagram.                                                                                                                                               | 13   |
| 2.5    | Reference recycling TDC [17].                                                                                                                                                        | 14   |
| 2.6    | (a) Open-loop time-mode-based ADC. (b) Timing diagram of the PWM                                                                                                                     | 18   |
| 2.7    | Block diagrams of a) Voltage-mode $\Sigma\Delta$ modulator. b) Time-mode $\Sigma\Delta$ modulator.                                                                                   | 21   |
| 2.8    | TDC-based $\Sigma \Delta$ ADC : a) Block diagram. b) Timing diagram and $p_q(t)$ generation.                                                                                         | 23   |
| 2.9    | 9 M-Level feedback DAC: a) DAC output signal. b) Thermometric DAC architecture.                                                                                                      | 26   |
| 2.10   | Block diagram of the output code generation of the TDC                                                                                                                               | 31   |
| 2.11   | Uniform-delay wired-Nor architecture for the feedback pulse, $p_q(t)$ , generation.                                                                                                  | 32   |
| 2.12   | Block diagram of the different architectures for implementing 25 inputs <i>OR</i> gate of Fig. 6. a) 1-Level <i>OR</i> gate. b) 2-Levels <i>OR</i> gate. c) 3-Levels <i>OR</i> gate. | 33   |
| 2.13   | Relative jitter of the different OR gate architectures versus transistor's width                                                                                                     | 35   |
| 2.14   | Block diagram of the TDC with the feedback pulse generator                                                                                                                           | 36   |

| FIGURE |                                                                                                                                                                              | Page |
|--------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|
| 2.15   | Dynamic-logic-based delay cell (D): Transistor level implementation.                                                                                                         | 38   |
| 2.16   | Dynamic-logic-based flip flop ( <i>FF</i> ).<br>a) Transistor level implementation. b) Timing diagram when the input is captured. c) Timing diagram when the input is missed | 40   |
| 2.17   | Dynamic-logic-based OR gate                                                                                                                                                  | 41   |
| 2.18   | SR latch for the feedback pulse generation                                                                                                                                   | 42   |
| 2.19   | Block diagram of the reset architecture of the TDC                                                                                                                           | 43   |
| 2.20   | Block diagram of the phase detector                                                                                                                                          | 44   |
| 2.21   | Effect of flip flop transition on the delay of the delay elements                                                                                                            | 47   |
| 2.22   | Effect of flip flop transition on OR gate response                                                                                                                           | 47   |
| 2.23   | Layout of the TDC                                                                                                                                                            | 49   |
| 2.24   | Chip micrograph                                                                                                                                                              | 49   |
| 2.25   | Test setup for the measurement of the ADC                                                                                                                                    | 50   |
| 2.26   | Output spectrum for a -6dB input signal                                                                                                                                      | 50   |
| 2.27   | SNR and SNDR of the ADC versus the input amplitude                                                                                                                           | 52   |
| 2.28   | HDk and the THD of the ADC versus the input amplitude                                                                                                                        | 53   |
| 3.1    | THD of the oscillator versus the quality factor of a second order filter.                                                                                                    | 57   |
| 3.2    | Block diagram of (a) Conventional BPF-based oscillator.<br>(b) Multilevel-comparator-based oscillator                                                                        | 57   |
| 3.3    | Block diagram of the system                                                                                                                                                  | 59   |
| 3.4    | Block diagram of the proposed harmonic-cancellation-based oscillator.                                                                                                        | 61   |

| FIGURE |                                                                                                                             | Page |
|--------|-----------------------------------------------------------------------------------------------------------------------------|------|
| 3.5    | Harmonics attenuation contributed by the different attenuation components of the oscillator system                          | 64   |
| 3.6    | Block diagram of the harmonic cancellation block.                                                                           | 64   |
| 3.7    | Flow chart of the design procedure and the search algorithm for finding the appropriate time shifts                         | 66   |
| 3.8    | Detailed block diagram of the oscillator                                                                                    | 67   |
| 3.9    | Clock routing and resistor-summer layout diagram                                                                            | 70   |
| 3.10   | Different types of the added signal errors                                                                                  | 72   |
| 3.11   | Effect of timing and amplitude mismatches on HD <sub>3</sub> suppression. a) 3D plot. b) HD <sub>3</sub> Contour plot       | 74   |
| 3.12   | PMOS/NMOS on-resistance switches mismatch.                                                                                  | 77   |
| 3.13   | Axis change from $\phi$ to $\gamma$                                                                                         | 78   |
| 3.14   | On-resistance effect of the NMOS/PMOS transistors switches on HD <sub>2</sub> . a) 3D plot. b) HD <sub>2</sub> Contour plot | 80   |
| 3.15   | Chip micrograph and area budgeting                                                                                          | 82   |
| 3.16   | Output spectrum of the pseudo-differential version of the oscillator at 10MHz.                                              | 83   |
| 3.17   | Output spectrum of the single-ended version of the oscillator at 10MHz.                                                     | 84   |
| 3.17   | HD <sub>2</sub> , HD <sub>3</sub> and THD of the differential output of the oscillator versus the output frequency          | 85   |
| 4.1    | Block diagram of a conventional PLL                                                                                         | 88   |
| 4.2    | Block diagram of the charge-pump-based PLL (CP-PLL)                                                                         | 90   |
| 4.3    | Block diagram of the CP-PLL model                                                                                           | 92   |
| 4.4    | Block diagram of the CP-PLL with the modified loop filter                                                                   | 93   |

| FIGURE |                                                                 | Page |
|--------|-----------------------------------------------------------------|------|
| 4.5    | Block diagram of the CP-PLL with the smoothing capacitor $C_2$  | 99   |
| 4.6    | Bode plot of a PLL                                              | 103  |
| 4.7    | Transistor level implementation of the charge pump              | 105  |
| 4.8    | Timing diagram of the current mismatches in CP-PLL              | 106  |
| 4.9    | Filtering characteristics of the loop filter                    | 108  |
| 4.10   | Block diagram of the dead-zone controlled PLL                   | 113  |
| 4.11   | Block diagram of the variable-KVCO PLL                          | 113  |
| 4.12   | Block diagram of the multi-path PFD-CP PLL [58]                 | 115  |
| 4.13   | Sample-based loop filter                                        | 116  |
| 4.14   | Transmission gate with dummy switches                           | 118  |
| 4.15   | Block diagram of the proposed SFB-PLL                           | 119  |
| 4.16   | SFB-PLL model                                                   | 121  |
| 4.17   | Transistor level implementation of the phase-frequency detector | 124  |
| 4.18   | PFD minimum pulse width versus the control voltage              | 124  |
| 4.19   | Typical integer-N divider architecture in PLLs                  | 125  |
| 4.20   | Block diagram of a prescaler with $M=2$                         | 126  |
| 4.21   | Block diagram of a 16/17 prescaler                              | 127  |
| 4.22   | Transistor level implementation of the TSPC D-FF                | 128  |
| 4.23   | Transistor level implementation of the VCO                      | 129  |
| 4.24   | VCO gain versus the control voltage                             | 130  |
| 4.25   | VCO gain versus the supply voltage                              | 131  |
| 4.26   | Transistor level implementation of the CP                       | 132  |

| FIGURE |                                                                                         | Page |
|--------|-----------------------------------------------------------------------------------------|------|
| 4.27   | Transistor level implementation of the TVCs                                             | 134  |
| 4.28   | Block diagram of a negative edge detector                                               | 134  |
| 4.29   | Degradation of the phase margin versus the TVC capacitors ratio                         | 136  |
| 4.30   | Timing diagram for the effect of the current source noise on the performance of the TVC | 138  |
| 4.31   | Transistor level implementation of the delay line based VTC                             | 141  |
| 4.32   | Pseudo-differential delay-line-based VTC and its timing diagram                         | 143  |
| 4.33   | Gain of the inverter-based VTC versus the control voltage $V_{cnt}$                     | 144  |
| 4.34   | Block diagram of the TVC-based VTC                                                      | 144  |
| 4.35   | Complete block diagram of the VTC                                                       | 146  |
| 4.36   | Block diagram of the comparator with hysteresis                                         | 146  |
| 4.37   | SFB-PLL model                                                                           | 148  |
| 4.38   | SFB-PLL chip micrograph                                                                 | 152  |
| 4.39   | SFB-PLL's output spectrum showing the reference spur at 6MHz offset.                    | 153  |
| 4.40   | Phase-noise plot of the SFB-PLL                                                         | 153  |

Transient response of the SFB-PLL showing a settling time of

11.8µsec.....

4.41

XV

154

# LIST OF TABLES

| TABLE |                                                                           | Page |
|-------|---------------------------------------------------------------------------|------|
| 2.1   | Comparison of the different TDC techniques                                | 16   |
| 2.2   | TDC measurement results compared with the state of the art TDCs           | 53   |
| 3.1   | Comparison of performance of the proposed approach with state-of-the-art. | 87   |
| 4.1   | Comparison between the different spur suppression techniques              | 118  |
| 4.2   | Truth table of a 2/3 prescaler                                            | 126  |
| 4.3   | SFB-PLL performance summary compared to the state-of-the-art PLLs.        | 155  |

#### CHAPTER I

#### INTRODUCTION

#### **1.1 Motivation**

Analog circuit design passed by many phases through the last century. Motivated by the invention of CMOS integrated circuits, many design techniques appeared to overcome the new challenges of the CMOS technology in which the dimensions scales down allowing the integration of larger systems. Gordan Moor predicted in 1975 that MOS device dimensions would continue to scale down by a factor of two every three years and the number of transistors per chip would double every one to two years [1]. As technology scales down by factor  $\alpha$ , the supply voltage, theoretically, scales down with the same factor. The aggressive reduction in the supply voltage and the moderate reduction in the device threshold voltage of CMOS technology have greatly affected the performance of CMOS voltage-mode circuits, typically reflected by a reduced dynamic range, increased propagation delay and reduced low noise margins [2]. Such scaling motivated the designers in sixties to introduce the current-mode circuit as a promising substitute for its voltage-mode counterpart [3].

This dissertation follows the style of IEEE Journal of Solid-State Circuits.

Signal representation in current-mode circuits depends on the value of the current flowing through certain branch. Consequently, it is upper limit is not constrained by the supply voltage as in the voltage- mode case in which the voltage signal must be less than the supply voltage except in some rare applications. In addition, high frequency operation favors current mode circuit topologies that are characterized by having lowimpedance-nodes compared to its voltage-mode counterpart. Thus, high frequency applications usually employ current-mode analog circuits and current-mode-logic (CML) to extend the operating bandwidth. On the other hand, low frequency applications continued to employ voltage-mode circuits since it usually consumes less power. However, rapid scaling in technology has introduced new challenges in the realm of traditional analog design. Scaling of supply voltage directly impacts the available voltage-dynamic-range. On the other hand, nanometric technologies with cutoff frequency  $f_T$  in hundreds of GHz range opens opportunities for time-resolution-based signal processing, which was not a viable option in previous technology nodes [4]. With reduced available voltage-dynamic-range and improved timing resolution, it is more convenient to device analog circuits whose performance depends on edge-timing precision rather than voltage levels. Thus, instead of representing the data/information in the voltage-mode, as a difference between two node voltages, or in current-mode, as a current flowing through certain branch, it should be represented in time-mode as a timedifference between two rising and/or falling edges. Fig. 1.1 shows the representation of the signals in voltage-mode, current-mode and time-mode. A major advantage of processing signals encoded in the time-mode is the digital-friendly nature of the system,

which scales down with the technology. In addition, migrating to smaller technologies is expected to improve the performance of the same time-mode-based design as timing resolution is improved.

The main circuit blocks in time-mode systems are the time-to-voltage and voltage-to-time converters which interface the time-mode circuits to the voltage-mode one. The first one can be considered as a charge pump that produces an output voltage that is proportional to the input pulse width while the later can be considered as a pulse-width-modulator (PWM) which generates an output pulse whose width is proportional to the input voltage.



Fig. 1.1 Signal representation in the different modes.

#### **1.2 Organization**

The dissertation includes novel design and implementation of three different blocks employing the time-mode concept discussed above. A time-to-digital converter (TDC) is proposed in chapter II to replace the multi-bit quantizer and the multi-bit feedback DAC of traditional voltage-mode  $\Sigma\Delta$  modulator. The proposed time-mode TDC makes the multi-bit  $\Sigma \Delta \square$  ADC digital friendly and more suitable for nanometric technologies. A pulse-width-modulator (PWM) converts the sampled-and-held voltage-sample to a time-pulse that the TDC generates a digital code corresponding its width. Simultaneously, the TDC provides a time-quantized feedback pulse for the  $\Sigma \Delta$  modulator, emulating the voltage-DAC in a conventional  $\Sigma \Delta$  ADC. Measurements show that the  $\Sigma \Delta$ -modulator achieves a dynamic range of 68dB and the TDC consumes 5.66mW at 250MHz event rate while occupying 0.006mm<sup>2</sup>.

Chapter III proposed an architectural solution for designing and implementing low THD oscillators. A digital harmonic-cancellation-block is used to suppress the low frequency harmonics while a passive, inherently linear, filter is used to suppress the high frequency ones. The proposed technique eliminates the need for typical analog, high-Q BPF to suppress the harmonics. Thus, eradicates the effect of increasing device nonlinearities in the nanometric technologies by having pure digital solution. In addition, eliminating the need for high-Q band-pass-filter (BPF) releases the output swing from the constraints imposed by the linearity of the filter. Measurement results show -72dB THD at 10MHz along with a differential output swing of 228mV<sub>pp</sub>. As the performance depends solely on the timing precision of digital signals, the proposed oscillator is considered the best time-mode-based oscillator in literature.

Chapter IV proposes an architectural solution for designing and implementing a low-reference-spurs PLL. A spur frequency-booster block is inserted between the PFD and the CP to boost the charge-pump input frequency. Hence, the reference-spurs theoretically vanish. The proposed technique adds additional degrees of freedom in the design of PLLs such that the spurs level can be reduced without sacrificing neither the loop bandwidth nor the voltage-controlled oscillator's gain. The prototype achieves -74dBc reference-spur suppression along with ( $K_{VCO}/f_{ref}$ ) ratio of 17 at a ( $f_{BW}/f_{ref}$ ) ratio of 1/20.

Finally, Chapter V concludes the work and explores the areas for future work.

#### CHAPTER II

# A 67DB DYNAMIC RANGE TIME-TO-DIGITAL CONVERTER FOR TIME-MODE-BASED $\Sigma\Delta$ MODULATOR\*

## 2.1 Introduction to Time-to-Digital Converters

The basic block that manipulates the data in time mode is the TDC. The TDC can be defined as a block that provides a digital code that corresponds to the width of the pulse (which is the data in time-mode case). Measuring the width of a digital pulse became an important technique for many applications. Although the very first work on CMOS TDC was targeting high energy physics, it became widely used in other electronic applications like PLL and time of flight-based systems [5-7]. The main TDC performance metrics are:

<sup>\*</sup>Reprinted, with permission, from M. M. Elsayed, V. Dhanasekaran, M. Gambhir, J. Silva-Martinez and E. Sánchez-Sinencio, "A 0.8ps DNL Time-to-Digital Converter with 250MHz Event Rate in 65nm CMOS for Time-Mode-Based ΣΔ Modulator" *IEEE J. Solid-State Circuits.*, vol. 46, no. 9, pp.2048-2098, Sept 2011. © 2011 IEEE and from V. Dhanasekaran, M. Gambhir, M. M. Elsayed, E. Sánchez-Sinencio, J. Silva-Martinez, C. Mishra, L. Chen, E. Pankratz "A 20MHz BW 68dB DR CT ΔΣ ADC Based on a Multi-Bit Time-Domain Quantizer and Feedback Element," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, San Francisco, CA, Feb. 2009, pp. 174-175. © 2009 IEEE and from V. Dhanasekaran, M. Gambhir, M. M. Elsayed, E. Sánchez-Sinencio, J. Silva-Martinez, C. Mishra, L. Chen, E. Pankratz "A Continuous Time Multi-Bit ΔΣ ADC Using Time-Domain Quantizer and Feedback Element," *IEEE J. Solid-State Circuits.*, vol. 46, no. 3, pp.639-650, Mar 2011. © 2011 IEEE.

- 1- Differential-nonlinearity (DNL): defines as the difference between the actual time-step size and the ideal one.
- 2- Resolution: is the minimum time-step the TDC can resolve.
- 3- Latency: (or the conversion time): defined as the time the TDC requires after the signal is tracked till the digital output is available.
- 4- Dead time: is the minimum time required between two acquisitions.
- 5- Silicon area.
- 6- Power consumption.

Since different topologies were developed to improve the performance of TDCs, a short description of each topology will be presented followed by a comparison table for the TDC tradeoffs.

#### 2.1.1 Inverter-Chain-Based TDC[8-11]

The block diagram of the inverter-chain-based TDC is shown in Fig. 2.1. The basic idea is to propagate the start signal, the rising edge of the pulse to be measured, through a delay line whose elements consists of CMOS inverters connected to the input of a flip flop. While the signal propagates through the line the output of each inverter toggles and changes its state. When the stop signal triggers the flip flops, the state of the line is captured and the position of the propagating start signal is identified by applying *XOR* function on every two successive flip flop outputs. The output of the *XOR* is zero only at the position of the start signal.



Fig. 2.1 Inverter chain based TDC.

By knowing the position of the propagating start signal and the delay of the single cell, the width of the pulse can be calculated. The main advantage of the inverterchain-based TDC is the simplicity of the design and the conversion speed. On the other hand, the resolution is limited by one inverter delay which puts fundamental limit on the resolution of the circuit.

#### 2.1.2 TDC Based on Vernier Delay Line [12-14]

The second most commonly used TDC is the Vernier TDC. Vernier delay line overcomes the basic limitation of the inverter-chain-based TDC, the resolution, by employing two delay lines of different delays per inverter cell,  $t_1 > t_2$ , as shown in Fig. 2.2. The start signal propagates through the first line and changes the inputs of the flip flops while the stop signal propagates through the second line and captures the inputs of the flip flops. As the stop signal propagates faster than the start signal, at certain time and certain flip flop it will precede the start signal and trigger that flip flop to capture an

unchanged input (because the start signal is lagging the stop signal). Consequently, at the time that both edges are at the same stage the two successive flip flop outputs will be the same indicating that stop signal precedes the start signal at this stage. In this case the pulse width will be equal to:

$$T_{pulse} = N^*(t_1 - t_2) \tag{2.1}$$

where N is the number of flip flops that captured a changed input.

It can be seen from this equation that the resolution of the Vernier-based TDC is the difference between two inverter delays,  $(t_1-t_2)$ , instead of one inverter delay. In other words, the resolution enhancement factor is given by:

$$\alpha = (t_2/(t_1 - t_2))$$
 (2.2)

On the other hand, the number of stages required to accommodate the same maximum pulse width is increased by  $\alpha$  compared to inverter-chain-based TDC. The number of stages required to accommodate a pulse width of *T* is given by  $N=T/(t_1-t_2)$ . In addition, having two active delay lines at the same time and increasing the delay line length by the resolution enhancement factor,  $\alpha$ , categorizes this technique as a power hungry one. One last disadvantage of this technique is the conversion latency. The latency is given by:

$$L=N.t_2 \tag{2.3}$$



Fig. 2.2 Vernier line-based TDC.

#### 2.1.3 Local Passive Interpolation TDC

This technique increases the resolution of the inverter-chain TDC while avoiding the latency of Vernier-based TDC. A conventional TDC is incorporated with local passive interpolators, LPI, between the successive delay elements to improve the resolution as shown in Fig. 2.3 [15]. Two delay lines are excited using differential input pulse while four-level passive interpolator is used to generate the intermediate crossings. The original crossings as well as the interpolated ones drive flip flop inputs that are triggered by the STOP signal. As shown in Fig.2.3, four resistors are used to interpolate two signals shifted by one inverter delay which improves the resolution by a factor of four.



Fig. 2.3 Local passive interpolation TDC

Area wise, this technique has less number of inverters, one fourth of Vernier TDC, but there is area overhead from the resistors and flip flops. Moreover, to have accurate interpolation, poly-silicon resistors are used which ends up with large area compared to Vernier TDC. From the mismatch point of view, the interpolation accuracy depends on the ratio between the four resistors rather than their absolute values. Since global mismatch does not affect the resistors ratios, this topology is considered a robust one. Finally, using less number of inverters in the delay line in this technique reduces the power consumption.

#### 2.1.4 Multistage TDC

This technique overcomes the disadvantage of having long Vernier line by using multistage pulse-quantization. A chain of buffers, similar to the conventional TDC, performs the coarse quantization while the fine quantization is performed using Vernier delay line as shown in Fig. 2.4 [16].

Since the maximum input-pulse-width to the Vernier line is set by the resolution of the conventional TDC, this technique is considered area and power efficient compared to Vernier-line-based TDC. Moreover, this technique surpasses Vernier TDC by having latency that is always less one buffer delay. On the other hand, it suffers two disadvantages. First, it is required to have two delay locked loops (DLL), instead of one, to calibrate the delay of each stage. Second, the MUX used to multiplex the first level signals induces dead zone in the signal leading to degradation in the overall resolution.

#### 2.1.5 Reference Recycling TDC

This technique overcomes the mismatch problem in long Vernier-delay-lines by employing short delay line and recycling the input pulse in the line many times as shown in Fig. 2.5 [17]. The signal, CLK, enters the delay line through the MUX, propagates through the delay line then the MUX recycles it again until another input pulse is applied. A DLL is used to adjust the total delay of the delay line such that the input clock period is integer multiple of the line's delay.



Fig. 2.4 Multistage TDC and its timing diagram.



Fig. 2.5 Reference recycling TDC [17].

Since the signal is recycled in the same delay line again and again, the layout is expected to be compact with reduced mismatch effect. In addition, this technique can be used concurrently with the LPI technique to improve the resolution.

## 2.1.6 Time Shrinking Delay Line TDC

In This technique single delay line digitizes the signal. The delay cells are designed such that the pulse shrinks while propagating through the line. The pulse also triggers flip flops connected to the delay element outputs and changes their state. As the pulse propagates through the line, the pulse width decreases until it vanishes. When the pulse vanishes, the remaining flip flops will not be triggered and its old state will be maintained indicating that the pulse vanished. The resolution in this technique depends on the shrinking of the pulse width when it propagates through one delay element. This allows a high resolution digitization of the input pulse. Similar to Vernier delay-linebased TDC, this technique suffers from the large latency [18].

#### 2.1.7 Pulse Stretching Converter

This technique improves the resolution of the conventional TDC by stretching the pulse then propagating it through the inverter chain delay line-based TDC. In this case the gain in the resolution is the stretching factor used to increase the pulse width. The stretching is performed by applying the input pulse to a CMOS switch that discharges a pre-charged capacitor with constant current until the pulse finishes. The capacitor is charged afterwards using smaller current. In this case the charging time will be proportional to the original pulse width and the proportionality factor, or the stretching factor, is the ratio between the discharging and charging currents. Unfortunately this technique leads to increased latency and poor DNL [19].

Table 2.1 summarizes the pros and cons of the aforementioned techniques. From the table it can be observed that there is a tradeoff between the resolution and the latency. Resolution defines the minimum time-step that the TDC can resolve while the latency limits the event rate of the input. In order to attain high resolution, the event rate will be sacrificed and vice versa. Consequently, the resolution and the bandwidth of the input should be compromised.

| TDC topology                    | Resolution | Latency  | Area            | Power    |
|---------------------------------|------------|----------|-----------------|----------|
| Inverter chain based TDC [8-11] | low        | low      | small           | small    |
| Vernier TDC [12-14]             | high       | large    | large           | large    |
| Local interpolation TDC [15]    | moderate   | low      | large           | small    |
| Multistage TDC [16]             | high       | moderate | moderate        | moderate |
| Reference recycling TDC [17]    | low        | low      | very<br>compact | small    |
| Time shrinking TDC [18]         | high       | large    | large           | large    |
| Pulse stretching TDC [19]       | moderate   | large    | large           | large    |

Table 2.1 Comparison of the different TDC techniques.

#### 2.2 TDC-Based ADC

The diagram in Fig. 2.6 (a) shows how a sinusoidal input of peak-to-peak amplitude of  $V_{pp}$  centered at  $V_{FS}/2$ , where  $V_{FS}$  is the full scale voltage, is digitized using time-mode circuits. The input signal is sampled-and-held at frequency  $f_s$  then a PWM block transforms the voltage-sample into a pulse-width which is digitized using the TDC. The transformation of the analog signal into the digital format is performed in the time-mode. The only part that is performed in voltage-mode is the pulse width modulation. In general, the same PWM design can be utilized as a front-end in time-mode systems to convert the signal from voltage-mode to time-mode prior to further processing.

Since the input signal is sampled at a frequency  $f_s$ , the zero-voltage input is mapped to a pulse width of  $T_s = 1/f_s$  while the full scale voltage input,  $V_{FS}$ , is mapped to a pulse of zero width. Consequently, the conversion gain of the PWM is given by

$$-T_s/V_{FS}$$
 (s/V). Since the amplitude of the voltage-sample ranges from  $\frac{V_{FS}}{2}\left(1-\frac{V_{pp}}{V_{FS}}\right)$  to

$$\frac{V_{FS}}{2}\left(1+\frac{V_{pp}}{V_{FS}}\right), \text{ the width of the output pulse will range from } \frac{T_s}{2}\left(1+\frac{V_{pp}}{V_{FS}}\right) \text{to } \frac{T_s}{2}\left(1-\frac{V_{pp}}{V_{FS}}\right),$$

i.e. centered around  $T_s/2$ , as illustrated in Fig. 2.6 b) at points "B" and "A", respectively. Since the output pulse is symmetric and centered around  $T_s/2$ , the change in the outputpulse-width corresponding to the change of the input can be modeled as if there are two virtual sinusoids around  $T_s/2$ , as shown in Fig. 2.6 b) and the pulse width is modulated with the amplitude of the pulse. Such model will be useful when analyzing the effect of transistor mismatchs on the TDC performance. The amplitude of the PWM output pulse is  $V_{dd}$  in all cases but on Fig. 2.6 b) it is different, at points "A" and "B", just for clarity purposes. The output pulse is then digitized using a TDC.

A major issue that hinders the full utilization of the aforementioned time-mode ADC architecture is the trade off between the resolution of the TDC and both latency and dead-time. Latency, or the conversion time, is defined as the time the TDC requires after the signal is tracked till the digital output is available. The dead time is the minimum time required between two acquisitions. Both latency and dead time limit the maximum sampling frequency that can be used and, as a consequence, put an upper limit on the bandwidth of the ADC. On the other hand, the resolution is the minimum time step the TDC can resolve.



Fig. 2.6 a) Open-loop time-mode-based ADC. b) Timing diagram of the PWM.

To achieve a high ADC dynamic range and a high signal-to-quantization-noiseratio (SQNR) a time-resolution of less than one gate delay may be required. This can be achieved using different techniques but will be at the expense of the latency and the dead time. Hence, the ADC bandwidth, which is limited by the latency, and SQNR, which is limited by the minimum time-resolution that the TDC can resolve, are compromised. To resolve this tradeoff, the concept of  $\Sigma\Delta$  modulator is adopted in time-mode designs [20]. Fig. 2.7 shows the analogy between the voltage-mode  $\Sigma\Delta$  modulator and its time-mode counterpart. The multi-bit quantizer and the multi-bit voltage-mode feedback DAC are replaced by a PWM and a modified TDC. The PWM transforms the input voltagesample into a pulse and the TDC generates a multi-bit digital output,  $D_{out}$ , that corresponds to the pulse-width and provides a time-quantized feedback pulse,  $p_q(t)$ which emulates the DAC output in traditional  $\Sigma\Delta$  modulators.

Owing to the over-sampling and noise shaping offered by the  $\Sigma \Delta$  loop architecture, the quantizer (TDC) is not required to have a number of levels in the order of the targeted SNR. On the other hand, the timing-accuracy of the feedback signal (the feedback pulse-width  $p_q(t)$ ) should be in the order of the targeted SNR or better. In other words, the error in the feedback signal  $p_q(t)$  should be less than a single bit as it will not be shaped by the loop filter. These two remarks can be mapped as specifications for the TDC as follows: First, the number of quantization steps of the TDC can be decreased compared to the open loop case, shown in Fig. 2.6 a), which means that we can use timequantization steps that are larger than one gate delay. That directly allows the use of TDC architectures with low latency and low dead-time. Thus, wide bandwidth ADC can be achieved. Second, the accuracy of the feedback signal, i.e. the width of the  $p_q(t)$ pulses, is proportional to the targeted SNR which puts a constrain on the DNL (timing mismatch) of the TDC steps. As a numerical example, assume a 10-bit ADC that is designed using the open loop architecture shown in Fig. 2.6 a). The time-quantization step of the TDC will be given by  $T_s/2^{10}$ . On the other hand if a third order  $\Sigma\Delta$  architecture with an over-sampling ratio (OSR) of 6 is used then the enhancement of the SQNR due to the loop filter is given by:

$$SQNR|_{\Sigma\Lambda} = 10\log\left(\frac{3(2L+1)M^{(2L+1)}}{2^{*}\pi^{2L}}\right) = 10\log\left(\frac{3^{*}7^{*}6^{7}}{2^{*}\pi^{6}}\right) = 34.86dB$$
(2.4)

Consequently, a 4.5 bits quantizer can be used to attain a 10-bits output. In other words, the time-quantization step of the quantizer can be as low as  $T_s/2^{4.5}$  which relaxes the resolution of the TDC.

On the other hand, the linearity of the feedback-pulse-width should be maintained higher than 10 bits which means that the DNL of the TDC should be better than 10 bits. In conclusion, by replacing the multi-bit voltage-mode feedback DAC by a TDC, the performance bottleneck is transformed to the timing-precision of the feedback pulse of the TDC,  $p_q(t)$ , rather than the absolute voltage levels of the DAC in a conventional voltage-mode  $\Sigma\Delta$  ADC. Since a reduced number of time-quantization steps is required compared to the open loop case, along with small latency to minimize the excess loop delay, inverter-chain-based TDC [8-11] is a suitable choice for time-mode based  $\Sigma\Delta$  modulators.



(a)



Fig. 2.7 Block diagrams of a) Voltage-mode  $\Sigma\Delta$  modulator. b) Time-mode  $\Sigma\Delta$  modulator.

Fig. 2.8 shows how the time-quantized feedback pulse,  $p_q(t)$  is generated using an inverter-chain based TDC. The full scale input,  $[0, T_s]$ , is split, in time, into N timequantization steps using inverter-chain-based delay cells which is driven by the sampling clock. Assuming perfect matching between the delay cells, the time-quantization step  $T_Q$ is given by  $T_s/N$ . The edges of the time-quantized feedback pulse,  $p_q(t)$ , should be aligned to the edges of the delay cells outputs as shown in Fig. 2.8 b). Since the TDC directly drives the single-bit DAC, two relevant specifications are the signal-to-jitter ratio (SJR) of the feedback pulse and the maximum DNL of the TDC time-steps. Note that the 'signal' in this case corresponds to the pulse-width while the 'noise' corresponds to its jitter.

#### 2.2.1 Signal-to-Jitter Ratio (SJR)

Assuming a maximum input signal of  $-6dBFs (V_{pp} = V_{FS}/2)$ , where dBFs refers to the full scale input, and assuming that it is centered around  $V_{FS}/2$  and the PWM conversion gain is  $T_s/V_{FS}$  then the maximum feedback-pulse-width,  $p_{qmax}(t)$ , will be given by  $\frac{T_s}{2} \left[ 1 + \frac{V_{pp}}{V_{FS}} \right] = \frac{3T_s}{4}$  as shown in Fig. 2.8 b). Since the maximum pulse width corresponds to the peak-to-peak signal in the voltage domain, by analogy, its RMS value will be given by  $\frac{3T_s}{4} \cdot \frac{1}{2\sqrt{2}} = \frac{3T_s}{8\sqrt{2}}$ . Consequently, the signal-to-(pulse-width-jitter) ratio is given by:

is given by:

$$SJR = 10\log\left(\frac{\left(3T_s/8\sqrt{2}\right)^2}{\sigma_{j_-pw}^2}\right)$$
(2.5)

where  $\sigma_{j_{-pw}}$  is the standard deviation of the pulse-width jitter of the feedback pulse integrated over the frequency band of interest.

#### 2.2.2 DNL of TDC's Time Steps

If perfect matching between the delay cells in Fig. 2.8 b) is assumed, the DNL should be zero. Due to inevitable mismatches, the delay of the different steps will change introducing nonlinearities in the time-quantized feedback pulse  $p_q(t)$ . Since the input voltage-sample ranges from  $V_{FS}/4$  to  $3V_{FS}/4$  (assuming  $V_{pp} = V_{FS}/2$ ), the pulse width will change from  $T_s/4$ ,  $p_{qmin}(t)$ , to  $3T_s/4$ ,  $p_{qmax}(t)$ , as shown in Fig. 2.8 b). The

delay cells that are excited by the reference clock during the time intervals  $[0,T_s/8], [7T_s/8,T_s]$  will seldom interact with the output pulse width (black delay cells on Fig. 2.8 b)). Hence, their mismatch does not affect the performance.







Fig. 2.8 TDC-based  $\Sigma \Delta$  ADC: a) Block diagram. b) Timing diagram and  $p_q(t)$  generation.

On the other hand, the mismatch of the cells excited during the time interval  $[3T_s/8, 5T_s/8]$  (gray cells on the timing diagram) will contribute equally to all the pulse widths so their contribution will be transformed as offset in the pulse width rather than a harmonic distortion. Consequently, the time mismatches that contribute to THD are those of the cells who are active during the time intervals of  $[T_s/8, 3T_s/8]$  and  $[5T_s/8, 7T_s/8]$  (white cells on the timing diagram). Since the output pulse is symmetric and centered around  $T_s/2$  as shown in Fig. 2.6 b), in the following analysis a single side of the pulse-width will be considered then a 3 dB improvement in the THD will be included due to having the two sides at the output. Another observation is that the width of the time-quantized feedback pulse consists of the summation of the individual delays of the different delay cells where each delay cell represents one LSB. Consequently, the TDC can be considered as a time-mode thermometric DAC, where each delay cell corresponds to a time-mode 1-bit DAC. This observation will be useful when analyzing the harmonic distortion of the TDC.

For simplicity of the harmonic distortion analysis, an M-level voltage-mode thermometric DAC consisting of M 1-bit DACs is considered then an analogy is performed to get the mismatch specification on the time-mode counterpart. Fig. 2.9 a) shows the output of an M-level (12-levels as an example) DAC for a one period of a digital sinusoidal signal of frequency  $\omega_{sin}$  with OSR of 15. Since the DAC architecture is a thermometric one, it can be represented as M 1-bit DACs followed by a summer as shown in Fig. 2.9 b). Each square wave signal, the outputs of the 1-bit DACs, can be expanded in terms of its harmonics using Fourier series:

$$x_{sq.(i)}(t) = \sum_{k=1}^{\infty} \frac{2}{\pi k} A \sin(k\phi_i) \cos(k\omega_o t)$$
(2.6)  
where  $x_{sq.(i)}(t) = 1$  for  $|t| \le \frac{\phi_i}{\omega_{sin}}$ ,  $i = 1, ..., M$ ,  $0 < \phi_i < \pi$ 

Consequently, the M-level DAC output is given by the summation of the different square waves:

$$x_{DAC}(t) = \sum_{k=1}^{\infty} \left\{ \frac{2}{\pi k} \sum_{i=1}^{M} \left[ A \sin\left(k\phi_{i}\right) \right] \cos\left(k\omega_{o}t\right) \right\}$$
(2.7)

where *A* represents the step height in voltage-mode DAC, *k* is the harmonic index and  $\phi_i$  represents the phases of the different 1-bit signals.

To estimate the effect of the 1-bit DAC mismatches on the THD, a thirdharmonic-free and fifth-harmonic-free digital sinusoidal signal similar to the one shown in Fig. 2.9 a) is generated using the technique proposed in [21]. The condition to generate such input signal is

$$\sum_{i=1}^{M} \sin(k\phi_i) = 0 \text{ for } k = 3,5$$
(2.8)

If the above condition is satisfied, the third and fifth harmonics will vanish according to the equation of  $x_{DAC}(t)$  derived above in (2.7).



Fig. 2.9 M-Level feedback DAC: a) DAC output signal. b) Thermometric DAC architecture.

Assuming amplitudes mismatches in the different 1-bit DACs, the M-level DAC output, given in the ideal case by (2.7), will be given by

$$x_{DAC}(t) = \sum_{k=1}^{\infty} \left\{ \frac{2}{\pi k} \sum_{i=1}^{M} \left[ \left( A + \Delta A_i \right) \sin\left( k\phi_i \right) \right] \cos\left( k\omega_o t \right) \right\}$$
(2.9)

Thus, its third and fifth harmonic are no longer zero. The harmonic distortion of  $x_{DAC}(t)$  becomes:

$$HD_{k} = \frac{1}{k} \frac{\sum_{i=1}^{M} \left[ \left( 1 + \Delta A_{i} / A \right) \sin\left( k \phi_{i} \right) \right]}{\sum_{i=1}^{M} \left[ \left( 1 + \Delta A_{i} / A \right) \sin\left( \phi_{i} \right) \right]}$$
(2.10)

Since 
$$\sum_{i=1}^{M} \sin(k\phi_i) = 0$$
 for  $k = 3,5$  and  $\Delta A_i / A <<1$ , the expression can be simplified to  
 $HD_k \cong \frac{1}{Ak \sum_{i=1}^{M} \sin(\phi_i)} \sum_{i=1}^{M} \left[ \Delta A_i \sin(k\phi_i) \right], \quad k = 3,5$ 
(2.11)

Assuming a Gaussian distribution for the amplitude mismatches  $\Delta A_i$  with standard deviation  $\sigma_{\Delta A_i}$ , the distribution of  $HD_k$  will also be Gaussian with zero mean and standard deviation  $\sigma_{HD_k}$  as it is a summation of Gaussian distributions.

$$\sigma_{HD_k} = \frac{\sigma_{\Delta A_i}}{Ak \sum_{i=1}^{M} \sin(\phi_i)} \sqrt{\sum_{i=1}^{M} \sin^2(k\phi_i)}$$
(2.12)

For  $HD_k < B$ , then  $\sigma_{HD_k}$  should be < B/2 for a 95% yield ( $2\sigma$  on Gaussian distribution covers 95% of the area under the curve). Thus, the standard deviation of the mismatch will be given by:

$$\sigma_{\Delta A_i} < \frac{B}{2} Ak \frac{\sum_{i=1}^{M} \sin(\phi_i)}{\sqrt{\sum_{i=1}^{M} \sin^2(k\phi_i)}}$$
(2.13)

Consequently, once the system level simulations of the  $\Sigma\Delta$  modulator is performed and the maximum distortion of the DAC, *B*, and the time-quantization step in the TDC,  $T_Q$ which corresponds to *A* in voltage-mode DACs, are specified, both can be plugged in the above equation to obtain the maximum tolerable mismatch between the different timesteps which corresponds to the DNL of the TDC.

#### 2.3 System Level Design

The design of a TDC for the time-mode based  $\Sigma \Delta ADC$  requires system level simulations for the  $\Sigma\Delta$  modulator in order to extract the TDC specifications. System level simulations for the  $\Sigma\Delta$  modulator were performed targeting 10+ bits resolution over a bandwidth of 100kHz to 20MHz [20]. An OSR of 6.25 is used which corresponds to a sampling rate of 250MHz (sampling period  $T_s$ =4nsec). The signal-to-noise ratio of the TDC and its digital output buffer should be better than 72dB over the signal bandwidth. The TDC is required to have 50 quantization steps which correspond to a step size of 80ps. Since  $T_s$  is 4nsec, the standard deviation of the jitter of the timequantized feedback pulse width,  $\sigma_{_{_{i-pw}}}$ , must be kept less than 266fs for SJR >72dB according to equation (2.5). Finally, targeting 65dB THD assuming -3dBFs input implies a standard deviation of the delay of the different TDC steps of around 480fs according to equation (2.13) with  $\pm 40$  fs variations depending on the choice of the harmonic-free digital input signal (the choice of  $\phi_i$ ). However, system level simulations of the  $\Sigma \Delta$ modulator along with the TDC model indicate that the mismatch specifications can be relaxed to be around 800fs for a -3dBFs input.

The TDC in  $\Sigma\Delta$  modulator performs two main functions. First, it identifies the position of the rising and falling edges of the PWM signal with respect to the falling

edge of the PWM reference clock. Second, it emulates the multi-bit DAC operation by providing a time-quantized feedback pulse  $p_q(t)$ . Fig. 2.8 shows the timing diagram of a 16-level TDC-based  $\Sigma\Delta$  modulator. The TDC delivers the code representing the position of the rising and falling edges of the PWM signal, 4 and 12 on the diagram, and provides a time-quantized feedback pulse  $p_q(t)$  whose edges are aligned to the time-quantization steps as shown in Fig. 2.8 b). Some modifications are required for the TDC to generate the code with respect to the falling edge of the reference clock and to generate  $p_q(t)$ .

Fig. 2.10 shows the block diagram of the output code generation part of the TDC. A 50-cell inverter-based delay line is employed whose first half captures the rising edge of the input pulse while the second half captures its falling counterpart. The input signal, p(t), is applied to the inputs of the flip flops while the falling edge of the clock is applied to the delay line. As the clock propagates through the digital delay line it triggers the flip flops sequentially in 80ps steps to capture the input pulse in a thermometric fashion. To maintain the symmetry of the design, p(t) is inverted before being applied to the flip flops of the second half of the TDC such that the second set of flip flops captures the rising edge of  $\overline{p(t)}$ . A thermometer-to-binary converter is used to encode the thermometric output before it is further processed. Since the timing-precision of the feedback pulse,  $p_q(t)$ , is critical, the feedback-pulse-generation block must fulfill very demanding specifications. First, the width of the feedback pulse should be minimized to have minimum effect on the excess loop delay of the  $\Sigma\Delta$  loop. Third, the delay must be

constant to avoid data-dependent delay which distorts the signal. Finally, the jitter of the feedback pulse should be minimized to fulfill the targeted SNR of the system since the jitter is reflected as noise in the ADC output spectrum.

Since the flip flops in Fig. 2.10 are clocked through the delay elements, its outputs are aligned to the delay element outputs which are 80ps apart. The rising and falling edges of the feedback pulse can be generated by applying the *OR* operator on the first and second 25-outputs sets of the flip flops ( $C_1$  through  $C_{25}$  and  $C_{26}$  through  $C_{50}$ ) respectively. In this case the feedback pulse edges are aligned to the delay-element-output edges with one flip flop and *OR* gate delay. To ensure uniform delay from all inputs, ( $C_1$  through  $C_{25}$  and  $C_{26}$  through  $C_{50}$ ), to the output of the *OR* gate, the wired-NOR structure shown in Fig. 2.11 is used to generate the quantized signal  $p_q(t)$ .

Many techniques can be used to implement 25-inputs *OR* operation in wiredstructure. The simplest one is to use a single *NOR* gate with 25 inputs as shown in Fig. 2.12 a). Other techniques are to use 2-levels or 3-levels *OR* operation as shown in Fig. 2.12 b) and c), respectively. The 2-level *OR* is implemented using *NOR-NAND* structure while the 3-level *OR* is implemented using *NAND-NOR-NAND* structures. Three factors should be considered when choosing the optimum *OR* implementation: delay, jitter and power consumption. For a fair comparison of these factors, all transistors are assumed to have the same current driving capability (charging/discharging currents are the same for all of them).



Fig. 2.10 Block diagram of the output code generation of the TDC (left side captures p(t) falling edge while the right side captures the rising edge).

The delay of digital circuit is roughly estimated by (Vdd/I)Cload where I is the current driving capability of the transistor. Since the design in Fig. 2.12 b) consists of two levels and in Fig. 2.12 c) consists of three levels, the corresponding delays of the three designs are approximately estimated to be:

$$t_{d1} = \frac{V_{dd}}{I} \left( 25C_{gdn} + C_{gdp} + C_{gg} \right) \approx \frac{V_{dd}}{I} C_{gd} \left( 26 + \frac{C_{gg}}{C_{gd}} \right)$$

$$t_{d2} = \frac{V_{dd}}{I} \left( 5C_{gd(n/p)} + C_{gd(p/n)} + C_{gg} \right) \approx 2 \approx \frac{V_{dd}}{I} C_{gd} \left( 12 + 2 * \frac{C_{gg}}{C_{gd}} \right)$$

$$t_{d3} = \frac{V_{dd}}{I} \left( 3C_{gd(n/p)} + C_{gd(p/n)} + C_{gg} \right) \approx 3 \approx \frac{V_{dd}}{I} C_{gd} \left( 12 + 3 * \frac{C_{gg}}{C_{gd}} \right)$$
(2.14)

where  $C_{gd}$  is the drain capacitance of the input and load transistors (assuming  $C_{gdp} = C_{gdn}$ ) and  $C_{gg}$  is the gate capacitance of the next stage. It can be observed that  $t_{d2}$  is always less than  $t_{d3}$  and is less than  $t_{d1}$  as long as  $C_{gg} < 14C_{gd}$ . From the power consumption point of view, the second design consumes minimum power as it achieves the least time delay and *I* is the same for the three designs.



Fig. 2.11Uniform-delay wired-Nor architecture for the feedback pulse,  $p_q(t)$  generation.

For the jitter analysis we assume that all transistors approximately have equal voltage-noise with standard deviation  $\sigma_n$ . The timing jitter is related to the voltage noise through the slew rate as follows:

$$J = \frac{v_n}{SR} = t_d \left(\frac{v_n}{\Delta V}\right)$$
(2.15)

Since the jitter is proportional to the delay, it is proportional to the load capacitance as follows:

$$J_{1}\alpha\left(26 + \frac{C_{gg}}{C_{gd}}\right), J_{2}\alpha\sqrt{2}*\left(6 + \frac{C_{gg}}{C_{gd}}\right), J_{3}\alpha\sqrt{3}*\left(4 + \frac{C_{gg}}{C_{gd}}\right)$$
(2.16)

From the above equations, it is clear that the jitter of the three designs depend on the ratio between gate-source capacitance and gate-drain capacitance.

Since the aforementioned analyses are approximate, transistor level simulations were performed to confirm that the quantitative analysis matches the trend of simulations. Fig. 2.13 shows the relative delay, jitter and power consumption of the three designs versus transistor width. It is clear that the 2-level architecture provides the best performance.



Fig. 2.12 Block diagram of the different architectures for implementing 25 inputs *OR* gate of Fig. 6.a) 1-Level *OR* gate. b) 2-Levels *OR* gate. c) 3-Levels *OR* gate.

## 2.4 Transistor Level Implementation

Fig. 2.14 shows the block diagram of the TDC with the feedback pulse generator. As shown in the figure, the TDC can be conceptually split into two main parts: one for the generation of output codes that correspond to the rising and falling edges of the input signals and a second part that generates the time-quantized rising and falling edges of the feedback pulse. The TDC block consists of six sub-blocks: delay cells (D), flip-flop (FF), 2-levels OR gate, SR latch and reset unit (R). In addition, a phase detector (PD) is used to adjust the total delay of the line to be 4ns The main design specifications are the timing jitter of the feedback pulse width and the timing mismatches of the time-quantized steps. In the following sections the implementation of the different blocks will be addressed such that the mismatch and jitter specifications are fulfilled.

## 2.4.1 Delay Cell

The bottleneck of the TDC is its jitter and mismatch performance. The jitter includes the jitter of the delay line (50 delay cells), 2-level *OR* gate, *SR* latch, feedback buffer and supply noise. On the other hand, the mismatch of the different delay elements is mapped as a distortion in the output spectrum.

As stated in section 2.3, 800fs of timing-mismatch/DNL of the delay cell and the *OR* gate is required to attain THD better than 65dB. This mismatch includes process variations, systematic design mismatch, periodic noise from the supply and metastability or data dependent delay. For critical sections, transistors' length is set as double the minimum length of the process to reduce the effect of PVT variations. Monte Carlo

simulations were used to provide a lower limit on the width such that the variation in the unit time-step is less than the targeted DNL.



Fig. 2.13 Relative jitter of the different OR gate architectures versus transistor's width.

Transistor's noise can be decreased by increasing the width of the transistor. By doubling the transistor width, a 3dB is gained in the noise as well as the timing-jitter. Unfortunately, the power consumption also doubles due to the increased capacitance in the circuit. In addition, *SR* does not improve significantly as both the driving capability of the transistor as well as the capacitive loading double. Moreover, increasing the power

consumption increases the supply noise and affects the timing precision of the neighboring circuits ending up with a degraded overall performance.



Fig. 2.14 Block diagram of the TDC with the feedback pulse generator  $(p_q(t))$ .

To overcome these drawbacks, innovative circuit techniques are employed to maintain the jitter within the required specifications. Assuming an *N*-inverters delay line and assuming uncorrelated noise sources, the total jitter of the line is given by:

$$J = \sqrt{N} \frac{v_n}{SR}$$
(2.17)

By increasing the *SR* by a factor *m* the number of required inverters to produce the same total delay will roughly increase by the same factor leading to:

$$J = \sqrt{N} \frac{v_n}{SR} \Longrightarrow J_{new} = \sqrt{mN} \frac{v_n}{SR^*m} = \frac{1}{\sqrt{m}} \left(\sqrt{N} \frac{v_n}{SR}\right)$$
(2.18)

As a rule of thumb, increasing the SR of the transitions by a factor m improves the jitter per unit delay by a factor  $\sqrt{m}$ . For example, having 8 transitions within 80ps provides 30% less jitter compared to having 4 transitions only. To increase the SR without doubling the width of the transistor, dynamic logic techniques are used. A dynamiclogic-based delay-cell is shown in Fig. 2.15 a). The main advantage of dynamic logic is that the driver transistor drives one load transistor instead of two. Theoretically the slew rate is improved by a factor of two and the number of stages is doubled per unit delay compared to the static CMOS delay line. Although the number of stages doubled, the power consumption will slightly increase as the pre-charging/pre-discharging transistors are designed using minimum length. For example, if  $L=2L_{min}$  is used for matching purposes in the static CMOS delay line, the same length can be used in the signal path in the dynamic-logic-based cell while the pre-charging/pre-discharging transistors are designed using  $L_{min}$ . Thus, their width is also divided by two without losing their current driving capability. Consequently, the total capacitance of the delay line, as well as the power consumption, increases by 25% only per unit delay compared to the static CMOS delay line.



Fig. 2.15 Dynamic-logic-based delay cell (D) a) Transistor level implementation. b) Timing diagram.

## 2.4.2 Flip Flop

Fig. 2.16 a) shows the dynamic-logic implementation of the flip flop while Fig. 2.16 b) and c) shows two timing diagrams in case of capturing/missing the data. The input signal, p(t), is normally low while the *CLK* is normally high. If the CLK falling edge leads p(t) as shown in Fig. 2.16 c) the output will not change. However, if the

rising edge of p(t) leads the falling edge of the CLK as shown in Fig. 2.16 b) the output node discharges and the flip flop captures the data. Static inverters,  $I_1$  and  $I_2$ , are designed to be unbalanced such that the transition of the output, Q, from low to high is enhanced.

Since the timing-precision of the feedback pulse is required to be better than 0.8ps, the metastability of the flip flop should be within that limit which is impractical. To overcome this problem, the output of the flip flop, that is triggered by  $CLK_n$ , is gated by another clock,  $\overline{CLK_{n+2}}$ , to leave time for the flip flop output to settle as shown in Fig. 2.14. In this case the flip flop output edges are still aligned to the outputs of the delay elements but the feedback pulse will be delayed by two time steps, 160ps, which adds to the total excess loop delay. To minimize the additional loop delay due to the *AND* stage, the *AND* gate is embedded in the two-levels *OR* gate.

## 2.4.3 2-Levels OR Gate

The two-levels *OR* gate is implemented using a modified *NOR-NAND* structure to include the *AND* operation that gates the flip flop outputs as shown in Fig. 2.17. Dynamic logic ensures uniform and minimum delay for the different inputs. Transistor  $M_1$  acts as an *AND* gate while  $M_2$  is driven by the flip flop output. Five of such branches are connected in parallel to implement the 5-input *NOR* gate. The second level *NAND* gate is implemented using five *PMOS* transistors in parallel to ensure uniform delay.



Fig. 2.16 Dynamic-logic-based flip flop (*FF*). a) Transistor level implementation. b) Timing diagram when the input is captured. c) Timing diagram when the input is missed.

Since the delay of the FF depends on the time difference between the rising edge of the FF input and the falling edge of the clock, under certain conditions the FF output will change from low to high while setting in more than 160ps leading to a data dependent delay at the output of the OR gate. Simulation results indicate that such condition happens when the time difference between the FF input and the clock edge is within 40fs. Consequently, the SNR due to metastability, assuming a maximum input signal of -5dBFs is given by:

$$SNR = 20 \log \left( \frac{4ns}{2\sqrt{2} * 40 fs} \right) - 5 = 86 dB$$
 (2.19)

where 4ns is the full scale of the TDC.

MonteCarlo simulations were used to provide the minimum limit on the transistor sizes such that DNL specifications are fulfilled. Simulations indicate that the

variations in the time step due to transistor mismatch in the different paths including the delay cell are around 0.78ps which fulfills the DNL specifications. A 1.2V supply was used in the design of the circuit.



Fig. 2.17 Dynamic-logic-based OR gate.

## 2.4.4 SR Latch

*SR* latch provides the feedback pulse of the *ADC*. Fig. 2.18 shows the block diagram of the circuit. The *set* input is driven by the *OR* gate of the first half of the TDC which detects the rising edge of the input signal while the *reset* input is driven by the *OR* gate of the second half that detects the falling edge. An edge detector detects the *set/reset* signals and generates a 150ps pulse that charges/discharges the high impedance output node to generate the feedback pulse. The output node is buffered using a static CMOS inverter to eliminate the leakage effect of the high impedance node.

#### 2.4.5 Reset Unit

The reset cell provides the reset signals to the dynamic-logic circuits. The TDC is split into two halves, one that handles the rising edge of the input pulse and one that handles its falling counterpart. Employing the fact that the two halves do not operate at the same time, the reset signal can be applied to the second half while the first half is handling the rising edge and vice versa as shown in Fig. 2.19. To prevent large instantaneous supply currents, the reset signal is applied sequentially to the 25 cells of each branch incorporating a static minimum-size delay-line as shown in Fig. 2.14.



Fig. 2.18 SR latch for the feedback pulse generation.

## 2.4.6 Calibration Circuit

Since the TDC is a time-mode circuit, it requires calibration in order to ensure that the total delay of the delay line is 4ns regardless of PVT variations. The delay is controlled through a voltage regulator that adjusts the supply voltage of the delay line. Fig. 2.20 shows the block diagram of the phase detector that detects the phase difference between the input and output clocks of the TDC. The input and the output clocks of the delay line are passed through two flip flops in order to ensure a 50% duty cycle then three NAND gates are used to implement XOR function. The outputs of the first two NAND gates indicate either the output clock is leading or lagging the input clock. The output of the XOR gate is used to discharge a high impedance node, Q, forcing it to toggle as long as the delay is not calibrated. The output is then divided by two in order to have a 50% duty cycle if the delay of the line is not 4ns. Simulations indicate that the accuracy of calibration is  $\pm 30$ ps which means that the gain error of each conversion level is less than  $\pm 600$  fs.



Fig. 2.19 Block diagram of the reset architecture of the TDC.



Fig. 2.20 Block diagram of the phase detector.

#### 2.5 Jitter and Data Dependent Delay

The clock and delay line jitter are mapped as noise floor in the ADC output spectrum and are required to be limited to 266fs as stated in section 2.3. The worst case jitter happens with a full scale input signal. In this case the rising edge is captured by the first stages of the TDC while the falling edge is captured by the very last stages. Consequently, the jitter of the 50 cells of the delay line affects the pulse width. On the other hand, the jitter of the *OR* gate, *SR* latch and the output buffer will affect both rising and falling edges. Simulation results indicate that the integrated jitter of the 50-cell delay line is around 160fs while the *OR* gate, *SR* latch and feedback buffer contribute with 90fs. Hence, the worst case RMS jitter is given by:

$$J_{tot} = \sqrt{J_{Delay\_line}^2 + \left(\sqrt{2} * J_{OR\_SR\_buf}\right)^2} = \sqrt{160^2 + \left(90\sqrt{2}\right)^2} = 205 fs$$
(2.20)

This timing jitter meets the system specs and leaves a margin for supply-induced jitter and noise injected from the substrate.

The second major issue in time-mode analog systems is the data-dependent delay, which is a variation in the time-quantized feedback pulse-width depending on the timing of the input signal. One cause of this phenomenon is the flip flop metastability. Another source of data-dependent delay is due to the use of dynamic logic gates. Feed-through from nearby switching circuits affect the high impedance nodes causing its output voltage to change. To overcome such problem, it is required to use static CMOS circuits to buffer the critical high impedance nodes. One example of this problem occurs when the flip flop outputs toggle and the switching feeds through to the high impedance nodes of the delay elements as shown in Fig. 2.21. When the flip flop output changes from low to high, the signal passes through the gate-source capacitance of  $M_2$  of the OR gate then through the drain-gate capacitance of  $M_1$ . Since the gate of  $M_1$  is a high impedance node, the voltage on the capacitance connected to this node changes its switching time by  $\Delta t_{cell}$ when the clock reaches this stage in the delay line. Consequently, if the rising edge of the input data happens at code  $C_{data}$ , the delay of the next (25- $C_{data}$ ) delay cells will change by:

$$\Delta t_{total} = (25 - C_{data}) \Delta t \tag{2.21}$$

Consequently, the feedback pulse width will change by the same value. Since the timingerror,  $\Delta t_{total}$ , is a function of  $C_{data}$ , it will result in a data dependent error in the feedback pulse ending with a harmonic distortion of the signal. To overcome this problem, a static CMOS inverter is inserted between the delay cell and the *OR* gate to isolate the high impedance node of the delay element from the feed-through of the flip flop transition. The additional inversion added by the inverter is compensated in the delay line.

The high impedance node at the output node of the first level of the OR gate also induces data-dependent delay. The voltage at the output node should be  $V_{dd}$  before discharging the node for all the inputs to ensure uniform delay. Due to the feed-through of the gate-drain capacitance, the output voltage changes by  $\Delta V$  when the flip flop output, which drives the upper transistor of the OR gate, changes from low to high. Depending on the number of flip flops that catch the input data, the output voltage changes at least by  $\Delta V$  and at most by 5  $\Delta V$  before discharging. This change induces unequal delay for the different inputs of the OR gate leading to harmonic distortion in the time-quantized feedback pulse. To prevent such effect, the flip flop outputs are gated using a static CMOS AND gate which is controlled by the delay line and spaced in time by one quantization step (80ps) as shown in Fig. 2.22. Consequently, the flip flop outputs are applied sequentially to the OR gate instead of being applied at the same time. Assume that flip flop number 3 is the first flip flop to catch the input signal, then the outputs of the first two flip flops will not toggle. When the third flip flop toggles the output voltage drops by  $\Delta V$  and the clock of that branch will turn on before the output of flip flop number four affects the OR gate output. Consequently, the output node of the OR gate always drops by one  $\Delta V$  before discharging leading to a uniform delay for the different inputs.



Fig. 2.21 Effect of flip flop transition on the delay of the delay elements.



Fig. 2.22 Effect of flip flop transition on OR gate response.

## 2.6 Layout Considerations and Experimental Results

The TDC is implemented in TI 65nm CMOS technology with a sound layout that matches the interconnects and the boundary conditions for the different signals. In addition, N-well guard rings are used to reduce the effect of substrate noise. Since recent research shows that asymmetric metal filling on different transistors can cause mismatches up to 30% [22]-[23], metal filling is performed manually such that each cell and each signal see the same pattern of metal coverage. Fig. 2.23 shows the layout of the TDC while Fig. 2.24 shows the chip micrograph. The TDC occupies 0.006mm<sup>2</sup> with a gate count higher than 2.5K.

A low jitter 500MHz clock is generated using an on-board oscillator. In addition, a 4MHz sinusoidal signal of THD<-75dB is generated with the help of a high-Q passive LC filter to filter out the harmonics and the noise of the signal generator. The ADC output is captured using Agilent 16950B logic analyzer. Fig. 2.25 shows the test setup of the ADC.

The output spectrum for a -6dBFs input signal is shown in Fig. 2.26. The THD of the signal is -66dB and the SNDR is 60dB which corresponds to an effective number of bits, ENOB, of 9.67. This THD is the upper limit of the harmonic distortion of the TDC as the THD includes the effect of the HD of the filter and the PWM generator.



Fig. 2.23 Layout of the TDC.



Fig. 2.24 Chip micrograph.



Fig. 2.25 Test setup for the measurement of the ADC.



Fig. 2.26 Output spectrum for a -6dB input signal.

Fig. 2.27 shows the measured SNR and SNDR of the  $\Sigma\Delta$  modulator versus the input signal level for 20MHz bandwidth. The measured THD corresponds to a DNL better than 0.8ps including mismatch and systematic power supply noise. In addition, the dynamic range, defined as the range of SNR>0dB, is higher than 68dB since the noise floor is dominated by the other blocks of the ADC. Fig. 2.28 shows the different harmonic distortions,  $HD_k$ , versus the input level. It is clear that the significant increase of the THD is mainly due to  $HD_2$  and  $HD_3$ . Since the trend of  $HD_2$  and  $HD_3$  is monotonic, it is expected to be contributed from the filter whose linearity degrades with the increase of the input level. On the other hand,  $HD_4$  and  $HD_5$  are non-monotonic indicating that they are generated from the TDC (which operates as the quantizer and the DAC). Since  $HD_4$  and  $HD_5$  are consistently below -66dBc we conclude that the distortion performance is not limited by TDC.

The TDC consumes 4.2mA from 1.33V supply with conversion rate of 250MHz and provides an ENOB of 5.64bits at the output and 9.67bits at the feedback node. For the power-consumption figure-of-merit given by:

$$FOM_P = P/f_{conv} 2^{ENOB}$$
(2.22)

The modulator and the feedback pulse shaper achieve 450fJ/conv-step and 27.8fJ/conv-step, respectively. For the area figure of merit defined as:

$$FOM_A = A/f_{conv} 2^{ENOB}$$
(2.23)

The modulator output and the feedback PWM shaper efficiencies are  $0.505 \text{nm}^2/\text{conv-step}$  and  $0.0309 \text{nm}^2/\text{conv-step}$ , respectively. Table 2.2 compares the measurement results with the state-of-the-art TDCs. It is observed that the proposed

TDC incorporated in the  $\Sigma\Delta$  loop outperforms the existing state-of-the-art TDCs in DNL, event rate along with  $FOM_A$  and  $FOM_P$ . However, the proposed TDC is designed to be incorporated in a time-mode based  $\Sigma\Delta$ ADC. Thus, it cannot be used in open loop architectures.



Fig. 2.27 SNR and SNDR of the ADC versus the input amplitude.



Fig. 2.28  $HD_k$  and the THD of the ADC versus the input amplitude.

|                                                  | [24]              | [25]-[26]           | [27]                | [28]          | This work                            |
|--------------------------------------------------|-------------------|---------------------|---------------------|---------------|--------------------------------------|
| Topology                                         | Coarse-<br>fine   | 4x<br>interpolation | Pulse-<br>shrinking | Cyclic<br>SAR | Inverter-chain                       |
| Technology                                       | 90nm              | 90nm                | -                   | 0.35µm        | 65nm                                 |
| ENOB                                             | 9 bits            | 7 bits              | 11 bits             | 13            | 9.67@<br>feedback                    |
| LSB                                              | 1.25ps            | 4.7ps               | 50ps-1ns            | 1.22ps        | 80ps                                 |
| Input range                                      | 640ps             | 601ps               | >100ns              | 327µs*        | 4nsec                                |
| Event rate<br>(MHz)                              | 66                | 180                 | 1                   | 5             | 250                                  |
| DNL                                              | 1ps               | 2.82ps              | 0.5LSB              | ~1.2ps        | Better than<br>0.8ps                 |
| Supply voltage                                   | 1V                | 1.2V                | -                   | 3.3V          | 1.33V                                |
| Power consumption                                | 3mW<br>@<br>10MHz | 3.6mW               | 10mW<br>@<br>100kHz | 33mW          | 5.66mW<br>@ 250MHz                   |
| Area (mm <sup>2</sup> )                          | 0.6000            | 0.0200              | -                   | 4.4500        | 0.0063                               |
| FOM <sub>A</sub><br>(nm <sup>2</sup> /conv-step) | 17.755            | 0.868               | -                   | 108.64        | 0.505@ output<br>0.0309@<br>feedback |
| FOM <sub>P</sub><br>(fj/conve-step)              | 88                | 156.25              | 49407               | 805.66        | 450@ output<br>27.8@<br>feedback     |

Table 2.2 TDC measurement results compared with the state of the art TDCs.

\* Input range depends on the jitter of an external clock.

## 2.7 Summary

The first TDC in 65nm technology is proposed. The TDC is capable of providing sub-ps DNL (better than 800fs) while employing simple inverter-chain without calibration which is considered the best in literature. Combining  $\Sigma\Delta$  architecture and the dynamic logic implementation, an event rate of 250MHz was achieved. Measurement results show that timing-precision of sub-ps in 65nm technology is possible without calibration. Migrating to new technology is expected to improve the jitter and to provide more timing precision that allows higher dynamic range. The TDC shows competitive area and power consumption that makes it suitable for mobile devices for long battery life and cost as well.

#### CHAPTER III

# A LOW THD, LOW POWER, HIGH OUTPUT-SWING TIME-MODE-BASED OSCILLATOR VIA DIGITAL HARMONIC-CANCELLATION TECHNIQUE\*

### **3.1 Introduction**

High linearity (pure) sine wave oscillators play important role in many applications like built-in-self-testing (BIST) and ADC characterization [29-32]. On-chip spectrum analyzers in BIST require a low total-harmonic-distortion (THD) signal to test the systems on chip [33-35]. On the other hand characterizing the distortion of a 10 bit ADC requires a sine wave with THD < -68dB [36, 37] which is hard to achieve in a fully integrated system with low power, small area and large output swing. Moreover, the increased device nonlinearity and the reduced voltage headroom associated with technology scaling makes it harder to attain the same performance with the same power consumption in newer technologies. All these challenges raise the need for a new technique that is low power, compatible with nanometric technologies and has competitive harmonic distortion.

<sup>\*</sup>Reprinted, with permission, from M. Elsayed, and E. Sánchez-Sinencio, "A Low THD, Low Power, High Output-Swing Time-Mode-Based Tunable Oscillator Via Digital Harmonic-Cancellation Technique" *IEEE J. Solid-State Circuits*, vol. 45, no. 5, pp.1061-1071, May 2010. © 2010 IEEE.

# **3.2 Background of Low THD Oscillators**

The basic idea of low THD BPF-based oscillators is to incorporate a band pass filter (BPF) along with a limiter, comparator, in a positive feedback loop [38-39]. The oscillation frequency is set by the center frequency of the filter while the amplitude is set by the limiter.

Assuming an infinite gain comparator, the THD is directly proportional to the quality factor of the loop filter. Fig. 3.1 shows the THD of the oscillator versus the quality factor of a  $2^{nd}$  order low-pass filter whose transfer function is given by:

$$H(s) = \frac{1}{1 + \frac{1}{Q}\frac{s}{\omega_o} + \frac{s^2}{\omega_o^2}}$$

Consequently, the  $HD_k$  of its output for a square wave input is given by:

$$HD_{k} = \frac{1}{k\sqrt{\left(1 - k^{2}\right)^{2}Q^{2} + k^{2}}}$$

Achieving a linearity better than 62dB requires very high-Q filter (Q>70). Implementing such high-Q filter requires large Op-amp gain-bandwidth product as well as a large spread of the capacitor values and will end up with larger silicon area [40].

One way to improve the performance of the oscillator without using high-Q filter is to employ a multi-level comparator in the feedback path instead of the conventionaltwo-level one [41] as shown in Fig. 3.2 b). The key idea behind this technique is that the amplitudes of the harmonics at the comparator output are function of the clamping levels and the threshold of the comparator. Consequently, an optimum set of clamping levels and thresholds can be chosen such that the THD is improved. However, this approach suffers from its sensitivity to process variations. The ratio between clamping levels should be  $1:\sqrt{2}$  which is hard to implement accurately on silicon. Implementing such ratio using capacitors limits the THD to -55dB assuming a 1% capacitor matching [42].



Fig. 3.1 THD of the oscillator versus the quality factor of a second order filter.



Fig. 3.2 Block diagram of (a) Conventional BPF-based oscillator. (b) Multilevel-comparator-based oscillator.

One common problem in all previously mentioned techniques is the BPF linearity, output swing and power consumption trade off. Targeting small THD along with high output swing ends up with a power-hungry filter. Moreover, the problem increases while migrating from old technology to newer one. The lower supply voltage, the higher noise level and the increased device-nonlinearities associated with newer technologies require more power to be pumped in the filter to achieve the same performance that can be easily obtained in the older technology.

One technique that avoided the use of BPF is presented in [34]. A sigma-delta loop is used to generate the required tone and the loop suppresses the harmonics by pushing it out-of-band using a concept similar to noise shaping in ADCs. Fig. 3.3 shows the block diagram of the system. A digital LPF is used in the loop to overcome the nonlinearity effects introduced by its analog counterpart. Although such technique eliminates the need for the analog BPF, it still suffers huge power consumption, 36.5mW at 40MHz, and large area overhead due to the use of a RAM.

# **3.3 Harmonic Cancellation Technique**

Harmonic cancellation is one of the linearization techniques that has been used for decades. It is in use since the invention of differential amplifiers and differential signaling in which two signals of 180° phase shift are subtracted to suppress the even harmonics [39]. Further work has been done in this area in push-push and triple-push oscillators in which the fundamental frequency of the signal is cancelled to double/triple the oscillating frequency [43-44]. Harmonic cancellation was also used to suppress the third harmonic of a square wave signal to improve the THD of sinusoidal oscillators [45].



Fig. 3.3 Block diagram of the system.

The main drawback of the previous harmonic cancellation techniques is their focus on cancelling one harmonic only at a time without being able to suppress the different harmonics simultaneously. In this section a theory is established in order to be able to manipulate different harmonics of periodic signals at the same time. Contrary to the previous work, the theory is not limited to single tone periodic signals like an oscillator but can be extended to be applied to any periodic signal of any spectrum. For the simplicity of the analysis, the square wave signal was used as a case study while deriving the theory.

Fig. 3.4 shows the block diagram of the proposed technique along with the level of the harmonics associated with each node. The input to the system is a high

frequency,  $Nf_o$ , square wave signal provided through an on-chip ring oscillator. The harmonic cancellation block manipulates the input signal and generates an output signal at the required frequency of oscillation,  $f_o$ , with significantly-suppressed low-frequency harmonics. The high frequency harmonics are later suppressed using a passive RC filter ending up with a low THD sinusoidal signal at the output. One point to note here is that the high frequency clock can be provided off-chip in order to operate at different frequencies. In addition, quality of the high frequency clock, amplitude and frequency, will affect the noise floor of the circuit without affecting the harmonic cancellation.

#### 3.3.1 Harmonic Cancellation Theory

The key idea of harmonic cancellation is to introduce new degree of freedom in the system to separately control the levels of the different harmonics of the signal. One way to introduce such degree of freedom is to add square wave signals of the same frequency but of different time-shifts. The symmetric square wave signal consists of infinite number of harmonics whose amplitudes can be obtained through Fourier series expansion:

$$x(t)|_{sq} = \frac{4}{\pi} \sum_{k=1,3,5...}^{\infty} \frac{1}{k} \cos(k\omega_o t)$$
(3.1)

where k is the harmonic index,  $\omega_o$  is the fundamental frequency of the signal and a 50% duty cycle is assumed.



Fig. 3.4 Block diagram of the proposed harmonic-cancellation-based oscillator.

If a time-shifted version of the signal is assumed, the coefficients of the odd and the even parts of the harmonics will be function of the time shift as shown below

$$x(t+\Delta t)\Big|_{sq} = \frac{4}{\pi} \sum_{k=1}^{\infty} \left[\frac{1}{k}\cos(k\omega_o t)\right] \cos(k\phi) - \frac{4}{\pi} \sum_{k=1}^{\infty} \left[\frac{1}{k}\sin(k\omega_o t)\right] \sin(k\phi)$$
(3.2)

where  $\phi = \omega_o \Delta t$  is the phase shift of the fundamental component of the signal.

In order to maintain the symmetry of the signal around the origin, two signals of opposite time shifts, ( $\Delta t \& -\Delta t$ ), are added.

$$x(t+\Delta t)\Big|_{sq} + x(t-\Delta t)\Big|_{sq} = \frac{4}{\pi} \sum_{k=1}^{\infty} \left[\frac{1}{k}\cos(k\omega_o t)\right] * \left[2\cos(k\phi)\right]$$
(3.3)

Consequently, by adding n signals of time shifts  $\Delta t_1 \dots \Delta t_n$  along with their negative

counterparts,  $\sum_{i=1}^{n} \left[ x \left( t + \Delta t_i \right) \right]_{sq} + x \left( t - \Delta t_i \right) \right]_{sq}$ , each harmonic will be multiplied by a

coefficient given by:

$$\left[2\sum_{i=1}^{n}\cos\left(k\phi_{i}\right)\right] \text{if } \phi_{i} \neq 0 \text{ or } \left[1+2\sum_{i=2}^{n}\cos\left(k\phi_{i}\right)\right] \text{if } \phi_{i}=0, \ \phi_{i}=\omega_{o}\Delta t_{i}$$
(3.4)

By choosing  $\phi_i$  such that the coefficients are very small the harmonics will be significantly suppressed. For example, adding three signals of phases  $\left(0, \frac{\pm 2\pi}{9}\right)$  eliminates the third harmonic of the output signal. If four signals of phases  $\left(\frac{\pm 2}{15}\pi, \frac{\pm 7}{15}\pi\right)$  are added, the third and fifth harmonics of the signal will vanish as shown in (3.5)

$$\cos\left(3\omega_{o}t\right)\left(2\cos\left(3*\frac{2\pi}{15}\right)+2\cos\left(3*\frac{7\pi}{15}\right)\right)=0$$

$$\cos\left(5\omega_{o}t\right)\left(2\cos\left(5*\frac{2\pi}{15}\right)+2\cos\left(5*\frac{7\pi}{15}\right)\right)=0$$
(3.5)

The fundamental component of the output will be scaled by the coefficient given by (3.4), where k=1, and will be slightly attenuated by the LPF. The main advantage of this technique is its elimination of the low frequency harmonics without using analog active filter. Consequently, the use of BPF can be avoided and simple passive, inherently linear, RC-LPF can be used to eliminate the high frequency harmonics. As a result, significant amount of the power will be saved and the output swing is not limited by the linearity of the analog filter anymore. Moreover, the theory implies that the harmonics of any periodic signal x(t) of any shape, does not have to be square wave signal, can be eliminated as long as time-shifted versions of the signal are available. Consequently, a harmonic-truncated square wave can be used instead of using a sharp transition square wave as it consumes less power. In addition, the technique still works for any periodic signal like saw-tooth or triangular wave.

# 3.3.2 System Level Design

To attain a very low THD, the ratio between the power of the fundamental component of the signal to the power of the harmonics should be maximized. This ratio is composed of three different parts. The first part is the natural difference between the harmonics' power and the fundamental power of square wave signal as given by (3.1). The second and third contributors are the harmonic cancellation block and the passive LPF. Consequently, by setting a targeted THD and assuming certain transfer function for the filter, 3<sup>rd</sup> order in the proposed prototype, the required cancellation from the harmonic cancellation block can be calculated. Fig. 3.5 shows the contribution of the three attenuation components and their summation, the fourth bar, at the output of the proposed oscillator for each harmonic.

The harmonic cancellation block plays an important role in suppressing the near harmonics, contributes 75% of the 3<sup>rd</sup> harmonic attenuation, while the linear filter suppresses the far ones, by contributing 88% of the 11<sup>th</sup> harmonic rejection. A THD better than -70dB is targeted in the calculations.

Fig. 3.6 shows the block diagram of the digital circuit that generates the required time-shifted signals for harmonic cancellation. A digital divider is used to divide the high frequency square wave,  $Nf_{osc}$ , to obtain an output frequency of  $f_{osc}$ . The divider consists of a counter and control logic that is used to reset the counter every N clock cycles. The output bits are fed to digital logic circuits that are used to generate square

wave signals with the required time-shifts. The time shifts is a multiple of the period of the master clock due to the digital nature of the block.



Fig. 3.5 Harmonics attenuation contributed by the different attenuation components of the oscillator system (Relative to the fundamental).



Fig. 3.6 Block diagram of the harmonic cancellation block.

The performance of the harmonic cancellation block depends on two design variables. The division ratio between the counter clock frequency and the output sinusoidal frequency,  $N = \frac{f_{master\_clock}}{f_o}$ , and the number of added shifted-signals,  $n_{\phi}$ , and their time-shift values,  $\Delta t_i$ . Since there are N possible-shift-steps within one period of the output signal and  $n_{\phi}$  signals to be added, the search space contains  $N^{n_{\phi}}$  points. Small values for N and  $n_{\phi}$  are chosen at the beginning then a search for a point in the space that satisfies the THD requirements is conducted. If no point is found, N and  $n_{\phi}$  are increased to increase the number of points in the search space and, as a consequence, increase the probability of finding a point that satisfies the THD. Either N or  $n_{\phi}$  or both can be increased to augment the number of points in the search space; however there is a trade off between them. As N increases the frequency of the master clock increases. Consequently, the power consumption of the system increases. On the other hand, increasing  $n_{\phi}$  increases the complexity and the layout area. Fig. 3.7 shows flowchart of the design procedure and the algorithm used to get N,  $n_{\phi}$  and the phases  $\phi_1...\phi_{n_{\phi}}$ . The algorithm scans the space ( $N^{n_{\phi}}$  points) to find the optimum point that provides the lowest THD. In the current design the output frequency is chosen to be 10MHz. Division ratio, N, of 116 and time shifts of 2, 7, 12, 19 steps were found to provide the best nearharmonic suppression. Selecting  $N=2^n$  may facilitate the design but no time shifts were found to provide competitive THD. Employing a third-order passive RC-LPF ensures a THD better than -70dB.



Fig. 3.7 Flow chart of the design procedure and the search algorithm for finding the appropriate time shifts.

Although the power consumed to generate the master clock increases as the master-clock-frequency increases, it is still less than the power needed for implementing linear-active BPF in deep submicron technologies in the BPF-based oscillators. In addition, the power consumed in smaller technologies will be less as both supply voltage and node capacitances decrease. The power in digital circuits is proportional to  $CV_{dd}^2$ 

# **3.4 Circuit Implementation**

One of the main advantages in the proposed system is its digital-friendly nature which makes it robust to process variations and allow easy migration to smaller technologies. Fig. 3.8 shows a detailed block diagram of the system.



Fig. 3.8 Detailed block diagram of the oscillator.

A static CMOS inverter-based ring oscillator is used to generate the master clock running at  $Nf_o$ , which is N times the output frequency. The high-frequency clock triggers a counter that divides the clock frequency by N and generates output bits to cover the range from 0 to N. The output bits are then fed to digital control circuit pairs, digital time shifter pairs, to generate the rising and falling edges of the shifted signals at the specified time instants. The outputs of each pair drives an S-R flip flop whose output is the shifted signal. In order to eliminate the effect of the non-uniform delay of the different paths, a transmission gate is used to align all the signals to the falling edge of the master clock. To avoid metastability problems, the counter and the FF are both designed to operate on the positive edge of the clock. Consequently, the delay of the counter, the control circuit and the setup-time of the FF can be as long as one complete clock cycle without affecting the performance of the system. In addition, as the transmission gates are activated by the negative edge of the clock, the FF will have half clock cycle to settle.

The digital signals are added using passive components, resistors, then the output signal is fed to a third order passive-RC LPF to filter out the high frequency harmonics. As the output node of the transmission gate will be floating for half the period, a CMOS inverter is inserted to keep this node high impedance node and to drive the summing resistors. Other alternatives for summing the different signals include active summing, using active elements to sum the signals, and capacitors. Using active elements will limit the swing of the output signal as the summing circuit itself will have to meet the THD spec of the system which will be very hard to achieve while maintaining large output swing. The point that favors the resistors over the capacitors, as both are inherently

linear, is the power consumption. Ensuring good matching in the summing network implies the use of larger caps which implies more current, and power, to charge and discharge it. On the other hand, resistor matching depends mainly on the resistor area which is independent of the resistance value as the same area can be used while interchanging the width and the length of the resistor to get different resistor value without affecting the layout area.

As the transmission gate aligns the signals to the master clock, the parts that should be considered carefully in the layout are the transmission gates itself, the inverters, the summing resistors and the clock distribution. Interdigitized and common centroid techniques were incorporated in the layout of the resistors in order to ensure good matching. As there are  $2n_{\phi}$  resistors to be matched,  $n_{\phi}$  for  $\phi$  and  $n_{\phi}$  for  $-\phi$ , a  $2n_{\phi}$ by  $2n_{\phi}$  array of resistors is used to have an interdigitized and common centroid layout as shown in Fig. 3.9. In addition, H-tree-routing-technique was used in the clock distribution to prevent relative clock skews between the different branches [46]. Similarly, another H-tree is used to construct the summing node on the other side of the resistor array before feeding the LPF.

The LPF is a third-order passive-RC filter whose component values are chosen such that the harmonic rejection is maximized. An important point to add is that having large rise or fall times for the digital square waves will not affect the THD of the oscillator as long as the rise times are identical for the different added signals and the fall times are also identical. This fact is implied in the harmonic-cancellation theory derived above in which the signal x(t) is not restricted to be ideal-transitions square wave. One source of systematic mismatch that affects the performance is the mismatch between the falling and rising edges of the inverter output and the on-resistance mismatch between the NMOS and the PMOS transistors. Such mismatch will affect the duty cycle of the signal and will end up with even harmonics at the output. One way to overcome this problem is to properly size the transistors to match the falling and rising times as possible.



Fig. 3.9 Clock routing and resistor-summer layout diagram.

Another solution is to have a pseudo differential version of the circuit. In other words, have another copy of the circuit and interchange the S and R inputs of the S-R flip flop. In this case if one inverter is switching high its counter part will be switching low. Consequently, the differential output will see both the rising and falling effects of the inverter at the two edges of the signal leading to a perfect 50% duty cycle. A drawback of this approach is the doubled area and the increased power consumption. The proposed prototype was found to consume 21% more power in the differential mode compared to the single ended version. In the proposed design pseudo differential solution is used so two sets of eight resistors each are used along with two low pass filters. Simulations indicate that a standard deviation of the resistor mismatch up to 1% and time-shift mismatch as large as 60ps, 6.96% of the period, can be tolerated in the design while attaining basically the same THD. A quantitative analysis for the effect of mismatch on the performance of the system is presented in the next section.

# **3.5 Performance Limitations**

As the performance depends mainly on timing precision, the THD will depend on timing mismatch of the added signals. There are three different types of errors that affect the square wave signal as shown in Fig. 3.10.

The first one is the time-shift error which affects both rising and falling edges of the signal. The main contributor to this error is the mismatch between the different transmission gates and between the different inverters. The second type of errors is the duty cycle error which is introduced by the mismatch between the NMOS and PMOS driving capabilities of the inverter driving the summing resistors. The effect of this error will appear only in the single ended version of the circuit. The last type of errors is the amplitude error which is introduced by summing-resistors mismatch and the PMOS/NMOS on-resistance mismatch. Resistor mismatch can be mapped as adding signals of different amplitudes using ideal summer.



Fig. 3.10 Different types of the added signal errors.

In the following subsections the effect of the amplitude and phase errors will be analyzed together followed by analysis of the sources of even harmonic distortion. In the analysis the focus will be on the second and third harmonics as they are the most susceptible to these errors

# 3.5.1 Amplitude and Phase Error Analysis

Assuming an amplitude error of  $\Delta A$  and phase error of  $\Delta \varphi$ , the output signal, y(t), is given by:

$$y(t) = \sum_{i=1}^{2n} (A + \Delta A_i) \cos(k\omega_o t + k(\varphi_i + \Delta \varphi_i))$$

$$\approx \sum_{i=1}^{2n} \begin{bmatrix} A \cos(k\omega_o t + k\varphi_i) - k\Delta \varphi_i \Delta A_i \sin(k\omega_o t + k\varphi_i) \\ +\Delta A_i \cos(k\omega_o t + k\varphi_i) - Ak\Delta \varphi_i \sin(k\omega_o t + k\varphi_i) \end{bmatrix}$$
(3.6)

Since the first term already satisfies the targeted THD while the second term is negligible compared to the third and fourth terms, the equation can be simplified to:

$$y(t) \approx \sum_{i=1}^{2n} \left[ \Delta A_i \cos(k\omega_o t + k\varphi_i) - Ak\Delta\varphi_i \sin(k\omega_o t + k\varphi_i) \right]$$
  
=  $A\cos(k\omega_o t) X + A\sin(k\omega_o t) Y$  (3.7)

where

$$X = \sum_{i=1}^{2n} \left[ \frac{\Delta A_i}{A} \cos(k\varphi_i) - k\Delta \varphi_i \sin(k\varphi_i) \right]$$
$$Y = -\sum_{i=1}^{2n} \left[ \frac{\Delta A_i}{A} \sin(k\varphi_i) + k\Delta \varphi_i \cos(k\varphi_i) \right]$$

Consequently, the amplitude of the third harmonic is given by:

$$Amp.(3^{rd} harmonic) = A_{\sqrt{\left(X^2 + Y^2\right)}}\Big|_{k=3}$$
(3.8)

while the amplitude of the fundamental component is given by:

$$A\sum_{i=1}^{2n}\cos(\omega_{o}t+\varphi_{i})=\cos(\omega_{o}t)A\sum_{i=1}^{2n}\cos(\varphi_{i}), \text{ where } \phi_{i}=-\phi_{i+n}$$
(3.9)

Consequently  $HD_3$  gain of the harmonic cancellation block is given by:

$$HD_{3} gain|_{dB} = 20 \log \left( \frac{\sqrt{(X^{2} + Y^{2})}|_{k=3}}{\sum_{i=1}^{2n} \cos(\varphi_{i})} \right)$$
(3.10)

Fig. 3.11 a) shows 3D plot of the effect of both amplitude and phase mismatches on the HD<sub>3</sub> gain while Fig. 3.11 b) shows contour plot indicating constant HD<sub>3</sub> gains of - 60dB to -45dB in 5dB steps.



Fig. 3.11 Effect of timing and amplitude mismatches on HD<sub>3</sub> suppression. a) 3D plot. b) HD<sub>3</sub> Contour plot.

Phase mismatch is mapped to timing mismatch in the figure. To achieve an output HD<sub>3</sub> of -70dB, HD<sub>3</sub> gain of the harmonic cancellation block should be less than the -45dB level as shown in Fig. 3.5. It can be concluded from the figure that if the standard deviation,  $\sigma_{\Delta t}$ , of the timing mismatch of the transmission gate and the inverter is kept below 60ps, which is easy to attain in UDSM technologies, the standard deviation

of the amplitude mismatch,  $\sigma_A$ , can be as high as 1% while attaining an output THD of -70dB or better.

# 3.5.2 Even Harmonic Distortion Analyses

There are two different effects for the mismatch between the NMOS and PMOS transistors of the CMOS inverters driving the summing resistors. The first effect is the difference in the rising and falling times which introduces duty cycle error. The second effect is the difference between the on-resistance of the PMOS and NMOS transistors leading to systematic amplitude errors. The two effects will be addressed separately.

In order to address the duty cycle error, the signal is decomposed into two parts. The first part is the ideal signal while the second part is the  $\Delta D$  part shown in Fig. 3.10. The duty cycle distortion effect will clearly appear in the second harmonic. Using Fourier series, the mismatch,  $\Delta D$ , can be expanded into its harmonics as follows

$$C_{k} = \frac{1}{T_{p}} \int_{t_{o}}^{t_{o}+\Delta D} A e^{-j\omega_{o}kt} dt = \frac{A}{\pi k} \sin\left(\pi k \frac{\Delta D}{T}\right) e^{-j\omega_{o}k\left(t_{o}+\frac{\Delta D}{2}\right)}$$
(3.11)

Adding the different signals:

$$total \approx A \frac{\Delta D}{T} \sum_{i=1}^{2n} e^{-j\omega_o k \left( t_{oi} + \frac{\Delta D}{2} \right)}, \quad \text{where } \sin\left( \pi k \frac{\Delta D}{T} \right) \approx \pi k \frac{\Delta D}{T}$$
$$= A \frac{\Delta D}{T} e^{-j\omega_o k \frac{\Delta D}{2}} \sum_{i=1}^{2n} e^{-j\omega_o k t_{oi}}$$
(3.12)

As for each signal starting at time  $t_{oi}$  there is a corresponding one starting at  $-t_{oi}$ , then

$$\therefore total = A \frac{\Delta D}{T} e^{-j\omega_o k \frac{\Delta D}{2}} \sum_{i=1}^{n} 2\cos(\omega_o kt_i) = \left[ 2A \frac{\Delta D}{T} \sum_{i=1}^{n} \cos(k\varphi_i) \right] e^{-j\omega_o k \frac{\Delta D}{2}}$$
(3.13)

Consequently, the second harmonic distortion is given by:

$$HD_{2} = \frac{\left\lfloor 2A\frac{\Delta D}{T}\sum_{i=1}^{n}\cos(2\varphi_{i})\right\rfloor}{2A\sum_{i=1}^{n}\cos(\varphi_{i})} = \frac{\Delta D}{T}\frac{\sum_{i=1}^{n}\cos(2\varphi_{i})}{\sum_{i=1}^{n}\cos(\varphi_{i})}$$
(3.14)

where T is the period of the output signal. Note that the passive-third-order LPF will improve the distortion by about 9.5 dB.

The second source of even harmonic distortion is the systematic mismatch in the output amplitudes resulted from the difference in the on-resistances of the CMOS inverter-transistors. Fig. 3.12 shows schematic diagram of the circuit including the on-resistance mismatch

Assuming a total number of input signals  $2n_{\phi}$  and part of them,  $n_h$ , is connected to  $V_{dd}$  through PMOS transistors while the rest are connected to ground through NMOS ones. The output voltage in such case is given by:

$$V_{out} = \frac{V_{dd}n_h}{n_h + (2n - n_h)\left(\frac{1 + x_p}{1 + x_n}\right)}, x_p = \frac{R_p}{R}, x_n = \frac{R_n}{R}$$
(3.15)

where R is the summing resistor,  $R_p$  and  $R_n$  are the on-resistances of the PMOS and NMOS transistors respectively.

In order to simplify the analysis and make use of the symmetry, the axis will be shifter by  $\pi/2$  as shown in Fig. 3.13. Consequently, the phases of the waveforms relative to the new reference will be given by  $\gamma_i = \phi_i - 0.5\pi$ . The Fourier coefficients of the output signal is given by:

$$C_{k} = \frac{1}{T_{p}} \int_{T_{p}} V_{out}(t) e^{-jk\omega_{o}t} dt.$$
(3.16)



Fig. 3.12 PMOS/NMOS on-resistance switches mismatch.

Since the output is symmetric around the new reference, the output amplitude is the same for the time intervals given by  $\left[\frac{-\gamma_i}{\omega_o}, \frac{-\gamma_{i+1}}{\omega_o}\right]$  and  $\left[\frac{\gamma_{i+1}}{\omega_o}, \frac{\gamma_i}{\omega_o}\right]$ . Consequently, Fourier coefficients can be written as:

$$C_{k} = \frac{1}{T_{p}} \sum_{i=1}^{2n} V_{out_{-}i(i+1)} \left[ \int_{\frac{-\gamma_{i}}{\omega_{o}}}^{\frac{-\gamma_{i+1}}{\omega_{o}}} e^{-jk\omega_{o}t} dt + \int_{\frac{\gamma_{i+1}}{\omega_{o}}}^{\frac{\gamma_{i}}{\omega_{o}}} e^{-jk\omega_{o}t} dt \right]$$

$$= \frac{1}{T_{p}} \cdot \frac{1}{-jk\omega_{o}} \sum_{i=1}^{2n} V_{out_{-}i(i+1)} \left[ e^{jk\gamma_{i+1}} - e^{jk\gamma_{i}} + e^{-jk\gamma_{i}} - e^{-jk\gamma_{i+1}} \right]$$

$$= \frac{1}{\pi k} \sum_{i=1}^{2n} V_{out_{-}i(i+1)} \left[ \sin(k\gamma_{i}) - \sin(k\gamma_{i+1}) \right]$$
(3.17)

where  $V_{out\_i(i+1)}$  is the output during the time intervals  $\left[\frac{-\gamma_i}{\omega_o}, \frac{-\gamma_{i+1}}{\omega_o}\right]$  and  $\left[\frac{\gamma_{i+1}}{\omega_o}, \frac{\gamma_i}{\omega_o}\right]$ . Consequently, the second harmonic distortion is given by:

$$HD_{2} = \frac{C_{2}}{C_{1}} = \frac{1}{2} \frac{\sum_{i=1}^{2n} V_{out_{-}i(i+1)} [\sin(2\gamma_{i}) - \sin(2\gamma_{i+1})]}{\sum_{i=1}^{2n} V_{out_{-}i(i+1)} [\sin(\gamma_{i}) - \sin(\gamma_{i+1})]}$$
(3.18)

where  $\gamma_i = \phi_i - 0.5\pi$  and  $V_{out_i(i+1)}$  is the output amplitude, given by (3.15), through the

time interval 
$$\left[\frac{\gamma_i}{\omega_o}, \frac{\gamma_{i+1}}{\omega_o}\right]$$



Fig. 3.13 Axis change from  $\phi$  to  $\gamma$ .

Fig. 3.14 shows the second harmonic distortion, in dB, versus the ratio between the resistances of the two transistors,  $R_p/R_n$ , and the ratio between the on-resistances of the NMOS and the summing resistance,  $R_n/R$ . Similar to the rising/falling times mismatch, this distortion will improve by about 9.5dB by the passive LPF. Assuming  $R_n/R$  of 0.02, the standard deviation of the mismatch between the onresistances of the NMOS and the PMOS transistors can be as high as 20% while attaining output HD<sub>2</sub> better than -70dB.

System level simulations indicate that the effect of the on-resistance mismatch will dominate over the rising/falling mismatch in case of large  $R_n/R$ . Increasing the summing resistor, R, is one way to decrease that distortion but it will increase the mismatch in the rising/falling times due to the increased transition time. Consequently, the optimum design is the one in which the contributions of the two effects are equal. Another way to improve the performance is to decrease the on-resistance of the transistors by using larger width transistors. Unfortunately, this improvement will be on the expense of the power consumption of the system.



Fig. 3.14 On-resistance effect of the NMOS/PMOS transistors switches on HD<sub>2</sub>. a) 3D plot. b) HD<sub>2</sub> Contour plot.

From the design limitation analysis we can conclude:

- 1- Harmonic rejection is a function of the delay mismatches of the different inverters. The smaller the timing mismatch, the samller the phase error between the different signals. Thus, targeting higher oscillation frequency,  $f_o$ , is expected to be tougher since the same timing mismatch will be translated as a larger phase error in the added signal.
- 2- Resistor areas, rather than the resistor values, are set according to the required THD and the matching requirements to achieve such THD.
- 3- On-resistance of the PMOS and the NMOS should be matched as possible in order to minimize the even order harmonics. A fully differential architecture can be used to avoid such problem; however, the area will be doubled and the power consumption will increase as well.

### **3.6 Measurement Results**

The proposed oscillator prototype was fabricated in UMC 0.13µm technology. Chip micrograph is shown in Fig. 3.15. Interdigitized and common centroid techniques were incorporated in the chip layout to minimize the local mismatch. Wide unit resistors of 20µm length and 10µm width were used to decrease the effect of process variations on the performance of the summer. In addition, guard rings were used to suppress substrate noise injected from other blocks on the same die. Since the dummy metal used for fulfilling the density requirements can significantly affect the matching of CMOS devices, manual metal filling is used to meet the density requirements without violating

the symmetry of the layout [23]. An external highly-linear buffer is used to drive a  $50\Omega$  spectrum analyzer. To ensure the linearity of the buffer, attenuation factor of 9.54dB is included in the PCB for the pseudo-differential and single-ended outputs.



Fig. 3.15 Chip micrograph and area budgeting.

Fig. 3.16 shows the spectrum of the differential output at 10MHz frequency. The proposed approach achieves a THD of -72dB while providing an output swing of 228mV<sub>pp</sub> using a 1.2V supply. As the circuit is completely digital, the output swing is supposed to be rail to rail at the summing point. Due to the attenuation imposed by the LPF, the pads, the parasitic capacitance of the PCB and the external buffer, the measured output swing dropped to 20% of the rail-to-rail value. Despite this large loss in the swing, the output amplitude to the supply voltage ratio,  $V_{pp}/V_{dd}$ , is 190mV/V which is the best in literature. The spectrum of the single ended version is shown in Fig. 3.17. Although its linearity degraded by the even harmonics, the circuit still shows good linearity of - 57.9dB. In experimental results, one point to note is that the step in the noise floor at

10MHz is due to the spectrum analyzer itself and is not generated from the circuit which is verified by disconnecting the circuit and plotting the noise floor of the analyzer alone.



Fig.3.16 Output spectrum of the pseudo-differential version of the oscillator at 10MHz (output amplitude in dBm versus frequency).

The oscillation frequency can be tuned by changing the supply voltage of the ring oscillator. Measurement results shows that the proposed oscillator covers the range from 4.35MHz-11MHz by sweeping the supply voltage of the ring oscillator, not the entire circuit, from 0.75V to 1.3V while achieving competitive performance. The supply of the control circuits and the driving inverters can be tuned as well to decrease the power consumption but it has a lower limit to keep enough overdrive voltage to control the transmission gates. Fig. 3.18 shows HD<sub>2</sub>, HD<sub>3</sub> and the THD of the differential output

signal versus frequency. It is clear that  $HD_2$  dominates as the supply of the ring oscillator decreases because the overdrive voltage of the transmission gates, which are controlled by the ring oscillator, decreases. As a consequence, the timing-mismatches introducing the even harmonics increase.



Fig. 3.17 Output spectrum of the single-ended version of the oscillator at 10MHz (output amplitude in dBm versus frequency).

Table 3.1 compares the proposed prototype with the state-of-the-art oscillators. The proposed oscillator shows better THD compared to the best in the literature while consuming 3.37mA from 1.2V supply. The power consumption includes the power consumed by the ring oscillator, the digital circuits and the driving inverters. Simulations indicate that a 10% process variations in the summing resistor values results in less than 1% variations in the power consumption. Although the power consumption for designing conventional highly-linear analog oscillators increases while migrating from old technology to newer one, the proposed design shows outstanding power-consumption performance in UDSM technologies. Compared to small size technology designs reported in literature, reference [42], the proposed design saves 80% of the power while achieving better THD. In addition, because the performance of the proposed architecture depends solely on timing accuracy rather than voltage level accuracy, this technique can potentially exhibit better THD in nanometric technologies while consuming less power. Moreover, process and temperature variations will not affect the THD as the performance depends on the matching of the components rather on its absolute values.



Fig. 3.18 HD<sub>2</sub>, HD<sub>3</sub> and THD of the differential output of the oscillator versus the output frequency.

A figure of merit, FOM, is proposed to include the different design factors, other than the THD, to have fair comparison between the different designs. The FOM is given by:

$$FOM = \frac{f_o(MHz).(V_{pp}/V_{dd})}{\text{Area}(mm^2).\text{power}(mW)}$$
(3.19)

One point to consider is that implementing the same oscillator design with the same THD in smaller technology ends up with

- 1- More power consumption since the linearity of the filter will degrade and more power will be pumped in the filter to attain the performance obtained in an older technology node.
- 2- Smaller output swing due to the use of a smaller supply voltage.

Thus, a different FOM will change if the technology is not represented in it. In order to overcome this problem, the proposed FOM includes the area as a representation for the cost and the technology used in the implementation as well. This way FOM provides fair comparison between the oscillators implemented in different technologies. Table 3.1 shows that the proposed technique shows an outstanding FOM along with very low THD compared to the state-of-the-art.

# 3.7 Summary

A highly linear oscillator based on a harmonic cancellation technique is proposed. The oscillator shows 17.2dB improvement in the THD, 9.5 dB higher output level while consuming 80% less power when compared to the latest small size technology designs, [42]. In addition, the performance depends solely on the timing resolution of the technology instead of voltage resolution which makes it, to the best knowledge of the authors, the first time-mode-based oscillator in the literature. Moreover, the performance of the proposed approach improves with device timing resolution making it suitable for nanometric technologies. Finally, this technique's emphasis on digital blocks allows for faster migration to new nodes and facilitates design automation.

|                                                  | This work          |                        |                     |                       |                  |                     |                    |
|--------------------------------------------------|--------------------|------------------------|---------------------|-----------------------|------------------|---------------------|--------------------|
|                                                  | Single<br>ended    | Pseudo<br>differential | [34]                | [42]                  | [47]             | [48]                | [49]               |
| $f_{out}$ (MHz)                                  | 10                 | 10                     | 40                  | 10                    | 18.7             | 25                  | 1.56               |
| Power<br>consumption                             | 3.34mW             | 4.04mW                 | 36.5mW              | 20.1mW                | 174mW            | 1.58mW              | 8mW                |
| V <sub>pp</sub> /V <sub>dd</sub> ratio<br>(mV/V) | 95.3               | 190.1                  |                     | 24.1                  | 68               | 14.3                | 29.5               |
| Area                                             | 0.1mm <sup>2</sup> | 0.186mm <sup>2</sup>   | 0.83mm <sup>2</sup> | 0.2mm <sup>2</sup>    | 1.47mm           | 0.63mm <sup>2</sup> | 1.4mm <sup>2</sup> |
| Technology<br>(CMOS)                             | 0.13µm             | 0.13µm                 | 0.8µm*              | 0.35µm                | 0.35µm           | 0.8µm               | 0.5µm              |
| Supply voltage                                   | 1.2V               | 1.2V                   |                     | 3.3V                  | 3.3V             | 2V                  | 2.7V               |
| FOM<br>(normalized)                              | 659.1              | 615.9                  |                     | 14.61                 | 1.21             | 87.4                | 1                  |
| THD                                              | -57.9 dB           | -72 dB                 | 65.4dB**            | -54.8 dB              | -55dB            | -43.6 dB            | -42.1dB            |
| Performance<br>bottle neck                       | Timing<br>accuracy | Timing accuracy        |                     | Capacitor<br>matching | Voltage accuracy |                     |                    |

 Table 3.1 Comparison of performance of the proposed approach with state-of-the-art.

 This work

\* BiCMOS process is used but only CMOS devices were incorporated in the design.

\*\*SFDR measured at  $f_o = 1$ MHz.

#### **CHAPTER IV**

# A SPUR-FREQUENCY BOOSTING PLL FOR LOW SPUR FREQUENCY SYNTHESIZER\*

# **4.1 Introduction**

Phase locked loops (PLLs) play important role in any communication system. In order to transmit a signal on certain channel frequency, a PLL must be used to continuously tune the oscillator frequency to the targeted channel and to compensate temperature variations, process variations, aging and low frequency noise of the oscillator. Fig. 4.1 shows a basic block diagram of a conventional PLL.



Fig. 4.1 Block diagram of a conventional PLL.

<sup>\*</sup>Reprinted, with permission, M. Elsayed, M. Abdul-Latif, and Edgar Sánchez-Sinencio, "A Spur-Frequency-Boosting PLL with a -74dBc Reference-Spur Rejection in 90nm Digital CMOS", in *IEEE RFIC Symp. Dig.*, June 2011, pp. 405-408. © 2011 IEEE.

The key block in a PLL is the oscillator which generates the carrier frequency for the communication system. In order to overcome any phase/frequency deviation of the oscillator due to the factors mentioned above, the oscillation frequency is compared to a reference frequency using a phase-frequency detector (PFD) which detects the phase difference between the oscillator and the reference signal. The reference signal is usually generated using a high-Q crystal oscillator whose frequency stability is very high compared to the silicon-based oscillators. The error signal, the PFD output, is low-pass-filtered to suppress the high frequency components that disturb the oscillator and then delivered to the control input of the oscillator. Since in most communication systems the transceiver is required to be programmable to operate at different channels, a frequency divider is usually used in the feedback path whose output is compared to the reference frequency which corresponds to the channel spacing in integer-N synthesizers.

One important modification for the PLL is the use of charge pumps in the feedforward path to control the control input of the voltage controlled oscillator (VCO). Fig. 4.2 shows a block diagram of the charge-pump based PLL (CP-PLL).



Fig. 4.2 Block diagram of the charge-pump-based PLL (CP-PLL).

The main advantage of the charge pump PLL compared to the conventional one is its immunity to supply variation and the digital friendly nature of the block. The PFD provides two output square wave signals, UP and DN, which correspond to the error sign and its amount. If the feedback signal is lagging the reference input, the UP output generates a square wave signal whose width is proportional to the phase difference between the reference and the feedback signal. On the other hand, if the feedback signal is leading the reference input, the DN output of the PFD generates a square wave signal whose duty cycle is proportional to the phase difference. The UP and DN signals control the current flow of a PMOS and NMOS current sources respectively to the output capacitance. The voltage across the output capacitor corresponds to the tuning voltage of the VCO.

Since the PLL is a feedback system, it needs to be mathematically modeled to check the stability. PLL is a very nonlinear system; however, it can be linearized around

the operating point which corresponds to the steady state or the lock state. Large signals dynamics and stability can be checked using non-linear stability analysis like Liapunov stability but it is more accurate to check it with simulations. In the following section, the small signal model of the different blocks will be derived to get insight on the tradeoffs of the PLL design and reach a general design procedure for PLLs.

# 4.2 PLL Dynamics

The VCO output is related to its control voltage by the following relation:

$$V_{out} = A\cos\left(\omega_{o}t + K_{vco}\int_{0}^{t}V_{c}(t)dt\right)$$

Consequently, the change of the phase,  $\Delta \phi$ , due to the change in the control input is given by:

$$\Delta \phi = K_{vco} \int_0^t V_c(t) dt$$

Applying Laplace transform to the two sides, we can get the transfer function of the VCO:

$$\frac{\Delta\phi(s)}{V_c(s)} = \frac{K_{vco}}{s}$$

Since the output frequency of the VCO will be divided by *N* before being compared with the reference input, the phase of the VCO output will also be divided by *N*. With regard to the PFD and charge pump, they convert the phase difference between the feedback signal and the reference input to an output current pumped in the loop filter (the cap  $C_1$ ). Consequently, their gain is given by:

$$\frac{I_{out}}{\Delta \phi_{in}} = K_{PD} I_{CP}$$
, where  $K_{PD} = \frac{1}{2\pi}$ 

Fig. 4.3 shows the block diagram of the CP-PLL model.



Fig. 4.3 Block diagram of the CP-PLL model.

The open loop gain (feed-forward gain) in that case is given by:

$$G(s) = K_{PD}I_{CP}.F(s).\frac{K_{VCO}}{s}$$

where F(s) is the loop filter trans-impedance (V/A).

On the other hand, the loop gain is given by

$$Loop \, Gain = \frac{G(s)}{N} = \frac{K_{PD}I_{CP}K_{VCO}}{N} \cdot \frac{F(s)}{s}$$

Assuming a single-capacitor loop-filter, the loop gain will be given by:

$$Loop \, Gain = \frac{K_{PD} \, I_{CP} \, K_{VCO}}{N \, C} \cdot \frac{1}{s^2}$$

The bode diagram of such loop gain corresponds to a double lossless integrators with -40dB/dec magnitude response along with  $-180^{\circ}$  phase crossing the X-axis at

$$\omega_n = \sqrt{\frac{K_{PD} I_{CP} K_{VCO}}{NC}}$$
 which corresponds to the natural frequency of the system. Such

magnitude response and phase response will end up with an oscillatory system. This is clear from the closed-loop transfer function of the system given by:

$$C.L.T.F. = \frac{G(s)}{1+G(s)/N} = N \cdot \frac{\omega_n^2}{\left(s^2 + \omega_n^2\right)}$$

which is a transfer function of an oscillator running at  $\omega_n$ . Intuitively, the reason behind that is having two lossless integrators in the system, which induces a 180° phase in the loop ending with an oscillatory response. To overcome this problem, it is required to replace one of the two lossless integrators by a lossy one. Since the VCO naturally acts as an integrator for the phase, it cannot be made lossy. On the other hand, the loop filter capacitor can be converted to a lossy integrator by adding a series resistor with the capacitor.



Fig. 4.4 Block diagram of the CP-PLL with the modified loop filter.

From the control theory perspective, introducing that resistor corresponds to adding a zero to the loop at  $\omega_z = 1/RC$ . Fig. 4.4 shows the block diagram of the PLL with the modified loop filter.

The transfer function of the filter, the impedance, is given by:

$$Z_{in} = \frac{\left(1 + sRC\right)}{sC}$$

Consequently, the loop gain is given by:

$$Loop \, Gain = \frac{K_{PD} \, I_{CP} \, K_{VCO}}{N \, C} \cdot \frac{(1 + sCR)}{s^2}$$

Consequently, the closed-loop transfer function is given by:

$$C.L.T.F. = N. \frac{\frac{K_{PD} I_{CP} K_{VCO}}{NC} . (1 + sCR)}{s^2 + \frac{K_{PD} I_{CP} K_{VCO}}{NC} . (1 + sCR)}$$
$$= N. \frac{\frac{K_{PD} I_{CP} K_{VCO}}{NC} . (1 + sCR)}{s^2 + \frac{s}{\omega_z} . \frac{K_{PD} I_{CP} K_{VCO}}{NC} + \frac{K_{PD} I_{CP} K_{VCO}}{NC}}, \ \omega_z = \frac{1}{RC}$$

Since the system is a second order one, its transfer function can be expressed in terms of its natural frequency and the damping factor as:

$$C.L.T.F. = N. \frac{\omega_n^2 \left(1 + \frac{s}{\omega_z}\right)}{s^2 + \frac{\omega_n^2}{\omega_z}s + \omega_n^2} = N. \frac{\omega_n^2 \left(1 + \frac{s}{\omega_z}\right)}{s^2 + 2\zeta\omega_n s + \omega_n^2},$$
  
Where  $\omega_n = \sqrt{\frac{K_{PD} I_{CP} K_{VCO}}{NC}}$ , and  $\zeta = \frac{\omega_n}{2\omega_z}$ 

For second order systems, the natural frequency is an indicative parameter for the unity gain bandwidth of the system while the damping factor represents the transient behavior. As mentioned before, the natural frequency equal to the unity-gain frequency if no zero exists in the loop. However, the system was oscillatory. In the current case, the damping factor prevents the system from sustaining the oscillation, as long as  $\zeta > 0$ , but the transient response can still suffer decaying oscillation depending on the value of  $\zeta$ . Consequently, the settling time and the phase margins are set through the damping factor,  $\zeta$ , while the unity-gain bandwidth is set through the natural frequency  $\omega_n$ . Generally speaking, top level system design provides the required unity-gain bandwidth,  $\omega_{GBW}$ , based on the reference frequency and the spur rejection requirements. In addition, a phase margin, *PM*, is specified to secure an acceptable settling time. The following analysis derives the relation between these four parameters (*PM*,  $\omega_{GBW}$ ,  $\zeta$ ,  $\omega_n$ ). Since the loop gain is given by:

$$Loop \, Gain = \frac{\omega_n^2}{s^2} \cdot \left(1 + \frac{s}{\omega_z}\right)$$

At the unity gain frequency the gain will be 1

$$1 = \frac{\omega_n^2}{-\omega_{GBW}^2} \cdot \sqrt{\left(1 + \frac{\omega_{GBW}^2}{\omega_z^2}\right)}$$

If 
$$\frac{\omega_{GBW}^2}{\omega_z^2} >> 1$$
, then  $\omega_n^2 = \omega_{GBW} \omega_z$ ; if not, then  
 $\omega_{GBW}^4 - \frac{\omega_n^4}{\omega_z^2} \omega_{GBW}^2 - \omega_n^4 = 0$   
 $\therefore \omega_{GBW}^2 = \omega_n^2 \left[ \frac{1}{2} \frac{\omega_n^2}{\omega_z^2} \pm \sqrt{\frac{\omega_n^4}{4\omega_z^4} + 1} \right]$ 

Since 
$$\zeta = \frac{\omega_n}{2\omega_z}$$
,  
 $\therefore \omega_{GBW}^2 = \omega_n^2 \left[ 2\zeta^2 \pm \sqrt{4\zeta^4 + 1} \right]$ 

Since  $\omega_{GBW}$  must be > 0

$$\therefore \omega_{GBW} = \omega_n \cdot \sqrt{2\zeta^2 + \sqrt{4\zeta^4 + 1}}$$

It is clear that if  $\zeta$  equals zero, then  $\omega_{GBW}$  will equal to  $\omega_n$  which conforms to the case of two lossless integrators PLL. On the other hand, since  $\zeta$  is larger than zero, the unity gain bandwidth increases indicating that the zero is located before the natural frequency of the system.

To derive the phase margin of the loop, the phase of the loop gain will be calculated at the unity-gain frequency calculated above.

$$PM = 180 + \left[ -180 + \tan^{-1} \left( \frac{\omega_{GBW}}{\omega_z} \right) \right]$$

Since  $\omega_{GBW} = \omega_n \cdot \sqrt{2\zeta^2 + \sqrt{4\zeta^4 + 1}}$ 

$$PM = \tan^{-1} \left( \frac{\omega_n \sqrt{2\zeta^2 + \sqrt{4\zeta^4 + 1}}}{\omega_z} \right)$$
$$PM = \tan^{-1} \left( 2\zeta \cdot \sqrt{2\zeta^2 + \sqrt{4\zeta^4 + 1}} \right)$$

The previous equation clearly indicates that the phase margin and the damping factor of the PLL are directly related. For a given phase margin, the damping factor can be calculated using the equation above. As a rough estimate, the phase margin can be approximated as  $100^{*}\zeta$  for  $\zeta < 0.65$ . Since  $\zeta$  depends on the location of the zero, its value can be set through the resistor which doesn't affect the natural frequency  $\omega_n$ . On the other hand, since  $\omega_{GBW} = \omega_n \sqrt{2\zeta^2 + \sqrt{4\zeta^4 + 1}}$ , the unity gain frequency of the PLL should be set by setting the natural frequency which corresponds to the DC gain of the system. In other words, changing the value of the resistor changes the phase plot of the loop gain such that the phase at the targeted unity-gain frequency corresponds to the required *PM*. On the other hand, the loop gain,  $\omega_n^2$ , shifts the magnitude plot up or down such that the gain at the targeted unity-gain frequency equals to 0dB.

One important note here is that the unity-gain frequency of the loop transfer function does not correspond to the -3dB of the closed loop one due to having a zero in the system. The -3dB frequency of the PLL can be calculated as follows

$$C.L.T.F. = N. \frac{\left(1 + \frac{2\zeta}{\omega_n}s\right)}{\frac{s^2}{\omega_n^2} + \frac{2\zeta}{\omega_n}s + 1},$$

$$\therefore \frac{1}{2} = \frac{\left(1 + 4\zeta^2 \frac{\omega_{3dB}^2}{\omega_n^2}\right)}{\left(1 - \frac{\omega_{3dB}^2}{\omega_n^2}\right)^2 + 4\zeta^2 \frac{\omega_{3dB}^2}{\omega_n^2}}$$
$$\therefore \omega_{3dB} = \omega_n \sqrt{1 + 2\zeta^2 + \sqrt{4\zeta^4 + 4\zeta^2 + 2}}$$

One remaining problem in the CP-PLL of Fig. 4.4 is the ripples on the control line of the VCO. The instantaneous current pulse of the CP being injected in the RC filter induces instantaneous drop on the resistor leading to voltage ripples on the control line. In order to alleviate such effect, an additional smoothing capacitor,  $C_2$ , is added in parallel to the RC network as shown in Fig. 4.5. Such capacitor smoothes the ripples on the control line by suppressing its fundamental component which is located at  $f_{ref}$ . To maintain the stability of the loop, the additional pole, due to  $C_2$ , should be located at a frequency higher than the unity gain frequency of the PLL.

The new loop gain is given by:

$$Loop \ Gain = \frac{K_{PD} I_{CP} K_{VCO}}{N(C_1 + C_2)} \cdot \frac{1}{s^2} \cdot \frac{(1 + s/\omega_z)}{(1 + s/\omega_p)}$$
  
Where  $\omega_z = \frac{1}{RC_1}$  and  $\omega_p = \frac{1}{R \cdot (C_1 / C_2)} = \frac{1}{R} \cdot \frac{C_1 + C_2}{C_1 \cdot C_2}$ 

Consequently, the natural frequency of the new system is given by:

$$\omega_n = \sqrt{\frac{K_{PD} I_{CP} K_{VCO}}{N(C_1 + C_2)}}$$



Fig. 4.5 Block diagram of the CP-PLL with the smoothing capacitor  $C_2$ .

By introducing  $C_2$ , the phase plot at very high and very low frequencies will be  $180^{\circ}$  while in between it will have a hump due to the effect of the zero. The peak value of the hump will depend on the distance between zero and the pole. In order to have larger peak, which corresponds to a better phase margin, the pole should be pushed to high frequency. On the other hand, ripple suppression favors a low frequency pole. The phase margin is given by:

$$PM = 180 + \left[ -180 + \tan^{-1} \left( \frac{\omega_{GBW}}{\omega_z} \right) - \tan^{-1} \left( \frac{\omega_{GBW}}{\omega_p} \right) \right]$$
$$PM = \tan^{-1} \left( \frac{\omega_{GBW}}{\omega_z} \right) - \tan^{-1} \left( \frac{\omega_{GBW}}{\omega_p} \right)$$

In order to maximize the phase margin, the unity-gain frequency,  $\omega_{GBW}$ , should be set at the peak of the *PM* equation. The frequency of the peak can be obtained by differentiating the *PM* equation with respect to  $\omega_{GBW}$ .

$$\frac{d(PM)}{d\omega_{GBW}} = \frac{1}{1 + \left(\frac{\omega_{GBW}}{\omega_z}\right)^2} \cdot \frac{1}{\omega_z} - \frac{1}{1 + \left(\frac{\omega_{GBW}}{\omega_p}\right)^2} \cdot \frac{1}{\omega_p} = 0$$
$$\therefore \omega_{GBW} = \sqrt{\omega_z \omega_p}$$

In other words, the optimum design for the PLL is to have the unity-gain frequency to be the geometrical mean of the pole and zero frequencies. The phase margin in that case will be given by:

$$PM = \tan^{-1}\left(\sqrt{\frac{\omega_p}{\omega_z}}\right) - \tan^{-1}\left(\sqrt{\frac{\omega_z}{\omega_p}}\right)$$
$$PM = \tan^{-1}\left(X\right) - \tan^{-1}\left(\frac{1}{X}\right), X = \sqrt{\frac{\omega_p}{\omega_z}}$$

The above equation indicates that in the optimally designed PLL, in which  $\omega_{GBW} = \sqrt{\omega_z \omega_p}$ , the phase margin will solely depend on the ratio between the pole and the zero frequencies. Consequently, for a given *PM*, the ratio between the zero and the pole frequencies can be calculated by incorporating the trigonometric identity:

$$\tan(a\pm b) = \frac{\tan(a)\pm\tan(b)}{1\mp\tan(a)\tan(b)},$$
  
$$\therefore \tan(PM) = \frac{1}{2}\left(X - \frac{1}{X}\right)$$
  
$$\therefore X = \sqrt{\frac{\omega_p}{\omega_z}} = \tan(PM) + \sec(PM)$$
  
$$\therefore \omega_p = \omega_z \cdot (\tan(PM) + \sec(PM))^2$$

Since the unity-gain frequency of the PLL,  $\omega_{GBW}$ , is given by the geometrical mean of the zero and the pole frequencies,  $\omega_p$  and  $\omega_z$  are given by:

$$\omega_{p} = \omega_{GBW} \cdot \left( \tan \left( PM \right) + \sec \left( PM \right) \right)$$
$$\omega_{z} = \frac{\omega_{GBW}}{\left( \tan \left( PM \right) + \sec \left( PM \right) \right)}$$

Consequently, for a given PLL specifications, *PM* and  $\omega_{GBW}$ , the previous equation can be used to optimally locate the zero and the pole. Finally, the natural frequency, or the gain of the loop, can be obtained employing the fact that the gain is 0dB at the unity gain frequency.

$$Loop \, Gain = 1 = \left| \frac{K_{PD} I_{CP} K_{VCO}}{N(C_1 + C_2)} \cdot \frac{1}{s^2} \cdot \frac{(1 + s/\omega_z)}{(1 + s/\omega_p)} \right|_{\omega = \omega_{GBW}}$$
$$1 = \frac{\omega_n^2}{\omega_{GBW}^2} \cdot \sqrt{\frac{(1 + \omega_{GBW}^2/\omega_z^2)}{(1 + \omega_{GBW}^2/\omega_p^2)}} = \frac{\omega_n^2}{\omega_{GBW}^2} \cdot \sqrt{\frac{\omega_p}{\omega_z}}$$

$$\therefore \omega_n = \frac{\omega_{GBW}}{\sqrt{\left(\tan\left(PM\right) + \sec\left(PM\right)\right)}}$$
  
where  $\omega_n = \sqrt{\frac{K_{PD} I_{CP} K_{VCO}}{N(C_1 + C_2)}}$ 

Fig. 4.6 shows a Bode plot for the PLL. From the previous derivation and Fig. 4.6 we can conclude that  $\omega_{GBW}$  is the geometrical mean of  $\omega_z$  and  $\omega_p$  while  $\omega_n$  is the geometrical mean of  $\omega_z$  and  $\omega_{GBW}$ . In addition, the proportionality factor between the different frequencies is a function of the phase margin  $(\tan(PM) + \sec(PM))$ . Finally, the ratio between the reference frequency,  $f_{ref}$ , and  $\omega_p$  controls the spur suppression which is discussed in the next section.

One important point to mention here is that all design equations derived before can be intuitively obtained from Fig. 4.6. For example, since the phase due to the zero and the pole change by  $\pm 45^{\circ}/dec$  and it is required to have the unity gain frequency at the peak of the total phase plot, it is expected that the peak will be at the mid point of  $\omega_z$ and  $\omega_p$  on the log scale. In other words,

$$\log(\omega_{GBW}) = \frac{1}{2} \left( \log(\omega_z) + \log(\omega_p) \right) = \frac{1}{2} \log(\omega_p \omega_z) = \log(\sqrt{\omega_p \omega_z})$$
  
$$\therefore \omega_{GBW} = \sqrt{\omega_p \omega_z}$$

For the natural frequency, the slope of the magnitude plot is -40dB/dec for the triangle that is constructed of  $\omega_z$  and  $\omega_n$  while it is -20dB/dec for the triangle that is constructed of  $\omega_z$  and  $\omega_{GBW}$ . Consequently, the natural frequency is expected to be in the mid point of  $\omega_z$  and  $\omega_{GBW}$  which, for a log scale, corresponds to their geometrical mean.

$$\omega_n = \sqrt{\omega_z \omega_{GBW}}$$

Finally, since the phase contributions of the pole and the zero are identical at  $\omega_{GBW}$ , the total phase contribution will be twice the contribution of each of them. In other words, the total phase will be  $90 + 2 \tan^{-1} \left( \frac{\omega_{GBW}}{\omega_p} \right)$ . Thus the phase margin is given

by

$$PM = 180 - \left(90 + 2\tan^{-1}\left(\frac{\omega_{GBW}}{\omega_p}\right)\right)$$
  
$$\therefore \omega_p = \omega_{GBW} \tan\left(\frac{90 + PM}{2}\right)$$

which is exactly similar to the expression derived before as

$$\tan\left(\frac{90+PM}{2}\right) = \tan\left(PM\right) + \sec\left(PM\right)$$



Fig. 4.6 Bode plot of a PLL.

In conclusion, for a given PM and  $\omega_{\rm GBW}$ , then the design procedure will be as follows:

1- Calculate the frequency of the loop filter zero: 
$$\omega_z = \frac{\omega_{GBW}}{\tan\left(\frac{PM+90}{2}\right)}$$
.

2- Calculate the frequency of the loop filter pole:  $\omega_p = \omega_{GBW} \cdot \tan\left(\frac{PM + 90}{2}\right)$ .

3- Calculate the natural frequency of the PLL: 
$$\omega_n = \frac{\omega_{GBW}}{\sqrt{\tan\left(\frac{PM+90}{2}\right)^2}}$$

4- If the system is approximated as a second order one, the damping factor given by

$$\zeta = \frac{\omega_n}{2\omega_z}$$
 can be approximated as  $\zeta = \frac{1}{2}\sqrt{\tan\left(\frac{PM+90}{2}\right)}$ 

### **4.3 PLL Reference Spurs**

As mentioned before, the VCO in a PLL is expected to be controlled through a DC signal that corresponds to the phase error between the feedback signal and the reference one. Due to the non-idealities in the loop, the voltage controlling the VCO suffers from ripples at the reference frequency which translates to the output as spurs around the VCO frequency. There are two main sources for output spurs. The first one is the non-idealities in the PLL blocks and especially the charge pump. The second source is the reference signal leakage through the substrate and the supply lines to the VCO. Although the reference signal leakage can be alleviated by careful layout and guard rings, same techniques are not that beneficial with the CP non-idealities. Fig. 4.7 shows a simple transistor level implementation of the charge pump.

Ideally, the input to the charge pump at the locking state should be two pulses of equal width. The two pulses, UP and DN, turn on transistors  $M_3$  and  $M_4$  and the current flows from M2 to M5 without pumping any current in the output. Consequently, the

charge pump output voltage, the VCO control line, will not change. Using the conservation of charges:

$$I_{UP}\Delta t_{UP} = I_{DN}\Delta t_{DN}$$

Where  $\Delta t_{UP}$  and  $\Delta t_{DN}$  are the width of the UP and DN pulses respectively.



Fig. 4.7 Transistor level implementation of the charge pump.

Due to the mismatches in the charge pump, the net current output of the charge pump will not be zero ending up with a change in the control line. To compensate such mismatch effect, the loop stabilizes at locking such that the pulse widths of the UP and DN signals compensate the current mismatch to maintain a zero output current. Fig. 4.8 shows a timing diagram of the UP and DN currents and the current ripples injected in the loop filter. As mentioned before, the mismatch in the currents is compensated by changing the pulse widths such that the average of the output current is zero. In other words, the DC component, the area under the total current curve, is zero. Although such mechanism eradicates the DC current injected in the filter, it induces disturbance on the control line with frequency of  $f_{ref}$  and its harmonics. Such ripples translate to the output as reference spurs. Since the fundamental component is the one that has the highest power, the higher order spurs are not usually important. In addition, since the PLL is usually used in communication systems and transceivers, it is expected to have an attenuated spectrum at the far spurs more that the fundamental one.



Fig. 4.8 Timing diagram of the current mismatches in CP-PLL.

There are many sources for non-idealities in the CP but can be classified into two categories: non-idealities attributed to the current sources and non-idealities attributed to the switches. The main contributor to output spur is the mismatch between the NMOS and the PMOS current sources  $M_2$  and  $M_5$  on Fig. 4.7. Since the output voltage of the charge pump depends on the channel selection, the  $V_{DS}$  of the two current sources will also depend on the channel selection leading to a channel-selection-dependent current

mismatch. In other words, output-resistance mismatch between the current sources leads to a mismatch in their currents. In addition, when the switches are turned off, the parasitic capacitors at the drain terminals of the current sources are connected to VDD and GND. Once the switches turn on, the two capacitors share the charges of the loop filter capacitor leading to an instantaneous disturbance of the output voltage. For the switches themselves, the mismatch between their ON resistances, clock feed-through and charge injection induce ripples on the VCO control line. Since the spur starts as a current pulse, in the following analysis we will translate it to voltage signal,  $V_{cl}$ , on the control line then translate the voltage signal to output spurs,  $A_{sp}$ , and then address the different factors that affect the spur suppression in PLLs. Denoting the fundamental component of the current pulse shown in Fig. 4.8  $i_{cp}$ , the control line voltage will be given by:

$$\begin{aligned} V_{cl} &= i_{CP} \left| \frac{1 + sC_1R}{s^2 C_1 C_2 R + s(C_1 + C_2)} \right|_{s = \omega_{ref}} \\ &= i_{CP} \left| \frac{1}{s(C_1 + C_2)} \frac{1 + s/\omega_z}{1 + s/\omega_p} \right| \\ &= i_{CP} \frac{1}{\omega_{ref} (C_1 + C_2)} \sqrt{\frac{1 + (\omega_{ref} / \omega_z)^2}{1 + (\omega_{ref} / \omega_p)^2}} \end{aligned}$$

Since  $(\omega_{ref} / \omega_p)^2 >> 1$  and  $C_2 << C_1$  then

$$\begin{split} V_{cl} &\approx i_{CP} \, \frac{1}{\omega_{ref} \left(C_1 + C_2\right)} \frac{\omega_p}{\omega_z} \\ &\approx i_{CP} \, \frac{1}{\omega_{ref} \left(C_1 + C_2\right)} \frac{\omega_p}{\frac{1}{RC_1}} \\ &\approx i_{CP} R. \frac{\omega_p}{\omega_{ref}} \end{split}$$

Note that this approximate result can be viewed as if the current pulse flows into the RC<sub>1</sub> branch which will create a voltage drop across the resistor given by  $i_{CP}R$ . Due to the filtering characteristics of the loop filter, shown in Fig. 4.9, this voltage disturbance will suffer attenuation equal to  $\frac{\omega_p}{\omega_p}$ 

suffer attenuation equal to  $\frac{\omega_{\scriptscriptstyle p}}{\omega_{\scriptscriptstyle ref}}$  .



Fig. 4.9 Filtering characteristics of the loop filter.

Applying the control line voltage,  $V_{cl}$ , to the oscillator:

$$V_{out} = A_{VCO} \cos\left(\omega_o t + K_{vco} \int_0^t V_{cl} \cos\left(\omega_{ref} t\right) dt\right)$$

$$= A_{VCO} \cos\left(\omega_o t + \frac{V_{cl} K_{vco}}{\omega_{ref}} \sin(\omega_{ref} t)\right)$$
  

$$\approx A_{VCO} \cos(\omega_o t) - A_{VCO} \frac{V_{cl} K_{vco}}{\omega_{ref}} \sin(\omega_{ref} t) \sin(\omega_o t)$$
  

$$\approx A_{VCO} \cos(\omega_o t) - A_{VCO} \frac{V_{cl} K_{vco}}{2\omega_{ref}} \cos(\omega_o - \omega_{ref}) t + A_{VCO} \frac{V_{cl} K_{vco}}{2\omega_{ref}} \cos(\omega_o + \omega_{ref}) t$$

Hence, the control line voltage disturbance is translated as output spur with a gain of

$$A_{SP\_VCO} = A_{VCO} \frac{K_{vco}}{2\omega_{ref}}$$

Consequently, the fundamental component of the current mismatch is translated to the VCO output as:

$$A_{SP\_total} = A_{VCO}.i_{CP}R.\frac{K_{vco}}{2\omega_{ref}}\frac{\omega_p}{\omega_{ref}}$$
$$= A_{VCO}.i_{CP}R.\frac{K_{vco}}{2\omega_{ref}}.\frac{\omega_{GBW}}{\omega_{ref}}.\frac{\omega_p}{\omega_{GBW}}$$

This in dB will be given by:

$$A_{SP\_total}\Big|_{dBc} = 20\log(i_{CP}R) + 20\log\left(\frac{\omega_p}{\omega_{GBW}}\right) + 20\log\left(\frac{K_{vco}}{2\omega_{ref}}\right) + 20\log\left(\frac{\omega_{GBW}}{\omega_{ref}}\right)$$

From the above equation we can notice that the main factors that affect spur suppression are:

- 1- The ratio between the unity-gain bandwidth of the PLL  $\omega_{GBW}$  and the reference frequency  $\omega_{ref}$ . The smaller the ratio, the better the suppression of the spurs.
- 2- The ratio between the VCO gain,  $K_{VCO}$ , and the reference frequency  $\omega_{ref}$ . The smaller the ratio, the better the suppression of the spurs.

3- The ratio between the pole frequency  $\omega_p$  and the unity gain bandwidth of the PLL  $\omega_{GBW}$ . As derived before, this ratio, for optimally designed PLL is solely controlled by the phase margin of the loop.

$$\frac{\omega_p}{\omega_{GBW}} = \tan\left(\frac{PM + 90}{2}\right)$$

4- The value of the loop filter resistor which is inversely proportional to the loopfilter capacitors  $C_1$  and  $C_2$  for a fixed filter characteristics.

The above equation is very important when comparing different spur reduction techniques. In most low-reference-spur PLLs reported in literature small  $\frac{K_{VCO}}{\omega_{ref}}$  [50-52]

and/or small  $\frac{\omega_{\rm GBW}}{\omega_{\rm ref}}$  [53-54] ratios are used for better spur suppression. However,

decreasing these ratios degrades the settling time of the PLL, increases the silicon area and/or reduces the tuning range. In order to have a fair comparison of the different spur reduction techniques, the different PLLs should be normalized with respect to these two

ratios,  $\frac{K_{VCO}}{\omega_{ref}}$  and  $\frac{\omega_{GBW}}{\omega_{ref}}$ . In other words, the following FOM can be used:

$$FOM = A_{Spurs}\Big|_{dBc} - 20\log\left(\frac{K_{vco}}{2\omega_{ref}}\right) - 20\log\left(\frac{\omega_{GBW}}{\omega_{ref}}\right) - 20\log\left(\frac{\omega_{p}}{\omega_{GBW}}\right)$$

The smaller the FOM, the better the performance. Note that this figure of merit corresponds to the voltage ripples on the control line,  $20\log(i_{CP}R)$ .

### **4.4 Low Spurs PLLs**

Many techniques have been developed in order to minimize the spur effects in PLLs. Some of these techniques employ gear shifting mechanisms that change the loop dynamics before and after the locking. On the other hand, some techniques minimize the spurs by employing different techniques of matching the CP currents. In this section a quick survey of the different techniques along with their advantages and disadvantages are presented.

#### 4.4.1 Gear-Shifting PLL

This type of PLL achieves both fast settling time and low spurs by changing the unity-gain frequency of the loop after locking. Basically, a lock detector detects if the loop went out-of-lock then programs the loop filter passives to increase the unity-gain bandwidth of the PLL. Once the loop is locked, the lock detector re-programs the passives such that the bandwidth is smaller and, as a consequence, the spurs are suppressed. Although fast settling time is achieved along with low spurs level, the VCO noise suppression is degraded since the loop bandwidth is smaller. In addition, the additional spur suppression came on the expense of the silicon area which is usually dominated by the passives. [55]

### 4.4.2 Dead-Zone Controlled PLL

This technique [56] reduces the spur by reducing the dead-zone delay of the PFD which is directly related to  $i_{CP}$ . Fig. 4.10 shows the block diagram of this PLL. The main

idea of this technique is to use a variable-delay element in the reset path of the PFD and program it such that the PFD output pulse width is minimized. As shown in Fig. 4.10, the PFD and CP are replicated to avoid interfering with the loop dynamics. The two inputs of the additional PFD are connected to the reference input to mimic the lock condition while its DN output controls an NMOS transistor that discharges the  $V_{delay}$ node. Since this node is charged through  $R_d$ , the new loop, consisted of the replica PFD, CP and the  $R_dC_d$ , will stabilize such that the charges injected from the supply into  $C_d$ through  $R_d$  will be equal to the charges withdrawn by the transistor and its tail current source.

Assuming that the time constant  $\tau_d = R_d C_d$  is much larger than the reference period, the pulse width can be approximately derived as

$$Q = I.\Delta t = \frac{V_{dd} - V_d}{R_d} . T_{ref} = I_{tail}.\Delta t_{PFD}$$
$$\therefore \Delta t_{PFD} = \frac{1}{I_{tail}} \frac{(V_{dd} - V_d)}{R_d} . T_{ref}$$

Although this technique can control the dead-zone pulse width, and hence the spurs, it suffers from the dependence on the absolute value of the resistor which can vary significantly with the process leading to the risk of running into a dead-zone condition. In other words, the resistor value should be chosen large enough such that the process variations do not harm the PLL operation which is same problem that exists in traditional PLLs in which the delay of the PFD feedback should be large enough to avoid the dead-zone.



4.10 Block diagram of the dead-zone controlled PLL.

# 4.4.3 Variable-K<sub>VCO</sub> PLL

Similar to the gear shifting technique, this technique [57] depends on changing the VCO gain and the charge pump current after locking. Fig. 4.11 shows the block diagram of the system.



Fig. 4.11 Block diagram of the variable-K<sub>VCO</sub> PLL.

Two charge pumps are used whose currents are dumped in the loop filter capacitors  $C_1$  and  $C_2$ . In addition, the VCO is controlled through two terminals,  $V_{c1}$  and  $V_{c2}$ . Upon locking, the lock detector disconnects one path, the one that controls  $C_1$ , and keeps the other path. In other words, the total effective charge pump current is reduced as well as the VCO gain. Decreasing the charge pump current directly decrease the spurs level. In addition, decreasing the VCO gain decreases the spurs gain while being translated to the output. Moreover, it decreases the overall unity-gain bandwidth which directly attenuates the spurs. However, reducing the bandwidth affect the suppression of the VCO noise.

# 4.4.4 Multi-Path PFD-CP PLL

Different from the other techniques that try to decrease the PLL unity-gain bandwidth to attenuate the spurs, this technique [58] rather increases the spurs frequency to get more attenuation from the loop filter. Fig. 4.12 shows a block diagram for this technique. The main idea is to repeat the spur at higher rate, *N* times higher, by having shifted versions of it and sum all of them at the loop filter node. This is performed by using different delay lines all of them have a delay that is integer multiple of the smallest

one,  $t_d = \frac{T_{ref}}{N}$ . Consequently, the outputs of the charge pumps, as well as the spurs, will

be spaced in time by  $\frac{T_{ref}}{N}$ . In other words, the spurs frequency is boosted by a factor N. Consequently, the spurs according to the FOM derived above will be improved by *40logN*. On the other hand, the spurs power increased by *N*. Thus, the total improvement in spur reduction will be *20logN*.



Fig. 4.12 Block diagram of the multi-path PFD-CP PLL [58].

The main drawback in such architecture is the accuracy of the time delay of the different paths and the matching of the spurs of the different charge pumps. Any mismatch in the delays will end up with an increased spurs level at  $f_{ref}$  rather than up-converting the spurs frequency. In addition, if the different charge pumps are not perfectly matched, this technique will up-convert the common spurs of the different charge pump but will leave the mismatched portions at  $f_{ref}$ . Consequently, the reference

spurs will not be fully cancelled. On the contrary, the spurs may increase if the mismatch exceeds certain limit.

# 4.4.5 Spur Suppression PLL Based on Sample-Reset Filter

This technique overcomes the spurs problem by sampling the control line voltage after the charge pump is turned off and its output voltage settles. Fig. 4.13 shows one implementation of this technique [59]. The charge pump input current is pumped in the capacitor  $C_1/2$  while  $S_1$  is off. Once the charge pump is turned off and the voltage across  $C_1/2$  is settled,  $S_1$  is turned on to update the loop filter. Consequently, the CP mismatches are isolated from the loop filter and the VCO control line which significantly reduces the output spurs. The only source of spurs in this technique is the clock feed through and the charge injection of the signal controlling the switch  $S_1$  which, both of them, are signal dependent.



Fig. 4.13 Sample-based loop filter.

One technique to overcome this problem is to use a transmission gate instead of a single transistor switch and adjust the PMOS and the NMOS transistors to be of the same size. Consequently, upon turning off the transmission gate, the electrons injected from the NMOS will be equal to the holes injected from the PMOS and electron-hole recombination occurs. However, this cancellation is perfect only if the input signal is around the mid-rail of the supplies. In addition, charge injection cancellation depends on having perfect complement clocks which is hard to attain. Also, changing the input signal level will change the turn-off time of both transistors leading to spurs at the output. Finally, having an input that is far from the mid-rail will induce holes/electrons in the PMOS/NMOS channel more than the electrons/holes in the NMOS/PMOS channel ending up with charge injection. In conclusion, charge injection cancellation usually operates perfectly at one level of the input signal and degrades as the input voltage goes far from it.

Another approach that is used mainly to overcome the clock feed-through disturbance is the use of dummy switches that are controlled by the complementary clock of the main switch as shown in Fig. 4.14. Similar to the transmission gate, the performance of this technique depends on the perfection of the complementary clocks and their rise/fall times. Moreover, this technique requires the size of the dummy switch to be half that of the sampling switch which makes it harder to ensure a perfect complement for their driving clocks. Table 4.1 provides a comparison between the different techniques and the drawbacks of each one.



Fig. 4.14 Transmission gate with dummy switches.

| Technique                           | Characteristics                                                                             |  |  |
|-------------------------------------|---------------------------------------------------------------------------------------------|--|--|
| Gear Shifting PLL                   | <ul><li>Fast settling.</li><li>Less suppression for VCO noise.</li></ul>                    |  |  |
| Dead-zone-controlled PLL            | Performance degrades with PVT variations.                                                   |  |  |
| Variable <i>K<sub>VCO</sub></i> PLL | <ul><li>Fast settling.</li><li>Less suppression for VCO noise.</li></ul>                    |  |  |
| Multi-path PFD-CP PLL               | • Very prone to process variations.                                                         |  |  |
| Sample-Reset filter PLL             | <ul><li>Good spur rejection</li><li>Channel-selection-dependent spur suppression.</li></ul> |  |  |

Table 4.1 Comparison between the different spur suppression techniques.

## **4.5 Spur-Frequency Boosting PLL**

As mentioned in the last two techniques in the previous section, the multi-path PFD-CP PLLs suffer the mismatches effect of the delay lines and the different charge pumps while the sampled loop filter approach suffers the non-idealities of the switch which can not be compensated except at one level of the input voltage making the performance channel-selection-dependent. In this section a novel technique is proposed to get the advantages of the two techniques while avoiding their disadvantages. In addition, the proposed technique can operate over a very wide frequency range without affecting the loop dynamics. In other words, the loop dynamics is independent of the divider ratio.

### 4.5.1 System Level Design

Fig. 4.15 shows the block diagram of the proposed spur-frequency boosting PLL (SFB-PLL). Similar to the conventional PLL, a PFD detects the phase error between the feedback signal and the reference input and applies the error to the charge pump to tune the VCO control voltage. However, the frequency of the PFD output is boosted by a boosting factor, *B*, before being applied to the charge pump. Consequently, the reference spurs theoretically vanish. The spur-frequency booster block consists of two sub-blocks: a time-to-voltage converter (TVC) and a voltage-to-time converter(VTC). The TVC converts the PFD output pulse-widths into voltage-samples which are then converted back to pulse-widths but at a higher frequency, *f*<sub>B</sub>, using a VTC. Fig. 4.16 shows the model of the PLL.



Fig. 4.15 Block diagram of the proposed SFB-PLL.

Compared to the conventional PLL, three additional gains should be considered. First, the gain of the TVC expressed in Volt/second (V/s) which corresponds to the ratio of the change of its output voltage to the change of its input pulse width. Second, the gain of the VTC expressed in second/Volt (s/V) which corresponds to the ratio of the change of its output pulse width to the change of its input voltage. Finally, the boosting factor, B(s), which corresponds to the regeneration rate of the regeneration of the PFD output pulse. The magnitude of B(s) is given by  $f_B/f_{ref}$ . Consequently, the loop transfer function is given by:

$$G(s)H(s) = \frac{K_{PD} I_{CP} K_{VCO}}{N(C_1 + C_2)} \frac{(1 + s/\omega_z)}{s^2(1 + s/\omega_p)} * K_{TVC}(s)K_{VTC}(s)B(s)$$

where G(s) is the feed-forward gain, H(s) is the feedback gain,  $K_{TVC}(s)$  is the TVC gain and  $K_{VTC}(s)$  is the VTC gain.

Two different options for providing  $f_B$  are available. The first option is to provide it through an independent source. The second option is to use a feedback from the PLL's VCO employing a divider. The main advantage that favours the second option is the wide operating rang that it provides without affecting the loop dynamics. Assume that the ratio between  $f_{VCO}$  and  $f_B$  is M, the loop transfer function will be given by:

$$G(s)H(s) = \frac{K_{PFD}I_{CP}K_{VCO}K_{TVC}(s)K_{VTC}(s)}{N(C_1+C_2)}\frac{f_{VCO}/M}{f_{ref}}\frac{1}{s^2}\frac{(1+s/\omega_z)}{(1+s/\omega_p)}*phase(B(s))$$

Since  $f_{VCO} = N f_{ref}$ , then

$$G(s)H(s) = \frac{K_{PD}I_{CP}K_{VCO}K_{TVC}(s)K_{VTC}(s)}{M(C_1 + C_2)} \frac{1}{s^2} \frac{(1 + s/\omega_z)}{(1 + s/\omega_p)} * phase(B(s))$$

The previous equation indicates that the loop dynamics and the phase margin are independent of the divider ratio; thus, the PLL can operate at any operating frequency as long as the VCO can cover that range.

From the spurs point of view, the new spurs will appear at  $f_B$  instead of  $f_{ref}$ . Consequently, the output spurs will be given by:

$$A_{SP\_total} = A_{VCO} i_{CP\_B} R \frac{K_{vco}}{2\omega_B} \frac{\omega_p}{\omega_B}$$



However, the fundamental component of the spur,  $i_{CP_B}$ , will increase compared to the traditional PLL. According to Fig. 4.8, the timing mismatch,  $\tau$ , will not change by the change of the input frequency of the CP from  $f_{ref}$  to  $f_B$ . However, the period will change

from  $\frac{1}{f_{ref}}$  to  $\frac{1}{f_B}$ ; consequently, the fundamental component,  $i_{CP_B}$ , will be given by a

 $i_{CP\_conv} \frac{f_B}{f_{ref}}$ . Where  $i_{CP\_conv}$  is the fundamental component of the current mismatch in the

case of conventional PLL. Thus, the spurs at the output will be given by:

$$A_{SP\_total} = A_{VCO} \frac{f_B}{f_{ref}} i_{CP\_conv} R \frac{K_{vco}}{2\omega_B} \frac{\omega_p}{\omega_B}$$
$$A_{SP\_total} = A_{VCO} i_{CP\_conv} R \frac{K_{vco}}{2\omega_{ref}} \frac{\omega_p}{\omega_{ref}} \left(\frac{\omega_{ref}}{\omega_B}\right)$$

Compared to the conventional case, the new spurs at  $f_B$  will suffer additional attenuation

of 
$$20\log\left(\frac{\omega_{ref}}{\omega_B}\right)$$
.

One important point to remember here is that the aforementioned analysis considers the CP current mismatch as the main source of the output spurs. However, clock feed-through, charge injection and substrate leakage will affect the output spurs as well. Specifically, as  $f_B$  increases, clock feed-through will start dominating over the current mismatch and the output spurs level may increase. To overcome this problem, additional non-dominant poles can be added to the loop filter to further suppress the spurs. Since  $f_B > f_{ref}$ , the location of the additional poles can be chosen such that it does not affect the dynamics of the loop while achieving a significant suppression for the spurs at  $f_B$ .

### 4.5.2 Transistor Level Implementation

The spur-frequency boosting PLL consists of six main blocks: PFD, divider, VCO, CP, TVC and VTC. The main specification in transistor level design is to

minimize the leakage of the reference spurs from the different blocks to the charge pump in order to provide a reference-frequency-clean control-voltage to the VCO. That includes using separate supplies for the blocks that manipulate the reference frequency (PFD, divider and the TVC) and the other blocks (VTC, CP and VCO). In the following subsections the implementation of each block will be analyzed.

### 4.5.2.1 PFD

Fig. 4.17 shows the implementation of the PFD. It consists of standard logic gates, NAND, NOR and INV. In addition, voltage controlled inverter cells are used to tune the PFD minimum-pulse-width. Fig. 4.18 shows post layout simulations for the PFD minimum pulse width versus the control voltage. It is clear that the PFD minimum pulse width ranges from 150ps to 410psec for  $V_c$  ranging from 0V to 0.8V.  $V_c$  is used to overcome any process variations that may shorten the pulse width to an extent that it is not able to maintain the charge pump in the correct region of operation.

### 4.5.2.2 Divider

The divider in a PLL is responsible for providing an output frequency that is lower than the input frequency by a factor N where N is a digital programmable input corresponding to the targeted channel. Fig. 4.19 shows a typical integer divider architecture. The divider consists of 3 blocks: prescaler, divide-by-P and divide-by-S. The prescaler is a programmable divider with one-bit control input,  $M_C$ , that sets the division ratio to M or M+1.



Fig. 4.17 Transistor level implementation of the phase-frequency detector.



Fig. 4.18 PFD minimum pulse width versus the control voltage.



Fig. 4.19 Typical integer-N divider architecture in PLLs.

Assuming that initially the *P* and *S* counters are reset and the prescaler is programmed to divide by M+1, after *S* counts at node *X*, the output of the *S* counter will change the programming bit of the prescaler such that it divides by *M*. After other (P-S) counts at node *X*, one count will be delivered to the output,  $f_{out}$ , and the counters will be reset to repeat the cycle again. The output period,  $T_{out} = 1/f_{out}$ , will be given by the period that is required for the first *S* counts at node *X*, which is given by  $S(M+1)T_{osc}$ , and the period of the next (*P*-*S*) counts at node *X*, which is given by  $(P-S)MT_{osc}$ . Consequently, the total period is given by:

$$T_{out} = S(M+1)T_{osc} + (P-S)MT_{osc} = (MP+S)T_{osc}$$
  
$$\therefore f_{osc} = \frac{f_{out}}{(MP+S)}$$

Since the input frequency to the *P* and the *S* counters is relatively low, the main block that require careful attention is the prescaler. Fig. 4.20 shows a block diagram of a prescaler whose *M* equals to 2. When the control signal,  $M_C$ , is high, the NOR gate output will always be '0' and  $Q_I$  will always be '0'. Thus, the *OR* gate will be

transparent and  $D_2$  will be equal to  $\overline{Q_2}$ . Consequently, the prescaler will act as a divide by 2. On the other hand, if  $M_C$  is low, the outputs,  $Q_1$  and  $Q_2$  will be given by:

$$Q_1 = \overline{Q_2}, \quad Q_2 = Q_1 + \overline{Q_2}$$

which follows the truth table given in table 4.2



Fig. 4.20 Block diagram of a prescaler with M=2.

| Q1_old | $Q_{2\_old}$ | Q1_new | $Q_{2\_new}$ |
|--------|--------------|--------|--------------|
| 0      | 0            | 1      | 1            |
| 1      | 1            | 0      | 1            |
| 0      | 1            | 0      | 0            |

Table 4.2 Truth table of a 2/3 prescaler.

It is clear that the circuit has two states only and operates as a divide by 3. A prescaler with M=2 is usually the first block of higher-order prescalers. In order to extend the prescaler to be a 4/5, 8/9 or a 16/17 one, the main idea is to add one, two, or three divide-by-2 blocks at the output of the 2/3 prescaler and use digital gates such that the control input of the 2/3 prescaler is normally equal to "1" except at one state of the

counter as shown in Fig. 4.21. If  $M_C$  is low, the 2/3 block will divide by 2 most of the time and one time only during the cycle will divide by 3. Consequently, the input will be divided by  $2^n + 1$  where *n* is the number of sub-dividers in the prescaler. On the other hand, if  $M_C$  is high, the prescaler will divide by  $2^n$ .



Fig. 4.21 Block diagram of a 16/17 prescaler.

Many techniques can be used to implement divide-by-2. The main tradeoff is the power consumption and the speed of the divider. Ultrahigh speed dividers use injection locking and phase switching architectures while high frequency dividers usually utilize current-mode-logic, CML, dividers. Moderate frequency dividers employ true-single-phase-clocking, TSPC, divider while low speed ones employ static dividers. In the current design, the prescaler is implemented using TSPC cells while the *S* and *P* counters are implemented using static cells. Fig. 4.22 shows transistor level implementation of a TSPC D-FF.



Fig. 4.22 Transistor level implementation of the TSPC D-FF.

## 4.5.2.3 Voltage-Controlled Oscillator

The voltage-controlled-oscillator is the block that converts the voltage ripple to reference-spur around the carrier frequency. As mentioned before, the VCO is vulnerable to two sources of voltage ripple: one through the control line voltage and another one through the supply lines. In order to check the immunity of the PLL to the reference spurs, a low power-supply-rejection-ratio, PSRR, VCO should be used such that the low spurs measured at the output is ascribed to the proposed architecture rather that the high PSRR of the VCO. In other words, if a low PSRR VCO is employed and the reference spurs are low, it is expected to be much lower when using a high PSRR one. To attain a low PSRR VCO, a single ended oscillator, shown in Fig. 4.23, with minimum channel length is used. Fig. 4.24 shows the VCO gain versus the input control voltage. The gain changes significantly at large control voltage since the supply voltage is 0.8V. However, the PLL should maintain a good phase margin along the variations of the VCO gain. Fig. 4.25 shows the VCO gain versus the supply voltage. The VCO supply gain is around 7GHz/V.



Fig. 4.23 Transistor level implementation of the VCO.

# 4.5.2.4 Charge Pump

The main problem in charge pump design is the matching between the UP (PMOS) current and its DN (NMOS) counterpart. To maintain the matching across a wide output voltage range, the drain voltages of the current mirror transistors should be equal. Fig. 4.26 shows the transistor level implementation of the charge pump. An NMOS-input differential pair is used to adjust the voltage at node *X* to follow the output voltage  $V_C$ . When  $V_C$  increases, the OpAmp input becomes more negative. Thus, its output goes down to increase the source-gate voltage of the PMOS to compensate the drop of the source-drain voltage. Consequently, the PMOS current will be, ideally, independent of the output voltage level.



Fig. 4.24 VCO gain versus the control voltage.

One important note is the polarity of the OpAmp. Since it is involved in two loops and one of them is positive, it is important to ensure that the negative loop, the one on the left in Fig. 4.26, is faster than the positive one. Since the positive loop is connected to the loop filter while the negative does not include any low frequency poles (compared to the loop filter), the system will be stable. In addition, since at steady state the UP switch will be open most of the time, the positive loop will be disconnected. However, the total phase margin of the two loops should be maintained at acceptable level, i.e.  $PM>45^{\circ}$ , to ensure stable transient performance. Since the negative loop constructs a two stage amplifier, the OpAmp and the PMOS current source, the capacitor  $C_c$  is used to ensure that the loop is stable.



Fig. 4.25 VCO gain versus the supply voltage.

One important consideration with the shown architecture is the start-up condition. Assuming at start up node X and  $V_C$  are at zero voltage the NMOS input transistors of the OpAmp will be completely off and the OpAmp output will follow the ramp up of the  $V_{dd}$ . Consequently, when the supply stabilize at  $V_{dd}$ , the PMOS current sources will be completely off. While the PLL starting to attain locking, the pulse width of the UP input will be larger than its DN counterpart. However, the UP current source will be off and the PLL will not start up. To overcome this problem either:

- 1- Use separate supply for the OpAmp that is less than the CP supply such that the PMOS current source is not totally off.
- 2- Use an additional PMOS current source to provide an OpAmp-independent UP current. For matching purposes, this current source should be turned off once the PLL starts up.

3- Use a start-up circuit to set the voltages of nodes X and  $V_C$  to be within the input common-mode range of the OpAmp.



Fig. 4.26 Transistor level implementation of the CP.

## 4.5.2.5 Time-to-Voltage Converter

The TVC is a block that translates the pulse width of the input pulse to a voltage output. Since the TVC is the last block that manipulates  $f_{ref}$  before the VCO, the main specification for it is to provide a spur-free output voltage. Two TVCs handle the UP and DN pulses of the PFD. In this architecture, both the pulses are active low and hence their circuit operation is identical. The circuit diagram of the two TVCs is shown in Fig. 4.27 along with a timing diagram for their operation. When the PFD (UP or DN) pulse goes low, transistors  $M_2$  are turned on to charge the capacitors  $C_1$  producing a voltage proportional to the PFD pulse width. A positive edge detector,  $ED_1$ , detects the positive edge of the PFD pulse, which indicates the end of charging phase, and generates an output pulse that turns on the transmission gates, TG, to allow charge sharing between

 $C_1$  and  $C_2$ . Charge sharing changes the voltage on  $C_2$  by an amount proportional to the change in the PFD pulse width.  $ED_2$  detects the end of charge-sharing phase and generates an output pulse to discharge  $C_1$  to a voltage  $V_b$  through  $M_3$ . Since the voltage generated by the TVC is proportional to the PFD pulse width, the output voltages of both TVCs are always around  $V_b$  at steady state independent of the channel selection. In addition, the transmission gates  $TG_{UP}$  and  $TG_{DN}$  are triggered using the same edge detector,  $ED_1$ , which guarantees equal feed-through and charge injection for the two TVCs and hence achieves better performance than the sample-reset approach in [51]. Although in [51] a fully differential architecture is used, different voltage-levels at the differential outputs induce mismatches in the injected and fed-through charges leading to leakage of the reference frequency to the output.

Fig. 4.28 shows the block diagram of the negative edge detector. The basic concept of the edge detectors is to apply the input along with its inversion to an AND gate while delaying the inverted signal. Thus, the output will be normally low except for the time that the inverted version of the signal takes to trigger the AND gate. Consequently, the pulse width of the output signal is given by the delay of the inverted signal.



Fig. 4.27 Transistor level implementation of the TVCs.



Fig. 4.28 Block diagram of a negative edge detector.

Denoting the voltages across the capacitors  $C_1$  and  $C_2$  as  $V_1$  and  $V_2$  respectively, the system time domain equation will be:

$$V_{2}(nT) = \frac{C_{1}}{C_{1} + C_{2}} V_{1}(nT) + \frac{C_{2}}{C_{1} + C_{2}} V_{2}((n-1)T)$$

Thus, the transfer function in the Z-domain is given by:

$$\frac{V_2(z)}{V_1(z)} = \frac{\frac{C_1}{C_1 + C_2}}{1 - \frac{C_2}{C_1 + C_2}Z^{-1}}$$

It is clear that the additional pole is inside the unity circle,  $Z_p = \frac{C_2}{C_1 + C_2} < 1$ , which

means that the system is stable. However the phase margin will decrease. Using the bilinear transformation the s-domain transfer function is found to be:

$$\frac{V_2(s)}{V_1(s)} = \frac{V_2(z)}{V_1(z)}\Big|_{z=\frac{1+sT/2}{1-sT/2}} = \frac{\left(1+\frac{s}{2/T}\right)}{1+\frac{s}{\left(\frac{C_1}{C_1+2C_2}\right)\frac{2}{T}}}$$

which corresponds to a zero at  $\omega_z = \frac{\omega_{ref}}{\pi}$  and a pole at  $\omega_p = \frac{\omega_{ref}}{\pi} \left( \frac{C_1}{C_1 + 2C_2} \right)$ . Since the

pole frequency is lower than the zero, it is expected that the phase margin of the PLL will degrade. The degradation in the phase margin is given by:

$$\Delta\phi_{PM} = \tan^{-1}\left(\pi \frac{\omega_{GBW}}{\omega_{ref}}\right) - \tan^{-1}\left(\pi \frac{\omega_{GBW}}{\omega_{ref}}\left(1 + \frac{2C_2}{C_1}\right)\right) = -\tan^{-1}\left(\frac{\pi \frac{\omega_{GBW}}{\omega_{ref}} \cdot \frac{2C_2}{C_1}}{1 + \left(\pi \frac{\omega_{GBW}}{\omega_{ref}}\right)^2 \left(1 + \frac{2C_2}{C_1}\right)}\right)$$

Fig. 4.29 shows the degradation of the phase margin of the PLL versus the

capacitor ratio 
$$\frac{C_2}{C_1}$$
 for different  $\frac{\omega_{ref}}{\omega_{GBW}}$  values.



Fig. 4.29 Degradation of the PLL phase margin versus the TVC capacitors ratio.

It is clear that a higher  $\frac{\omega_{ref}}{\omega_{GBW}}$  ratio improves the performance. This is obvious

from the pole and zero frequencies derived above as both are function in  $\omega_{\rm ref}$  only; thus,

having a higher  $\frac{\omega_{ref}}{\omega_{GBW}}$  ratio will decrease their effect on the phase margin of the loop. On

the other hand, as  $\frac{C_2}{C_1}$  decreases the degradation in the phase margin decreases.

Consequently, decreasing  $C_2$  and increasing  $C_1$  improves the performance. However, decreasing  $C_2$  will increase the effect of charge injection and the clock feed-through of the transmission gates at the output. In addition, leakage current will change the voltage of the cap which will be corrected at each reference cycle leading to voltage ripples at frequency  $f_{ref}$ . On the other hand, increasing  $C_1$  will decrease the gain of the TVC since

$$V_1(nT) = \frac{I_{CS}}{C_1} \Delta t_{PFD}(nT)$$

Where  $I_{CS}$  is the charging current and  $\Delta t_{PFD}$  is the pulse width of the PFD output.

$$\therefore \frac{V_2(s)}{\Delta t_{PFD}(s)} = \frac{I_{CS}}{C_1} \frac{\left(1 + \frac{s}{\omega_{z_-TVC}}\right)}{\left(1 + \frac{s}{\omega_{p_-TVC}}\right)}$$

In order to increase the gain and compensate the increase in  $C_1$ , the charging current,  $I_{CS}$ , should increase. In conclusion, the TVC gain, the reference spur rejection, the power consumption along with the phase margin of the PLL should be considered when choosing the values of  $C_1$  and  $C_2$ .

From noise analysis point of view, the architecture shown in Fig. 4.27 provides high rejection for the noise of the current source  $M_{CS}$ . Fig. 4.30 shows a timing diagram of the TVC at steady state along with the effect of the current source noise on the performance.

The PFD output pulse width is usually given by a fixed minimum pulse width, used to avoid the dead zone of the PFD, in addition to a small additional width,  $\Delta \phi$ , in one of the two outputs to compensate the phase error, phase noise, of the PLL. Assuming that the VCO phase is leading the reference input as shown in Fig. 4.30, the PFD pulse width is given by:

$$\begin{split} \Delta t_{UP} &= \Delta t_{\min} \\ \Delta t_{DN} &= \Delta t_{\min} + \frac{\Delta \phi}{2\pi} T_{ref} \end{split}$$



Fig. 4.30 Timing diagram for the effect of the current source noise on the performance of the TVC.

Denoting the noise current of the current source  $i_{n_cs}$ , the charges pumped in  $C_1$  will be given by:

$$\begin{aligned} Q_{UP} &= \left(I_{cs} + i_{n\_cs}\right) \Delta t_{\min} \\ Q_{DN} &= \left(I_{cs} + i_{n\_cs}\right) \Delta t_{\min} + \left(I_{cs} + i_{n\_cs}\right) \frac{\Delta \phi}{2\pi} T_{ref} \end{aligned}$$

Thus, the differential charges pumped in  $C_1$  will be given by:

$$\Delta Q = Q_{DN} - Q_{UP} = \left(1 + \frac{i_{n_{cs}}}{I_{cs}}\right) I_{cs} \frac{\Delta \phi}{2\pi} T_{ref}$$
  
$$\therefore \Delta V = \frac{\Delta Q}{C_1} = \left(1 + \frac{i_{n_{cs}}}{I_{cs}}\right) I_{cs} \frac{\Delta \phi}{2\pi} \frac{T_{ref}}{C_1}$$

where  $I_{cs} \cdot \frac{\Delta \phi}{2\pi} T_{ref}$  is the noiseless output of the TVC. In conclusion, the noise is independent of the PFD dead zone period  $\Delta t_{min}$  due to the pseudo differential nature of the TVC. The most interesting point is that the same conclusion can be reached about the spurs. Since the circuit is identical for both UP and DN inputs and the same pulse is controlling the UP and DN transmission gates, any spurs due to feed through or charge injection of the transmission gates is converted to a common mode signal and is independent of the PFD output pulse width. On the other hand, the main source of noise in the TVC is the current sources  $M_{IUP}$  and  $M_{IDN}$  since their contribution will not be common mode. The voltage change on  $C_{IUP}$  and  $C_{IDN}$  are given by:

$$\Delta V_{C1\_UP} = \left(I_{cs} + i_{n\_M1UP}\right) \frac{\Delta t_{\min}}{C_1}$$
$$\Delta V_{C1\_DN} = \left(I_{cs} + i_{n\_M1DN}\right) \frac{1}{C_1} \left(\Delta t_{\min} + \frac{\Delta \phi}{2\pi} T_{ref}\right)$$

Thus, the differential voltage across  $C_1$  will be given by:

$$\Delta V_{C1} = \left(1 + \frac{i_{n_{-M1DN}}}{I_{cs}}\right) \frac{I_{cs}}{C_1} \frac{\Delta \phi}{2\pi} T_{ref} + \frac{\Delta t_{\min}}{C_1} \left(i_{n_{-M1DN}} - i_{n_{-M1UP}}\right)$$

Since  $M_{IUP}$  and  $M_{IDN}$  are matched and are subject to the same operating conditions, their noise powers are the same. Thus, the differential voltage across  $C_1$  can be represented in terms of its mean and variance as follow:

$$\Delta V_{C1} = \mu_{\Delta V_{C1}} + i_{n_{-}M1}\sigma_{\Delta V_{c1}} = \left(\frac{I_{cs}}{C_1}\frac{\Delta\phi}{\omega_{ref}}\right) + i_{n_{-}M1}\left(\frac{\sqrt{2}\Delta t_{\min}}{C_1}\sqrt{1 + \frac{\Delta\phi}{\Delta t_{\min}\omega_{ref}}}\right)$$

Contrary to the  $M_{cs}$  case, the above equation indicates that the voltage noise will depend on the minimum pulse width of the PFD,  $\Delta t_{min}$ , in addition to the phase difference between the input signal and the feedback pulse,  $\Delta \phi$ . Consequently, the PFD minimum pulse width should be as small as possible since the contribution of  $M_I$  is dominating the noise performance of the TVC. One solution to alleviate this problem is to use one transistor,  $M_I$ , instead of two,  $M_{IUP}$  and  $M_{IDN}$ . In that case the  $\frac{\sqrt{2}i_{n_{-}M1}}{C_1}\Delta t_{min}$  term in the above equation will be cancelled out and the effect of  $M_I$  noise

will be equivalent to  $M_{cs}$ .

# 4.5.2.6 Voltage-to-Time Converter

The voltage-to-time converter is a block that produces a square wave whose pulse width changes proportionally to the change of the input voltage. The simplest implementation of a VTC is to use a series of CMOS inverters in which the delay of each other inverter is controlled by the input voltage while the input is a square wave with minimum possible pulse width. Fig. 4.31 shows transistor level implementation of this VTC. Since the delay of each other inverter is controlled by the input voltage, the delay of the rising and falling edges of the input pulse will be different ending up with a change in the pulse width that is proportional to the input voltage.



Fig. 4.31 Transistor level implementation of the delay line based VTC.

The number of inverters in the line is set through the required gain of the VTC. The more the gain is, the more the number of inverters is. The gain of the VTC is given by:

$$A_{VTC} = \frac{\Delta \tau_{pw}}{\Delta V_{cnt}} = \frac{\tau_{pw1} - \tau_{pw2}}{V_{cnt1} - V_{cnt2}}$$

where  $V_{cnt(1,2)}$  and  $\tau_{pw(1,2)}$  are the control voltage and the pulse width of the output respectively.

Although this technique is simple, it has one drawback which is the offset in the output pulse (or the minimum output pulse width). Since the inverter chain is unbalanced, the delays of the rise and fall edges are not equal, the minimum output pulse width is large which is not preferable. The output pulse width affects the spur performance of the

PLL and having small pulse width signal at steady state is the classical way of decreasing the output spurs. In addition, having large pulse width at steady state will increase the power consumption of the charge pump. To overcome this problem, a pseudo differential version of the inverter line is used as shown in Fig. 4.32. The first line is controlled by the input voltage while its counterpart is controlled by a constant bias voltage whose value is less than the minimum expected control voltage of the first line. Since the minimum input control voltage is given by  $V_b$  of the TVC, the bias voltage of the second delay line should be less than  $V_b$ . As shown in the timing diagram in Fig. 4.32, the delay line outputs will suffer the offset but the differential output, after the NOR gate, will be offset free. Consequently, a pulse in the order of 100ps can be generated.

One major problem with inverter-based VTC is the linearity of its transfer function. Fig. 4.33 shows the gain of the VTC versus the control voltage. The gain has little variations at small values for  $V_{cnt}$ , <0.5V, and suffers large changes as  $V_{cnt}$  increases. To overcome the nonlinearity problem, a gear shifting mechanism is used to switch between the pseudo-differential inverter-based VTC and a TVC-based VTC. Although the later can operate linearly over a wide range of the input control voltage, it can not generate output signals in the order of 100psec which is required at steady state. Fig. 4.34 shows the TVC-based VTC. An edge-detector, *ED*, detects the rising edge of the high frequency clock,  $f_B$ , and generates a short pulse that sets the SR-latch (whose *QB* is normally high). The current source,  $M_I$  and the capacitor operate as a TVC. When *QB* goes low, transistor  $M_I$  turns on and the current source starts charging the capacitor *C*. Once the voltage across C exceeds  $V_{in}$ , the SR-latch is reset generating an output pulse whose width is proportional to the VTC input voltage,  $V_{in}$ . The gain of the TVC-based VTC is given by:

$$A_{VTC} = \frac{C}{I_b}$$



Fig. 4.32 Pseudo-differential delay-line-based VTC and its timing diagram.

Since the VTC is preceded with a TVC, the nonlinearities of the TVC and the TVC-based VTC will cancel out leading to a linear transfer function from the PFD output to the CP input. The only design concern is that the comparator should be able to handle a large common mode input range.



Fig. 4.33 Gain of the inverter-based VTC versus the control voltage  $V_{cnt}$ .



Fig. 4.34 Block diagram of the TVC-based VTC.

Fig. 4.35 shows the complete block diagram of the VTC. Path 1 corresponds to the TVC-based VTC while path 2 is the inverter-based one. A path-selector, comparator with hysteresis, chooses the suitable path for the input level  $V_{cnt}$ . The hysteresis prevents ringing in the gear-shifting mechanism. Fig. 4.36 shows block diagram of the comparator. Two differential pairs are used to compare the input signal to the hysteresis voltages,  $V_{ref1}$  and  $V_{ref2}$ , and an SR latch is used to store the state. Since the accuracy of the hysteresis does not affect the performance,  $V_{ref1}$  and  $V_{ref2}$ , can be generated using simple voltage-level shifters.

From the noise point of view, the voltage-controlled edge-detector path, path 2, is more critical than the TVC-based VTC, path 1. Since the TVC-based VTC will be idle at steady state, its noise will not contribute to the overall PLL noise. However, path 2 will contribute jitter to the VTC output pulse. The jitter of the upper delay line in Fig. 4.32 will change the instance of the rising edge of the pulse while the jitter of the lower line will modulate the time-instance of the falling edge. Thus, the pulse width of the output pulse, which corresponds to the oscillator's phase noise to be corrected, will change. Since the two lines are almost identical (one line has one inversion more than the other), their jitter powers are almost the same but independent. Consequently, the jitter in the pulse width will be given by:

$$\Delta t_{pw_jitter} = \sqrt{2\Delta t_{delay_line}}$$

where  $\Delta t_{delay\_line}$  and  $\Delta t_{pw\_jitter}$  are the jitter of the delay line and the jitter of the pulse width of the output respectively.





Fig. 4.35 Complete block diagram of the VTC.



Fig. 4.36 Block diagram of the comparator with hysteresis.

Since there are two VTCs of independent noise, the total noise pumped in the filter will be proportional to  $\sqrt{2}\Delta t_{delay\_line}$ . One way to alleviate this problem is to share

the reference line, the upper delay line in Fig. 4.32, between the two VTCs. Consequently, the jitter of the rising edge will be the same for the UP and the DN signals and is converted to common mode. However, the jitter of the lower delay lines still affects the performance. The total noise pumped in the filter in that case will be proportional to  $\sqrt{2}\Delta t_{delay\_line}$  which is 3dB better than the previous implementation. One important point to mention here is that the jitter of the boosting clock,  $f_B$ , does not affect the noise performance of the VTC. The variations in the rising edge position of  $f_B$  vary the position of the VTC output pulse rather than its width. Consequently, it does not add noise to the phase error signal.

### 4.5.3 PLL Implementation

Since some of the PLL blocks introduce additional poles and zeros in the loop, the conventional PLL transfer function should be changed accordingly. Fig. 4.37 shows a block diagram of the SFB-PLL model. The gains of the PFD, CP, VCO and the dividers are similar to the conventional PLL's counterpart. However, the spur-frequency booster provided three additional gains associated with the TVC, VTC and the spur regeneration rate.

The TVC introduced a zero at 
$$\omega_{z_{-}TVC} = \frac{\omega_{ref}}{\pi}$$
 and a pole  
at  $\omega_{p_{-}TVC} = \frac{\omega_{ref}}{\pi} \left( \frac{C_{1_{-}TVC}}{C_{1_{-}TVC} + 2C_{2_{-}TVC}} \right)$  along with a gain of  $K_{TVC}$ . Thus, the loop gain will be

given by:

$$G(s)H(s) = \frac{K_{PFD} I_{CP} K_{VCO} K_{TVC} K_{VTC}}{N(C_1 + C_2)} B(s) \frac{(1 + s/\omega_z)}{s^2 (1 + s/\omega_p)} \frac{(1 + s/\omega_{z_-TVC})}{(1 + s/\omega_{p_-TVC})}$$



Fig. 4.37 SFB-PLL model.

In addition, the regeneration factor, whose DC value is given by  $B = \frac{f_B}{f_{ref}}$ , has a

phase associated with it which represents the delay of each output pulse of the VTC within a reference period. In other words, the output pulses of the VTC are spaced in time by  $T_B = T_{ref} / B$  which corresponds to a train of pulses. The Fourier transform of such signal is given by:

$$B(s) = 1 + e^{-sT_B} + e^{-s2T_B} + \dots + e^{-s(B-1)T_B} = e^{-\frac{s\pi}{\omega_{ref}} \left(\frac{B-1}{B}\right)} \frac{\sinh\left(\frac{\pi s}{\omega_{ref}}\right)}{\sinh\left(\frac{\pi s}{\omega_B}\right)}$$

Since the unity gain frequency,  $\omega_{GBW}$ , must be less than the reference frequency,  $\omega_{ref}$ , by at least one order of magnitude and since  $\omega_B > \omega_{ref}$ , B(s) can be approximated as:

$$B(s)\Big|_{s=j\omega_{GBW}} \approx e^{-j\frac{\pi}{(\omega_{ref}/\omega_{GBW})}\left(\frac{B-1}{B}\right)}\frac{\omega_{GBW}/\omega_{ref}}{\omega_{GBW}/\omega_{B}} = Be^{-j\frac{\pi}{(f_{ref}/f_{GBW})}\left(\frac{B-1}{B}\right)}$$

The above equation indicates that the regeneration will maintain a gain of *B*. However, it will affect the overall phase margin of the PLL. Since the regeneration process generates *B* copies of the PFD output pulse spaced in time by  $T_B$  through  $T_{ref}$ , the average of the additional delay of the PFD output is half  $T_{ref}$  which corresponds to a loss in the phase margin of  $2\pi/(T_{GBW}/(T_{ref}/2)) = \pi\omega_{GBW}/\omega_{ref}$  which conforms with the equation above. The overall open loop transfer function is given by:

$$G(s)H(s) = \frac{K_{PFD}I_{CP}K_{VCO}K_{TVC}K_{VTC}}{N(C_1+C_2)}\frac{(1+s/\omega_2)}{s^2(1+s/\omega_p)}\frac{(1+s/\omega_2)}{(1+s/\omega_p)}e^{-\frac{s\pi}{\omega_{ref}}\left(\frac{B-1}{B}\right)}\frac{\sinh\left(\pi s/\omega_{ref}\right)}{\sinh\left(\pi s/\omega_B\right)}$$

In the current prototype, the ratio between the reference frequency and the unity gain bandwidth,  $\omega_{ref} / \omega_{GBW}$ , is chosen to be 20 along with a phase margin of 45°. In addition, equal TVC capacitors are used. Consequently, the contribution of the TVC, VTC and the regeneration process to the loop dynamics is given by:

$$A_{TVC,VTC} = K_{TVC} K_{VTC} \frac{\left(1 + \frac{s}{\omega_{ref}/\pi}\right)}{1 + \frac{s}{\omega_{ref}/\pi} \left(\frac{C_{1_{-}TVC} + 2C_{2_{-}TVC}}{C_{1_{-}TVC}}\right)} e^{-\frac{s}{2f_{ref}} \left(\frac{B-1}{B}\right)} \frac{\sinh\left(\pi s/\omega_{ref}\right)}{\sinh\left(\pi s/\omega_{B}\right)} \right|_{s=j\omega_{GBW}}$$
$$= K_{TVC} K_{VTC} B * 0.915 e^{-j0.284} * e^{-j\frac{\pi}{20} \left(\frac{B-1}{B}\right)}$$

Assuming  $\left(\frac{B-1}{B}\right) \approx 1$ , then the above equation will yield to

$$A_{TVC,VTC} = 0.915 * K_{TVC} K_{VTC} B * e^{-j0.44}$$

Consequently, the open loop gain of the PLL will be given by:

$$G(s)H(s)\Big|_{s=j\omega_{GBW}} = \frac{0.915*BK_{PD}I_{CP}K_{VCO}K_{TVC}K_{VTC}}{N(C_1+C_2)}\frac{1}{s^2}\frac{(1+s/\omega_z)}{(1+s/\omega_p)}e^{-j0.44}$$

Since the last term in the above equation indicates a loss in the phase margin of 25°, the loop filter should be designed targeting a phase margin of 70° to end up with 45°. Consequently, the ratios  $\omega_{GBW}/\omega_z$ ,  $\omega_p/\omega_{GBW}$  and  $(\omega_n/\omega_z)^2$  are given by:

$$\alpha = \tan\left(\frac{70+90}{2}\right) = 5.67$$

Since  $\omega_{ref} / \omega_{GBW} = 20$ , then

$$\omega_z = \frac{1}{\alpha} \frac{\omega_{ref}}{20} = \frac{1}{RC_1}$$

$$\omega_p = \alpha \frac{\omega_{ref}}{20} = \frac{1}{R\frac{C_1C_2}{C_1 + C_2}}$$

$$\omega_n = \frac{1}{\sqrt{\alpha}} \frac{\omega_{ref}}{20} = \sqrt{\frac{0.915 * B K_{PFD} I_{CP} K_{VCO} K_{TVC} K_{VTC}}{N(C_1 + C_2)}}$$

It is clear from the design equations that there will be degrees of freedom in the design.  $\omega_p$  and  $\omega_z$  can be solved together to get reasonable values for the loop filter components while the gains of the different blocks can be used to fulfill the  $\omega_n$  equation.

In the current prototype,  $\omega_{ref}$  is chosen to be 6MHz,  $K_{VCO}$  is set to 100MHz/V and N is around 600.

## 4.5.4 Measurement Results

The proposed SFB-PLL is fabricated in 90nm UMC digital process. Fig. 4.38 shows the chip micrograph. The PLL occupies 0.036mm<sup>2</sup> and consumes 1.5mW at 3.6GHz.

Voltage regulators are used to ensure clean supply voltage for the circuit while supply capacitors covering wide frequency range are used to suppress the supply noise. Small resistors are inserted between the regulator output and the supply capacitors, in series, to construct a low pass filter for further suppression of the noise. Resistor values in the range of  $10\Omega$  to  $100\Omega$  were used depending on the DC current of the supply and how critical it is. The bottom side of the PCB contains the testing setup for the stand alone test circuits (TVC, VTC, CP, VCO and passives) along with the controls that chooses the block to be tested. The measured VCO gain was found to be around 100MHz/V.

The 6MHz reference clock is provided through a signal generator (HP8648C) while the boosting clock,  $f_B$ , is provided through another signal generator (Agilent E8267D). Agilent E4446A spectrum analyzer is used to monitor the PLL output spectrum and Agilent infinitum oscilloscope is used to check the divider output. Finally, the circuit is powered up using Agilent E3631A power supply.



Fig. 4.38 SFB-PLL chip micrograph.

Fig. 4.39 depicts the PLL output spectrum showing that the PLL provides a reference spur suppression of -74.025dBc. Fig. 4.40 shows the phase noise of the PLL which indicates that the PLL bandwidth is around 300kHz. Fig. 4.41 shows the settling time of the PLL for a frequency jump of 7 channels. The settling time is around 11.82µsec.



Fig. 4. 39 SFB-PLL's output spectrum showing the reference spur at 6MHz offset.



Fig. 4.40 Phase-noise plot of the SFB-PLL.



Fig. 4.41 Transient response of the SFB-PLL showing a settling time of 11.8µsec.

Table 4.3 presents a comparison with the state-of-the-art published work. For a fair comparison, the FOM for the reference spur derived before is used to compare between the different PLLs. The ratio  $\left(\frac{\omega_p}{\omega_{GBW}}\right)$  is assumed to be 4 for all designs, which secures a phase margin of 60° with minimum loop filter area. Since the reference frequency is 6MHz, the VCO gian is 100MHz/V and the unity-gain frequency of the PLL is 300kHz, the FOM is given by:

$$FOM = A_{Spurs} \Big|_{dBc} - 20 \log\left(\frac{K_{vco}}{2\omega_{ref}}\right) - 20 \log\left(\frac{\omega_{GBW}}{\omega_{ref}}\right) - 20 \log\left(\frac{\omega_{p}}{\omega_{GBW}}\right)$$
$$= -74.025 - 20 \log\left(\frac{100}{2*6}\right) - 20 \log\left(\frac{0.3}{6}\right) - 20 \log(4) = -78.46 \, dBc$$

Table 4.3 SFB-PLL performance summary compared to the state-of-the-art PLLs.

|                                                        | This work             | [50]                 | [51]                                      |            | [52]                 | [53]                 | [54]                 |
|--------------------------------------------------------|-----------------------|----------------------|-------------------------------------------|------------|----------------------|----------------------|----------------------|
| Output<br>Frequency                                    | 3.6GHz                | 2.21GHz              | 0.8GHz                                    | 2.4GHz     | 2.4GHz               | 5.2GHz               | 5.4GHz               |
| Reference<br>Frequency                                 | 6MHz                  | 55.25MHz             | 100<br>MHz                                | 300<br>MHz | 12MHz                | 10MHz                | 10MHz                |
| $f_{\scriptscriptstyle BW}/f_{\scriptscriptstyle ref}$ | 1/20                  | 1/20                 | 1/10                                      | 1/30       | 1/12                 | 1/50                 | 1/400                |
| $K_{\rm VCO}/f_{\rm ref}$                              | 16.67                 | 0.9                  | 5                                         | 3.33       | 5                    | 30                   | 22                   |
| Spur (dBc)                                             | -74                   | -80 to -86           | -68.5                                     | -48.3      | -70.45               | -68.5                | -70                  |
| FOM (dBV)                                              | -78.46                | -59.09 to<br>-65.09  | -68.46                                    | -35.19     | -68.83               | -70.04               | -50.79               |
| Power<br>consumption                                   | 1.5mW                 | 3.8mW                | -                                         | 23mW       | 48.78mW              | 19.8mW               | 13.5mW               |
| VCO<br>topology                                        | Single-<br>ended ring | LC-VCO               | Pseudo-differential<br>Cross-coupled ring |            | LC-VCO               | LC-VCO               | LC-VCO               |
| Phase noise                                            | -60                   | -121                 | -119.8 @ 100KHz                           |            | -103                 | -76                  | -63                  |
| (dBc/Hz)                                               | @ 10KHz               | @ 200KHz             | Measured @ 2GHz                           |            | @100KHz              | @ 20KHz              | @ 10KHz              |
| Area                                                   | 0.063mm <sup>2</sup>  | 0.200mm <sup>2</sup> | 0.070mm <sup>2</sup>                      |            | 4.800mm <sup>2</sup> | 0.640mm <sup>2</sup> | 0.490mm <sup>2</sup> |
| Technology                                             | 90nm digital<br>CMOS  | 0.18μm<br>CMOS       | 0.13μm digital<br>CMOS                    |            | 0.18µm<br>CMOS       | 0.18µm<br>CMOS       | 0.25µm<br>CMOS       |

# 4.6 Summary

A spur-frequency-boosting PLL (SFB-PLL) that provides low reference-spur is proposed. The proposed architecture eliminates the need for decreasing the loop bandwidth and/or decreasing the VCO gain. In addition, it mitigates the channel-selection-dependent-sampler problem suffered by the previously published sampled-loop-filter PLLs. Moreover, it eliminates the mismatches problem suffered by the previously published multi-path PFD/CP. Measurements of the fabricated prototype shows 8.4dB improvement in spur suppression compared with the best state-of-the-art published work [53] along with a VCO gain to reference frequency ratio,  $\frac{K_{VCO}}{f_{ref}}$ , of 17

and loop bandwidth to reference frequency ration,  $\frac{f_{GBW}}{f_{ref}}$  ,of 1/20 were used. The PLL

consumes 1.5mW in 90nm digital CMOS provided through UMC and occupies 0.063mm<sup>2</sup> only.

#### CHAPTER V

### CONCLUSIONS AND FUTURE WORK

New architectures for implementing analog circuits in time-mode has been proposed. A sub-ps matching in 65nm technology is proved to be feasible on silicon through the TDC which provides attractive opportunities for designing analog circuits in time-mode. In addition, with the increased resolution of the CMOS process with the emerging 28nm and 12 nm technologies, the dynamic range of the systems will be much higher. In addition, the power consumption is expected to scale down with the technology. Moreover, the performance of the time-mode analog circuits is expected to be better in the newer nanometric technologies. One area for future research is to explore the other TDC architecture which can provide much higher resolution, compared to the inverter chain one that has been used in the current project, which will open opportunities for a much higher resolution TDC-based  $\Sigma\Delta ADCs$ . Since matching might be problematic for higher resolution, reference recycling TDC could be a better option for future research.

The proposed oscillator proves that a need for a high-Q band-pass filter is not longer exist. In addition, the lower-THD, lower-power, higher-output-swing along with compact design proves that time-mode designs can provide very competitive solutions. Moreover, scaling the technology is expected to improve the performance even more which is an advantage that voltage-mode solutions lack. However, more research can be performed on how to get the optimum time-shifts for certain THD specifications. In addition, the trade-off between the number of signals to be added and the resolution of the time-shift should be investigated to get an optimum design that consumes minimum power.

Finally, the novel PLL architecture exploits the voltage-mode and time-mode converters by using a time-to-voltage converter followed by a time-to-voltage converter to boost the spur frequency of the PLL ending up with a competitive low reference-spur PLL. Since the proposed architecture is novel, a lot of research can be done on the optimization of the TVC and the VTC blocks. In addition, since the proposed architecture provides new degrees of freedom in the PLL design, the system level design should be investigate to incorporate the phase noise into the design equations. The use of an LC-VCO is a must in any future work in order to provide a reference-spur free along with a good phase noise performance PLL. Finally, far frequency poles can be used in the loop filter to suppress the boosted spurs without affecting the dynamics of the loop.

#### REFERENCES

- B. Razavi, Design of Analog CMOS Integrated Circuits, McGraw-Hill, Boston, MA, 2001.
- [2] K. Smith, A. Sedra, "The Current Conveyor a New Circuit Building Block" Proc. IEEE, vol. 56, no. 8, pp. 1368-1369, Aug. 1968.
- [3] F. Yuan, CMOS Current-Mode Circuits for Data Communications, Springer, New York, 2007.
- [4] R. Tsuchiya, K. Ohnishi, M. Horiuchi, S. Tsujikawa, Y. Shimamoto, N. Inada, J. Yugami, F. Ootsuka, T. Onai, "Femto-Second CMOS Technology With High-k Offset Spacer and SiN Gate Dielectric with Oxygenenriched Interface," in *IEEE Symposium on VLSI Technology*, 2002, pp. 1510-151.
- [5] Y. Arai, T. Ohsugi, "TMC-a CMOS Time to Digital Converter VLSI," *IEEE Trans. Nuclear Science*, vol. 36, no. 1, pp. 528-531, Feb. 1989.
- [6] F. Amorini, A. Anzalone, R. Bassini, C. Boiano, G. Cardella, S. Cavallaro, et al., "Digital Signal Processing for Mass Identification in a 4π-Detector, Using Time of Flight Measurement," *IEEE Trans. Nuclear Science*, vol. 55, no. 2, pp. 717-722, Apr. 2008.
- [7] Y. Liu, U. Vollenbruch, Y. Chen, C. Wicpalek, L. Maurer, et al., "A 6ps Resolution Pulse Shrinking Time-to-Digital Converter as Phase Detector in Multi-Mode Transceiver" in *Proc. IEEE Radio and Wireless Symposium*, Jan. 2008, pp. 163-166.

- [8] Y. Arai, M. Ikeno and T. Matsumura, "Development of a CMOS Time Memory Cell VLSI and a CAMAC Module with 0.5 ns Resolution," *IEEE Trans. on Nucl. Sci.*, vol. 39, no.4, pp.784-788, Aug. 1992.
- [9] T. Rahkonen, J. T. Kostomovaraa, "The Use of Stabilized CMOS Delay Lines for the Digitization of Short Time Intervals," *IEEE J. Solid-State Circuits*, vol. 28, no. 8, pp. 887-894, Aug. 1993.
- [10] A. Rothermel, F. Dell'ova, "Analog Phase Measuring Circuits for Digital CMOS IC's", *IEEE J. Solid-State Circuits*, vol. 28, no.7, pp. 853-856, Jul. 1993
- [11] M. Santos, "A CMOS Delay Locked Loop and Sub-Nanosecond Time-to-Digital Converter Chip," *IEEE Trans. Nuclear Science*, vol. 43, no. 3, pp. 1717-1719, June 1996.
- [12] X. Kang, S. Wang, Y. Liu, X. Sun, R. Zhou, et al. "A Simple Smart Time-to-Digital Converter Based on Vernier Method for a High Resolution LYSO MicroPET," in *Proc. IEEE Nuclear Science Symposium*, Oct. 2007, vol. 4, pp. 2892-2896.
- [13] P. Chen, C. C. Chen, J. C. Zheng, Y. S. Shen, "A PVT Insensitive Vernier-Based Time-to-Digital Converter with Extended Input Range and High Accuracy," *IEEE Trans. Nuclear Science*, vol. 54, no. 2, pp. 294-302, Apr. 2007.
- [14] X. Kang, Y. Liu, X. Sun, S. Wang, Y. Xia, et al, "Front-end Electronics Design based on Vernier Method for a High Resolution MicroPET," in *Proc. International Conference on Biomedical Engineering and Interface*, May 2008, pp. 800-803.

- [15] S. Henzler, S. Koeppe, D. Lorenz, W. Kamp, R. Kuenemund, et al, "Variation Tolerant High Resolution and Low Latency Time-to-Digital Converter," in *Proc.* 33rd European Solid State Circuits Conference (ESSCIRC), Sept. 2007 pp. 194 – 197.
- [16] V. Ramakrishnan, P. T. Balsara, "A Wide-Range, High-Resolution, Compact, CMOS Time To Digital Converter," in *Proc. 19th International Conference on VLSI Design*, Jan. 2006, pp. 197-202.
- [17] J. P. Jansson, A. Mantyniemi, J. Kostamovaara, "A CMOS Time-to-Digital Converter with Better Than 10 ps Single-Shot Precision," *IEEE J. Solid-State Circuits*, vol. 41, no. 6, pp. 1286-1296, June 2006.
- [18] P. Chen, L. Shen-Luan W. Jingshown, "A CMOS Pulse-Shrinking Delay Element for Time Interval Measurement" *IEEE Trans. on Circuits and Systems II*, vol. 47, no. 9, pp. 954-958, Sept. 2000.
- [19] E. Räisänen-Ruotsalainen, T. Rahkonen, J. Kostamovaara, "A BiCMOS Time-to-Digital Converter with Time Stretching Interpolators," in *Proc. European Solid State Circuits Conference (ESSCIRC'96)*, Neuchatel, Switzerland, September 1996, pp.428-431.
- [20] V. Dhanasekaran, M. Gambhir, M. M. Elsayed, E. Sánchez-Sinencio, J. Silva-Martinez, C. Mishra, L. Chen, E. Pankratz "A 20MHz BW 68dB DR CT ΔΣ ADC Based on a Multi-Bit Time-Domain Quantizer and Feedback Element," in *Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, San Francisco, CA, Feb. 2009, pp. 174-175.

- [21] M. M. Elsayed, and E. Sánchez-Sinencio, "A Low THD, Low Power, High Output-Swing Time-Mode-Based Tunable Oscillator via Digital Harmonic-Cancellation Technique," *IEEE J. Solid-States Circuits*, vol 45, no. 5, pp. 1061-1071, May 2010.
- [22] X. Wu, J. Trogolo, F. Inoue, Z. Chen, P. Jones-Williams, I. Khan, P. Madhani, "Impact of Sinter Process and Metal Coverage on Transistor Mismatching and Parameter Variations in Analog CMOS Technology," in *Proc. IEEE Int. Microelectronic Test Structure Conf. (ICMTS '07)*, pp. 69-73.
- [23] H. P. Tuinhout, M. Vertregt, "Characterization of Systematic MOSFET Current Factor Mismatch Caused by Metal CMP Dummy Structures," *IEEE Trans. on Semiconductor Manufacturing*, vol. 14, no. 4, pp. 302-310, Nov. 2001.
- [24] M. Lee, A. Abidi, "A 9 b, 1.25 ps Resolution Coarse–Fine Time-to-Digital Converter in 90 nm CMOS That Amplifies a Time Residue," *IEEE J. of Solid-State Circuits*, vol. 43-4, no. 4, pp. 769-777, Apr. 2008.
- [25] S. Henzler, S. Koeppe, D. Lorenz, W. Kamp, R. Kuenemund, D. Schmitt-Landsiedel, "A Local Passive Time Interpolation Concept for Variation-Tolerant High-Resolution Time-to-Digital Conversion," *IEEE J. of Solid-State Circuits*, vol. 43-7, no. 7, pp. 1666-1676, Jul. 2008.
- [26] S. Henzler, S. Koeppe, W. Kamp, H. Mulatz, D. Schmitt-Landsiedel, "90nm 4.7ps-Resolution 0.7-LSB Single-Shot Precision and 19pJ-per-Shot Local Passive Interpolation Time-to-Digital Converter with On-Chip Characterization," in *Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, San Francisco, CA, Feb. 2008, pp. 548-549.

- [27] K. Karadamoglou, N. Paschalidis, E. Sarris, N. Stamatopoulos, G. Kottaras, V. Paschalidis "An 11-bit high-resolution and Adjustable-Range CMOS Time-to-Digital Converter for Space Science Instruments," *IEEE J. of Solid-State Circuits*, vol. 39-1, no. 1, pp. 214-222, Jan. 2004.
- [28] A. Mantyniemi, T. Rahkonen, J. Kostamovaara, "A CMOS Time-to-Digital Converter (TDC) Based On a Cyclic Time Domain Successive Approximation Interpolation Method," *IEEE J. of Solid-State Circuits*, vol. 44-11, no. 11, pp. 3067-3078, Nov. 2009.
- [29] J. Wibbenmeyer, C. H. Chen, "Built-In Self-Test for Low-Voltage High-Speed Analog-to-Digital Converters," *IEEE Trans. on Instrumentation and Measurement*, vol. 56, no. 6, pp. 2748 – 2756, Dec. 2007.
- [30] D. Dallet, D. Slepicka, Y. Berthoumieu, D. Haddadi, P. Marchegay, "[ADC Characterization in Time Domain] Frequency Estimation to Linearize Time-Domain Analysis of A/D Converters," *IEEE Trans. on Instrumentation and Measurement*, vol. 55, no.5, pp. 1536 – 1545, Oct. 2006.
- [31] H. Ting, C. Lin, B. Liu, S. Chang, "Reconstructive Oscillator Based Sinusoidal Signal Generator for ADC BIST," in *Proc. IEEE Asian Solid-State Circuits Conference*, pp. 65-68, Nov. 2005.
- [32] J. Blair, "Histogram Measurement of ADC Nonlinearities Using Sine Waves," *IEEE Trans. on Instrumentation and Measurement*, vol. 43, no. 3, pp. 373–383, June 1994.

- [33] M. Méndez-Rivera, A. Valdes-Garcia, J. Silva-Martinez and E. Sánchez-Sinencio,
   "An On-Chip Spectrum Analyzer for Analog Built-In Testing," J. Electronic Testing: Theory and Applications, vol. 21, no. 3, pp. 205 – 219, June 2005
- [34] B. Dufort, G.W. Roberts, "On-Chip Analog Signal Generation for Mixed-Signal Built-In Self-Test," *IEEE J. Solid-State Circuits*, vol. 34, no. 3, pp. 318-330, Mar. 1999.
- [35] B. Dufort, G.W. Roberts, "Increasing the Performance of Arbitrary Waveform Generators Using Sigma-Delta Coding Techniques," in Proc. IEEE International Test Conference, pp. 241-248, Oct. 1998.
- [36] J. Wibbenmeyer, C. H. Chen, "Built-In Self-Test for Analog-to-Digital Converters in SoC Applications," in *Proc. IEEE Industrial Electronics Conf. (IECON05)*, pp. 2231-2236, Nov. 2005.
- [37] IEEE-Std-1241—Standard for Terminology and Test Methods for Analog to Digital Converters, June 2001.
- [38] S. Pavan and Y. Tsividis, "An Analytical Solution for a Class of Oscillators, and Its Application to Filter Tuning," *IEEE Trans. Circuits & System I*, vol. 45, no. 5, pp. 547–556, May 1998.
- [39] A. Sedra, K. C. Smith, *Microelectronic Circuits*, 5<sup>th</sup> ed. New York: Oxford University Press, 2004.
- [40] J. L. Ausin, J. F. Duque-Carillo, G. Torelli, M. A. Dominguez "A Design Strategy for Area Efficient High-Order High-Q SC Filters," in *Proc of International Symposium on Circuits and Systems (ISCAS'04)* vol. 1, pp. I-85 – I-88, May 2004.

- [41]F. Bahmani, E. Sánchez-Sinencio, "Low THD Bandpass-Based Oscillator Using Multilevel Hard Limiter," *IET Circuits, Devices and Systems*, vol. 1, no. 2, pp. 151-160, April 2007.
- [42] S. W. Park, J. L. Ausin, F. Bahmani, E. Sánchez-Sinencio, "Nonlinear Shaping SC Oscillator with Enhanced Linearity," *IEEE J. Solid-State Circuits*, vol. 42, no. 11, pp. 2421-2431, Nov. 2007.
- [43] E. Seok, C. Cao; D. Shim, D. J. Arenas, D. B. Tanner, C. Hung; K. K. O, "A 410GHz CMOS Push-Push Oscillator with an On-Chip Patch Antenna," *in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, 2008*, pp. 472-473.
- [44] Y. Tang, H. Wang, "Triple-Push Oscillator Approach: Theory and Experiments," *IEEE J. Solid-State Circuits*, vol. 36, no. 10, pp. 1472–1479, Oct. 2001.
- [45] S. C. Tang, G. T. Clement, "A Harmonic Cancellation Technique for an Ultrasound Transducer Excited by a Switched-Mode Power Converter," *in Proc. IEEE Ultrasonics Symposium*, pp. 2076-2079, Nov. 2008.
- [46] M. A. El-Moursy, E. G. Friedman, "Exponentially Tapered H-Tree Clock Distribution Networks," *IEEE Trans. on Very Large Scale Integration (VLSI) Systems*, vol. 13, no. 8, pp. 971-975, Aug. 2005.
- [47] Y. Byung-Do, J. H. Choi, H. Seon-Ho, K. Lee-Sup, and Y. Hyun-Kyu, "An 800-MHz Low-Power Direct Digital Frequency Synthesizer with an On-Chip D/A Converter," *IEEE J. Solid-State Circuits*, vol. 39, no. 5, pp. 761–774, May 2004.

- [48] J. Galan, R. G. Carvajal, A. Torralba, F. Muñoz, J. Ramirez-Angulo, "A Low-Power Low-Voltage OTA-C Sinusoidal Oscillator with a Large Tuning Range," *IEEE Trans. on Circuits and Systems I*, vol. 52, no. 2, pp. 283-291, Feb. 2005.
- [49] A. Mohieldin, A. Emira, and E. Sánchez-Sinencio, "A 100 MHz 8mW ROM-Less Quadrature Direct Digital Frequency Synthesizer," *IEEE J. Solid-State Circuits*, vol. 31, no. 10, pp. 1235–1243, Oct. 2002.
- [50] X. Gao; E. Klumperink, G. Socci, M. Bohsali, B. Nauta, ""Spur Reduction Techniques for Phase-Locked Loops Exploiting A Sub-Sampling Phase Detector," *IEEE J. Solid-State Circuits*, vol.45, no.9, pp.1809-1821, Sept. 2010.
- [51] Z. Cao, Y. Li, S. Yan, "A 0.4 ps-RMS-Jitter 1–3 GHz Ring-Oscillator PLL Using Phase-Noise Preamplification," *IEEE J. Solid-State Circuits*, vol. 43, no. 9, pp. 2079 - 2089, Sept. 2008.
- [52] K. J. Wang, A. Swaminathan and I. Galton, "Spurious Tone Suppression Techniques Applied to a Wide-Bandwidth 2.4 GHz Fractional-N PLL," in *Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2008, pp. 342-343.
- [53] C.-F. Liang, S.-H. Chen and S.-I. Liu, "A Digital Calibration Technique for Charge Pumps in Phase-Locked Systems," *IEEE J. Solid-State Circuits*, vol. 43, no. 2, pp. 390 - 398, Feb. 2008.
- [54] S. Pellerano, S. Levantino, C. Samori, A. Lacaita, "A 13.5-mW 5-GHz Frequency Synthesizer With Dynamic Logic Frequency Divider," *IEEE J. of Solid-State Circuits*, vol. 39, no. 2, pp. 378-383, Feb. 2004.

- [55] E. Tang, M. Ismail, S. Bibyk "A New Fast-Settling Gearshift Adaptive PLL to Extend Loop Bandwidth Enhancement in Frequency Synthesizers," in *Proc of International Symposium on Circuits and Systems (ISCAS) Dig. Tech. Papers*, 2002, pp. IV-787 - IV-790.
- [56] C. Charles, D. Allstot, "A Calibrated Phase/Frequency Detector for Reference Spur Reduction in Charge-Pump PLLs," *IEEE Trans. Circuits and Systems II*, vol. 53, no.
  9, pp. 822 - 826, Nov. 2006.
- [57] C. Kuo, J. Chang, S. Liu, "A Spur-Reduction Technique for a 5-GHz Frequency Synthesizer," *IEEE Trans. Circuits and Systems I*, vol. 53, no. 3, pp. 526 - 533, Mar. 2006.
- [58] T. Lee, W. Lee, "A Spur Suppression Technique for Phase-Locked Frequency Synthesizers," in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2006, pp. 2432-2441.
- [59] K. J. Wang, A. Swaminathan and I. Galton, "Spurious Tone Suppression Techniques Applied to a Wide-Bandwidth 2.4 GHz Fractional-N PLL," *IEEE J. Solid-State Circuits*, vol. 43, no. 12, pp. 2787 - 2797, Dec. 2008.

### VITA

Mohamed Mostafa Elsayed received the B.Sc. and M.Sc. degrees in electrical engineering from Cairo University, Cairo, Egypt, in 2002 and 2005, respectively. Since 2005, he has been pursuing the Ph.D. degree at the Analog and Mixed Signal Center (AMSC), Texas A&M University, and he received his PhD in December 2011.

During summer 2001, he interned at Saarland University, Saarbrücken, Germany, where he worked on multi-threshold and dynamic-threshold digital circuits. He was a teaching and research assistant at Cairo University from 2002 to 2005. In summer and fall 2009, he was with Texas Instruments Inc., Dallas, TX, as a design intern where he designed a low-power DAC. His research interests include time-to-digital converters, phase-locked-loops, and digitally enhanced analog circuits.

Mr. Elsayed can be reached through the Department of Electrical Engineering, Texas A&M University, 3128 TAMU, College Station, TX 77843.