# DESIGN OF CMOS INTEGRATED PHASE-LOCKED LOOPS

## FOR MULTI-GIGABITS SERIAL DATA LINKS

A Dissertation

by

# SHANFENG CHENG

Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements for the degree of

## DOCTOR OF PHILOSOPHY

December 2006

Major Subject: Electrical Engineering

### DESIGN OF CMOS INTEGRATED PHASE-LOCKED LOOPS

## FOR MULTI-GIGABITS SERIAL DATA LINKS

## A Dissertation

by

## SHANFENG CHENG

## Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements for the degree of

## DOCTOR OF PHILOSOPHY

Approved by:

| Chair of Committee, | Jose Silva-Martinez    |
|---------------------|------------------------|
| Committee Members,  | Aydin Ilker Karsilayan |
|                     | Peng Li                |
|                     | Manuel P. Soriaga      |
| Head of Department, | Costas Georghiades     |

December 2006

Major Subject: Electrical Engineering

### ABSTRACT

Design of CMOS Integrated Phase-locked Loops for Multi-Gigabits Serial Data Links. (December 2006) Shanfeng Cheng, B.Sc., Fudan University; M.Sc., Fudan University

Chair of Advisory Committee: Dr. Jose Silva-Martinez

High-speed serial data links are quickly gaining in popularity and replacing the conventional parallel data links in recent years when the data rate of communication exceeds one gigabits per second. Compared with parallel data links, serial data links are able to achieve higher data rate and longer transfer distance. This dissertation is focused on the design of CMOS integrated phase-locked loops (PLLs) and relevant building blocks used in multi-gigabits serial data link transceivers.

Firstly, binary phase-locked loops (BPLLs, i.e., PLLs based on binary phase detectors) are modeled and analyzed. The steady-state behavior of BPLLs is derived with combined discrete-time and continuous-time analysis. The jitter performance characteristics of BPLLs are analyzed. Secondly, a 10 Gbps clock and data recovery (CDR) chip for SONET OC-192, the mainstream standard for optical serial data links, is presented. The CDR is based on a novel referenceless dual-loop half-rate architecture. It includes a binary phase-locked loop based on a quad-level phase detector and a linear frequency-locked loop based on a linear frequency detector. The proposed architecture enables the CDR to achieve large locking range and small jitter generation at the same time. The prototype is implemented in 0.18  $\mu$ m CMOS technology and consumes 250 mW under 1.8 V supply. The jitter generation is 0.5 ps-rms and 4.8 ps-pp. The jitter peaking and jitter tolerance performance exceeds the specifications defined by SONET OC-192 standard. Thirdly, a fully-differential divide-by-eight injection-locked frequency divider with low power dissipation is presented. The frequency divider consists of a four-stage ring of CML (current mode logic) latches. It has a maximum operating frequency of 18 GHz. The ratio of locking range over center frequency is up to 50%. The prototype chip is implemented in 0.18  $\mu$ m CMOS technology and consumes 3.6 mW under 1.8 V supply. Lastly, the design and optimization techniques of fully differential charge pumps are discussed. Techniques are proposed to minimize the nonidealities associated with a fully differential charge pump, including differential mismatch, output current variation, low-speed glitches and high-speed glitches. The performance improvement brought by the techniques is verified with simulations of schematics designed in 0.35  $\mu$ m CMOS technology.

# DEDICATION

To the memory of my grandfather

#### ACKNOWLEDGMENTS

Firstly, I'd like to thank my advisor, Dr. Jose Silva-Martinez, for his enlightening guidance and advice to my research during my Ph.D. study at the Analog & Mixed-signal Center. He showed me the art of analog integrated circuit design through his in-depth knowledge and pioneering expertise. He was always accessible and willing to answer my questions. I was deeply affected by his serious attitude and full devotion to scientific research. I also would like to thank Dr. Edgar Sanchez-Sinencio, the director of the Analog & Mixed-signal Center, for what I learned in his courses. His courses were the most difficult while the most interesting ones I've ever had. What he taught will be useful throughout my career as an analog IC design engineer. I'd like to thank Dr. Aydin Ilker Karsilayan for his valuable input to the design of the SONET OC-192 CDR and his valuable comments to my publications.

I'd like to thank my friend, Haitao Tong, for the countless discussions which inspired my enthusiasm, my interest and my ideas. He is the most helpful and unselfish person I've ever met. His earnestness and meticulousness with academic research often stimulated me to work hard towards the goal instead of slacking off. I would not have gone this far in my Ph.D. research without the contributions and help from Haitao. I also would like to thank my mentor, Mr. Hui Pan, at Broadcom for guidance and discussions during my internship at Broadcom. I'd like to thank my friend, Jianhong Xiao, for interesting discussions and valuable input to my research. I'd like to thank all the labmates at the Analog & Mixed-signal Center for useful discussions and sharing ideas.

I would like to thank my wife, Yingying Chen, for her constant love, care and support. She gave me the encouragement and motivation to make it through all the hard work in all the years during my Ph.D. study. I'd like to thank my father and mother for being such wonderful parents. What they taught me in my early childhood has been, and will continue to be, invaluable lessons throughout my life. My gratitude for them is beyond words and will last forever.

I would like to thank Dr. Peng Li, Dr. Henry F. Taylor, and Dr. Soriaga for serving as my committee members and for their valuable input and suggestions.

I would like to thank MOSIS for manufacturing the prototype chips. I would like to thank Broadcom Corp. for providing testing equipment and facilities.

## TABLE OF CONTENTS

| Pag               | ge |
|-------------------|----|
| ABSTRACTi         | ii |
| DEDICATION        | v  |
| ACKNOWLEDGMENTS   | vi |
| TABLE OF CONTENTS | ii |

| LIST OF FIGURES | X |
|-----------------|---|
| LIST OF TABLES  |   |

# CHAPTER

| Ι   | INTR                                      | ODUCTION                                                                                              | 1                 |
|-----|-------------------------------------------|-------------------------------------------------------------------------------------------------------|-------------------|
|     | I.1.<br>I.2.<br>I.3.                      | Application Background<br>Architectures of Serial Data Link Transceivers<br>Research Focus            | 2                 |
| II  |                                           | ELING AND ANALYSIS OF PHASE-LOCKED LOOPS BASED ON<br>RY PHASE DETECTORS                               | 7                 |
|     | II.1.<br>II.2.<br>II.3.                   | Introduction<br>Steady-State Analysis of BPLL<br>Jitter Analysis                                      | 9                 |
|     | II.4.                                     | Summary                                                                                               |                   |
| III | A 100                                     | BPS CDR FOR SONET OC-192 STANDARD                                                                     | 58                |
|     | III.1.<br>III.2.<br>III.3.<br>III.4       | Introduction to Optical Transceivers<br>Existing CDR Architectures<br>Proposed Solution<br>Conclusion | 60<br>64          |
| IV  | A FUI                                     | LLY-DIFFERENTIAL LOW-POWER DIVIDE-BY-8 INJECTION-<br>ED FREQUENCY DIVIDER UP TO 18GHZ                 |                   |
|     | VI.1.<br>VI.2.<br>VI.3.<br>VI.4.<br>VI.5. | Introduction<br>Conventional Frequency Dividers<br>Proposed Divide-by-8 ILFD<br>Measurement Results   | 121<br>125<br>136 |
|     | VI.J.                                     | Conclusion                                                                                            | 144               |

# CHAPTER

| V DESIGN AND ANALYSIS OF HIGH-SPEED GLITCH-FREE FULLY                 |     |
|-----------------------------------------------------------------------|-----|
| DIFFERENTIAL CHARGE PUMP WITH MINIMUM CURRENT                         |     |
| MISMATCH AND VARIATION                                                | 145 |
| V.1. Introduction                                                     | 145 |
| V.2. Fully Differential Charge Pump with Accurate Matching and Minimu | um  |
| Current Variation                                                     | 146 |
| V.3. Glitch Suppression                                               | 153 |
| V.4. Complete Implementation of the Charge Pump                       |     |
| V.5. Conclusion                                                       | 161 |
| VI SUMMARY AND CONCLUSIONS                                            | 162 |
| REFERENCES                                                            | 165 |
| VITA                                                                  | 172 |

# Page

# LIST OF FIGURES

Page

Х

| Fig. 1.1. Comparison of a parallel ATA cable and a serial ATA cable2                 |
|--------------------------------------------------------------------------------------|
| Fig. 1.2. Typical architecture of a single-channel transceiver                       |
| Fig. 1.3. Typical architecture of a multiple-channel transceiver                     |
| Fig. 2.1. Transfer characteristic of an ideal BPD9                                   |
| Fig. 2.2. Phase-domain model of PLL based on binary PD9                              |
| Fig. 2.3. Schematic of a first-order loop filter                                     |
| Fig. 2.4. Steady-state waveforms of BPLL with 1st-order filter                       |
| Fig. 2.5. POUT0 and POUT1 across all the modes for BPLL with 1st order filter        |
| Fig. 2.6. Schematic of the second order loop filter                                  |
| Fig. 2.7. Steady-state waveforms of BCDR with C1=10C2 and TP=2TS21                   |
| Fig. 2.8. POUT0 and POUT1 vs. oscillation periods for BPLL with 2nd order filter24   |
| Fig. 2.9. Mapping relationship between bit patterns and jitter values                |
| Fig. 2.10. JISI with 10 Gb/s PRBS applied to a LPF of 4 GHz BW                       |
| Fig. 2.11. JISI amplitude with different number of buffer stages and bandwidth28     |
| Fig. 2.12. Transfer characteristic of a gradual-switching BPD                        |
| Fig. 2.13. Phase-sweeping characteristic of Alexander's BPD                          |
| Fig. 2.14. Input / output jitter waveforms in the slewing region                     |
| Fig. 2.15. Illustrative plot for BPLL jitter peaking                                 |
| Fig. 2.16. Simulated waveforms of BPLL with jitter peaking                           |
| Fig. 2.17. BPLL input-to-output jitter transfer characteristic                       |
| Fig. 2.18. Simulated BPLL waveforms with 20 KHz and 144 UI sinusoidal input jitter45 |

| Fig. 2.19. Jitter tolerance mask of a BPLL                                        | 47 |
|-----------------------------------------------------------------------------------|----|
| Fig. 2.20. Simulated waveforms of BCDR using an ABPD with 0.02 UI IRJISI          | 47 |
| Fig. 2.21. Simulated waveforms of BCDR using GBPD (KT=80) with 0.02 UI IRJISI.    | 50 |
| Fig. 2.22. Simulated waveforms of BPLL using ABPD and GBPD                        | 52 |
| Fig. 2.23. Structure modification to minimize jitter generation with GBPD         | 52 |
| Fig. 2.24. Simulated and predicted jitter generation caused by 0.01 UI JVCO       | 56 |
| Fig. 3.1. Block diagram of a typical optical transceiver                          | 59 |
| Fig. 3.2. Block diagram of a single-loop CDR                                      | 61 |
| Fig. 3.3. Block diagram of a dual-loop CDR with external reference                | 63 |
| Fig. 3.4. Structure of the referenceless CDR proposed in [25]                     | 64 |
| Fig. 3.5. Block diagram of the proposed CDR                                       | 67 |
| Fig. 3.6. Block diagram of the QPD                                                | 68 |
| Fig. 3.7. Timing diagram of CK90 and DIN under different phase errors             | 71 |
| Fig. 3.8. Output waveforms of the QPD with fixed input frequency difference       | 71 |
| Fig. 3.9. Schematic of half-rate binary PD                                        | 73 |
| Fig. 3.10. Schematic of double-edge D-flipflop                                    | 73 |
| Fig. 3.11. Modified double-edge D-flipflop with inversion on rising edge sampling | 74 |
| Fig. 3.12. Frequency detector based on unbalanced quadri-correlator               | 75 |
| Fig. 3.13. Frequency detector based on balanced quadri-correlator                 | 75 |
| Fig. 3.14. Block diagram of the linear FD                                         | 76 |
| Fig. 3.15. Timing diagram of the LFD                                              | 78 |
| Fig. 3.16. Timing diagram of LFD when $t_d > T_b/4$ and $\Delta f < 0$            | 80 |

Page

| Fig. 3.17. Block diagram of the revised LFD                                           | 82  |
|---------------------------------------------------------------------------------------|-----|
| Fig. 3.18. Timing diagram of the modified LFD when $\Delta f < 1/(2t_d$               | 82  |
| Fig. 3.19. Timing diagram of the modified LFD when $1/td > \Delta f > 1/(2t_d)$       | 83  |
| Fig. 3.20. Transfer curve of the LFD before and after modification                    | 83  |
| Fig. 3.21. The block diagram of VCO and its drivers                                   | 85  |
| Fig. 3.22. Internal schematic of each VCO stage                                       | 88  |
| Fig. 3.23. The phase noise of the VCO with post-layout parasitics                     | 89  |
| Fig. 3.24. Tuning curve of the VCO when temp= $50^{\circ}$ C, VCX=0.6 V               | 90  |
| Fig. 3.25. Schematic of the quad-level PCP                                            | 92  |
| Fig. 3.26. Schematic of the opamp used for mismatch control in the QCP                | 94  |
| Fig. 3.27. Output current of the QCP without mismatch and variation control           | 96  |
| Fig. 3.28. Output current of the QCP with mismatch control only                       | 97  |
| Fig. 3.29. Output current of the QCP with both mismatch control and variation control | 97  |
| Fig. 3.30. Transient waveforms of the QCP with LPE at different frequency offset      | 99  |
| Fig. 3.31. Schematic of the tri-state FCP                                             | 100 |
| Fig. 3.32. DC output current of the FCP                                               | 100 |
| Fig. 3.33. Transient output current of the FCP when $\Delta f=\pm 1$ GHz              | 101 |
| Fig. 3.34. Transient waveforms of the CDR macro model during the locking process      | 103 |
| Fig. 3.35. Transient waveforms of the transistor-level CDR during the locking process | 105 |
| Fig. 3.36. Jitter generation of the CDR in locked state                               | 107 |
| Fig. 3.37. Jitter tolerance of the CDR versus SONET jitter tolerance mask             | 110 |
| Fig. 3.38. Micro-photo of the CDR prototype chip                                      | 117 |

Page

xiii

| Fig. 4.1. Conventional divide-by-8 CML static frequency divider                     | 122 |
|-------------------------------------------------------------------------------------|-----|
| Fig. 4.2. Ring-oscillator based ILFD proposed in [33]                               | 123 |
| Fig. 4.3. Single-ended divide-by-2 ILFD proposed in [34]                            | 124 |
| Fig. 4.4. Divide-by-2 ILFD based on LC oscillator proposed in [35]                  | 125 |
| Fig. 4.5. Schematic of the proposed divide-by-8 LILFD                               | 126 |
| Fig. 4.6. D-Latch cell used in each stage of the LILFD                              | 126 |
| Fig. 4.7. Timing diagram of the input and output signals of the LILFD               | 127 |
| Fig. 4.8. Timing diagrams at the boundaries of the locking range                    | 128 |
| Fig. 4.9. Simulated locking range vs. differential input amplitude when VBP=0.3 V   | 132 |
| Fig. 4.10. Locking range of the LILFD under different bias conditions               | 133 |
| Fig. 4.11. The input sensitivity of the LILFD versus input frequency when VBP=0.3 V | 134 |
| Fig. 4.12. Normalized phase error vs. input frequency for the LILFD                 | 135 |
| Fig. 4.13. Test setup of ILFD chip                                                  | 136 |
| Fig. 4.14. Locking range of the LILFD with 3 dBm input power vs. VBP                | 138 |
| Fig. 4.15. Output signal spectrum of the LILFD when locked at 17.6 GHz              | 138 |
| Fig. 4.16. Measured output phase noise of the LILFD when locked at 17.6 GHz         | 139 |
| Fig. 4.17. Measured locking range of the LILFD vs. input power when VBP=0.3 V       | 140 |
| Fig. 4.18. Measured input sensitivity versus input frequency when VBP=0.3 V         | 140 |
| Fig. 4.19. Die photo of the LILFD prototype chip                                    | 143 |
| Fig. 5.1. Conceptual diagram of a differential charge pump                          | 149 |
| Fig. 5.2. Proposed fully differential charge pump with mismatch suppression         | 149 |
| Fig. 5.3. CMFB circuit for the differential charge pump                             | 151 |

# Page

| Fig. 5.4. Output currents with and without mismatch suppression                                                                  | 151 |
|----------------------------------------------------------------------------------------------------------------------------------|-----|
| Fig. 5.5. Variation suppression circuit                                                                                          | 152 |
| Fig. 5.6. Charge pump output current with and without variation suppression                                                      | 153 |
| Fig. 5.7. Transient waveforms of the NMOS diff. pair with fast input signal                                                      | 155 |
| Fig. 5.8. Proposed low-speed glitch suppression circuit                                                                          | 156 |
| Fig. 5.9. Common source node voltage and output current of the charge pump with and without low-speed glitch suppression circuit |     |
| Fig. 5.10. Proposed high-speed glitch suppression circuit                                                                        | 159 |
| Fig. 5.11. Output current with and without suppression of high-speed glitch                                                      | 159 |
| Fig. 5.12. Complete schematic of the proposed fully differential charge pump                                                     | 161 |

# LIST OF TABLES

| Table 3.1. Data rates and frame formats supported by SONET                            |
|---------------------------------------------------------------------------------------|
| Table 3.2. Mapping relationship between phase error, QPD output and QCP output        |
| Table 3.3. The range of frequency and amplitude of the VCO under different corners88  |
| Table 3.4. The range of frequency and amplitude of the VCO at different temperature89 |
| Table 3.5. PSRR of the VCO at different operating frequencies       91                |
| Table 3.6. Simulated full-chip performance summary of the CDR                         |
| Table 3.7. Performance comparison between this work and existing solutions            |
| Table 4.1. Performance comparison between the LILFD and existing solutions            |

### CHAPTER I

### INTRODUCTION

#### I.1. Application Background

High-speed serial data links are quickly gaining in popularity in recent years. They are taking the place of traditional parallel data links because serial data links enable data to be transmitted at higher data rates and over longer distance. Examples of traditional parallel data links include LPT (Line Printer Terminal, usually used to connect printers), Parallel ATA (Advanced Technology Attachment, commonly known as IDE interface, usually used to connect hard drives), SCSI (Small Computer System Interface, usually used to connect hard disks and scanners), etc. Parallel data links were widely used a few years ago. Compared with serial data links, parallel data links can directly transmit data in terms of bytes or words without the over head of assembling and disassembling. However, it's hard for parallel data links to achieve data rates higher than one gigabits/second. When the data rate goes above one gigabits/second, it is difficult to match the channel delay and synchronize the signals on the different channels of the parallel link. In comparison, serial data links can easily achieve data rates of several gigabits per second since the entire data stream is sent over a single channel and does not suffer from synchronization problem. Although serial data links can also have multiple channels, the different channels carry independent data streams and do not have to be synchronized. As an additional merit, the

This dissertation follows the style of IEEE Journal of Solid State Circuits.

cables for serial data links are more compact in form factor than cables for parallel data links and take less space in systems with limited space budget. Popular multi-gigabits serial data link applications or standards based on serial cables include Serial ATA [1], Serial SCSI [2], SONET/STM [3], 10 Gigabits Ethernet, 10 Gb/s Fiber Channel, etc. A comparison of Parallel ATA cable and Serial ATA cable is shown Fig. 1.1 to illustrate the difference in form factor. The Parallel ATA cable has 80 pins while the serial ATA cable has only 7 pins. Another type of serial data links are backplane transceivers in which the data is transmitted via metal trace on printed-circuit board (PCB) between different components on the same board. In this case, no additional cable is needed. Serial data links based on on-board traces include PCI Express [4], HyperTransport [5], RapidIO [6], etc.



Fig. 1.1. Comparison of a parallel ATA cable and a serial ATA cable

## I.2. Architectures of Serial Data Link Transceivers

Based on the number of channels, serial data link transceivers can be categorized into single-channel and multiple-channel transceivers. Single-channel transceivers are usually implemented with the architecture shown in Fig. 1.2. In the transmitter, a clock generator

PLL is used to generate the transmitter clock. A multiplexer (MUX) unit driven by the transmitter clock is employed to assemble low-speed parallel data streams into a high-speed serial data stream. The serial data stream is sent into the link channel via the transmitter driver (TX Driver). In the receiver, the signal picked up from the channel is first amplified to full scale by the front-end amplifier. After that, the Analog CDR (clock and data recovery) module is employed to recover the clock from the incoming data stream and retime the data with the recovered clock. The implementation of the CDR here is based on PLLs with analog loop filters. The retimed data stream is then split into parallel low-speed data streams via the demultiplexer block (Demux). This architecture is not suitable to be used in multiple-channel transceivers mainly because the transmitter and the receiver have separate PLLs and VCOs. When multiple transmitters and receivers are integrated on the same chip, the VCOs will pull each other because they are not synchronized in phase or frequency. As a result, the PLLs may not achieve lock or stay locked properly.

The typical architecture for multiple-channel serial data link transceivers is shown in Fig. 1.3. Four channels of transmitters and receivers are shown as an example, and they are usually integrated on the same chip in practical implementations. A single PLL is employed to generate the clock signal to drive the transmitters and receivers in all the channels. The transmitter structure here is similar to the transmitter structure shown in the left part of Fig. 1.2. Meanwhile, the receiver generates the recovered clock and the retimed data from the incoming data, making use of the same clock provided by the PLL. The recovered clock is generated from the clock provided by the PLL via dynamic phase-interpolation

automatically performed by a delay-locked loop (DLL) inside the receiver. The PLL used in multiple-channel transceivers must be able to generate at least four clock phases as the fundamental requirement by the phase-interpolation operation. Since there is only one VCO within the entire multiple-channel transceiver, no frequency pulling issues occur.



Fig. 1.2. Typical architecture of a single-channel transceiver



Fig. 1.3. Typical architecture of a multiple-channel transceiver

### I.3. Research Focus

In serial data link transceivers, the PLL is the most critical building block due to stringent performance requirements. In the transmitter, PLL is implemented as a frequency synthesizer which generates the transmitter clock from a crystal reference signal. In the receiver, the PLL takes the form of a CDR which is used to extract the recovered clock and the retimed data from the received signal. In serial data link transceivers, the most important performance index for the PLL is jitter in time domain instead of phase noise in frequency domain. The time-domain jitter must be controlled within a certain limit to maximize the eye-opening of the transmitted data and minimize the bit error rate (BER) of the received data. The jitter performance indexes include jitter generation, jitter tolerance to random jitter or deterministic jitter, jitter transfer bandwidth, jitter peaking, etc. In addition, the power dissipation of the PLLs used in serial data links must be minimized under the prerequisite of meeting the data rate requirements.

This dissertation is focused on the design of CMOS integrated phase-locked loops for applications in serial data link transceivers. The research will target at both optimization of system architectures and innovative design of various building blocks inside the PLL, e.g., phase detector, frequency detector, charge pump, voltage-controlled oscillators and frequency divider. The chip prototypes are implemented in standard CMOS processing technology since it is compatible with digital circuits and reduces the manufacturing cost. Chapter II covers the modeling and analysis of PLLs using binary phase detectors. It serves as the theoretical foundation for chapter III, which presents the design of a 10Gbps CDR for the application of SONET OC-192. Chapter VI presents the design of a fully differential injection-locked frequency divider with high operating frequency (up to 18 GHz) and low power dissipation. Chapter V discusses design and optimization techquiues of high-speed fully differential charge pumps. Chapter VI summarizes and makes conclusions about the entire dissertation.

#### CHAPTER II

# MODELING AND ANALYSIS OF PHASE-LOCKED LOOPS BASED ON BINARY PHASE DETECTORS

### II.1. Introduction

Phase-locked loops (PLL) using binary phase detectors (BPD) are receiving more attention with ever-increasing demand for higher operational frequency and data rate. BPD is also called bang-bang phase detector (PD). It outputs high level or low level depending on the sign of the input phase difference, as shown in Fig. 2.1. The advantage of BPD over linear phase detector (LPD) is that BPD can operate at a much higher speed without suffering from dead-zone problems or component mismatches [7]. PLLs based on BPDs, i.e., binary PLLs (BPLL), have found many applications in systems that require an ultra-high-speed reference input signal with a frequency comparable to the VCO frequency. Some examples include multi-gigahertz clock multipliers [8], optical receivers (STM, SONET) [7] [9], high-speed serial data links (SATA, PCI Express) [10].

BPLL is a nonlinear system because the BPD module has nonlinear phase-to-voltage transfer characteristic. It is a hybrid structure between a continuous system and a discrete-time system because the loop filter and VCO behave as continuous-time modules while the BPD works by using discrete-time sampling. Many efforts have been made investigating the nonlinear loop dynamics of BPLLs [11]-[13]. However, the existing models and analyses are incomplete, not very accurate and do not provide enough insights and

guidelines for IC designers. [11] and [12] mainly focus on the characterization of transfer and tolerance properties of BPLL in response to large sinusoidal input jitter without detailed analysis of the steady-state behavior of the loop itself. [13] focuses on a fully digital BPLL implementation with a pure discrete-time iterative method; this paper, however, does not discuss BPLL using analog filters which is more prevalent in practical applications. The condition for zero jitter peaking is not discussed in [11]-[13]. Also, no detailed discussions have been made in the existing literatures on jitter caused by intersymbol interference (ISI), which is the most dominant source of jitter generation in multigigabits binary CDRs. What is more, the condition to limit jitter peaking within a certain level has not been investigated, which makes it hard for designers to choose the minimum filter capacitance to achieve minimum silicon area and maximum level of integration.

The nonlinear loop dynamics of BPLL is modeled and analyzed in full detail in this work by combining discrete-time and continuous-time analysis. The steady-state waveforms of BPLL using 1<sup>st</sup> and 2<sup>nd</sup> order loop filters under a jitter-free environment are derived in section II. The existence of multiple oscillation modes is revealed and the stablest oscillation mode is determined by evaluating the tolerance against random jitter disturbance. Section III focuses on the jitter performance properties of BPLL. First, BPD and jitter due to ISI (JISI) are modeled. After that, jitter transfer bandwidth, jitter-peaking and jitter tolerance mask of BPLLs are characterized. Lastly, jitter generation due to JISI and VCO phase noise is analyzed. Section IV draws the conclusions of this analysis.



Fig. 2.1. Transfer characteristic of an ideal BPD



Fig. 2.2. Phase-domain model of PLL based on binary PD

BPLL is a phase-locked loop which detects the phase difference between the reference signal and feedback signal using a BPD. The reference signal can be a periodical signal or a random bit sequence. Particularly, when the incoming signal is random data, the BPLL becomes a Binary CDR (BCDR). The transfer characteristic of an ideal BPD is shown in

Fig. 2.1. The output of a BPD switches between low and high level depending on the sign of the input phase difference. The phase-domain block diagram of a BPLL is shown in Fig. 2.2.  $H_{LP}(S)$  models the transfer function of the loop filter.  $I_{CP}$  is the charge pump (CP) output current.  $K_{VCO}$  is the VCO gain.  $I_{CP}$  and  $K_{VCO}$  will be abbreviated as I and K in this work to reduce the length of long equations. The loop delay cell models the lumped delay (t<sub>d</sub>) caused by all the building blocks within the loop. The frequency divider module is optional depending on the actual implementation. For simplicity, it will be ignored in the following analysis since it is just a gain factor in the phase domain. The phase-domain model in Fig. 2.2 is implemented as a behavioral prototype with Simulink modules in Matlab to verify the correctness of the derived expressions. Steady-state analysis of a BPLL with 1<sup>st</sup> order and 2<sup>nd</sup> order filter are presented respectively.

### II.2.1 BPLL with First-order Loop Filter

In this section, the steady-state behavior of BPLLs with first order loop filters will be analyzed. A first-order loop filter is the series combination of a resistor R and a capacitor C to convert the charge pump current into voltage. The schematic of the first order filter is shown in Fig. 2.3. It is widely used in practical implementations of BPLL and BCDR [7]-[9]. In some fully-digital implementations [8], the 1st-order loop filter is split into a proportional branch and an accumulative branch. The steady-state waveforms of the BPLL prototype with a 1<sup>st</sup>-order filter are drawn illustratively for better understanding in Fig. 2.4 based on simulation results. P<sub>OUT-D</sub> is the delayed version of the output phase as indicated



Fig. 2.3. Schematic of a first-order loop filter

in Fig. 2.2. It can be reasonably assumed that all the steady state waveforms are symmetric around zero and have the same period of TP in steady state. Since the BPD output is just the sign of phase error, it must be a square wave of 50% duty cycle with period of TP in the absence of input jitter. Thus, the waveform of VCO control voltage (VC) is obtained as,

$$V_{C}(t) = \begin{cases} \frac{It}{C} + IR + \frac{IT_{P}}{4C}, & -\frac{T_{P}}{2} < t < 0\\ -\frac{It}{C} - IR + \frac{IT_{P}}{4C}, & 0 < t < \frac{T_{P}}{2} \end{cases}$$
(1)

The output phase  $P_{OUT}$  is derived as the integration of  $KV_C(t)$ ,

$$P_{OUT} = \begin{cases} K \left( \frac{It^2}{2C} + IRt + \frac{T_P It}{4C} + \frac{IRT_P}{4} \right) & -\frac{T_P}{2} < t < 0 \\ K \left( -\frac{It^2}{2C} - IRt + \frac{T_P It}{4C} + \frac{IRT_P}{4} \right) & 0 < t < \frac{T_P}{2} \end{cases}$$
(2)

Please note that the initial conditions for the output phase and control voltage in (1) and (2) are derived based on the assumption of symmetry around zero. The peak of the output phase occurs at t=0, which is obtained from (68) as,

$$A_{OUT} = P_{OUT}(0) = \frac{IKRT_P}{4}$$
(3)



Fig. 2.4. Steady-state waveforms of BPLL with 1st-order filter

It can be seen from (3) that the output phase amplitude is proportional to R and  $T_P$  and it does not depend on the filter capacitor value. As shown in Fig. 2.4, the peaks of  $P_{OUT}$  are

aligned to the transition edges of  $V_{PD}$ . The zero-crossing time of  $P_{OUT-D}$  (point C) must sit between time A (where the BPD switches) and time B (the sampling instant immediately preceding time A). Otherwise, the BPD would have switched at time B instead of time A because the sign of the phase difference has already changed at time B. Denoting the values of  $P_{OUT-D}$  at point A and B as  $P_{OUT0}$  and  $-P_{OUT1}$ , this condition can be expressed as follows,

$$\begin{cases} P_{OUT0} = P_{OUT-D}(0) = P_{OUT}(-t_d) > 0 \\ -P_{OUT1} = P_{OUT-D}(-T_s) = P_{OUT}(-t_d - T_s) < 0 \end{cases}$$
(4)

The range for the oscillation period  $T_P$  can be solved by substituting (68) into (4) and the result is obtained as,

$$T_{MIN} = \frac{2t_d (2RC - t_d)}{RC - t_d} < T_P < \frac{2(T_s + t_d)(2RC - T_s - t_d)}{RC - T_s - t_d} = T_{MAX}$$
(5)

The above range is derived based on the condition  $RC>T_S+t_d$  which is satisfied in most practical designs and ensures the stability of the BPLL. Also, since the BPD output is a square wave with 50% duty cycle and each half cycle must be a multiple of  $T_S$  due to the sampling nature of the BPD, the complete cycle  $T_P$  must be a multiple of  $2T_S$ , as expressed below,

$$T_P = 2nT_S, \quad n = 1, 2, 3...$$
 (6)

Combining (5) and (6), it can be concluded that  $T_P$  must be an even multiple of  $T_S$  staying within the upper limit  $T_{MAX}$  and the lower limit  $T_{MIN}$ . An interesting conclusion indicated by (5) is that the BPLL is able to oscillate at a range of oscillation periods (modes) in steady state. The actual oscillation mode depends on the initial voltage over the loop filter capacitor and the initial output phase of the VCO. The initial voltage on the capacitor to reach a particular oscillation period can be derived from (1) as follows,

$$V_{C0} = V_C(0) = IR + \frac{IT_P}{4C}$$
(7)

The initial output phase of the VCO is simply the value of  $P_{OUT}(0)$  which is given by (3). By applying the right initial voltage and output phase, all the oscillation modes can be produced by Matlab simulations. That verifies that all the oscillation modes are sustainable stable states of the BPLL in the absence of input jitter.

When the loop delay  $t_d$  is zero,  $T_{MIN}$  is zero and the actual lower limit for the possible oscillation periods becomes  $2T_s$ . When the loop delay is much larger than one sampling period, i.e.,  $t_d$ >> $T_s$ ,  $T_{MIN}$  is very close to  $T_{MAX}$  and the BPLL oscillates within a small frequency band in steady state. In this case, it can be approximated that the BPLL has only one mode equal to  $(T_{MAX}+T_{MIN})/2$  to achieve greater simplicity. It can be seen from (5) that the BPLL tends to oscillate at longer periods with the increase of the loop delay, which is intuitively consistent since longer latency decreases the response time of the loop.

When RC >>T<sub>S</sub>, the voltage variations on the capacitor can be ignored in steady state. Under this condition, the behavior of the BPLL closely resembles the behavior of BPLL with zero-order loop filter, i.e., a simple resistor; the values for  $T_{MIN}$  and  $T_{MAX}$  can be approximated as,

$$4t_d \approx T_{MIN} < T_p < T_{MAX} \approx 4(T_s + t_d)$$
(8)

It is easy to see that the BPLL has only two oscillation periods within the given range for  $T_P$ . When  $t_d \gg T_S$ , these two oscillation periods are proportional to the loop delay. In the special case when  $t_d=0$  and  $C=\infty$ , it is easy to see that  $T_{MIN}=0$  and  $T_{MAX}=4T_S$ ; thus, the BPLL has only one oscillation mode with  $T_P=2T_S$ .

Although the BPLL is able to have a range of sustainable oscillation modes in the jitter-free case, the steady states associated with each oscillation mode may be broken when the input jitter is large enough to change the decision of the BPD. When the input jitter stays smaller than both  $P_{OUT0}$  and  $P_{OUT1}$  as defined in (4) (refer to Fig. 2.4), the BPD output is exactly the same as the jitter-free case; the BPLL sustains its original oscillation mode without being disturbed. On the other hand, if the input jitter is larger than either  $P_{OUT0}$  or  $P_{OUT1}$ , the BPD will make different decisions from the jitter-free case. The original steady state will be broken and the loop may settle to a different oscillation mode. Therefore, we can define the following index to measure the relative stability of each oscillation mode,

$$D_{stable} = \min(P_{OUT0}, P_{OUT1})$$
(9)

The stablest oscillation mode (SOM) can be determined by finding the maximum value of  $D_{stable}$ . When Gaussian input jitter (virtually unbounded) is applied, the BPLL is expected to settle to the SOM at the greatest probability out of all the possible modes. The values of  $P_{OUT0}$  and  $P_{OUT1}$  across all the oscillation modes under a test case (RC/T<sub>S</sub>=4 and t<sub>d</sub>=T<sub>S</sub>) is shown in Fig. 2.5.  $P_{OUT0}$  increases monotonically and  $P_{OUT1}$  decreases monotonically with the increase of T<sub>P</sub>. It is easy to see that the maximum value of  $D_{stable}$  occurs when  $P_{OUT0}=P_{OUT1}$ . Thus, based on the definition of  $P_{OUT0}$  and  $P_{OUT1}$  in (4), the following equation must be satisfied at the SOM,

$$P_{OUT}(-t_{d} - T_{s}) + P_{OUT}(-t_{d}) = 0$$
(10)

The period at the SOM is derived by solving (10) as,

$$T_{P-SOM} = \frac{(T_s + 2t_d)(4RC - T_s - 2t_d) - T_s^2}{2RC - T_s - 2t_d}$$
(11)

This result is verified by simulations: when Gaussian input jitter is applied, the oscillation period of the BPLL does settle to values close to  $T_{P-SOM}$  as predicted. Hence, in the presence of jitter disturbance, the expected value of the output jitter amplitude is equal to the output jitter amplitude at the SOM and it is obtained from (3) as,

$$\overline{A_{OUT}} = P_{OUT}(0)|_{T_p = T_{P-SOM}} = \frac{IKRT_{P-SOM}}{4}$$
(12)

In actual operation, the BPLL moves back and forth around the SOM due to the jitter disturbance. The variance from the SOM depends on the strength of the input jitter disturbance. It can be seen from (11) that  $T_{P-SOM}$  increases with the decrease of RC and the increase of  $t_d$ ; so does the expected output jitter amplitude. Therefore, the capacitor value should be maximized to minimize the output jitter within acceptable limit of the loop locking time. On the other hand, the loop delay  $t_d$  should also be minimized to reduce the output jitter.



Fig. 2.5. POUT0 and POUT1 across all the modes for BPLL with 1st order filter

### II.2.2 BPLL with Second-Order Filter

In this section, the steady-state behavior of BPLLs with second order loop filters will be analyzed. The schematic of a second order filter is shown in Fig. 2.6. It adds  $C_2$ , a capacitor usually much smaller than  $C_1$ , in addition to the first order filter. Two reasonable



Fig. 2.6. Schematic of the second order loop filter

assumptions are made on the steady state of BPLL with  $2^{nd}$ -order filter: all the loop nodes have the same oscillation period of  $T_P$ ; the control voltage  $V_C$  and the output phase are symmetric around a stable DC value (assumed as zero in the following analysis). Based on these two assumptions and following the approach used in the first-order filter, the expression for VCO control voltage is derived and shown below,

$$V_{C}(t) = \begin{cases} V_{0} \left( 1 - \frac{2e^{-t/\tau}}{1 + e^{-T_{p}/2\tau}} \right) + \frac{I(t - T_{p}/4)}{C_{1} + C_{2}}, \ 0 < t < T_{p}/2 \\ V_{0} \left( -1 + \frac{2e^{-(t - T_{p}/2)/\tau}}{1 + e^{-T_{p}/2\tau}} \right) - \frac{I(t - 3T_{p}/4)}{C_{1} + C_{2}}, \ T_{p}/2 < t < T_{p} \end{cases}$$
(13)  
where  $V_{0} = \frac{IRC_{1}^{2}}{(C_{1} + C_{2})^{2}}; \quad \tau = R(C_{1} \parallel C_{2})$ 

The initial voltages on  $C_1$  and  $C_2$  when t=0 are also obtained as shown below,

$$V_{10} = \frac{-I\left[e^{-\frac{T_p}{2\tau}}(T_p + 4R(C_1 || C_2)) + T_p - 4R(C_1 || C_2)\right]}{4(C_1 + C_2)(1 + e^{-\frac{T_p}{2\tau}})}$$

$$V_{C0} = \frac{-I\left[e^{-\frac{T_p}{2\tau}}(T_p - 4R(C_1 || C_2)\frac{C_1}{C_2}) + T_p + 4R(C_1 || C_2)\frac{C_1}{C_2}\right]}{4(C_1 + C_2)(1 + e^{-\frac{T_p}{2\tau}})}$$
(14)

The output phase can be derived as the integration of  $KV_C$  from (13), resulting in,

$$P_{OUT}(t) = K \Biggl[ V_0 \Biggl( t - \frac{T_P}{4} - \tau \Biggr) + \frac{2V_0 \pi e^{-t/\tau}}{1 + e^{-T_P/2\tau}} + \frac{It(2t - T_P)}{4(C_1 + C_2)} \Biggr]$$
when  $0 < t < \frac{T_P}{2}$ 

$$P_{OUT}(t) = -K \Biggl[ V_0 \Biggl( t - \frac{3T_P}{4} - \tau \Biggr) + \frac{2V_0 \pi e^{(T_P - t)/\tau}}{1 + e^{T_P/2\tau}} + \frac{It(t - T_P)}{2(C_1 + C_2)} \Biggr]$$
(15)
when  $\frac{T_P}{2} < t < T_P$ 

The initial output phase of the VCO is then derived as,

$$P_{OUT}(0) = KV_0 \left(\tau \tanh\left(\frac{T_P}{4\tau}\right) - \frac{T_P}{4}\right)$$
(16)

The time when  $P_{OUT}$  reaches the minimum value can be derived by finding the zeroderivative point of (15). The result is obtained as,

$$t_{\min} = LambertW \left( \frac{C_1 e^{\frac{C_1}{C_2}}}{C_2 \cosh\left(\frac{T_p}{4\tau}\right)} \right) \tau - \frac{V_0(C_1 + C_2)}{I} + \frac{T_p}{4}$$
(17)

LambertW(z) is a special function whose value is the solution of the following equation [16],

$$z = xe^x \tag{18}$$

To get more insight on the characteristic of  $t_{min}$ , we assume  $C_1 >> C_2$  and  $\tau > T_P$ ; these are typical conditions for 2nd-order filters used in practical implementations of BPLL. Under these assumptions, (17) can be further simplified as,

$$t_{\min} \approx LambertW\left(\frac{C_1}{C_2}e^{\frac{C_1}{C_2}}\right)\tau - \frac{V_0(C_1 + C_2)}{I} + \frac{T_P}{4} = \frac{T_P}{4}$$
 (19)

Substituting (19) into (15), the output phase amplitude is obtained as,

$$A_{OUT} = |P_{OUT}(t_{\min})| \approx \frac{KIT_{p}^{2}}{32(C_{1} + C_{2})} + KV_{0}\tau \left(1 - \frac{1}{\cosh\left(\frac{T_{p}}{4\tau}\right)}\right)$$
(20)

Since we assumed that  $\tau > T_P$  and  $C_1 >> C_2$ ,  $A_{OUT}$  can be further simplified as,

$$A_{OUT} \approx \frac{KIT_P^2}{32C_2} \tag{21}$$

(21) indicates that the output phase amplitude is approximately proportional to the square of  $T_p$  and inversely proportional to the smaller capacitor in the filter. It is proportional to VCO gain and CP current as in the case of 1<sup>st</sup>-order filters. However, it does not depend on the filter resistor. The prototype BPLL was simulated with 2<sup>nd</sup> order filter and the steady-state waveforms are shown in Fig. 2.7. It verifies that the peaks and valleys of the output phase are actually located around  $T_P/4$  from the zero-crossing points as indicated by (19).



Fig. 2.7. Steady-state waveforms of BCDR with C1=10C2 and TP=2TS

Multiple oscillation modes also exist in the steady state of BPLL with 2<sup>nd</sup> order filter. Similar to the analysis of BPLL with 1<sup>st</sup> order filter, the zero-crossing point of the output phase must sit between the switching instant of the BPD and the immediately preceding sampling instant. This constraint can be expressed mathematically as,

$$\begin{cases} P_{OUT}(-t_d) > 0\\ P_{OUT}(-t_d - T_s) < 0 \end{cases}$$
(22)

Substituting (15) into (22) and using third-order Taylor series approximation under the typical condition  $\tau >> T_S + t_d$  and  $T_p >> T_S$ , the range of  $T_P$  is obtained as,

$$\sqrt{48RC_2 t_d} \approx T_{MIN} < T_P < T_{MAX} \approx \sqrt{48RC_2 (T_S + t_d)}$$
(23)

Also,  $T_P$  must be an even multiple of the sampling period  $T_S$ . Thus, the BPLL can oscillate at any even multiples of  $T_S$  limited between  $T_{MAX}$  and  $T_{MIN}$ . All the oscillation modes can be produced in simulation by applying the initial voltages and phase given by (14) and (16). When the loop delay is zero,  $T_{MIN}$  is zero and the lower limit of  $T_P$  becomes  $2T_S$ . When  $t_d >> T_S$ ,  $T_{MIN}$  is close to  $T_{MAX}$  and the BPLL can only oscillate within a narrow frequency band; it can be approximated that the BPLL has a fixed oscillation period equal to  $(T_{MIN}+T_{MAX})/2$ .

When the BPLL has Gaussian input jitter, the SOM can be determined using the same index  $D_{stable}$  defined in (10) for the analysis of BPLL with 1<sup>st</sup> order loop filter. The values for  $P_{OUT0}$  and  $P_{OUT1}$  (defined in (9) ) across all the possible modes under a test case

 $(C_1/C_2=100, T_S=t_d, \tau/T_S\approx 200)$  are plotted in Fig. 2.8.  $P_{OUT0}$  increases monotonically while  $P_{OUT1}$  decreases monotonically with the increase of  $T_P$ . Notice that the maximum value of  $D_{stable}$  occurs when  $P_{OUT0}=P_{OUT1}$ . Thus, the oscillation period at the SOM ( $T_{P-SOM}$ ) can be determined by solving the following equation,

$$P_{OUT}\left(-t_{d}-T_{s}\right) = -P_{OUT}\left(-t_{d}\right)$$

$$\tag{24}$$

If we assume that the output jitter changes linearly with time in one sampling period, equation (24) can be approximated by the following equation,

$$P_{OUT}\left(-t_d - \frac{T_s}{2}\right) = 0 \tag{25}$$

Solving (25) yields the solution for  $T_{P-SOM}$ ,

$$T_{P-SOM} \approx \sqrt{48RC_2 \left(\frac{T_s}{2} + t_d\right)}$$
 (26)

The expected value of the output jitter amplitude is equal to the output jitter amplitude at the SOM. It's derived by substituting (26) into (21),

$$\overline{A_{OUT}} = \frac{KIT_{P-SOM}^{2}}{32C_{2}} \approx \frac{3}{4} KIR(T_{s} + 2t_{d})$$
(27)

Equation (27) shows that under the conditions  $\tau \gg T_S + t_d$  and  $T_p \gg T_S$ , the expected output jitter amplitude is proportional  $T_S + 2t_d$  but independent of either C1 or C2.



Fig. 2.8. POUT0 and POUT1 vs. oscillation periods for BPLL with 2nd order filter

# II.3. Jitter Analysis

In this section, the behavior and response of the BPLL under the influence of various jitter sources is investigated. Jitter due to ISI (JISI) and bandwidth-limited BPD is modeled. After that, jitter transfer, jitter tolerance and jitter generation specifications of BPLL are characterized. The analysis of JISI and jitter tolerance only applies to BCDR which takes random data as the reference. The analysis in this section assumes that the BPLL uses the typical first-order filter. It is also assumed that voltage variations on the loop filter capacitor are negligible within a single sampling period since very large capacitors are usually used to avoid jitter peaking.

# II.3.1 Jitter Due to ISI

When an ideal pseudo-random data sequence (PRBS) is passed through a bandwidthlimited module, the output data will have inter-symbol interference between the adjacent data bits. That will make the transition edges of the data bits move back and forth from the original point along the time axis. The variation of transition edges along the time axis can be modeled as equivalent additive jitter in the phase domain.



Fig. 2.9. Mapping relationship between bit patterns and jitter values



Fig. 2.10. JISI with 10 Gb/s PRBS applied to a LPF of 4 GHz BW

When an ideal PRBS is passed through a 1<sup>st</sup>-order low pass filter, it can be found from simple analysis that the JISI in the output data is mainly distributed around 4 levels. These four levels can be mapped to four bit patterns as shown in Fig. 2.9. It is assumed that the input data has stayed at low level for an infinite length of time before the start of these bit patterns. Also, the jitter shown here refers to the jitter of the last transition of the four bit patterns. The values of the four jitter levels can be represented with four numbers (-P1, -P2, P1 and P2) by choosing an intermediate reference phase. The last output transition of pattern A and B has longer delay from the input transition compared with pattern C and D because the output data is settled more completely before the last transition and the last transition takes longer time to cross zero. There are other bit patterns than those shown in Fig. 2.9. However, they can be assigned to the same group with one of the four listed patterns for similar jitter values because the bits leading the current transition. The plot

of JISI when a 10 Gb/s PRBS is applied to a LPF of 4 GHz bandwidth is shown in Fig. 2.10. The simulated jitter is actually distributed around four levels.

When the bandwidth of the LPF is smaller but comparable to half the data rate, it can be proved that P2 is close to P1. Therefore,

$$\begin{cases} P_2 = P_1 - \Delta \\ \Delta << P_1, P_2 \end{cases}$$
(28)

For simplicity, the JISI can be modeled as a random binary noise switching between -P1 and P1. The value of P1 is determined by the 3-dB bandwidth of the filter and the incoming data rate. Let us assume that the LPF has a single pole at  $\omega_c$  and the incoming data has a bit period of T. The value of P1, a good approximate for the amplitude of JISI, is obtained below by calculating the timing distance between the zero-crossing points of the transient waveforms,

$$P_{1} = \frac{-\ln(1 - e^{-\omega_{c}T})}{2\omega_{c}}$$
(29)

Cascading of 1<sup>st</sup> order LPFs with equal bandwidth can be approximated as a single LPF with an equivalent 3-dB frequency given by the following expression,

$$\omega_{eq,n} = \omega_c \sqrt{\sqrt[n]{2} - 1} \tag{30}$$

Thus, the amplitude of JISI induced by n-stage cascaded buffers is obtained based on (29),

$$P_1 = \frac{-\ln\left(1 - e^{-\omega_{eq,n}}\right)}{2\omega_{eq,n}} \tag{31}$$

The change of JISI amplitude with the number of cascaded buffer stages is shown in Fig. 2.11. UI is the abbreviation for unit interval, which means the time length of one bit period of the input data. The bandwidth BW is normalized to the PRBS data rate  $(BW=\omega_c/DataRate)$ . Fig. 2.11 shows that JISI increases almost linearly with the number of stages when there are three or more stages. Therefore, proper caution should be exercised in physical implementations of BCDR to ensure not too much JISI is introduced if extra buffers are placed before the BPD for amplification purposes.



Fig. 2.11. JISI amplitude with different number of buffer stages and bandwidth

## II.3.2 Modeling of Bandwidth-Limited BPD

In physical implementations, the BPD has limited bandwidth and speed. The BPD is not able to switch abruptly when the input phase difference changes sign. When one of the two inputs to the BPD is PRBS, the bandwidth limitation of the internal building blocks of the BPD introduces JISI which may make the BPD produce a wrong output level.



Fig. 2.12. Transfer characteristic of a gradual-switching BPD

An illustrative plot for the transfer curve of a gradual-switching BPD (GBPD) is shown in Fig. 2.12. The transfer characteristic of a GBPD can be modeled by an inverse tangent function as,

$$V_{PD} = \frac{2}{\pi} \arctan\left(\phi_{in} K_T\right)$$
(32)

where  $\Phi_{in}$  is the input phase difference and  $K_T$  is a coefficient modeling the slope of the curve. Please note that the inverse tangent function is chosen somewhat arbitrarily to roughly emulate the transfer curve of a real GBPD. The actual value of  $K_T$  can be derived by fitting to the simulated transfer curve.



Fig. 2.13. Phase-sweeping characteristic of Alexander's BPD

The JISI introduced by the BPD has a more complex distribution than the one caused by a simple LPF due to nonlinear operation of sampling latches (positive feedback in holding mode) within the BPD. A simple and approximate approach is to refer the JISI caused by the internal building blocks of the BPD to the input terminals of the BPD. For an ideal BPD, it always outputs high level if the incoming data leads the sampling clock. However, the BPD may output low level under the effect of JISI even if the data leads the clock. If the JISI is larger than the input phase difference but has an opposite sign, the BPD will make a wrong decision. If we simulate the BPD by fixing the data phase and sweeping the clock

phase, the output of the BPD will go through three regions, i.e., all-low region, transitional region and all-high region. In the transitional region, the BPD may output high or low depending on the sum of JISI and the input phase difference. The length of the transitional region in terms of phase is equal to the peak-to-peak swing of JISI. Thus, the BPD with internal JISI can be modeled as a jitter-free BPD with additive input-referred JISI (IRJISI) of given amplitude at the input terminal.

As an example, the classic Alexander's BPD [14] was implemented with 0.18um CMOS transistors and modeled using the proposed approach. The simulated phase-sweeping characteristic is shown in Fig. 2.13. It can be observed that the transition region covers a range of about 5ps. The derivative of average PD output relative to phase is proportional to the probability distribution of IRJISI. Thus, according to the derivative plot, the IRJISI of Alexander PD is mainly distributed around two levels, i.e.,  $\pm 0.9$ ps offset from the center phase.

When the BPD is used in BCDR, the gradual-switching curve and the effects of IRJISI should be modeled independently. The gradual switching coefficient  $K_T$  should be derived by applying two periodical signals to the BPD, which will not produce any JISI. Meanwhile, the IRJISI should be derived by applying a PRBS to one terminal and a periodical signal to the other. The JISI produced by any amplifiers (modeled as LPF) preceding the BPD can also be referred to the BPD input. Thus, the overall BPD plus its preceding amplifiers can be modeled as a gradual-switching BPD with input-referred IRJISI. In [12], The JISI and gradual switching of BPD are mixed together to yield a gradual-switching BPD with even

lower slope. That approach fails to identify the JISI as an active jitter source which contributes to the overall jitter generation of the system. In comparison, the model extraction method proposed here produces output jitter distribution very close to transistor-level simulation results.

#### II.3.3 Jitter Transfer and Jitter Peaking

Jitter transfer defines the transfer characteristic from the input jitter to the output jitter. In BPLL, the jitter transfer characteristic depends on both the frequency and amplitude of the input jitter due to the nonlinearity of the BPD. The capability for the output jitter to track the input jitter is limited by the phase slew rate of the loop (PSRL) [7], [12]. The PSRL is reached when the PD output is continuously low or high. When the voltage variations on the filter capacitor are ignored, the PSRL is given by,

$$\rho = \frac{\partial P_{OUT}}{\partial t} = IRK \tag{33}$$

When the maximum slope of the input jitter is smaller than the PSRL, the output jitter tracks the input jitter closely. The output jitter amplitude is nearly equal to the input jitter amplitude in the tracking region. At the upper bound of the tracking region, the maximum slope of the input jitter is equal to PSRL. Thus, if the input jitter is represented as  $A_{in}sin\omega t$ , the maximum tracking frequency is given below [12],

$$\omega_{TU} = \frac{\rho}{A_{IN}} = \frac{IRK}{A_{IN}}$$
(34)

where  $A_{IN}$  is the amplitude of the input jitter. In the tracking region, the only error between the input and output jitter is due to the binary switching of the BPD. The maximum error is the sum of the maximum phase shift which can be produced by the loop and the input jitter within a single sampling period. It is given by the following expression,

$$P_{E,\max} = \Delta P_{loop,\max} + \Delta P_{IN,\max} = IRKT_S + A_{IN}\omega_{IN}T_S$$
(35)

where  $\omega_{IN}$  is the radian frequency of the input jitter.



Fig. 2.14. Input / output jitter waveforms in the slewing region

When the maximum slope of the input signal goes beyond the PSRL, the BPLL leaves the full-tracking region and enters a transition region where the BPLL either tracks or slews depending on the instantaneous slope of the input jitter. When the input jitter goes near the

extreme points with relatively smaller slope, the output jitter tracks the input jitter closely. When the input jitter goes near the zero-crossing points with maximum slope, the PD output stays high or low for a continuous length of time and the loop slews. If the output jitter intersects the input jitter before the extreme points, the output jitter has the same amplitude as the input jitter. Otherwise, the output jitter amplitude is smaller than the input jitter amplitude.

When the input jitter frequency goes even higher, the BPLL enters the slewing region where the loop slews for the entire period. An illustrative plot for the input and output jitter waveforms in the slewing region is shown in Fig. 2.14. Similar to the jitter-free steady state, all the loop nodes have the same oscillation period  $T_{IN}$  in steady state when an input jitter with fixed period  $T_{IN}$  is applied to the loop. Following the same procedure as the one used in the jitter-free steady-state analysis, the expressions for the input/output phase in the slewing region are obtained as,

$$P_{IN}(t) = A_{IN} \cos(\omega_{IN} t + \phi); \quad \phi = \cos^{-1} \frac{T_{IN} K I R}{4 A_{IN}}$$
(36)

$$P_{OUT}(t) = \begin{cases} K \left( \frac{It^2}{2C} + IRt + \frac{IT_{IN}t}{4C} + \frac{IRT_{IN}}{4} \right) & -\frac{T_P}{2} < t < 0\\ K \left( -\frac{It^2}{2C} - IRt + \frac{IT_{IN}t}{4C} + \frac{IRT_{IN}}{4} \right) & 0 < t < \frac{T_P}{2} \end{cases}$$
(37)

The output jitter amplitude  $A_{OUT}$  is derived from (37) as  $P_{OUT}(0)$ , yielding,

$$A_{OUT} = \frac{IRKT_{IN}}{4} \tag{38}$$

It is interesting to notice that output jitter amplitude in the slewing region does not depend on the input jitter amplitude or the capacitor value. The output jitter amplitude decreases at 20dB/dec with the increase of the input jitter frequency. The condition for the loop to stay in the full-slewing region is that the output jitter has smaller slope than the input jitter immediately after the input / output jitter intersect and the BPD switches. Otherwise, the output jitter would track the input jitter for at least a certain length of time after the intersection point, which means the loop is actually in the transition region. Thus, the lower frequency limit of the slewing region is defined by the following critical condition (the time of the intersection point is defined as zero in Fig. 2.14),

$$\frac{\partial P_{IN}(t)}{\partial t}\Big|_{t=0+} = \frac{\partial P_{OUT}(t)}{\partial t}\Big|_{t=0+}$$
(39)

By substituting (36)-(37) into (39) and assuming the voltage variations on the capacitor are negligible within one input jitter period, an approximate expression for the lower frequency limit of the slewing region is derived and shown below,

$$\omega_{LS} = \frac{IRK\sqrt{\pi^2 + 4}}{2A_{IN}} \tag{40}$$

The input-to-output jitter attenuation at this critical frequency is derived from (38) and (40) and given below,

$$\frac{A_{OUT}}{A_{IN}}\Big|_{\omega=\omega_{LS}} = \frac{\pi}{\sqrt{\pi^2 + 4}} = -1.48dB$$
(41)

The result above indicates that the 3dB frequency of the jitter transfer curve falls in the slewing region. Thus, the 3dB jitter transfer bandwidth can be derived from (38) and the result is shown below,

$$\omega_{3dB} = \frac{\pi IRK_{VCO}}{\sqrt{2}J_A} \tag{42}$$

The 3dB frequency of the prototype BPLL predicted by (42) is 10.13MHz for an example set of parameters (I=40 $\mu$ A, K=1GHz/V, J<sub>A</sub>=0.15UI, R=56 $\Omega$ , C=35nF). That agrees well with the simulation result (10.1MHz). In comparison, the estimated value based on the equation given in [12] deviates from the simulated value by 40% because it derives the 3dB frequency by approximate extrapolation instead of exact calculation.



Fig. 2.15. Illustrative plot for BPLL jitter peaking

Many CDR applications require the peaking in the jitter transfer characteristic to be limited within a certain small level so that the input jitter will not be amplified too much after passing through multiple data links [15]. Similar to a linear CDR, the capacitor in the first order loop filter must be reasonably large not to introduce jitter peaking. However, the definition of damping factor is no longer valid in analyzing the peaking effects in a BPLL. In the tracking or transition region, the output jitter amplitude is equal to or smaller than the input jitter amplitude because the output jitter tracks the input jitter near the peaks or valleys of the input sine wave. Therefore, the loop must stay in the slewing region to have peaking. An illustrative plot of BPLL waveforms with peaking is shown in Fig. 2.15. The output jitter consists of two pieces of symmetric parabolic curves within a single period; the maximum value of the output jitter occurs at the extreme point of the parabolic curve (point 'c' in Fig. 2.15) instead of the intersection point (point 'b' in Fig. 2.15) of the input / output jitter. The output jitter continues to rise after the intersection point 'b' until the extreme

point 'c' is reached, which results in peaking. The time when the output jitter reaches the peak can be derived from (37) based on the condition of zero-derivative, i.e.,  $P_{OUT}(t)=0$ . The result is obtained as,

$$t_{\max} = \frac{T_{IN}}{4} - RC \ge 0 \tag{43}$$

 $t_{max}$  must be greater than the time of the intersection point 'b', which is defined as zero in Fig. 2.15. Otherwise, the extreme point does not actually exist in the output jitter and the output jitter peaks at the intersection point instead. Since the output jitter at the intersection point has no way to be larger than the input jitter amplitude, no peaking will happen. Substituting (43) into (37) yields the output jitter amplitude,

$$A_{OUT} = P_{OUT}(t_{\max}) = \frac{IK(T_{IN}^{2} + 16R^{2}C^{2})}{32C}$$
(44)

It can be seen from (44) that the output jitter amplitude increases with the increase of the input jitter period  $T_{IN}$ . Since the loop is in slewing region, the output jitter value at the intersection point is given by (38). Therefore, if  $T_{IN}$  decreases, the output jitter at the intersection point increases and the intersection point 'b' moves closer to the peak point 'a' of the input sine wave. When point 'b' and 'a' coincide, the loop reaches the lower limit of the slewing region and the output jitter reaches the maximum amplitude. In this case, the output jitter value at the intersection point (given by (38)) is equal to the input jitter amplitude, i.e.,

$$A_{OUT,Intersect} = \frac{KIRT_{IN}}{4} = A_{IN}$$
(45)

The corresponding input jitter period is obtained from (45) as,

$$T_{IN} = \frac{4A_{IN}}{KIR} \tag{46}$$

By substituting (46) into (44), the maximum output jitter amplitude that can be achieved for a given capacitor is obtained as,

$$A_{OUT} = \frac{1}{2CKIR^2} \left[ A_{IN}^2 + (CKIR^2)^2 \right]$$
(47)

To limit the peaking to G dBs, we have,

$$A_{OUT} = \frac{1}{2CKIR^2} \left[ A_{IN}^2 + (CKIR^2)^2 \right] \le 10^{G/20} A_{IN}$$
(48)

The minimum capacitor value to limit the peaking to be less than G dbs can be derived by solving (48) considering of the limit given by (43). The result is obtained as,

$$C \ge C_{MIN} = \frac{A_{IN}}{R^2 IK} \left( 10^{G/20} - \sqrt{10^{G/20} - 1} \right)$$
(49)

(49) indicates that the  $C_{MIN}$  is inversely proportional to the square of the filter resistance, and more important,  $C_{MIN}$  is proportional to the input jitter amplitude. That means, even if any large capacitor is used, a given level of peaking can always be produced by applying input jitter with large enough amplitude. Fortunately, in actual CDR applications, the maximum input jitter amplitude is limited by the jitter tolerance mask [15]. As long as there is no peaking within the jitter tolerance mask, the prescribed specifications can still be met with a BCDR. The BPLL model was simulated with the existence of peaking to verify the expressions derived above. The simulated waveforms are shown in Fig. 2.16. In the simulated case, the output jitter amplitude is 0.193UI (Unit Internal), which generates peaking of 2.2dB compared with the input jitter amplitude of 0.15UI. The simulated level of jitter peaking exactly matches the result predicated by the above expressions.

When a large enough capacitor is used to avoid any peaking, the jitter transfer curve of a BPLL over the entire frequency band is shown in Fig. 2.17. It resembles the transfer curve of a 1<sup>st</sup>-order LPF in that there is no attenuation in the tracking region while the output falls at 20dB/dec in the slewing region. There is no amplitude attenuation in the beginning part  $(f_1 \sim f_2)$  of the transition region while there is some amplitude attenuation in the ending part  $(f_2 \sim f_3)$ , according to the previous analysis.



Fig. 2.16. Simulated waveforms of BPLL with jitter peaking (0.15 UI input

jitter@7.168MHz, C=300pF)



Fig. 2.17. BPLL input-to-output jitter transfer characteristic

### II.3.4 Jitter Tolerance

Jitter tolerance is the maximum amplitude of the input jitter that can be applied to a CDR without causing bit errors. Jitter tolerance mask is the plot of the maximum input jitter amplitude versus the corresponding frequency. Usually the bit error rate increases drastically when the phase error between the clock and data exceeds a certain limit denoted as  $P_{EL}$ . The limit is 0.5UI in the ideal case but should be smaller in practical designs due to various jitter sources.

When the input jitter frequency is very high, the output jitter amplitude is heavily attenuated compared with the input jitter amplitude. In this case, the phase error is almost equal to the input jitter amplitude [12], hence,

$$P_{E} = \left| P_{IN} - P_{OUT} \right| \approx \left| P_{IN} \right| \le A_{IN} \le P_{EL}$$

$$\tag{50}$$

Thus, jitter tolerance is limited by  $P_{EL}$  at very high frequencies.

At intermediate frequencies, the loop slews for most of the entire period when the input jitter amplitude reaches the jitter tolerance mask (refer to Fig. 2.14). In this case, the slewing-region equations (36) and (37) can be used to represent the input and output jitter waveforms. Assuming the voltage variations on the capacitor are negligible in one period of the input jitter in the slewing region, the phase error is given by,

$$P_{E} = \left| P_{IN}(t) - P_{OUT}(t) \right| = \left| A_{IN} \cos\left(2\pi f_{IN}t + \phi\right) - K\left(-IRt + \frac{IR}{4f_{IN}}\right) \right|$$
(51)

The maximum phase error for a given input jitter frequency and amplitude is derived from (51) and given below,

$$P_{E,MAX} = A_{IN} \sqrt{1 - S_I^2} + A_{IN} S_I \left( \sin^{-1} S_I + \cos^{-1} \frac{\pi S_I}{2} - \frac{\pi}{2} \right)$$
(52)

where,

$$S_I = \frac{KIR}{A_{IN}\omega_{IN}}$$
(53)

 $S_I$  is the ratio of the PSRL over the maximum slope of the input jitter.  $S_I$  must be less than  $2/\pi$  for equation (52) to hold. Unfortunately, it is not possible to derive simple analytical expressions for the maximum input jitter amplitude in terms of  $\omega_{IN}$  and  $P_{E,MAX}$  from (52). An approximation method is proposed in [12], which assumes that the maximum phase error occurs close to the zero-crossing point of the output jitter (refer to the illustration in Fig. 2.14). Under this assumption, we have,

$$P_{E,MAX} \approx A_{IN} \cos\left(\phi + \frac{\pi}{2}\right) = \sqrt{A_{IN}^{2} - \left(\frac{KIR}{4f_{IN}}\right)^{2}} \le P_{EL}$$
(54)

Thus, the input tolerance at intermediate frequencies becomes,

$$A_{IN} \le \sqrt{P_{EL}^{2} + \left(\frac{KIR}{4f_{IN}}\right)^{2}}$$
(55)

When the input jitter changes very slowly, slope on the input jitter can be considered as quasi-static frequency deviation since there is enough time for the capacitor to be charged (or discharged) to the right voltage to drive the VCO to produce the proper amount of frequency deviation. The changing rate of the input frequency is equal to the 2<sup>nd</sup>-order derivative of the input jitter. Thus, the frequency slew rate of the input signal (FSRI) is given by,

$$FSRI = \frac{\partial P_{IN}^{2}}{\partial t^{2}}\Big|_{MAX} = A_{IN} \omega_{IN}^{2}$$
(56)

On the other hand, the slew rate of the output frequency is proportional to the slew rate of the voltage on the capacitor. Thus the frequency slew rate of the loop (FSRL) is given by,

$$FSRL = \frac{KI}{C}$$
(57)

It is reasonably assumed that the jitter tolerance at very low frequencies is much larger than 1 UI. Therefore, the ratio of the input and output jitter amplitude is very close to 1 even when the phase error goes up to  $P_{EL}$  (up to 0.5UI). That means the frequency slew rate of the loop should be very close to the maximum changing rate of the input frequency (but a little smaller), hence,

$$FSRI \le FSRL \tag{58}$$

Combining (56)-(58), the jitter tolerance in the low frequency region is obtained as,

$$A_{IN} \le \frac{KI}{C\omega_{IN}^{2}}$$
(59)



Fig. 2.18. Simulated BPLL waveforms with 20 KHz and 144 UI sinusoidal input jitter

The BPLL model was simulated in Matlab with 20 KHz and 144 UI sinusoidal input jitter to verify the level of jitter tolerance in the low frequency region. The simulated waveforms are shown in Fig. 2.18. Although the maximum phase error is as large as 0.5 UI, the input jitter and output jitter curves almost overlap each other in the plot, which indicates little attenuation in terms of percentage. The input jitter tolerance predicted by (59) is 137 UI,

only 5% away from the actual simulation result. In comparison, the result predicted by the equation proposed in [12] is 343 UI, which is far away from the simulation result. The reason for the large deviation is that the equation in [12] is based on the incorrect assumption that the loop slews for most of the entire period with very low input frequencies. Actually, Fig. 2.18 shows that the loop tracks the input jitter closely for a greater part of the entire period.

The jitter tolerance of the BPLL over the entire frequency band is shown in Fig. 2.19. It is divided into three regions based on different slopes. In the low frequency region, the jitter tolerance drops at 40 dB/dec with the increase of frequency as indicated by (59). In the intermediate frequency region, the jitter tolerance drops at 20 dB/dec as indicated by (55). In the high frequency region, the jitter tolerance has a flat value equal to  $P_{EL}$  as indicated by (50). To find the critical frequencies between the different regions, the jitter tolerance in the intermediate frequency region can be approximated as a straight line in the bode plot. The approximated expression is shown below,

$$A_{IN} \le \sqrt{P_{EL}^{2} + \left(\frac{KIR}{4f_{IN}}\right)^{2}} \approx \frac{KIR}{4f_{IN}}$$
(60)

The upper limit of the low frequency region  $(f_L)$  can be found as the extrapolated intersection point of (60) and (59). The lower limit of the high frequency region  $(f_H)$  can be found as the extrapolated intersection point of (60) and (50). The derived results for  $f_H$  and  $f_L$  are given below,

$$\begin{cases} f_L = \frac{1}{\pi^2 RC} \\ f_H = \frac{KIR}{4P_{FL}} \end{cases}$$
(61)



<

Fig. 2.19. Jitter tolerance mask of a BPLL





Fig. 2.20. Simulated waveforms of BCDR using an ABPD with 0.02 UI IRJISI

Jitter generation is the output jitter produced by the loop when no input jitter is applied. The jitter generation mainly consists of two parts. The first part is due to JISI caused by the bandwidth limitation of the BPD together with any preceding amplifiers if used. The second part is due to the phase noise of the VCO.

#### II.3.5.1 Jitter Generation Due to JISI

In ultra-high-speed BCDR, jitter generation due to JISI is the most dominant part in the overall jitter generation since most of the VCO phase noise is heavily suppressed by the loop dynamics [7]. JISI is mainly caused by the BPD and its preceding amplifiers. According to the modeling methods previously discussed, the BPD and its preceding amplifiers can be modeled as a jitter-free BPD with additive IRJISI at the input terminal. For simplicity, the IRJISI is modeled as a random binary pulse switching between two levels with equal amplitude and opposite direction. The width of each pulse is equal to 1 UI.

Let us assume that the amplitude of the IRJISI is  $A_I$ . The BCDR was simulated in Matlab using an abrupt-switching BPD (ABPD) with additive IRJISI ( $A_I$ =0.02 UI) applied at the input of the loop. The simulated waveforms are shown in Fig. 2.20. It shows that the output jitter has the same bound as the input jitter, which is [-0.02 UI, 0.02 UI]. That is because the ABPD is not able to discriminate any output phase between [- $A_I$ ,  $A_I$ ] when there is no input jitter other than the IRJISI. Hence, the loop has the same response for any output phase within [- $A_I$ ,  $A_I$ ]. To get more insight, let us assume the current output phase is  $P_0$ . If there is a positive pulse in IRJISI, the overall input phase difference of the BPD is positive as long as  $P_0 < A_I$ . That makes the BPD output a positive pulse. If there is a negative pulse in IRJISI, the phase difference is negative as long as P<sub>0</sub>>-A<sub>I</sub>. That makes the PD output a negative pulse. In other words, a positive pulse and a negative pulse in IRJISI add the same amount of shift (while in opposite direction) to the output phase (ignoring the voltage variations on the capacitor). Thus, the output jitter is simply proportional to the integration of IRJISI as long as the output jitter stays within the interval [-A<sub>I</sub>, A<sub>I</sub>]. Since the IRJISI is modeled as a random binary sequence, its integration can be easily large enough to go beyond the interval. In this case, the output jitter simply drifts with the random drifting of the integration of the IRJISI. When the output jitter goes below the lower boundary, the PD produces only positive pulses regardless of the value of IRJISI, bringing the output phase back within the boundary. The same thing happens when the output jitter goes above the upper boundary. In summary, the output jitter can drift to the boundaries but it is limited within the interval by the loop dynamics. The amplitude of phase error is simply the sum of the amplitude of the output jitter and the IRJISI because the current value of the IRJISI is independent of the current value of the output jitter. Since both the output jitter and the IRJISI are limited within  $[-A_I, A_I]$ , the phase error is limited within  $[-2A_I, 2A_I]$ . That is verified in Fig. 2.20 where the simulated phase error is limited within [-0.04 UI, 0.04 UI].

In comparison to BCDR with an ABPD, the BCDR is also simulated with a GBPD modeled by (32). IRJISI of the same amplitude is applied at the loop input. The simulated waveforms are shown in Fig. 2.21. The output jitter amplitude decreases down to 0.008 UI, which is only 40% of the value obtained with an ABPD. The amplitude of the phase error decreases to 0.028UI, i.e., 0.008UI+  $A_I$ , which is only 56% of the value with an ABPD. A qualitative explanation for the decrease of the output jitter follows. While an ABPD can not discriminate the magnitude of output jitter within [- $A_I$ ,  $A_I$ ], it is not the same case for a GBPD due to finite slope in the transition region. When the output jitter stays within [0,  $A_I$ ], the negative voltage produced by the GBPD in the presence of a negative IRJISI pulse is larger than the positive voltage produced by the GBPD in the presence of a positive IRJISI pulse. Similar effects occur when the output jitter stays within [- $A_I$ , 0]. Thus, when the positive and negative pulses in the IRJISI have about the same density, the output jitter is pulled towards zero due to asymmetric pulling forces upon a non-zero output phase.



Fig. 2.21. Simulated waveforms of BCDR using GBPD (K<sub>T</sub>=80) with 0.02 UI IRJISI

It is evident that the output jitter has smaller amplitude when the GBPD has smaller  $K_T$ . On the other hand, smaller  $K_T$  decreases PSRL, which leads to the loss of jitter transfer bandwidth and jitter tolerance. However, if the width of the transition region is properly

designed, jitter transfer bandwidth and jitter tolerance of BPLL with a GBPD stays almost the same as if an ABPD is used. Let us look at SONET OC-192 receiver as an example. The standard prescribes that the jitter tolerance must be larger than 0.15 UI at 4MHz input jitter frequency [15]. Usually the BCDR in the receiver is designed to have a jitter transfer bandwidth of 5-10MHz with 0.15 UI input jitter amplitude to satisfy the jitter tolerance mask [7]. When doing jitter transfer characterization, a 0.15 UI sinusoidal input jitter at 8MHz is applied to the loop. Fig. 2.22 shows the simulated waveforms under this condition. When an ABPD is used, the PD output switches between -1 and 1 abruptly at the same frequency as the input jitter. The output jitter amplitude is 0.12 UI, which is attenuated by 2 dB compared with the input jitter amplitude. When a GBPD is used, the output of the PD stays near -1 or 1 for most of an entire period since the phase error is larger than the width of the transition zone for most part of a period. In other words, a GBPD acts similarly to an ABPD when characterizing jitter transfer and jitter tolerance. The output jitter amplitude with a GBPD is 0.11 UI, which is close to the value when an ABPD is used. The simulation result indicates that the jitter transfer bandwidth for both cases are very close to each other (9 MHz for ABPD and 8 MHz for GBPD with  $K_T$ =80). Since a GBPD has about the same output level as an ABPD when the input phase error is beyond the transition region, the BPLL has about the same PSRL for both cases. Therefore, the jitter tolerance is similar for both cases. However, the jitter generation due to IRJISI with a GBPD is 60% smaller than if an ABPD is used. That makes it a well worthwhile trade-off between jitter generation and jitter transfer bandwidth / jitter tolerance.



Fig. 2.22. Simulated waveforms of BPLL using ABPD and GBPD (0.15 UI input jitter @ 8

MHz)



Fig. 2.23. Structure modification to minimize jitter generation with GBPD

We have come to the conclusion that a GBPD with proper transition zone width can reduce the jitter generation with little loss of jitter transfer bandwidth and jitter tolerance. Actually, all physically implemented BPD have a certain transition zone. However, most BPLLs are based on CPs switched by full-scale digital control signals. The CP outputs either full-scale charging current or full-scale discharging current depending on the control signals. Although the BPD itself may not output full-swing voltage signals due to speed limitation, cascaded buffers or logic gates are often used to amplify the PD output to full scale before it is applied to the CP. If the amplifying cells before the CP are merged into the BPD, we will get an equivalent GBPD with a very large K<sub>T</sub>, which closely resembles an ABPD. Thus, the jitter generation is very close to the case when an ABPD is used. To address this problem, the digitally-switched CP is replaced by a linear transconductor. The buffers and logic gates between the BPD and the CP are removed. The transconductor directly converts the PD output voltage into proportional current which is then injected into the loop filter. The structure modification is illustrated in Fig. 2.23. In this way, a BPD with a lower  $K_T$  is properly implemented and the trade-off previously discussed can be exercised to achieve better performance.

## II.3.5.2VCO Phase Noise

Similar to the input-output jitter transfer, the transfer of VCO phase noise to the output phase also depends on the PSRL [12]. The loop attempts to produce a compensating voltage on the control terminal of the VCO to cancel the jitter produced by the VCO (JVCO). The output jitter of the loop is similar to the phase error in the input-output jitter

transfer analysis. When the JVCO has a lower slope than the PSRL, the loop tracks the JVCO closely. In this case, the output jitter stays within a small limit which is equal to the sum of the maximum phase shift produced by the loop and the JVCO within a single sampling period (similar to the phase error limit given by (35) ),

$$A_{OUT} = \Delta P_{loop,\max} + \Delta P_{JVCO,\max} = IRKT_S + A_{VCO}\omega_{VCO}T_S$$
(62)

When the slope of JVCO is larger than the PSRL, the loop is not able to track the JVCO. As a result, part of the JVCO leaks to the loop output. When the slope of the JVCO is much larger than the PSRL, the loop is not able to provide any significant compensation. In this case, the VCO phase jitter is reproduced in the output phase with little attenuation.

Let us assume that the JVCO has amplitude of  $A_{VCO}$  at a particular frequency. The cut-off frequency  $\omega_C$  below which the JVCO is fully suppressed by the loop is obtained by setting the maximum slope of JVCO equal to the PSRL, leading to,

$$\omega_C = \frac{IRK}{A_{VCO}} \tag{63}$$

When the PSRL is significantly lower than the slope of the JVCO, the loop slews all the time. In this case, the output jitter is similar in terms of mechanism to the phase error given by (52) since the output jitter is simply the difference between the JVCO and the part of jitter compensated by the loop. Thus, the output jitter amplitude is obtained by replacing  $A_{IN}$  and  $\omega_{IN}$  by  $A_{VCO}$  and  $\omega_{VCO}$  in (52), yielding,

$$A_{OUT} = A_{VCO} \sqrt{1 - S_{VCO}^{2}} + A_{VCO} S_{VCO} \left( \sin^{-1} S_{VCO} + \cos^{-1} \frac{\pi S_{VCO}}{2} - \frac{\pi}{2} \right)$$
(64)

where,

$$S_{VCO} = \frac{KIR}{A_{VCO}\omega_{VCO}}$$
(65)

Similarly, the condition  $S_{VCO} \le 2/\pi$  must be satisfied for (64) to hold, hence,

$$\omega_{VCO} \ge \frac{\pi K I R}{2 A_{VCO}} \tag{66}$$

The BPLL was simulated with JVCO of 0.01UI. The jitter frequency was swept from 50MHz to 300MHz. The simulated and predicated values for the output jitter amplitude are shown in Fig. 2.24. The simulated and predicated curves are very close to each other. There is a gap between the two frequencies given by (63) and (66) where the BPLL slews for only part of an entire period. Linear interpolation between the end point values given by (62) and (64) is used to predict the output jitter amplitude in this region. Fig. 2.24 shows that the JVCO is greatly attenuated below the cut-off frequency  $\omega_c$ . The simulated cut-off frequency  $f_c=\omega_c/2\pi$  is 68MHz in the prototype BPLL, which is large enough to suppress most of the VCO phase noise in many applications.

The reason for the BPLL to have large bandwidth in which the JVCO is well suppressed is that  $\omega_{C}$  is inversely proportional to the JVCO amplitude seen from (63). In high-speed

BCDR applications, the amplitude of JVCO is much smaller compared with the amplitude of JISI even if a RC-type ring oscillator is used. That is why RC ring oscillators are preferred over LC oscillators in BCDR implementations to save chip area and reduce layout complexity [7].



Fig. 2.24. Simulated and predicted jitter generation caused by 0.01 UI JVCO

## II.4. Summary

BPLLs are modeled and analyzed in this chapter. The steady-steady behavior of BPLL with 1<sup>st</sup> and 2<sup>nd</sup> order filters are investigated by combining discrete-time and continuous-time analyses. It shows that the BPLL can have a continuous range of oscillation modes in steady state. Under the disturbance of Gaussian random jitter, the BPLL is expected to settle to the stablest oscillation mode. The stablest mode is determined by evaluating the

relative stability of all the modes. The expected value of the output jitter amplitude is derived and its dependence on the loop parameters is analyzed. The jitter performance properties of BPLL are fully characterized. Expressions with excellent accuracy are derived for jitter transfer bandwidth. The condition to limit jitter peaking within a certain degree is obtained so that designers can minimize the capacitor area in a given design. Internal jitter generation contributed by both JISI and JVCO is analyzed. Bandwidth-limited BPD is modeled as jitter-free BPD with additive IRJISI. Analysis indicates that GBPD in combination with linear V-I converter can significantly reduce jitter generation with little loss of jitter transfer bandwidth and jitter tolerance. Transfer curve from VCO jitter to output jitter is characterized. Analysis shows that most of the VCO phase noise is suppressed by the loop dynamics in BPLLs.

### CHAPTER III

# A 10GBPS CDR FOR SONET OC-192 STANDARD

### III.1. Introduction to Optical Transceivers

With the ever-increasing demand for higher communication bandwidth, optical fibers come into widespread use in wide-area backbone networks. SONET/SDH is the mainstream standard for optical transceivers. SONET is mainly used in US and Canada while SDH is mainly used in the rest part of the world. The standards and data rates supported by SONET/SDH are listed in Table 3.1. In recent years, 10 Gb/s optical transceivers dominate the market; they are defined by SONET OC-192 or SDH STM-64.

| Optical Carrier Level | Bit rate (Mbps) | Frame Format |
|-----------------------|-----------------|--------------|
| OC-1                  | 51.84           | STM-0        |
| OC-3                  | 155.52          | STM-1        |
| OC-12                 | 622.08          | STM-4        |
| OC-48                 | 2,488.32        | STM-16       |
| OC-192                | 9,953.28        | STM-64       |
| OC-768                | 39,813.12       | STM-256      |

Table 3.1. Data rates and frame formats supported by SONET



Fig. 3.1. Block diagram of a typical optical transceiver

The block diagram of a typical optical transceiver is shown in Fig. 3.1. On the transmitter side, a frequency synthesizer is used to generate the transmitter clock. The clock signal is fed into the multiplexer, which assembles parallel low-speed data streams into a serial high-speed data stream. After that, the serial data is retimed by the transmitter clock and sent into the laser driver. The laser driver converts the electrical voltage signal into optical signal through a laser diode. The optical signal is transmitted to the receiver over optical fibers. On the receiver side, the optical signal coming out from the optical fiber is converted into electrical current through a photo-diode. The output current of the photo-diode is converted

into voltage by the trans-impedance amplifier (TIA). After that, the voltage signal is amplified into full scale by the limiting amplifier (LA). The output voltage of the LA is fed into the clock and data recovery (CDR) module as the input data. The CDR recovers the clock signal from the input data and retimes the input data with the recovered clock. The retimed data is split into parallel low-speed data streams by the demultiplexer with the aid of the recovered clock. The parallel low-speed data streams are processed by fully digital circuits as the final step.

# III.2. Existing CDR Architectures

This project is focused on the design of the CDR module for 10Gbps SONET OC-192 optical receivers. CDR is the most critical module in the optical receiver due to stringent requirements on jitter performance. Existing architectures of 10Gbps SONET CDR include single-loop CDR, dual-loop CDR, referenceless CDR, etc. The structure of a single-loop CDR is shown in Fig. 3.2. It is similar to a conventional charge pump PLL [17] in terms of operating principles. The main feature which makes a CDR different from a regular PLL is that the phase detector of a CDR must be able to compare the phase difference between a random bit sequence and a periodic clock signal. The phase detector can be either linear phase detector (e.g., Hogge's phase detector [18]) or binary phase detector (e.g., Alexander's phase detector [19]). The output voltage of the phase detector controls the charge pump to produce a charging or discharging current which is then injected into the loop filter. The loop filter can be either internal or external, depending on the size of the capacitor. The loop filter produces an output voltage (V<sub>CTRL</sub>) which tunes the phase and

frequency of the VCO. The VCO produces the recovered clock when the CDR is locked. The input data is re-timed with the recovered clock via a D flipflop (DFF) to produce the recovered data. In many practical implementations of the phase detector, the phase detector itself can provide a re-timed copy of the input data. That is a desired property of the phase detector called built-in retiming capability, which avoids additional phase offset caused by the delay match in the signal transmission path of the data and clock signal. The SONET OC-192 CDRs proposed in [20][21] are based on single-loop architecture. The main drawback of this architecture is narrow locking range since the phase detector can provide correct locking force only within a small frequency offset. To overcome the process variation of VCO frequency, single-loop CDRs usually need an external tuning terminal for the VCO; the center frequency of the VCO must be manually tuned into the small locking range that can be handled by the phase detector.



Fig. 3.2. Block diagram of a single-loop CDR

The structure of a dual-loop CDR based on external reference is shown in Fig. 3.3. The dual-loop topology includes a phase-locked loop (PLL) and a frequency-locked loop (FLL). The phase-locked loop consists of the phase detector, the PLL charge pump, the loop filter and the VCO. The frequency-locked loop consists of the frequency detector (FD), the FLL charge pump, the loop filter and the VCO. The loop filter and the VCO are shared by both loops. The frequency detector compares the frequency difference between an external reference signal and recovered clock. The output signal of the frequency detector controls the output current of the FLL charge pump. The output currents of the two charge pumps are combined and injected into the loop filter. Both the PLL and the FLL are active when the CDR is far from being locked. The FLL helps the CDR to achieve a wide-locking range. When the frequency difference between the incoming data and the output clock settles into a certain small range which can be handled by the PLL alone, the FLL is disabled and only the PLL stays active. Since the current of the PLL charge pump is much smaller than that of the FLL charge pump, the bandwidth of the PLL is much smaller. Thus, the dual-loop CDR can achieve very small jitter in locked state since only the PLL is active. CDR implementations based on the dual-loop architecture with external reference were reported in [22]-[24]. They have the advantage of large locking range and robust operation. However, since the frequency-locked loop depends on an external reference, discrete crystal oscillator components are needed, which decreases the level of integration and introduces cost overhead to the entire system.



Fig. 3.3. Block diagram of a dual-loop CDR with external reference

A referenceless dual-loop CDR was proposed in [25]. The structure of the CDR is shown in Fig. 3.4. It includes a PLL based on binary phase detector and a FLL based on binary frequency detector. This CDR structure is able to operate without any external reference signal. However, the major drawback to this solution is that the bandwidth ratio of the PLL and the FLL can hardly be adjusted due to the special type of phase and frequency detector. When the ratio goes away from the desired value, the PLL is not able to take over the locking process from the FLL and the CDR is not able to achieve the final locked state properly. This drawback is verified by macro-model simulation. Thus, the bandwidth of the two loops can not be optimized separately to get large locking range and small jitter

generation at the same time. Instead, locking range and jitter generation become a trade-off against each other for this architecture.



Fig. 3.4. Structure of the referenceless CDR proposed in [25]

# III.3. Proposed Solution

## III.3.1 System Architecture

A 10Gbps CDR architecture for SONET OC-192 standard is proposed to address the issues associated with the existing solutions. The block diagram of the proposed CDR is shown in

Fig. 3.5. It is based on half-rate referenceless dual-loop architecture. The CDR consists of two loops, a frequency-locked loop (FLL) and a phase-locked loop (PLL). The frequencylocked loop is made up of the linear frequency detector (LFD), the FLL charge pump (FCP), the loop filter and the VCO. The LFD compares the frequencies of the incoming random data and the output clock, generating a pair of frequency difference signals (FD\_UP and FD DN). FD UP or FD DN become active when the frequency difference is positive or negative, respectively. When the frequency difference is zero, the LFD enters a highimpedance state in which both FD\_UP and FD\_DN stay inactive. The tri-state output signal of the LFD controls the FCP to produce tri-state output current, i.e.,  $I_{fcp}$ , 0 and  $-I_{fcp}$ . The phase-locked loop is made up of the quad-level phase detector (QPD), the PLL charge pump (PCP), the loop filter and the VCO. The QPD compares the phases of the incoming data and the clock, generating two phase difference signals in quadrature, i.e., PDI (MAG) and PDQ (SIGN). The PCP is a quad-level charge pump (QCP) with 4 levels of output current (3I<sub>pcp</sub>, I<sub>pcp</sub>, -I<sub>pcp</sub>, -3I<sub>pcp</sub>). PDI is used to select the magnitude of the PCP output current while PDQ is used to select the direction (charging or discharging) of the PCP output current. The combination of PDI and PDQ controls the PCP to produce all four possible values of output current depending on the phase difference. In actual design, the LFD and QPD are implemented together as a single PFD (phase and frequency detector) module due to strong interdependency. Details about the LFD and QPD will be discussed in a later section. The output currents of the PCP and FCP are combined and injected into the loop filter. The loop filter is a first order filter which is simply the series combination of a resistor and a capacitor. It is placed off-chip since the required capacitor value is 35nF and

consumes way too much area to be implemented on chip. The loop filter suppresses the voltage ripple on the control terminal of the VCO. The VCO is a 4-stage ring oscillator producing 4-phase differential clocks which are spaced by 45 degrees from each other. The VCO is designed to work at 5GHz in locked state, i.e., half the incoming data rate, which makes the proposed topology a half-rate architecture. The 4-phase differential clocks are required for the proper operation of the half-rate PFD. A half-rate architecture is chosen for the CDR to ease the design of VCO. At 5GHz, the VCO can be implemented as a RC-type ring oscillator without the use of passive inductors which consumes a lot of area and increases the cost of the processing technology.

The proposed CDR is able to achieve large capture range and small jitter generation at the same time because it has two loops and the bandwidth of the two loops can be optimized separately. Although the incoming data rate for SONET OC-192 is fixed, a large locking range is still desired because the center frequency of the VCO can easily deviate from the designed value by 10%~20% due to process variation. When the incoming data rate is far from the VCO frequency, both FLL and PLL are active. The FLL has larger loop bandwidth than the PLL, which helps the CDR to bring the VCO frequency close to the incoming data rate in a short time. When the frequency difference becomes zero (no more cycle slipping will occur), the LFD produces high-impedance output signal which effectively disables the FLL. Meanwhile, PDI (MAG) stays low while PDQ (SIGN) switches between high and low, which controls the PCP to operate at the two smaller current levels only ( $I_{pcp}$ , - $I_{pcp}$ ). In this case, the PLL stays at a low-bandwidth mode to

ensure small jitter generation when the CDR is locked. Details about how the PLL takes over the locking process from the FLL will be explained in later sections. Since the LFD is able to extract the frequency difference between the random bit sequence and the clock signal, no external reference is required for this architecture, which increases the level of integration and eliminates the cost overhead of external crystal oscillator.



Fig. 3.5. Block diagram of the proposed CDR

III.3.2 Quad-level Phase Detector



Fig. 3.6. Block diagram of the QPD

The block diagram of the QPD is shown in Fig. 3.6. CK<sub>0</sub>, CK<sub>45</sub>, CK<sub>90</sub>, CK<sub>135</sub> represents the 4 pairs of differential clocks produced by the VCO. CK<sub>45</sub>, CK<sub>90</sub> and CK<sub>135</sub> are delayed by 45<sup>0</sup> from CK<sub>0</sub>, CK<sub>45</sub> and CK<sub>90</sub>, respectively. The QPD consists of two half-rate binary PDs (BPD). The structure of the half-rate BPD (HBPD) was originally proposed in [25]. The function of the HBPD is to generate a binary output signal based on the sign of the phase difference between half-rate clock and full-rate data sequence. It outputs high level if clock is earlier than data and outputs low level if lock is later than data. The HBPD has three input terminals, i.e., DATA, CKI and CKQ. CKI and CKQ are two clock signals in quadrature. Two quadrature clock phases are required due to the half-rate operation. The output signal (PD) of the HBPD represents the sign of the phase difference between DATA and CKQ. The phase difference is defined as zero when the transition edge of CKQ is

aligned to the center of the data bits in DATA. Thus, the HBPD I generates binary phase difference signal from  $CK_{90}$  and DIN while the HBPD II generates binary phase difference signal from  $CK_{135}$  and DIN. When the input data and the clock signal have a fixed frequency difference, the phase difference between the clock and data oscillates under a beat period equal to the inverse of the frequency difference. Assuming the incoming data rate is  $f_{data}$  and the frequency of the half-rate clock is  $f_{ck}$ , the effective frequency difference is defined as,

$$\Delta f = f_{data} - 2f_{ck} \tag{67}$$

where the coefficient 2 is introduced to account for the half-rate relationship between the clock and data. The beat period  $T_b$  is derived as the inverse of the frequency difference,

$$T_{b} = \left| \frac{1}{\Delta f} \right| = \left| \frac{1}{f_{data} - 2f_{ck}} \right|$$
(68)

Thus, with a fixed input frequency difference, the output of the HBPD is a square wave oscillating between high and low level with a beat period of  $T_b$ . Because the input clocks to HBPD I and HBPD II have  $45^0$  phase difference and the clocks are half-rate compared with the input data, the square waves produced by HBPD I and HBPD II have  $90^0$  phase difference. Due to the CDR loop design, the transition edges of  $CK_{90}$  are expected to be aligned to the center of the input data bits when the CDR is locked. Thus, the input phase error of the whole QPD is defined as the phase difference between  $CK_{90}$  and DIN. The

timing relationship between CK<sub>90</sub> and DIN under difference phase errors is shown in Fig. 3.7. One bit period on the input data is mapped to the full range of the phase error (PE,  $360^{\circ}$ in total). The output waveforms of the QPD when the input data and clock have a fixed frequency difference are illustrated in Fig. 3.8. The input phase error of the QPD is divided into four regions based on the value combinations of PDI and PDQ. If we use PDI to select the sign of the QCP current and PDQ to select the magnitude of the QCP current, a quadlevel quantization relationship from the phase error to the QCP current is obtained. The mapping relationship between the phase error, the output values of the QPD and the output current of the QCP is listed in Table 3.2. The detailed implementation of the charge pump will be presented in a later subsection. Based on the mapping relationship, the QPD acts as a two-bit phase error quantizer, which has more accuracy than conventional binary PD which has only one-bit resolution. Compared with conventional BPD, The QPD is closer to the transfer characteristic of a linear PD and thus has the advantage of smaller jitter generation when the CDR is locked. When the CDR is locked, the input phase error of the QPD stays within  $(-90^{\circ}, 90^{\circ})$  and the QPD controls the QCP to produce only two smaller levels of output current. Thus, less jitter is generated due to the bang-bang switching of the QPD in locked state. On the other hand, the operational speed of the QPD is similar to that of conventional 1-bit BPD, which is still limited by the maximum toggling speed of the DFFs.



Fig. 3.7. Timing diagram of CK<sub>90</sub> and DIN under different phase errors



Fig. 3.8. Output waveforms of the QPD with fixed input frequency difference

| Phase Error | -180 <sup>°</sup> ~-90 <sup>°</sup> | -90 <sup>°</sup> ~0 <sup>°</sup> | $0^{0} \sim 90^{0}$ | 90 <sup>°</sup> ~180 <sup>°</sup> |
|-------------|-------------------------------------|----------------------------------|---------------------|-----------------------------------|
| (PDQ, PDI)  | 00                                  | 10                               | 11                  | 01                                |
| CP Current  | -3I <sub>pcp</sub>                  | -I <sub>pcp</sub>                | I <sub>pcp</sub>    | 3I <sub>pcp</sub>                 |

Table 3.2. Mapping relationship between phase error, QPD output and QCP output

The schematic of the HBPD is shown in Fig. 3.9. It is composed by two double-edge Dflipflops (DEDFF) and a modified double-edge D-flipflop. A DEDFF is similar in function to a regular D-flipflop (DFF). What is different is that, for a DEDFF, the clock signal can sample the data on both rising edges and falling edges. The schematic of the DEDFF is shown in Fig. 3.10. It consists of two CML latches and a CML 2:1 multiplexer (MUX). One latch samples the input data (D) when CK is high while the other latch samples the input data (D) when CK is low. The MUX selects the output from the first latch when CK is low and selects the output from the second latch when CK is high. The overall effect is that the input data is sampled at both rising edges and falling edges of the input clock. The schematic of the modified DEDFF is shown in Fig. 3.11. What makes the modified DEDFF different from the original one is that the data sampled by the rising edge of the clock is inverted before being passed to the MUX. In Fig. 3.9, DEDFF I samples the input data with CKQ and yields QQ; DEDFF II samples the input data with CKI and yield QI. After that, QI samples QQ via the modified DEDFF and yields the phase difference signal (PD), which represents the sign of the phase error between DATA and CKQ. More details about the timing diagram and working principle of the HBPD can be found in [25].



Fig. 3.9. Schematic of half-rate binary PD



Fig. 3.10. Schematic of double-edge D-flipflop



Fig. 3.11. Modified double-edge D-flipflop with inversion on rising edge sampling

III.3.3 Linear Frequency Detector

A linear frequency detector (LFD/FD) is proposed to detect the frequency difference between the input data and the output clock. The linear frequency detector is a digital implementation of frequency detectors based on quadri-correlator. Quadri-correlator was originally proposed by Richman in the early days of color television [26]. It was first used in a clock recovery circuit by Bellision [27] and was first successfully implemented in a 50MHz circuit by Cordell [28]. The block diagram of a frequency detector based on unbalanced quadri-correlator is shown in Fig. 3.12. The edge detector converts both rising edges and falling edges of the input data into pulses so that the output signal of the edge detector has a strong frequency component at the input data rate. The edge detector is usually implemented as the combination of a differentiator and a regulator. The output signal of the edge detector is mixed with two quadrature clock signals and passed through a low pass filter (LPF) to get two frequency difference components, i.e.,  $\sin(\omega_1-\omega_2)t$  and  $\cos(\omega_1-\omega_2)t$ . One of the frequency difference components is passed through a differentiator to make its amplitude proportional to the frequency difference  $\omega_1-\omega_2$ . After that, a multiplier and another LPF are used in the same way as AM demodulation to obtain a DC component proportional to the frequency error. Fig. 3.13 shows a frequency detector based on balanced quadri-correlator with similar operating principle. Frequency detectors based on quadri-correlator can have either analog implementation [29] or digital implementation [30].



Fig. 3.12. Frequency detector based on unbalanced quadri-correlator



Fig. 3.13. Frequency detector based on balanced quadri-correlator

The LFD implementation used in the CDR prototype is based on unbalanced quadricorrelator; its schematic of the LFD is shown in Fig. 3.14. The LFD is built on top of the QPD. It takes PDI and PDQ (generated by the QPD as shown in Fig. 3.6) as the input signals. Since PDI and PDQ represent the phase difference information and they have orthogonal phase, they are similar to the two signals at the output of the first two LPFs, i.e.,  $\cos(\omega_1-\omega_2)$ t and  $\sin(\omega_1-\omega_2)$ t in Fig. 3.12. PDQ is passed through a delay cell and yields PDQ1. PDQ is subtracted from PDQ1 and  $\Delta$ PDQ is yielded as the subtracting difference. The delay cell plus the subtractor can be considered as the equivalent digital implementation of the differentiator in Fig. 3.12. After that,  $\Delta$ PDQ multiplies PDI and generates a tri-state FD signal (positive, negative or zero). The FD signal controls the FCP to produce tri-state output current, which is then injected into the loop filter. The edge detector and the first two multipliers in the quadri-correlator are replaced with equivalent function by the building clocks inside the QPD (Fig. 3.6).



Fig. 3.14. Block diagram of the linear FD

The delay cell in the LFD is implemented with a chain of cascaded CMOS inverters. The subtractor and the multiplier are implemented with CMOS logic gates since they all deal with fully digital input and output signals. Since  $\Delta$ PDQ and FD are tri-state signals (1, 0, - 1), they are represented with two digital bits in actual implementation. Particularly, the output signal FD is represented with two digital signals, i.e., FD\_UP and FD\_DN. When FD\_UP is high, the FCP provides charging current. When FD\_DN is high, the FCP provides discharging current. When FD\_DN are low, the FCP is in high-impedance state and produces no output current.

To understand the operating principle of the LFD, let us assume that there is a fixed frequency difference  $\Delta f$  between the input data and the clock. The analysis of the HPBD in [25] shows that: PDI leads PDQ by 90 degrees when  $\Delta f$ >0; PDI lags PDQ by 90 degrees when  $\Delta f$ <0. The timing diagram of the LFD when  $\Delta f$ >0 is shown in Fig. 3.15 (a). As a result of subtraction,  $\Delta PDQ$  is a series of positive pulses and negative pulses. The rising edges of PDQ produce positive pulses while the falling edges produce negative ones. The pulses have the same width t<sub>d</sub>, which is equal to the delay introduced by the delay cell in the LFD shown in Fig. 3.14. Since PDI leads PDQ by 90 degrees, the pulses in  $\Delta PDQ$  have the same polarity as the PDI. Thus the multiplication result of  $\Delta PDQ$  and PDI, i.e., the FD signal, becomes a series of all-positive pulses. The pulses have width of t<sub>d</sub> and have the same period as PDI and PDQ, i.e., the beat period T<sub>b</sub>. The timing diagram of the LFD when  $\Delta f$ <0 is shown in Fig. 3.15 (b). Similar analysis shows that FD is a series of all-negative pulses when  $\Delta f$ <0. Since the period of the FD signal is equal to the beat period T<sub>b</sub>=1/ $\Delta f$ , the

density of the pulses in the FD signal (or its average value) is proportional to the effective frequency difference ( $\Delta f$ ) between the data and the clock. In this sense, the proposed FD structure is a linear FD.



(b).  $\Delta f < 0$  when PDI lags PDQ

Fig. 3.15. Timing diagram of the LFD

Recall the following three conclusions to help understand the auto-off feature of the LFD: (1) PDI measures the phase difference between DIN and  $CK_{90}$ ; (2) PDQ measures the phase difference between DIN and  $CK_{135}$ ; (3)  $CK_{90}$  is designed to be aligned to the center of the data bits in locked state. When the CDR is locked, PDI is toggling rapidly between high and low since the transition edges of  $CK_{90}$  move back and forth by a small amount around the center of the data bits. On the other hand, PDQ takes a constant high value (indicating data is earlier than clock in phase) since the transition edges of  $CK_{135}$  are around 45 degrees later than the center of the data bits. When PDQ stays constantly high,  $\Delta$ PDQ is always zero since the derivative of a constant is zero. Hence the FD signal is always zero since it is just the multiplication result of  $\Delta$ PDQ and PDI. Therefore, the FD signal is automatically turned off when there is no frequency difference, which allows the quiet and standalone operation of the PLL in the locked state of the CDR.

There is an upper limit to the delay introduced by the delay cell (t<sub>d</sub>) in the LFD. If t<sub>d</sub> is larger than  $T_b/4$ , the FD pulses are no longer all-positive or all-negative. The timing diagram when t<sub>d</sub>> $T_b/4$  and  $\Delta f<0$  is shown in Fig. 3.16. The FD output is now a series of pulse pairs. Each pair consists of a negative pulse with width equal to  $T_b/4$  immediately followed by a positive pulse with width equal to  $t_d-T_b/4$ . As long as  $t_d<T_b/2$ , the positive pulse (in wrong direction) is narrower than the negative pulse (in correct direction). Hence, the integrated value of the LFD output signal still has the right sign to push the loop towards the locked state. When  $t_d>T_b/2$ , the positive pulse is wider than the negative pulse and the loop will move away from the locked state since the average value of the FD signal has the wrong sign; in this case, the CDR is no longer able to achieve lock. Therefore, the locking range of the CDR is limited by the time  $t_d$  and the relationship is expressed below,

$$\Delta f_{\max} = \frac{1}{2t_d} \tag{69}$$

In practical application of the proposed PFD,  $t_d$  should be set to be the minimum value allowed by the speed limitation of the given technology to achieve maximum locking range. If  $t_d$  is too small, the pulses generated by the LFD will be too narrow and easily swallowed or heavily distorted; the CDR may not be able to achieve lock properly. Since this design is implemented in CMOS 0.18µm technology,  $t_d$  is set to be a reasonable value of 0.4ns. Thus, the maximum single-side frequency offset which can be handled by the LFD is 1.25GHz in the ideal case.



Fig. 3.16. Timing diagram of LFD when  $t_d > T_b/4$  and  $\Delta f < 0$ 

As illustrated in Fig. 3.16, the FLL has reduced loop gain and loop bandwidth when  $t_d>T_b/4$ due to the cancellation of the positive and negative pulses. That results in longer locking time when the frequency offset approaches  $\Delta f_{max}=1/(2t_d)$ . To address this problem, the LFD can be revised by delaying the signal PDI by  $t_d/2$  before it is multiplied with  $\Delta$ PDQ. The block diagram of the revised LFD is shown in Fig. 3.17. The timing diagram of the revised LFD in the case of  $\Delta f<0$  is illustrated in Fig. 3.18. Due to the extra delay introduced in the path of PDI, the multiplication result contains all negative pulses when  $|\Delta f|<1/(2t_d)$ . When  $|\Delta f|=1/(2t_d)$ , all the pulses are actually merged together and the FD signal stays negative all the time; which makes the FLL converge at the maximum speed. When  $1/t_d>|\Delta f|>1/(2t_d)$ , the pulse width become narrower with the increase of the frequency offset. The timing diagram of the modified LFD in this case is illustrated in Fig. 3.19. The pulse width becomes  $T_b-t_d$ , which drops to zero when  $|\Delta f|=1/t_d$ . Thus, the revised LFD is able to drive the loop towards the right direction within the following frequency offset,

$$\Delta f_{\max} = \frac{1}{t_d} \tag{70}$$

The frequency to voltage gain curve of the LFD before and after the modification is shown in Fig. 3.20. In summary, the revised LFD achieves a larger locking range and faster settling time than the original one.



Fig. 3.17. Block diagram of the revised LFD



Fig. 3.18. Timing diagram of the modified LFD when  $\Delta f < 1/(2t_d)$ 



Fig. 3.19. Timing diagram of the modified LFD when  $1/t_d > \Delta f > 1/(2t_d)$ 



Fig. 3.20. Transfer curve of the LFD before and after modification

The timing diagrams of the LFD shown in the figures above are based on fully toggling data sequence for simplicity. When the input data is a random bit sequence, PDI and PDQ are not exactly periodical signals with 50% duty cycle. That makes the LFD not able to work as ideally as shown in the above analysis. The time-averaged behavior of the LFD still has a gain curve similar to the one shown Fig. 3.20. However, the actual gain of the LFD will decrease with the decrease of the transition density of the input data.

#### III.3.4 Voltage-Controlled Oscillator

Since the proposed CDR uses the QPD which is based on binary phase detectors, the jitter generation of the loop in locked state is not sensitive to the phase noise of the VCO due to the nonlinear loop dynamics [21]. When the CDR is locked, the phase error is very small and the QPD output is still full scale. That makes the QPD similar to a linear PD with very large gain. In this sense, the CDR has very large loop bandwidth in locked state when analyzed as if it is a linear CDR. For a linear CDR, the phase transfer function from the VCO output to the loop output has high-pass transfer characteristic [17]. Thus, most of the low-frequency phase noise of the VCO is filtered out by the loop dynamics due to the large loop bandwidth. It should be pointed out that this only serves an intuitive way to understand this issue. To mathematically describe this issue in a accurate way, nonlinear and discrete-time analysis of loop dynamics as presented in the previous chapter must be used (refer to the analysis given in section 3.5.2 of chapter II).



Fig. 3.21. The block diagram of VCO and its drivers

Since the CDR is not sensitive to the phase noise of the VCO, a 4-stage RC-type ring oscillator is employed to generate 4 phases of differential clocks. The block diagram of the VCO and the attached drivers is shown in Fig. 3.21. The VCO core consists of four cascaded differential stages. The last stage is connected back to the first stage with inverted polarity to achieve additional 180 degree phase shift. Assuming all the four stages are completely symmetrical, the phase shift provided by each stage is 45 degrees after stable oscillation is established. Thus the 4 pairs of clock signals are spaced by 45 degrees. The clock signals generated by the VCO are used to drive the PFD which present a significant amount of input capacitance. Thus, a chain of buffers are placed between the VCO core and the PFD to reduce the capacitance load so that the VCO core is able to cover the frequency of 5GHz in its tuning range across all process corners. The smallest buffers are directly

connected to the VCO core for minimum degradation of the oscillation frequency. In actual layout, it takes quite long metal wires (a few hundred microns) to route from the VCO to the PFD. The long metal wires are connected after the first stage of buffers so that the extra routing parasitics won't affect the frequency of the VCO core. Two more stages of buffers are placed after the long-distance routing so that the clocks arriving at the PFD have large enough amplitude.

The internal schematic of the VCO stages is shown in Fig. 3.22. It's basically a differential pair with tunable bias current and load impedance. M1 and M2 are the differential pair transistors. M3 provides a fixed bias current while R1-R2 provide fixed load resistance. M5-M6 and M8-M9 serve as tunable active load impedance. M8-M9 are tuned by the internal control voltage VC, which is connected to the loop filter output and automatically controlled by the CDR loop. M5-M6 are tuned by the external control voltage VCX, which allows the frequency range of the VCO to be manually adjusted for maximum flexibility during testing of the prototype chip. In order to achieve a wide tuning range while minimizing the variation of the oscillation amplitude, the bias current must be decreased accordingly when the load impedance increases. If the bias current is kept constant while the load impedance increases, the oscillation amplitude will increase quickly together with the decrease of operating frequency. When the load impedance gets too large, the common mode output voltage is so low that differential pair transistors enter deep triode region and oscillation is not able to be established due to less-than-unity loop gain. Therefore, when the control voltage (VC or VCX) increases, the bias current provided by M4 is reduced at the same time via the current mirror consisting of M10 and M11. In this way, a large tuning range is achieved together with stabilized oscillation amplitude. The swing of the control voltages VC and VCX is around 0 to 1.2 V (i.e., Vdd- $V_{th,pmos}$ ). When the control voltage is higher than 1.2 V, the PMOS enters the cut-off region and the load impedance provided by the PMOS transistors is no longer tunable.

The range of operating frequency and oscillation amplitude of the VCO was simulated with post-layout parasitics over all process corners. The simulation results are given in Table 3.3. F<sub>low</sub> and F<sub>high</sub> means the lowest operating frequency (VC=VCX=1.2) and highest operating frequency (VC=VCX=0). A<sub>1</sub> and A<sub>2</sub> are the oscillation amplitude of the VCO core at the lowest and highest operating frequency, respectively. Typical-typical means the corner of typical NMOS and typical PMOS; Slow-fast means the corner of slow NMOS and fast PMOS; and so on. Since the design kit itself doesn't support process variation of poly resistors, the resistor values are manually increased by 10% in slow-PMOS corner and decreased by 10% in fast-PMOS corner to reach the two extreme cases in terms of speed. It can be seen that the desired operating frequency (5 GHz) is well covered in all process corners. The VCO was also simulated across different temperatures in typical process corner to evaluate the sensitivity to temperature drifting. The simulation result is shown in Table 3.4. It verifies that the desired operating frequency of 5 GHz is covered by the tuning range from  $-25^{\circ}$  C to  $125^{\circ}$  C. The simulation results also show that the oscillation amplitude is controlled within 700 mV to 900 mV in the entire tuning range over PVT variations.

The VCO has about 1GHz/V tuning gain for the internal tuning terminal (VC) and 2GHz/V tuning gain for the external tuning terminal (VCX). Since the VCO control voltages (VC or VCX) have a swing of 0~1.2V, the VCO can be tuned by about 1GHz through internal automatic tuning and can be tuned by about 2 GHz through external manual tuning.



Fig. 3.22. Internal schematic of each VCO stage

Table 3.3. The range of frequency and amplitude of the VCO under different corners

| Process Corner  | F <sub>low</sub> (GHz) | F <sub>high</sub> (GHz) | A <sub>1</sub> (mV) | A <sub>2</sub> (mV) |
|-----------------|------------------------|-------------------------|---------------------|---------------------|
| Typical-typical | 3.56                   | 6.32                    | 815                 | 755                 |
| -Slow-slow      | 3.21                   | 5.54                    | 831                 | 751                 |
| Fast-fast       | 4.05                   | 7.29                    | 803                 | 755                 |
| Slow-fast       | 3.83                   | 6.63                    | 618                 | 731                 |
| Fast-slow       | 3.38                   | 6.06                    | 799                 | 890                 |

| Process Corner   | F <sub>low</sub> (GHz) | F <sub>high</sub> (GHz) | A <sub>1</sub> (mV) | A <sub>2</sub> (mV) |
|------------------|------------------------|-------------------------|---------------------|---------------------|
| -25 <sup>0</sup> | 3.76                   | 7.03                    | 748                 | 775                 |
| 50 <sup>0</sup>  | 3.56                   | 6.32                    | 815                 | 755                 |
| 125 <sup>0</sup> | 3.43                   | 5.81                    | 788                 | 700                 |

Table 3.4. The range of frequency and amplitude of the VCO at different temperature

The phase noise of the VCO was simulated with post-layout parasitics in typical process corner at 50°C around the center frequency of 5 GHz. The simulated phase noise is shown in Fig. 3.23. The phase noise at 1MHz offset is measured to be -94.8 dBc/Hz.



Fig. 3.23. The phase noise of the VCO with post-layout parasitics

The tuning curve of the VCO is simulated with the internal control voltage swept from 0 to 1.3V (typical process corner at  $50^{\circ}$ C with post-layout parasitics)). The simulated curve is shown in Fig. 3.24. The tuning gain K<sub>VCO</sub> varies from 570 MHz/V to 910 MHz/V when VC varies from 0 to 1.1V. The tuning gain reaches the maximum value of 910 MHz/V when VC is around 0.7 V. It drops rapidly when VC is higher than 1.2 V because the PMOS transistors start to enter cut-off region.



Fig. 3.24. Tuning curve of the VCO when temp= $50^{\circ}$  C, VCX=0.6 V

The PSRR of the VCO is evaluated by simulating the gain from the power supply voltage to the VCO frequency. The simulation results at different operating frequencies are shown in Table 3.5. The PSRR varies from 100 MHz/V to 300 MHz/V depending on different operating frequencies. It shows that as a RC-type ring oscillator, the VCO is quite sensitive to supply bounces. When power supply decreases, both the PMOS transistors and the

NMOS differential pair move closer to triode region, which makes the gate-to-drain capacitance become larger. Since the oscillating frequency is determined by the RC time constant at the output nodes of each stage, the oscillating frequency goes lower when the power supply decreases. Due to the sensitivity to supply bounces, large decoupling capacitance must be added both on chip and off chip to minimize the voltage variations on the power supply.

Table 3.5. PSRR of the VCO at different operating frequencies

| Operating Frequency (GHz) | 3.55 | 5   | 6.3 |
|---------------------------|------|-----|-----|
| PSRR (MHz/V)              | 300  | 109 | 132 |

# III.3.5 Charge Pump

A quad-level charge pump (QCP) is used in the phase-locked loop to interface with the QPD. The schematic of the QCP is shown in Fig. 3.25. The schematic of the QCP consists of three parts, the charge pump core in the middle, the variation control circuit on the left side and the mismatch control circuit on the right side. The branch consisting of M13, M18 and M15 provides the bias current for the entire charge pump. The dimension of the branch M1~M4 is twice that of the branch M5~M8. M7 and M8 provide a discharging current of  $I_{pcp}$  while M3 and M4 provide a discharging current of  $2I_{pcp}$ . The branch M9~M12 serves as a current mirror to copy the discharging current provided by NMOS transistors to the charging current provided by PMOS transistors. Therefore, M1-M2 and M5-M6 provide

charging current of  $2I_{pcp}$  and  $I_{pcp}$ , respectively. The control signals UP, DN, DN1 are generated by the logic combination of PDI and PDQ so that the charge pump can output 4 levels of current ( $-3I_{pcp}$ , -  $I_{pcp}$ ,  $I_{pcp}$  and  $3I_{pcp}$ ) in different cases. M5 and M8 are controlled by the same signal DN1 because their conduction states are always the opposite of each other in all the possible operating modes of the QCP. If the QPD and the QCP are considered a combined block, the transfer curve from phase error to CP current is like that of a 2-bit quantizer. The current levels are set in this way so that the transition points of the quantizer lie on a straight line without any DNL or INL [31].



Fig. 3.25. Schematic of the quad-level PCP

The mismatch between the charging current and discharging current is controlled by the opamp on the right side of Fig. 3.25. This mismatch control technique was proposed in [32]. The opamp ensures that the drain voltage of M11 and M10 is equal to the output voltage (VOUT) so that both the NMOS current mirrors and the PMOS current mirrors are

well matched on all the three terminals (gate, drain and source). Thus, the charging and discharging current will be ideally matched if the opamp is ideal and the current mirror transistors are perfectly symmetrical. The actual degree of matching is limited by the gain and offset of the opamp and the matching accuracy of the current mirror transistors. The schematic of the opamp employed for mismatch control is shown in Fig. 3.26. It is a single stage amplifier with an extra current mirror to perform differential-to-single-ended conversion. The current mirror also helps to isolate the input and output DC levels so that the opamp operates properly over a large swing. Since the opamp is single-ended, there is systematic input-referred offset when it is placed in the close loop. The transistor M12 (a very small one) is used to cancel the input-referred offset by manually introducing additional offset in the opposite direction. M13 is a transistor with large W and L used as compensation capacitance. Two-stage opamps are not preferred in the QCP here because the entire feedback loop (starting from VREF, passing the opamp, M10 and ending at VREF in Fig. 3.25) would have three stages and it would be very hard to compensate it properly under the given speed requirement. It is worthy of pointing out that there is another positive feedback loop running through VOUT. Although this loop is a positive feedback loop, the PMOS switches M1 or M5 must be turned on in order for the positive feedback loop to be closed. When either M1 or M5 is always on, VOUT may be pushed up till it reaches VDD due to the positive feedback if the QCP operates by itself with a simple load capacitor. However, that's unlikely to happen when the QCP is placed in the CDR loop. VOUT is automatically controlled by the negative feedback of the global CDR loop. When VOUT is higher than the desired voltage, the CDR automatically turns off the

charging current (breaking the positive feedback loop at the same time) and turns on the discharging current to pull down VOUT. Thus, the positive feedback loop in the QCP will not cause stability issues here. This conclusion is also verified by careful whole-loop simulations with different initial voltages on the loop filter.



Fig. 3.26. Schematic of the opamp used for mismatch control in the QCP

The mismatch control circuit only forces the charging current to follow the discharging current regardless of the output voltage. However, the discharging current decreases with the decrease of the output voltage due to the channel length modulation effect on M3 and M7. After using the mismatch control circuit, both the charging current and discharging current decrease with the decrease of output voltage. This is undesirable because the variation of charge pump current causes variation of the loop bandwidth of the CDR, which affects the stability and phase margin of the CDR. Thus, the variation control circuit shown

in the left side of Fig. 3.25 is used to stabilize the operating current of the QCP with the change of the output voltage. A transconductance cell (Gm) is used to sink current from or inject current into the bias branch (M13, M15 and M18 in Fig. 3.25) depending on the value of VOUT. When VOUT is lower than VOM, the Gm cell injects current into the bias branch. When VOUT is higher than VOM, the Gm cell sinks current from the bias branch. The optimum value of the transconductance can be determined from simulation so that the charge pump output current has minimum variation within the output swing of interest. The transconductance cell is implemented as a differential pair with current mirror load and source degeneration (for enhanced input linearity).

The output currents of the QCP are simulated before and after adding the mismatch control and variation control circuit. For convenience, only the output currents when the QCP operates at the level of  $\pm 3I_{pcp}$  (around 120µA) are shown. Fig. 3.27 shows the simulated output current of the QCP without mismatch control and variation control. In this case, the charging and discharging current have large mismatches when the output voltage deviates from the center of the output swing. Fig. 3.28 shows the QCP output current when only mismatch control circuit is used. In this case, the charging current follows the discharging current closely from 0.1V to 1.2V. When the output voltage is higher than 1.2V, the feedback opamp used for mismatch control fails to operate properly because the input transistors (M1 and M2 in Fig. 3.26) go into cut-off region. This explains the mismatch at the high end of the output swing. When the output voltage is lower than 0.1V, VCP is so high (forcing the charging current to be as small as the discharging current) that the transistors M2 and M6 (refer to Fig. 3.25) is near cut off and the negative feedback loop for mismatch control has only a small again. That explains the mismatch at the low end of the output swing. This doesn't bring much loss to the system budget since the effective swing of the VCO control voltage is from 0 to 1.2V. On the other hand, both charging and discharging current drop rapidly with the decrease of VOUT, which points out the need for variation control. Fig. 3.29 shows the simulated output current of the QCP with both mismatch control and variation control. With the addition of the variation control, the variation of the output currents of the QCP is controlled within 10% over an output swing from 0.1V to 11.V.



Fig. 3.27. Output current of the QCP without mismatch and variation control



Fig. 3.28. Output current of the QCP with mismatch control only



Fig. 3.29. Output current of the QCP with both mismatch control and variation control

To evaluate the transient response of the QCP, the control signals of the charge pump (PDI, PDQ) are manually applied to emulate the case when there is a fixed effective frequency difference between the clock and data. Since the switching speed of PDI and PDQ is proportional to the effective frequency difference  $\Delta f$ , the control signals corresponding to the designed maximum frequency offset should be applied as the stress test for the QCP. The CDR is designed to have a locking range of 2 GHz excluding the external manual tuning capability. Thus, 1 GHz frequency offset is used as the stress test condition. The simulated waveforms under this testing condition are shown in Fig. 3.30 (a)-(b). Refer to Table 3.2 for the mapping relationship between PDI, PDQ and the output current of the QCP. The designed four levels of the QCP output current are 120  $\mu$ A, 30  $\mu$ A, -30  $\mu$ A, -120  $\mu$ A. It can be observed from the simulation result that the QCP is able to handle frequency offset of 1 GHz marginally. The transient waveforms of the QCP at frequency offset of 0.5 GHz is shown in Fig. 3.30 (c)-(d) for comparison. The QCP output currents exhibit much better switching properties at 0.5 GHz frequency offset.

The charge pump used in the FLL is similar to the QCP for the PLL. It uses the same mismatch control and variation control circuit. However, it only has one output branch to produce tri-state output current ( $I_{fcp}$ ,0,  $-I_{fcp}$ ).  $I_{fcp}$  is around 1mA to achieve fast settling speed for the FLL. The schematic of the FCP is shown in Fig. 3.31. The static charging current and discharging current derived from DC simulation is shown in Fig. 3.32. It verifies the effect of the mismatch and variation control circuit.



Fig. 3.30. Transient waveforms of the QCP with LPE at different frequency offset

The control terminals of FCP are connected to the output signals of the LFD, i.e., FD\_UP and FD\_DN. Recall that FD\_UP and FD\_DN are a series of pulses and the pulse density is proportional to the effective frequency difference. Thus, the output signals of the LFD under input frequency difference of 1 GHz is applied to the FCP as the stress test condition to evaluate its transient response. The simulated transient output current of the FCP when  $\Delta f=\pm 1$  GHz is shown in Fig. 3.33. It verifies that the FCP has enough speed margin at the maximum frequency offset of interest.



Fig. 3.31. Schematic of the tri-state FCP



Fig. 3.32. DC output current of the FCP



Fig. 3.33. Transient output current of the FCP when  $\Delta f=\pm 1$  GHz

## III.3.6 Full-System Performance

#### III.3.6.1 Lock-in Dynamics

The CDR is modeled with Simulink modules in Matlab. The macro model is implemented mainly for two reasons. The first reason is that more insight can be gained into the systemlevel characteristic performance of the CDR. The second reason is that the performance of the macro-model implementation and the transistor-level implementation can be compared to identify the critical blocks that limit the performance of the entire system. The transient waveforms of the CDR macro model during the locking process are shown in Fig. 3.34. The entire locking process can be divided into three regions, i.e., the unlocked region, the transition region and the locked region. When the CDR is far from being locked (i.e., the unlocked region), both the PLL and the FLL are actively operating. In the unlocked region, the phase error changes cyclically while the frequency offset gradually decreases. The change of phase error can be seen from the waveforms of the PCP current since the PCP current is just the quantized result from the phase error. The cycle of the PCP current gradually increases, which indicates the decrease of frequency offset with time. In this region, the main locking force is provided by the FD signal and the FCP current. The FD signal is a series of negative pulses which keep pulling down the VCO control voltage toward zero (the designated target control voltage in the macro model). The density of the negative pulses decrease with the decrease of the frequency offset. The FCP current is simply proportional to the tri-state FD signal. The PCP current provides little contribution since the integration of each cycle of the PCP current is close to zero. It can be found from Fig. 3.34 by careful observation that the negative half cycle of the PCP current actually lasts longer than the positive half cycle. That is because the negative half cycle reduces the frequency offset and slows down the change of phase. In contrast, the positive half cycle increases the frequency offset and speeds up the change of phase. Thus, the integration of the PCP current also aids the locking although its contribution is minor compared with the FCP current. When the CDR goes into the transition region, the FD signal starts to have both negative pulses and positive pulses due to the switching of the PDQ (MAG) signal when the phase error moves back and forth around 90°. The mechanism for the FD to produce both positive and negative pulses in this region can be understood by studying the timing diagram and state machine of the PFD. The details are not shown here to avoid too much complexity. In this region, the PLL (PCP) gradually takes over the locking process from the FLL (FCP). The completion of the takeover happens when the frequency offset is smaller than what can be handled by the larger level of the PCP, i.e.,  $\Delta f < 3I_{pcp}RK_{vco}$ . In this

region, the main locking force is provided by the PCP (operating at the larger level, i.e.,  $\pm 3I_{pcp}$ ). The FCP makes little contribution since the positive and negative pulses occur in pairs in most cases and cancel each other. At the end of the transition region, the phase error is reduced to below 90° from above 90° and stays below 90° (i.e., no more cycle slipping will occur). In the locked region, the FLL is completely shut off with both LFD and FCP in high-impedance state. Meanwhile, the PLL alone brings the phase error around the target of 0° and the phase error stays locked around 0° with a certain amount of jitter. In this region, the PCP only works at the smaller level ( $\pm I_{pcp}$ ) since the phase error is limited within (-90°, 90°). Thus, the CDR is able to have small jitter generation in the locked state since the loop gain is significantly smaller.



Fig. 3.34. Transient waveforms of the CDR macro model during the locking process

The transistor-level implementation of the CDR is also simulated and the derived transient waveforms are shown in Fig. 3.35. The control voltage starts from an initial value of 0.75V and settles towards the target value of 0.775V. The transient waveforms derived from the transistor-level implementation are similar to those derived from the macro-model implementation. There are a few differences to be explained. For the transistor-level implementation, the FD signal is split into two single-bit signals, i.e., FUP and FDN. FUP signal is low effective and controls the charging current of the FCP via a PMOS switch while FDN signal is high effective and controls the discharging current of the FCP via a NMOS switch. The output jitter of the VCO shown in Fig. 3.35 is normalized to a virtual reference bit sequence at 10Gbps. It wraps around to 0 when it exceeds 100ps (1UI) and will wrap around to 100ps when it gets smaller than 0, as shown in Fig. 3.35. The absolute value of the output jitter is not important since the phase of the input data relative to the virtual reference sequence is somewhat arbitrary during actual simulation. What is important is that when the CDR is locked, the output signal should have almost constant phase or jitter (with small variation) relative to the virtual reference sequence. The amount of variation on the output jitter will be the simulated jitter generation of the CDR (without considering noises, however). Similar to the macro-model implementation, there are only positive pulses (FUP pulses) in the unlocked region. The cycle slipping can be observed from the waveform of the output jitter which takes the shape of saw tooth in the unlocked region. In the transition region, there are both FUP and FDN pulses. The CDR enters the locked region at the last cycle slipping. In the locked region, the LFD stays off, i.e., no more FUP or FDN pulses. The PDQ(MAG) signal stays low, which indicates the phase

error is limited within the region of  $(-90^{\circ}, 90^{\circ})$  and the PCP operates only at the smaller level. The ripple on the control voltage is very small in the locked state (less than 5 mV).



Fig. 3.35. Transient waveforms of the transistor-level CDR during the locking process

## III.3.6.2 Locking Range

The locking range of the CDR is limited by several potential factors, i.e., the LFD, the swing of the VCO control voltage and the tuning range of the VCO. As previously discussed, the proposed LFD structure can tolerate effective frequency difference of  $\pm 2$  GHz. Meanwhile, the VCO can be tuned by about 1GHz (i.e., 2 GHz effective frequency difference due to the half-rate architecture) with internal tuning only, considering the swing of the VCO control voltage and the tuning gain of the VCO. Thus, ideally, the CDR should

have a locking range of  $\pm 2$  Gbps. However, the actual locking range derived from simulation is  $\pm 1.4$ Gbps. When the frequency offset approaches 2GHz, the frequency of the output signals of the QPD (i.e., PDI and PDQ) and LFD (FUP and FDN) also approaches 2GHz. The logic gates in the LFD and the charge pump are designed in CMOS static logic to save power dissipation. Such a high frequency can not be properly handled by these CMOS logic gates, which makes the CDR unable to achieve lock as designed. That explains the reduction of the simulated locking range. Fortunately, the locking range of  $\pm 1.4$  Gbps is more than enough for the SONET application.

#### III.3.6.3 Jitter Generation

The jitter generation refers to the output jitter of the CDR in locked state when the input data is an ideal PRBS with no jitter. According to the specifications defined by SONET OC-192 standard [15], the peak-to-peak jitter generation of the CDR should be smaller than 0.1 UI, i.e., 10ps, within the bandwidth from 50 KHz to 80 MHz. The simulated jitter generation of the CDR in locked state at typical corner and 50°C is shown in Fig. 3.36. The simulated peak-to-peak jitter generation is 4.8 ps. The simulated RMS jitter generation is 0.55 ps.

It needs be pointed out that the simulated jitter generation does not take any noise (thermal noise or flicker noise) into consideration because there is no way to incorporate the small-signal noise sources into transient simulation. The simulated jitter generation is mainly caused by the bandwidth limitation of the QPD. This conclusion is verified by applying a

toggling sequence (010101...) to the CDR. When a toggling sequence is applied, the peakto-peak jitter generation drops to only 0.5 ps, which is much smaller than the peak-to-peak jitter generation when PRBS is applied. When an ideal PRBS is passed through a bandwidth-limited block, inter-symbol interference (ISI) is produced, which causes the transition edges of the data bits to move back and forth. Thus, a certain amount of jitter is produced. This type of jitter depends on the data patterns and is sometimes also called deterministic jitter (DJ) due to its predictability. For the QPD, both the sampling latches and the internal buffers introduce DJ. The QPD introduces a significant amount of DJ since the data rate of 10 Gb/s approaches the speed limit of CMOS 0.18 µm technology.



Fig. 3.36. Jitter generation of the CDR in locked state

#### III.3.6.4 Jitter Transfer

The jitter transfer characteristic of the CDR refers to the relationship between the input jitter and output jitter when sinusoidal input jitter with certain amplitude and frequency is applied. Since the proposed CDR is based on binary PD, whether the output jitter is able to track the input jitter depends on the phase slew rate of the loop (PSRL). The phase slew rate of the CDR is the maximum phase changing slope which can be provided by the loop [20] For detailed analysis of jitter transfer characteristic of binary CDR, please refer to section 3.3 of chapter II. If the voltage variation on the capacitor in the loop filter is ignored, the PSRL is given by the following equation,

$$PSRL = I_{CP}RK_{VCO} \tag{71}$$

When the slope of the input jitter is smaller than the PSRL, the output jitter tracks the input jitter closely. When the slope of the input is larger than the PSRL, the output jitter is not able to fully track the input jitter and the input jitter is attenuated. For a sinusoidal jitter Asin( $\omega$ t), its maximum slope is equal to A $\omega$ . If the frequency of sinusoidal input jitter is increased while its amplitude is fixed, the maximum slope of the input jitter will exceed the PSRL at a certain frequency. The jitter transfer bandwidth is defined as the frequency at which the input jitter is attenuated by 3 dB. To satisfy the jitter transfer bandwidth around 5-10 MHz when the input jitter is sinusoidal and has a fixed amplitude of 0.15 UIpp [20]. When the capacitor in the loop filter is sufficiently large, the 3 dB jitter transfer

bandwidth can be approximated by the following expression (refer to section 3.3 of chapter II),

$$f_{3dB} = \frac{IRK_{VCO}}{2\sqrt{2}A_{JIN}}$$
(72)

By substituting the variables with the actual designed parameters, it can be calculated that the CDR prototype has a jitter transfer bandwidth of 10 MHz.

### III.3.6.5 Jitter Tolerance

The jitter tolerance of the CDR is defined as the maximum input jitter amplitude that can be applied to the CDR without causing bit errors. The jitter tolerance has different values at different input jitter frequencies due to the frequency dependence of jitter transfer characteristic. The plot of the jitter tolerance versus the input jitter frequency is called jitter tolerance mask, which means any input jitter below the mask in a frequency-amplitude plot can be tolerated by the CDR without causing bit errors. Please refer to section 3.4 of chapter II for theoretical derivations of jitter tolerance at different frequency bands. SONET OC-192 requires the jitter tolerance to be at least 0.15UI<sub>pp</sub> from 4 MHz to 40 MHz, 1.5UI<sub>pp</sub> from 24KHz to 400KHz and 15UI<sub>pp</sub> from 10Hz to 2.4 KHz [15]. The jitter tolerance of the prototype is plotted against the jitter tolerance walues above 1 MHz are derived by circuit simulations. The jitter tolerance values below 1 MHz are directly calculated from the modeling equations given in section 3.4 of chapter II since the simulation time is way too

long to extract the jitter tolerance value at very low frequencies. The plot indicates that the jitter tolerance of the designed CDR exceeds the jitter tolerance mask defined by SONET standard with enough margin.



Fig. 3.37. Jitter tolerance of the CDR versus SONET jitter tolerance mask

### III.3.6.6 Tolerance to Mismatch of Charge Pump Current

Although mismatch suppression circuits are used in the PCP and FCP, the output currents of the charge pumps are still able to have a certain amount of mismatch due to random mismatch between current mirror transistors. The mismatch is usually caused by mismatch of threshold voltage and transistor sizes. The effect of the mismatch of the charge pump currents on the proper operation of the CDR should be investigated.

The mismatch of the PCP current will result in a small amount of phase offset when the CDR is locked. Let us assume that the mismatch between the charging current and discharging current of the PCP can be expressed as,

$$\Delta I_{pcp} = I_{charge-pcp} - I_{discharge-pcp}$$
(73)

The amount of phase offset in terms of unit interval (UI) can be estimated as,

$$\Delta \phi = \Delta I_{pcp} R K_{vco} T_s \tag{74}$$

where  $T_s$  is the sampling frequency of the QPD, which is equal to the incoming data rate. Simple numerical calculation shows 10% mismatch in the current of the PCP yields very small phase offset of  $2.2 \times 10^{-5}$  UI. Therefore, the loop is not sensitive to the current mismatch of the PCP.

When there is significant mismatch in the output current of the FCP, the loop may be stuck in the transition region and not able to proceed to the locked region. As previously shown in the transient waveforms of the loop (see Fig. 3.34 and Fig. 3.35), in the transition region, the QPD provides the main locking force while the average value of the LFD output is zero if the FCP current is ideally matched. In the presence of mismatch, the average value of the LFD output is no longer zero. When the average value of the LFD output goes in the opposite direction with the QPD output but has larger magnitude, the loop is not able to proceed into the locked state. Macro-model simulations are performed with loop parameters extracted from the transistor-level design to evaluate the loop's tolerance to FCP current mismatch. Simulation results show that the CDR is able to tolerate 20% mismatch in the current of the FCP. Monte-Carlo simulations of the transistor-level FCP are performed to evaluate the current mismatch of the FCP. The simulated current mismatch has a standard deviation of 2.5%, which verifies that the CDR is able to achieve lock at a high confidence level of eight sigma (20%/2.5%=8).

#### III.3.6.7 Supply Bounce Tolerance

Similar to the jitter tolerance analysis given in section 3.4 of chapter II, the tolerance to supply bounce also depends on the frequency band of the supply bounce. The tolerance to the supply bounce is determined by whether the loop can track the supply bounce and produce a corresponding signal on the VCO control signal to cancel the effect of the bounce on the output phase. When the supply ripple has high frequency (i.e., the voltage variations on the filter capacitor are ignorable within a single cycle of the ripple signal), the tracking capability is determined by the resistor in the loop filter. At high frequencies, the tracking capability of the loop in terms of phase slew rate is just the PSRL, which is defined in (71). On the other hand, let us assume the supply bounce is defined by  $Asin(\omega t)$  and the gain from the supply voltage to the VCO frequency is denoted as a simple constant  $K_{VDD}$  (this assumption is true when the ripple frequency is up to a few hundred MHz as verified by simulations). The maximum phase slope caused by the supply bounce is then given by,

$$PSR_{VDD} = AK_{VDD} \tag{75}$$

By combining (71) & (75) and setting  $PSR_{VDD}$  to be equal to PSRL, the maximum amplitude of high-frequency supply bounce that can be tolerated by the CDR is obtained as,

$$A_{\max-hf} = \frac{IRK_{VCO}}{K_{VDD}}$$
(76)

If the worst case value of  $K_{VDD}$  (300 MHz/V) is used, the calculated high-frequency supply bounce tolerance is 7 mV for the designed CDR.

When the frequency of the supply bounce is pretty low, the tracking capability of the loop is mainly provided by the capacitor in the loop filter (refer to section 3.4 of chapter II). The maximum frequency slew rate the loop is able to provide is given by,

$$FSRL = \frac{K_{VCO}I_{CP}}{C}$$
(77)

Following the same assumption that the supply bounce is defined by  $Asin(\omega t)$ , the frequency slew rate in the VCO output signal that the power supply bounce is able to induce is given by,

$$FSR_{VDD} = A\omega K_{VDD} \tag{78}$$

By combining (77) & (78) and setting  $FSR_{VDD}$  equal to FSRL, the maximum amplitude of the supply bounce at frequency  $\omega$  is obtained as,

$$A_{\max-lf} = \frac{K_{VCO}I_{CP}}{\omega CK_{VDD}}$$
(79)

It's easy to see from (79) that the supply bounce tolerance at low frequency is inversely proportional to the frequency of the supply bounce. For example, the designed CDR can tolerate 120 mV<sub>pp</sub> supply bounce at 10 KHz or 1.2 V<sub>pp</sub> supply bounce at 1 KHz. Since the supply bounce tolerance is also inversely proportional to the capacitor in the loop filter, varying the value of the capacitor becomes a trade-off between supply bounce tolerance and loop locking time.

The intersection frequency of the low-frequency band and high-frequency band can be derived by setting  $A_{max-lf}$  and  $A_{max-hf}$  to be equal. The intersection frequency is obtained as 1/RC, whose numerical value is 510 KHz for this design. Since the supply bounce signal is mainly focused on frequencies below 10 KHz with higher frequency components heavily attenuated by decoupling capacitors,  $A_{max-lf}$  serves as a more useful guideline for this particular application.

## III.3.6.8 Tolerance to Process Variation and Temperature Drift

Post-layout simulation of the CDR is performed across all the process corners. The CDR works well in the fast and typical process corners but fails to lock in the slow process corner. The typical temperature used to carry out the simulations is set to be 50°C due to the large power dissipation of the chip. The module which is the speed bottleneck in the CDR loop is found to be the QPD (the speed bottleneck module is located by replacing different

modules with the macro-model modules and observing the performance improvement). Also, in slow process corner, the output buffer does not have enough bandwidth; even if an idea PRBS is fed to the output buffer, the eye diagram of the output data is almost fully closed. In typical process corner, the CDR can work up to a temperature of 97°C. It verifies the conclusion that circuits designed under CMOS 0.18µm without using inductive peaking can work at 10Gb/s with very small margin.

## III.3.6.9 Data Pattern Dependence

Since the QPD doesn't have a high-impedance state, its output keeps the last state when the incoming data doesn't have any transition. When the incoming data has a very long run of ones or zeroes, the filter capacitor will keep being charged or discharged to one side only until the loop loses lock. Assuming the incoming data has a run of N bits, the phase deviation caused by this long run is given by,

$$\Delta P_{lr} = I_{CP} R K_{VCO} N T_S \tag{80}$$

 $T_s$  is the bit period of the incoming data. To ensure no bit error is caused, the phase deviation should be smaller than 0.5 UI (a lower limit should be used for practical design to allow for other jitter sources). The maximum length of consecutive bits which can be tolerated by the loop is derived by substituting  $\Delta P_{lr}$ =0.5 UI into (79),

$$N \le \frac{1}{2I_{CP}RK_{VCO}T_S} \tag{81}$$

Using the parameters for this CDR design, the calculated maximum value of N is 2232 bits. SONET only requires the CDR to be able to handle 72 consecutive run of bits [15]. Thus, the lack of a high-impedance state in the QPD doesn't introduce any real problem to the CDR.

#### III.3.6.10 Full-Chip Performance Summary

The chip was manufactured in TSMC 0.18µm CMOS technology. The micro-photograph of the chip is shown in Fig. 3.38. A summary of the simulated performance of the entire CDR chip is given in Table 3.6. The simulated jitter generation has an rms value of 0.55 ps and a peak-to-peak value of 4.8 ps. The jitter generation required by SONET OC-192 is 1 ps (rms) or 10ps (peak to peak). The CDR exhibits no peaking in the jitter transfer characterisitic. The maximum peaking allowed by SONET OC-192 is 0.1dB. The jitter tolerance of the CDR exceeds the jitter tolerance mask specified by SONET as shown in Fig. 3.37. The core of the chip consumes power dissipation of 160 mW. The overall power dissipation of the chip including all I/O buffers is 250 mW. The area of the chip including the pad frame is 1.1 mm×1.1 mm. Performance comparison between the prototype chip and the existing solutions is given in Table 3.7. Compared with the existing solutions, the prototype chip achieves smaller area (as a result of inductorless design), smaller jitter generation and larger locking range with comparable power dissipation.



Fig. 3.38. Micro-photo of the CDR prototype chip

Table 3.6. Simulated full-chip performance summary of the CDR

| Jitter Generation         | 0.55ps (rms) / 4.8ps (p2p)         |  |  |
|---------------------------|------------------------------------|--|--|
| Jitter Transfer Bandwidth | 10 MHz                             |  |  |
| Jitter Peaking            | Less than 0.1 dB                   |  |  |
| Jitter Tolerance          | Exceed SONET jitter tolerance mask |  |  |
| Power Dissipation         | 160 mW (core) /250 mW (all)        |  |  |
| Area                      | 1.1 mm×1.1 mm                      |  |  |

|              | Jitter<br>Gen.<br>(rms, ps) | Jitter Gen.<br>(p2p, ps) | Locking<br>Range<br>(GHz) | Process<br>(µm) | Supply<br>Voltage | Power<br>(W) | Area<br>(mm <sup>2</sup> ) |
|--------------|-----------------------------|--------------------------|---------------------------|-----------------|-------------------|--------------|----------------------------|
| This<br>work | 0.55                        | 4.8                      | 2.8                       | CMOS<br>0.18    | 1.8               | 0.16         | 1.1×1.1                    |
| [20]         | 0.78                        | N/A                      | 0.2                       | SiGe            | -5                | 1.5          | 3×3                        |
| [21]         | 0.8                         | 5.4                      | N/A                       | SiGe            | -5                | 4.5          | 4.5×4.5                    |
| [22]         | N/A                         | 6.5                      | N/A                       | CMOS<br>0.18    | 1.8               | 1.32         | 2.5×2.1                    |
| [23]         | N/A                         | 3                        | N/A                       | CMOS<br>0.13    | 1.2               | 0.99         | 19×19                      |
| [24]         | N/A                         | 1.8                      | N/A                       | CMOS<br>0.09    | 1.2               | 1.65         | 5×5                        |
| [25]         | 0.8                         | 9.9                      | 1.43                      | CMOS<br>0.18    | 1.8               | 0.091        | 1.75×1.55                  |

Table 3.7. Performance comparison between this work and existing solutions

#### III.4. Conclusion

A 10Gb/s clock and data recovery chip prototype is presented in this work. The CDR is based on referenceless dual-loop half-rate architecture. It reduces the cost and increases the level of integration of the entire system by being able to compare the frequency difference between random data and periodical clock without the aid of external reference frequency. The half-rate architecture relieves the design stringency of the VCO since the VCO only needs to operate at 5 GHz. The VCO is implemented as a 4-stage RC-type ring oscillator providing 4 pairs of differential clocks spaced by 45 degrees. The PFD consists of a quadlevel PD and a linear FD. It enables the bandwidth of the phase-locked loop and frequencylocked loop to be optimized independently. Thus, the CDR is able to achieve a large locking range, short locking time and small jitter generation at the same time. A quad-level charge pump is designed to work together with the quad-level PD. Mismatch suppression circuit is used in the charge pump to minimize the mismatch of charging current and discharging current. Variation suppression circuit is used in the charge pump to minimize the variation of the output current of the charge pump and thus minimize the bandwidth variation of the entire loop. The chip is designed and simulated with post-layout parasitics (including all the pads) in TSMC CMOS 0.18  $\mu$ m technology. Simulation results indicate that the designed CDR exceeds the specification prescribed by SONET OC-192 standard.

#### CHAPTER IV

# A FULLY-DIFFERENTIAL LOW-POWER DIVIDE-BY-8 INJECTION-LOCKED FREQUENCY DIVIDER UP TO 18GHZ

#### IV.1. Introduction

Phase-locked loops are widely used in modern communication systems. With everincreasing demand for larger bandwidth, the required operation frequency of the phaselocked loops (PLLs) keeps getting higher. On the other hand, more and more communication chipsets are used in mobile devices, which require PLLs with low power dissipation to achieve longer battery life. In PLLs, most of the power is consumed by the VCO and the frequency dividers which operate at a much higher frequency compared with other components within the loop. It remains a challenging task to design high frequency dividers with low power dissipation.

Current-mode logic (CML) static frequency dividers are widely used in high-speed PLLs because of simple design and robust operation. However, they consume significant amount of power with high incoming frequencies. Injection-locked frequency dividers (ILFD) are gaining popularity in recent years because they can dissipate less power for the same operating frequency. Unlike static frequency dividers which can operate with incoming frequencies approaching DC, ILFD performs frequency division correctly only when the incoming frequency stays within a range, denoted as the locking range. Various ILFD structures have been reported in existing literatures [33]-[35]. However, the existing

solutions suffer from various problems; e.g., the locking range is too narrow and it shifts with the input signal amplitude. Also, the existing structures use either single-ended input or pseudo-differential input, which limits their application in high-performance low-noise systems that requires fully differential signals throughout the entire system.

To address these issues, a fully-differential ILFD structure with low power dissipation is proposed in this work. The ILFD structure implements a division ratio of eight to produce 8-phase output signals that are 45 degrees apart from each other. Furthermore, the proposed topology can be easily modified to implement other even division ratios. Section II gives a brief introduction of the existing solutions of ILFD structures and compares their advantages and drawbacks. Section III presents the proposed ILFD structure, explains the locking mechanism and performs characterization of the locking range, sensitivity and phase error. Section IV analyzes the measurement results. Section V draws conclusions from this paper.

#### IV.2. Conventional Frequency Dividers

#### IV.2.1 CML Static Frequency Divider

CML static frequency dividers are widely used in multi-gigahertz PLLs to divide the high frequency signal generated by the VCO into a signal with frequency lower enough to be handled by the following programmable frequency dividers implemented in CMOS logic. The basic CML static frequency divider is a divide-by-2 cell which consists of a CML D-

flipflop (DFF) with the output terminals (Q) connected back to the input terminals (D) in reversed polarity. The divide-by-2 cells can be cascaded to implement higher division ratios. A divide-by-8 CML static frequency divider is shown in Fig. 4.1. CML static frequency dividers have a higher limit of operating frequency set by the maximum toggling frequency of the first DFF in the divider chain. Assuming the maximum toggling frequency of a CML DFF in a given process is  $f_{t-max}$ , the maximum input frequency of a CML static frequency divider will be around  $2f_{t-max}$ . There is no lower limit to the input frequency for CML static frequency dividers. High-speed CML static frequency dividers are usually power hungry because the DFFs in the beginning stages have to handle very high frequencies.



Fig. 4.1. Conventional divide-by-8 CML static frequency divider

#### IV.2.2 Injection-locked Frequency Divider

Injection-locked frequency divider (ILFD) usually consists of an oscillator with one or more terminals for signal injection. If no input signal is applied, the oscillator operates at its free-running frequency. When the input signal is injected, the phase of the output signal is locked to the phase of the input signal while the frequency of the output signal stays at a sub-multiple of the input frequency. An ILFD based on a 5-stage RC-type ring oscillator was proposed in [33] and its schematic is shown in Fig. 4.2. The input signal  $V_{inj}$  is injected into the bias terminal of the differential pair of the first stage via AC coupling. The ILFD implements a division ratio of 8. It has a locking range of 25 MHz with 1 GHz input frequency when the injected power is equal to 0 dBm. However, the locking range of this ILFD is too small to be used in most practical applications.



Fig. 4.2. Ring-oscillator based ILFD proposed in [33]

A single-ended divide-by-2 ILFD was proposed in [34]. The schematic of the ILFD is shown in Fig. 4.3. It is based on a 3-stage ring of NMOS inverters with PMOS active load. The input signal is injected into the gate terminal of a NMOS switch (M7) sitting across the output nodes of the second and third stage. When the switch is turned on, the output nodes of the 2nd and  $3^{rd}$  stages are shorted and  $V_{out+}$  and  $V_{out-}$  are forced to be equal. If  $V_{out+}-V_{out-}$ is defined as the differential output voltage, the positive peaks of the input signal will be locked to the zero-crossing points of the differential output voltage in locked state. When the ILFD is locked, the output frequency is equal to half of the input frequency since there is one peak point per input period and two zero-crossing points per output signal. This ILFD consumes 43  $\mu$ W and has a locking range from 2.1 GHz to 4.3 GHz with 0.7V supply. However, the two output signals provided by this ILFD are actually far from real differential signals as verified by simulation results, which makes it unsuitable for applications requiring true fully differential outputs. Also, this ILFD is sensitive to power supply variations and common-mode noise interferences since it is based on single-ended NMOS inverters.



Fig. 4.3. Single-ended divide-by-2 ILFD proposed in [34]

A divide-by-2 ILFD based on an LC oscillator was proposed in [35]. The schematic of the ILFD is shown in Fig. 4.4. The locking mechanism is similar to the ILFD based on an inverter chain proposed in [34]. The input signal is injected via the gate terminal of the NMOS switch M3. When the NMOS switch is turned on, the two output terminals are shorted. Therefore, the positive peaks of the input signal are locked to the zero-crossing points of the differential output signal when the ILFD achieves lock. Similar to the ILFD based on an inverter chain, this ILFD also implements a division ratio of two. Due to the

high quality-factor of LC oscillator, this locking range of this ILFD is relatively narrow (3% around 50 GHz and 19% around 15 GHz).



Fig. 4.4. Divide-by-2 ILFD based on LC oscillator proposed in [35]

### IV.3. Proposed Divide-by-8 ILFD

## IV.3.1 Structure of the Proposed ILFD

To overcome the issues associated with the existing solutions of ILFD, a fully differential ILFD based on latches (LILFD) with a division ratio of eight is proposed. The schematic is shown in Fig. 4.5. It consists of a 4-stage ring of latches. The output terminals of the last latch are connected to the input terminals of the first latch with inverted polarity to achieve additional phase-shift of 180 degrees. The clock terminals of the four latches are tied together and used to inject the differential input signal. The output frequency is equal to one eighth of the input frequency. The output signal can be taken from the  $Q\pm$  terminals of any

of the four latches. The schematic of the latch is shown in Fig. 4.6. It is a CML latch with PMOS active load biased by the control voltage VBP. VBP is used to tune the operating frequency of the LILFD.



Fig. 4.5. Schematic of the proposed divide-by-8 LILFD



Fig. 4.6. D-Latch cell used in each stage of the LILFD



Fig. 4.7. Timing diagram of the input and output signals of the LILFD

The locking mechanism of the LILFD can be explained qualitatively as follows. It is assumed that the injected signal is large enough so that the latches stay hard-switched. When the injected signal ( $V_{in}$ ) is low, the latches preserve the current logic state. When the injected signal is high, the latches work like a differential amplifier; the ILFD operates like a ring oscillator and the oscillation signal (Q) propagates from one stage to the next like a pipeline. If we assume that the oscillation signal propagates by only one stage during the half period when  $V_{in}$  is high, it propagates by only one stage during one full input period since the logic states are preserved during the half period when  $V_{in}$  is low. For a 4-stage ring oscillator, the phase shift provided by each stage is  $45^0$  if all the stages are symmetrical. Thus, one full input period ( $360^0$ ) is equal to  $45^0$  phase shift of the oscillation signal in terms of time length. That means the input and oscillation signal (output) have a frequency ratio of 8:1. Because the latches toggle only after  $V_{in}$  becomes high, the transition edges of the output signal are locked to the rising edges of the input signal with a certain amount of delay. An illustrative timing diagram of the input and output waveforms is shown in Fig. 4.7. The transition edges of Q1-Q4 are delayed by  $t_{ck-q}$  from the rising edge of  $V_{in}$ .  $t_{ck-q}$  is the time for the latch to toggle state after  $V_{in}$  becomes high.  $T_{in}$  is the period of the input signal.



Fig. 4.8. Timing diagrams at the boundaries of the locking range

When the input frequency becomes low enough so that the oscillation signal is able to propagate by more than one stage during the half period when  $V_{in}$  is high, the LILFD is no longer able to lock at a frequency ratio of 8 to 1. The timing diagram illustrating the lower limit of the locking range is shown in Fig. 4.8 (a).  $t_{q1-q2}$  is defined as the time it takes for the signal transition to propagate from one stage to the next when  $V_{in}$  is high. At the lower limit, the output of the next stage (Q<sub>2</sub>) just fails to cross zero level before the falling edge of  $V_{in}$ .

After  $V_{in}$  becomes low,  $Q_2$  reverts to the original level due to the positive feedback of the latch. Thus, the condition to reach the lower limit of the locking range can be expressed as,

$$\frac{T_{in}}{2} < t_{ck-q} + t_{q1-q2} \tag{82}$$

On the other hand, when the input frequency becomes very high, the oscillation signal does not have enough time to propagate by one stage during the half period when  $V_{in}$  is high. Thus, the LILFD is not able to achieve lock. The timing diagram illustrating the higher limit of the locking range is shown in Fig. 4.8 (b). Q<sub>1</sub> crosses zero just before the falling edge of  $V_{in}$ . Thus, the upper limit of the locking range can be expressed as,

$$\frac{T_{in}}{2} > t_{ck-q} \tag{83}$$

Combining (82) and (83), a rough approximation for the locking range is obtained as,

$$\frac{1}{2t_{ck-q}} < f_{in} < \frac{1}{2(t_{ck-q} + t_{q1-q2})}$$
(84)

where  $f_{in}$  ( $f_{in}=1/T_{in}$ ) is the frequency of the input signal. As a rough approximation, both  $t_{ck}$ q and  $t_{q1-q2}$  are proportional to the RC time constant at the output node of the latch in tracking mode. More accurate values of these two delays values have to be determined from simulations because of the various strong nonlinearities involved. It is worthy of pointing out that the LILFD is able to lock at division ratios other than eight (e.g., 4, 7, 16). However, the locking ranges associated with other ratios are much smaller as verified by simulations and thus not discussed here in detail. The actual division ratio at which the LILFD operates is determined by the input frequency. It does not depend on the initial states of the latches, as verified by numerous simulations. Only when the input frequency is not inside any locking ranges for all the possible ratios, the LILFD stays unlocked. When the LILFD is not locked, the output signal contains two frequency components competing with each other, i.e., the input frequency component and the self-oscillating frequency component.

Since the LILFD is based on an RC ring oscillator, its locking range is expected to be much larger than the ILFD based on an LC oscillator proposed in [35] due to lower quality factor (Q). When the quality factor is very high, it's hard for the ILFD to oscillator at frequencies far from the free-running frequency determined by the LC resonator. On the other hand, the input signal is injected into all the four stages instead of only one stage like the ILFD proposed in [33]. Therefore, the LILFD is expected to have a much larger locking range than the ILFD proposed in [33] because of larger injection efficiency. Furthermore, the LILFD has full differential input signals and output signals, which makes it especially suitable for low-noise high-performance applications. In addition, the LILFD is able to produce evenly-spaced 8-phase output signals, which can be readily used to drive building blocks expecting multiple-phase clocks such as phase interpolators or half-rate phase frequency detectors.

Another important advantage of the LILFD is low power dissipation. All the latches in the 4-stage ring only need to toggle at a frequency equal to one eighth of the input frequency. The transistor size and bias current used in these latches can be much smaller than the latches used in a CML static frequency divider with the same input frequency. Therefore, the LILFD consumes significantly less power compared with static frequency dividers when handling the same incoming frequency. On the other hand, when the same transistor dimension and bias current are used for the latches, the LILFD is able to handle significantly higher input frequency compared with static frequency dividers.

The LILFD can be easily modified to achieve other frequency division ratios. The 4-stage ring can be changed into an n-stage ring and achieves a division ratio of 2n under the same locking mechanism. Thus, the LILFD is flexible and suitable to be used in high speed frequency synthesizers and clock multipliers. In comparison, the ILFDs reported in [34]-[35] can only implement a division ratio of 2 and do not have the same flexibility.

#### IV.3.2 Locking Range

The locking range of an ILFD is defined as the input frequency range in which the ILFD is able to divide properly the frequency of the incoming signal by the desired ratio. To extract the locking range, the LILFD was simulated by injecting a sinusoidal signal with specified amplitude and frequency. Fig. 4.9 shows the simulated locking range of the LILFD versus the differential amplitude of the injected signal when VBP is set to 0.3 V (see Fig. 4.6). As seen from the figure, the locking range increases with the increase of the amplitude of the injected signal. The center frequency of the locking range is almost constant and very close to the free running frequency of 12.3GHz. This property makes it much easier to design the LILFD for a particular operating frequency regardless of input signal's amplitude. In comparison, the center frequency of the ILFD reported in [34] is shifted by a large amount with the increase of the input amplitude . For the LILFD, when the amplitude of the injected signal is small, the locking range is almost linearly related to the amplitude of the injected signal. When the amplitude of the injected signal becomes pretty large, the locking range increases more slowly and approaches an upper limit. The underlying reason is that when the amplitude of the injected signal is larger than twice the saturation voltage ( $V_{dsat}$ ) of the differential pair (M5-M6 in Fig. 4.6), the differential pair is fully switched to one side or the other. Therefore, further increase of the amplitude of the injected signal has little effect on the circuit operation and the locking range reaches a maximum value.



Fig. 4.9. Simulated locking range vs. differential input amplitude when VBP=0.3 V



Fig. 4.10. Locking range of the LILFD under different bias conditions

The operating frequency of the LILFD can be tuned by changing the bias voltage VBP. Fig. 4.10 shows the simulated locking range of the LILFD while VBP is swept from 0 to 0.8 V. Both the lower and upper limit of the locking range decrease with the increase of VBP since the free-running frequency of the LILFD decreases with the increase of the load impedance provided by the PMOS transistors. When VBP is higher than 0.4 V, the ratio of the locking range over the center frequency (LROCF) is relatively constant and stays around 50%, as shown in Fig. 4.10. When VBP is lower than 0.3 V, the LROCF drops rapidly with the increase of operational frequency.

#### IV.3.3 Sensitivity

The sensitivity of an ILFD is defined as the minimum input amplitude that must be applied for the ILFD to lock to the input signal with a particular frequency. Fig. 4.11 shows the simulated input sensitivity of the LILFD with the change of the input frequency when VBP is set to 0.3 V. The ILFD achieves minimum input sensitivity when the input frequency is near the free-running frequency, which is equal to 12.3 GHz in this case. If the input frequency is close to the free-running frequency, the sensitivity is almost linearly related to the frequency difference between the input frequency and the free-running frequency. The sensitivity is nearly symmetric on two sides around the free-running frequency; it increases when the input frequency deviates from the free-running frequency until the ILFD loses ability to lock when the input frequency goes out of the locking range.



Fig. 4.11. The input sensitivity of the LILFD versus input frequency when VBP=0.3 V

#### IV.3.4 Phase Error



Fig. 4.12. Normalized phase error vs. input frequency for the LILFD

When the LILFD is locked, the input signal and output signal can be represented by the following expressions,

$$\begin{cases} V_{in} = A_{in} \sin(8\omega t + \varphi) \\ V_{out} = A_{out} \sin(\omega t) \end{cases}$$
(85)

where  $\varphi$  is the phase error between the input signal and output signal. The phase error in the locked state was extracted from simulations over the entire locking range of the LILFD with VBP=0.3 V. The simulated curve is shown in Fig. 4.12, where the phase error has been normalized to  $2\pi$  for simplicity ( $\varphi_{norm}=\varphi/2\pi$ ). At the lower limit of the locking range, the normalized phase error is about 0.3. That means the transition edge of the output signal is delayed by  $0.3T_{in}$  from the rising edge of the input signal. The normalized phase error

increases from 0.3 to 0.5 when the input frequency goes from the lower limit to the higher limit. Thus, the phase error is a monotonic function of the frequency of the injected signal. In this sense, the LILFD closely resembles a type-I PLL in which the steady-state phase error is a function of the input frequency [36].

IV.4. Measurement Results



Fig. 4.13. Test setup of ILFD chip

The LILFD was manufactured in TSMC 0.18um CMOS processing technology through the MOSIS educational program. The test setup to characterize the chip is shown in Fig. 4.13. A single-ended synthesized sweeper is used to generate the input signal. Unfortunately, it is very difficult to generate fully differential input signals with baluns due to the broadband nature of the system under test. Therefore, a low pass filter consisting of a large resistor and a large capacitor is placed on-chip between the two injection terminals ( $V_{IN+}$  and  $V_{IN-}$ ); the input signal is applied to  $V_{IN+}$  while  $V_{IN-}$  only gets the DC level from the input signal due to the DC input

level is provided by a DC power supply via an external high-frequency bias-T component. Fig. 4.14 shows the locking range of the ILFD with large input power (3 dBm) under different biasing conditions. The operating frequency of the ILFD goes from 3 GHz to 18GHz when the bias voltage VBP is swept from 0 to 0.7. Similar to the simulation results, the ILFD has larger locking range when VBP is high and the free-running frequency is low. If VBP is higher than 0.4, the locking range is around 50% of the center frequency. If the operating frequency goes above 10 GHz, the LROCF drops rapidly with the decrease of VBP. For VBP=0, the LROCF is about 4%. The measured locking range is a little smaller than the simulation results at low frequencies while considerably smaller than the simulation results at high frequencies. There are several reasons leading to the reduction of locking range in actual measurement. Firstly, the test setup uses single-ended injection which is less efficient than fully-differential injection. Secondly, the noise on the power supply can prevent the ILFD to lock to the input signal properly around critical conditions. This is especially significant when the oscillation amplitude is very small at high operating frequencies. Lastly, the high-frequency attenuation and impedance mismatch in the test setup and on the PCB decreases the actual available power injected into the chip at high frequencies.



Fig. 4.14. Locking range of the LILFD with 3 dBm input power vs. VBP



Fig. 4.15. Output signal spectrum of the LILFD when locked at 17.6 GHz



Fig. 4.16. Measured output phase noise of the LILFD when locked at 17.6 GHz

Fig. 4.15 shows the measured spectrum of the output signal within 2 MHz frequency offset when the LILFD is locked to 3 dBm input signal at 17.6 GHz. The phase noise plot from 1 KHz to 100 MHz is given in Fig. 4.16. The measured phase noise at 1 MHz offset from the center frequency of 2.2 GHz is -112.9 dBc/Hz. The measured phase noise is very small because the output phase noise is mainly determined by the input phase noise for an ILFD in locked state [37]. In contrast, when the input frequency is out of the locking range, the output spectrum looks like the spectrum of a free-running ring oscillator with a wide spread; the center frequency moves back and forth due to the noise and temperature fluctuations.



Fig. 4.17. Measured locking range of the LILFD vs. input power when VBP=0.3 V



Fig. 4.18. Measured input sensitivity versus input frequency when VBP=0.3 V

The locking range of the LILFD was measured under different input power levels with VBP set to 0.3 V. The measurement result is shown in Fig. 4.17. The locking range is linearly related to the input signal amplitude and symmetric around the free-running frequency when the input power is small. However, when the input power gets pretty large, the locking range loses symmetry and most of the locking range expansion happens on the lower side. This is not a problem for the LILFD in practical applications because fully differential signals are readily available from most high-frequency VCOs in PLL systems. The sensitivity of the LILFD was measured over the entire locking range with the same bias voltage on VBP. The measurement result is shown in Fig. 4.18. Again, due to single-ended signal injection, the sensitivity is not symmetric around the free-running frequency of 11.7 GHz. The measured sensitivity is a little larger than the simulation results due to the noise and interferences in the testing environment.

The chip consumes power dissipation of 3.6 mW under a supply voltage of 1.8 V excluding the power dissipation of the output buffers. The core of the LILFD circuit occupies an active area of 35  $\mu$ m×35  $\mu$ m. The micro-photograph of the chip is shown in Fig. 4.19. As a reference, a divide-by-8 CML static frequency divider was also designed and simulated with post-layout parasitics in the same technology. It achieves a maximum input frequency of 12 GHz with a power dissipation of 5.3 mW. Performance comparison between the LILFD and the existing solutions is given in Table 4.1. It shows that the LILFD has significant overall advantage in terms of power dissipation, maximum operating frequency

and locking range compared with the existing solutions. Compared with CML static frequency divider, the LILFD has significantly less power dissipation and higher maximum operating frequency. Compared with the topologies reported in [33] and [35], the LILFD has much larger LROCR. The LILFD has significantly larger LROCR and lower power dissipation than the ILFD reported in [38] and [39]. Although the divide-by-2 case of the ILFD reported in [39] has a large LROCR (60%), the operating frequency of 3.3 GHz is very low for the given technology (CMOS 0.18 µm); a simple CMOS frequency divider might as well be used for much lower power dissipation. As a matter of fact, the ILFD reported in [34] has much low power dissipation mainly because it is designed to work at relatively low operating frequencies for the given technology and uses a very low power supply voltage of 0.7 V. However, that topology is quite sensitive to common-mode noise and power supply variation due to the nature of a single-ended design. The LILFD is a design with fully differential input/output and achieves much better rejection to commonmode noise and interferences. The ILFD reported in [40] has fully differential input/output with pretty high operating frequency due to the use of passive inductors. However, it can only divide the input frequency by a fixed ratio of two.



Fig. 4.19. Die photo of the LILFD prototype chip

|                              | Input       | Output      | Power<br>(mW) | Supply<br>(V) | Process<br>(µm) | Division<br>Ratio | Center<br>Freq.<br>(GHz) | LROCF |
|------------------------------|-------------|-------------|---------------|---------------|-----------------|-------------------|--------------------------|-------|
| This work                    | Fully diff  | Fully diff  | 3.6           | 1.8           | 0.18            | 8                 | 18                       | 4%    |
|                              |             |             |               |               |                 |                   | 14                       | 13%   |
|                              |             |             |               |               |                 |                   | 11                       | 30%   |
|                              |             |             |               |               |                 |                   | 6.5                      | 47%   |
| [33]                         | Single      | Single      | 0.35          | 1.5           | 0.24            | 8                 | 1                        | 2.5%  |
| [34]                         | Single      | Pseudo-diff | 0.044         | 0.7           | 0.2             | 2                 | 4.3                      | 53%   |
| [35]                         | Pseudo diff | Fully diff  | 3             | 1.5           | 0.13            | 2                 | 50                       | 3%    |
|                              |             |             |               |               |                 |                   | 40                       | 0.2%  |
|                              |             |             |               |               |                 |                   | 15                       | 19%   |
| [38]                         | Pseudo diff | Fully diff  | 10.4-12.5     | 2             | 0.13            | 8                 | 20                       | 0.75% |
|                              |             |             |               |               |                 | 6                 | 15                       | 5.1%  |
|                              |             |             |               |               |                 | 4                 | 10                       | 20%   |
|                              |             |             |               |               |                 | 2                 | 5                        | 40%   |
| [39]                         | Pseudo diff | Fully diff  | 6.8           | 1.8           | 0.18            | 8                 | 14.5                     | 1.4%  |
|                              |             |             |               |               |                 | 6                 | 10.7                     | 9.3%  |
|                              |             |             |               |               |                 | 4                 | 6.8                      | 23.5% |
|                              |             |             |               |               |                 | 2                 | 3.3                      | 60%   |
| [40]                         | Fully diff  | Fully diff  | 12            | 1.8           | 0.13            | 2                 | 36                       | 95%   |
| CML Static FD<br>(Simulated) | Fully diff  | Fully diff  | 5.3           | 1.8           | 0.18            | 8                 | 12                       | N/A   |

Table 4.1. Performance comparison between the LILFD and existing solutions

### IV.5. Conclusion

A fully-differential divide-by-8 ILFD based on latches is described. It operates from 3 GHz to 18 GHz under different biasing conditions. The LROCR of the proposed topology is

around 50% when operating at low and moderate frequencies. It has higher operating frequencies and lower power compared with CML static frequency dividers and larger locking range than existing ILFD structures. Its locking range stays symmetric around the free-running frequency under different input power levels, which makes easier to design it to operate in a pre-defined frequency band. The LILFD is the first reported high-division-ratio ILFD with fully differential input and output. The structure can be easily modified to implement other even number division ratios. It can be used as standalone frequency dividers or high-frequency prescalers in PLLs in practical applications.

#### CHAPTER V

## DESIGN AND ANALYSIS OF HIGH-SPEED GLITCH-FREE FULLY DIFFERENTIAL CHARGE PUMP WITH MINIMUM CURRENT MISMATCH AND VARIATION

#### V.1. Introduction

Phase-locked loop (PLL) is widely used in modern communication systems. PLL based on charge pump is preferred over other types because it has a wide capture range and no systematic phase offset. In practice, non-idealities of the charge pump degrade the performance of the entire loop. The mismatch between the charging and discharging current introduces steady-state phase offset and increases reference spurs in a PLL. The variation of the output current amplitude of the charge pump due to the change of the output voltage will result in variation of the loop bandwidth. Glitches in the output current will increase the level of reference spurs in frequency synthesizers. It will also increase the level of jitter generation in clock and data recovery (CDR) systems, which are widely used in multi-gigahertz serial data links.

Several single-ended charge pump structures have been proposed in the literature [41]-[46]. A single-ended charge pump with positive feedback was proposed in [42] to boost the operational frequency of the charge pump. An obvious disadvantage of that technique is that the positive feedback will result in an undesirable hysteresis effect which swallows narrow input pulses. A technique was proposed in [43] to eliminate the high-frequency glitches, which is done at the price of decreasing the operational frequency of the charge

pump. The charge pump proposed in [44] uses wide-swing current mirrors which still suffer from heavy mismatch when the output voltage comes close to the rails. The charge pump proposed in [45] uses source-switching but it is slow to turn off the output current. In highperformance applications with stringent noise suppression requirements, a fully differential charge pump is preferred over a single-ended charge pump because of the immunity to common mode noise and power supply variation [41]. Some recent works [47]-[49] proposed differential charge pump structures which have only common mode feedback but do not suppress the differential mismatch errors.

A novel fully differential charge pump for applications in high-speed high-performance PLLs is proposed in this paper. Section II covers the charge pump design with mismatch and variation suppression. Section III discusses the techniques to suppress the transient glitches. Section IV shows the complete schematic of the charge pump with system-level performance verification. Section V draws conclusions from this work.

# V.2. Fully Differential Charge Pump with Accurate Matching and Minimum Current Variation

#### V.2.1 Differential Charge Pump with Mismatch Suppression

Fully differential charge pumps are preferred in high performance PLLs with stringent requirements on noise suppression [41]. The conceptual diagram of a fully differential charge pump is shown in Fig. 5.1. Several differential charge pump structures with proper

common mode feedback (CMFB) have been reported [47]-[49]. The CMFB, however, cannot eliminate the differential error caused by the mismatch between charging and discharging current when the differential output voltage is not zero. To illustrate this, let us define the output voltages as,

$$V_{OUT+} = V_{CM} + \Delta V; \quad V_{OUT-} = V_{CM} - \Delta V \tag{86}$$

where  $V_{CM}$  is the desired common mode voltage and  $\Delta V>0$ . We also assume that those voltages are the exact values required by the VCO to operate at the desired frequency. However, due to the channel length modulation effect, the charging current will be smaller than the discharging current on the positive output terminal while the charging current will be larger than the discharging current on the other side. We can assume for simplicity that,

$$I_{C+} = I_{D-} = I_0 - \Delta I; \quad I_{C-} = I_{D+} = I_0 + \Delta I$$
(87)

where  $\Delta I$ >0 and  $I_0$  is the current when the output voltage is equal to  $V_{CM}$ . Let's consider the case of a classic phase frequency detector [50]. The UP and DN pulses have the same width when the input phase difference is zero. Thus, we can define the overall differential output current as,

$$I_{diff} = (I_{C+} - I_{D+}) - (I_{C-} - I_{D-}) = -4\Delta I$$
(88)

Instead of staying at the desired voltages, the positive output voltage will decrease while the negative output voltage will increase, due to the non-zero differential current. This error cannot be corrected by the common mode feedback circuit since the two output voltages are symmetric around the common mode level. Thus, the PLL has to settle to a non-zero phase error. Also, the UP and DN pulses will have different width, which increases the level of reference spurs.



Fig. 5.1. Conceptual diagram of a differential charge pump



Fig. 5.2. Proposed fully differential charge pump with mismatch suppression

To overcome this drawback, we propose a differential charge pump with excellent mismatch suppression, which is shown in Fig. 5.2. The charging and discharging current are turned on when UP and DN are high, respectively. The mismatch suppression technique is derived from the one proposed in [51]. The terminals I<sub>CMFB+</sub> and I<sub>CMFB-</sub> are reserved for injection of CMFB current. Two opamps are used to ensure that  $V_{R^+} \cong V_{\text{out+}}$  and  $V_{R^-} \cong V_{\text{out-}}$ . VH and VL are the logic low level and logic high level of the differential input signal. When the charge pump is providing discharging current, the discharging current flowing through M1 will be equal to the current flowing through M10 because the transistor pairs (M1, M5) and (M10, M6) are matched. On the other hand, when the charge pump is providing charging current, the current flowing through M3 will be equal to the current flowing through M9 since the transistor pairs (M3, M7) and (M9, M8) are matched. Thus, the amplifiers force the charging current to closely follow the discharging current. A simplified version of the rail-to-rail opamp proposed in [52] with 54dB DC gain is used to implement the amplifiers. A large capacitor must be added at the gate of M7/M8 to properly compensate the feedback loop.

The CMFB circuit is shown in Fig. 5.3. It amplifies the common mode error signal and converts it into two output currents. Source degeneration is used at the input stage to maximize the linear input swing so that the CMFB circuit can work properly over a large swing. The output currents are injected into the nodes  $I_{CMFB\pm}$  in the charge pump shown in Fig. 5.2 without interfering with the operation of the mismatch suppression circuit previously discussed.



Fig. 5.3. CMFB circuit for the differential charge pump



Fig. 5.4. Output currents with and without mismatch suppression

The differential charge pump is designed at transistor level in TSMC 0.35  $\mu$ m CMOS technology with 3.3 V power supply. The differential charge pump is simulated to verify

the effectiveness of the proposed techniques. Fig. 5.4 shows the output currents versus the output voltage with and without mismatch suppression. Without mismatch suppression, the charging current and discharging current are close to each other only when the output voltage is near the common mode voltage (1.65 V). When the output voltage goes farther away from the common mode level, the difference between the charging and discharging current becomes larger. If the desired output swing is  $\pm 1$  V around 1.65 V, the current mismatch can be as high as 15%, which will cause unacceptable phase offset in many applications. After the introduction of mismatch suppression circuit, the charging current and discharging current match very well for a large swing from 0.1V to 3 V.

V.2.2 Suppression of Output Current Variation



Fig. 5.5. Variation suppression circuit

It's evident in Fig. 5.4 (b) that both output currents decrease when the output voltage goes towards zero. At 0.3V output voltage, the current amplitude decreases by 30% from the nominal value at the common mode level. Unfortunately, the variation of charge pump

output current will result in variation of the PLL loop bandwidth. Such a big variation may bring the PLL from a stable region to an unstable region.



Fig. 5.6. Charge pump output current with and without variation suppression

To suppress the current variation dependent on the output voltage, we propose the variation suppression circuit shown in Fig. 5.5 to dynamically adjust the bias voltages V<sub>BN±</sub> (also marked in Fig. 5.2) and hence the charge pump bias current. When the output voltage is higher than the common mode level, M1-M2 from the compensation circuit stay off and have no effect on the tail current source bias voltages (V<sub>BN±</sub>). When the output voltage goes low enough to push the NMOS output transistor into triode region, M1-M2 from the compensation circuit starts to conduct and injects current into M3. That results in an increase of the bias current for the charge pump as an effective compensation. As a rule of thumb, M2 can be designed to conduct when the output transistor starts to enter triode region, i.e.,  $V_{OUT}=2V_{dsat,NMOS}$ . DC sweep simulation can be done to achieve optimum compensation in actual design.

Fig. 5.6 shows the discharging output current of the charge pump with and without the variation suppression circuit. It can be seen that the variation suppression technique extends significantly the range of the output voltage for a given variation tolerance. The output current variation is controlled within 3% when the output voltage is higher than 0.2 V.

#### V.3. Glitch Suppression

For an ideal charge pump, if a square wave control signal with a particular rising time and falling time is applied, the output current should be a square wave without any glitches. However, in the actual implementation of a differential charge pump, the output current pulse has glitches whose magnitude increases with the speed of the input signal. The current glitches are generated mainly via two mechanisms discussed in the following subsections.

#### V.3.1 Low-Speed Glitch

The first type of glitch is caused by the speed limitation of the common source node of the differential pairs. Let's consider the NMOS differential pair in the charge pump in Fig. 5.2 with very slow input pulse. When the input is balanced, the common node voltage  $V_S$  is equal to  $V_{S1}=V_{CM}-V_{TH}-V_{dsat,M1}$ . When the differential pair is fully switched to one side,  $V_S$  is equal to  $V_{S2}=V_{H}-V_{TH}-\sqrt{2} V_{dsat,M1}$ , which is smaller than the value when the input is balanced under the condition that the input signal swing is much larger than  $V_{dsat,M1}$ . Thus, with slow input pulse,  $V_S$  goes down to  $V_{S1}$  when the input is balanced and goes back to

 $V_{S2}$  when the input is fully switched to the other side. However, when the input signal is very fast,  $V_S$  is not able to settle to the value of  $V_{S2}$  as soon as the input finishes switching, due to heavy parasitics at the common source node. Thus, there is a temporary overshoot of  $V_{gs}$  for the transistor being turned on, which leads to overshoot of the output current. Fig. 5.7 shows the transient waveforms of the NMOS differential pair in the charge pump during the switching. This overshoot current is referred to as low-speed glitch in this work.



Fig. 5.7. Transient waveforms of the NMOS diff. pair with fast input signal

The circuit shown in Fig. 5.8 is proposed to minimize the low-speed glitch. Two relatively large capacitors are added at the common source nodes of the differential pairs. They are used to minimize the voltage variation on the common source nodes during the transition of the input signal by pulling the common source node down to a much lower speed compared with the input signal. Also, instead of using a fixed bias for  $V_D$  as shown in Fig. 5.2, an

amplifier in unity-gain feedback configuration is added to ensure that  $V_{out}$  and  $V_D$  have very close voltages [53]. As a result, the common source node will have the same voltage before and after the switching. The amplifier used here has the same structure as the one used for the mismatch suppression.



Fig. 5.8. Proposed low-speed glitch suppression circuit (enclosed in ellipses)

The charge pump was simulated with and without low-speed glitch suppression. The output currents are shown in Fig. 5.9. The input signal has a pulse width of 2.5 ns with a transition time of 0.1 ns. Without the glitch suppression circuit, the voltage at the common source node of the NMOS differential pair experiences a slow variation with a peak around 60mV, which causes large and wide glitches in the output current. After the introduction of the low-speed glitch suppression circuit, the variation of the common source node voltage is much smaller (about 1 mV). As a result, the low-speed glitch on the output current is almost completely eliminated. On the other hand, we can see that there still remain fast and

sharp glitches in the output current even with the low-speed glitch suppression circuit. That is called high-speed glitch which will be discussed in the next subsection.



Fig. 5.9. Common source node voltage (NMOS diff. pair) and output current of the charge pump with and without low-speed glitch suppression circuit

#### V.3.2 High-Speed Glitch

The high-speed glitch is generated by charging or discharging the gate-to-drain capacitance  $(C_{gd})$  of the output transistors, which directly injects current into the output node. Let's assume the input voltage has a transition time of  $\Delta T$  to switch from  $V_L$  to  $V_H$ . The generated glitch current is expressed below,

$$I_{glitch} = C_{gd} (V_H - V_L) / \Delta T = C_{gd} K$$
(89)

where K represents the slew rate of the input voltage during transition. The glitch magnitude is proportional to the input voltage slew rate and the gate-to-drain capacitance. The amplitude of the high-speed glitch can be larger than the output current itself when the input signal is switching extremely fast. This kind of glitch is very narrow and has approximately the same width as the input transition time. If somehow the output transistor goes into deep triode region (e.g., the NMOS output transistor will go into triode region when the output voltage is very low),  $C_{gd}$  will be close to half the MOS gate capacitance, i.e.,

$$C_{gd} = C_{gg} = C_{gg}/2 = WLC_{ox}/2$$
(90)

When this happens, the gate-to-drain capacitance will be several times larger and so is the induced glitch current. To minimize the glitch, it's always desirable to keep the output transistors in saturation region. In addition, it maximizes the switching speed of the charge pump if the output transistors work in saturation region instead of triode region.

The circuit shown in Fig. 5.10 is proposed to suppress the high-speed glitches. The source terminals of M1' and M3' are left floating to avoid extra DC current. The transistors M1' and M3' match the size of the transistors M1 and M3. When both M1 and M1' stay in the saturation region, they have the same gate-to-drain overlap capacitance. Thus, the glitches on the discharging current induced by the switching of DN+ and DN- cancel each other. The same thing happens for the glitches produced on the charging current provided by PMOS devices.



Fig. 5.10. Proposed high-speed glitch suppression circuit (enclosed in ellipses)



Fig. 5.11. Output current with and without suppression of high-speed glitch

The output current glitches are simulated for the charge pump with and without the highspeed glitch suppression circuit (the low-speed glitch suppression technique is applied in both cases). The output current waveform is shown in Fig. 5.11. The input signal has 50ps pulse width with 5 ps transition time. The output current has extremely large glitches due to the fast switching of the input signal. With such a high speed input signal, the desired output current level (about 30  $\mu$ A) is totally drowned by the glitches (about 150  $\mu$ A). After adding the proposed circuit, the high-speed glitches are almost completely eliminated from the output current since the output transistors and the dummy transistors are matched. It was verified by simulations that the charge pump can have glitch-free operation for 10ps input pulse width and 1ps transition time with pure large capacitance as load. In practical implementation, however, this performance will be limited by the resistance in the loop filter and any other parasitic resistance like routing resistance and gate resistance.

It should be pointed out that the high-speed glitches generated by the output transistor and dummy transistor fully cancel each other only when both of them stay in the saturation region. The output voltage ranges for the NMOS and PMOS transistors to stay saturated are given below, respectively,

$$0 < V_{R-PMOS} < V_{CM} - V_A / 2 + |V_{TH-PMOS}| V_{CM} + V_A / 2 - V_{TH-NMOS} < V_{R-NMOS} < V_{dd}$$
(91)

where  $V_A$  is the swing of the input signal. The range for the glitches generated by NMOS and PMOS transistors to be fully cancelled is the cross-set of the two ranges, which is given below,

$$V_{CM} + V_A / 2 - V_{TH-NMOS} < V_R < V_{CM} - V_A / 2 + |V_{TH-PMOS}|$$
(92)

The length of this range is given by,

$$V_{R-L} = V_{\max} - V_{\min} = |V_{TH-PMOS}| + V_{TH-NMOS} - V_A$$
(93)

The simulated range of full cancellation in this design is from 1.2 V to 2 V with  $V_A=0.6$  V. That's very close to the result estimated by (92).

#### V.4. Complete Implementation of the Charge Pump



Fig. 5.12. Complete schematic of the proposed fully differential charge pump

The complete schematic of the fully differential charge pump employing all the techniques discussed above is shown in Fig. 5.12. The overall power consumption is around 1mW with 3.3 V supply voltage. Simulation results indicate that these techniques can be combined to achieve optimum performance without interfering with each other.

A 10 GHz PLL with 312.5 MHz reference (modeled in Cadence VerilogAMS) using the proposed charge pump was simulated to verify system-level performance improvement. After introducing the proposed techniques, the reference spur is reduced from -65 dB to -74

dB for small differential output voltage of 0.4 V. Also, the spur decreases from -39 dB to -58 dB for large differential output voltage of 2 V.

#### V.5. Conclusion

A glitch-free fully differential charge pump with excellent suppression of output current mismatch and variation is introduced in this work. Techniques are proposed to eliminate the low-speed glitches caused by the speed limitation of the common source nodes and the high-speed glitches induced by capacitive coupling. Especially, the technique to suppress the high-speed glitch enables the charge pump to have glitch-free operation with very narrow input pulses. Mismatch suppression circuit is incorporated into the proposed charge pump so that the output currents have very good matching over a large output swing. Variation suppression circuit is employed to effectively minimize the variation of the output current amplitude with the change of the output voltage, which results in more stable loop bandwidth of the PLL.

Detailed analysis and simulation results indicate that the proposed fully differential charge pump is very suitable to be used in high-performance phase-locked loops and CDR's working at the frequency of multi-gigahertz or even higher.

#### CHAPTER VI

#### SUMMARY AND CONCLUSIONS

Different topics and projects around the application background of serial data link transceivers are presented in this dissertation. As a matter of fact, the projects presented here are mainly for use on the receiver side. In a serial data link transceiver, the receiver is much more critical to design than the transmitter. That's why it becomes the major focus in this work. In chapter II, the steady-state behavior of BPLLs is accurately described; the jitter performance properties of BPLLs are characterized; the special effect of jitter due to inter-symbol interference is analyzed. The conclusion derived in chapter II serves as solid theoretical foundation for the transistor-level design of the SONET CDR presented in chapter III. In chapter III, the design and analysis of a 10 Gb/s CDR for SONET OC-192 is presented; it uses dual-loop referenceless half-rate architecture, including a binary phasetracking loop and a linear frequency-tracking loop; the adoption of the quad-level PD and linear FD enables the CDR to achieve a large locking range and small jitter generation at the same time. Full-chip post-layout simulation result shows that the prototype chip exceeds the performance required by SONET OC-192 with significant improvement over the existing solutions. In chapter IV, the design of a low-power fully-differential injectionlocked frequency divider with a division ratio of eight is presented. The frequency divider can handle maximum input frequency of 18GHz with around 3.6 mW power under 1.8 V supply. It can be used in the clock-generator PLL in serial data link transceivers to reduce the power budget of the entire system. In chapter V, the design and optimization techniques

of high-speed fully differential charge pumps are analyzed. The topics covered include mismatch suppression, variation suppression and glitch suppression. These techniques can be properly adopted and combined to optimize the charge pump and minimize the nonidealities of the entire PLL, such as phase offset, jitter and spurs.

A few issues and problems in the area of serial data link transceivers are pointed out here for potential future research.

- (1). In the theoretical analysis of BPLL, the peaking condition derived in chapter II still depends on the initial conditions. If the proper initial conditions are not applied, it's still possible for the BPLL to have no peaking even when the capacitor is smaller than the critical value for zero peaking. The dependence of peaking condition on the initial conditions still asks for further investigation.
- (2). In the analysis of input-output jitter transfer properties, only sinusoidal jitter and ISI jitter (simply approximated as binary jitter) have been analyzed. The relationship between the distribution of input jitter and output jitter for other types of jitter (Gaussian distribution, bounded uniform distribution and ISI jitter distribution induced by more complex channels) are still worthy of further research.
- (3). With the increase of data rates, a fixed or adaptive equalizer will have to be used to compensate the channel loss in the front end of the receiver. The topic of equalizers is

not discussed in this work. Equalizers will become a critical block in the next generation of serial data link transceivers.

- (4). The CDR design presented in this work uses an analog loop filter with a very large capacitor (35 nF). It is so large that it has been placed externally off chip. It may be able to be replaced with a digital loop filter and implemented on-chip to further increase the level of integration.
- (5). Injection-locked frequency divider is an area highly worthy of further research. The potential research directions include: how to minimize the drop of locking range when the operating frequency approaches the higher limit; how to maximize the ratio of the locking range over the center frequency; how to minimize the input sensitivity.

#### REFERENCES

- [1]. Wikipedia.com, "Serial ATA", Available: http://en.wikipedia.org/wiki/Serial\_ATA.
- [2]. SCSI Trade Association, "Serial Attached SCSI Definition", Available: http://www.scsita.org/aboutscsi/sas/definition.html.
- [3]. Wikipedia.com, "Synchronous Optical Networking from Wikipedia, the Free Encyclopedia", Available:

http://en.wikipedia.org/wiki/Synchronous\_optical\_networking.

- [4]. PCI-SIG.com, "PCI Express Specifications", Available: http://www.pcisig.com/specifications.
- [5]. HyperTransport Consortium, "HyperTransport Technology Specifications", Available: http://www.hypertransport.org/tech/tech\_specs.cfm.
- [6]. RapidIO Trade Association, "RapidIO Specifications", Available: http://www.rapidio.org/specs/current.
- Y. M. Greshishchev, P. Schvan, J. L. Showell, M.-L. Xu, J. J. Ojha, et al., "A Fully Integrated SiGe Receiver IC for 10-Gb/s Data Rate", *IEEE J. of Solid State Circuits*, vol. 35, pp. 1949-1957, Dec. 2000.
- [8]. N. Da Dalt, E. Thaller, P. Gregorius, L. Gazsi, "A Compact Triple-Band Low-Jitter Digital LC PLL With Programmable Coil in 130-nm CMOS", *IEEE J. of Solid State Circuits*, vol. 40, pp. 1482-1490, Jul. 2005.

- [9]. J. Savoj, B. Razavi, "A 10-Gb/s CMOS Clock and Data Recovery Circuit with a Half-Rate Binary Phase/Frequency Detector", *IEEE J. of Solid State Circuits*, vol. 38, pp. 13-21, Jan. 2003.
- [10]. M Aoyama, K., Ogasawara, M. Sugawara, T. Ishibashi, S. Shimoyama, et.al., "3
   Gbps, 5000 ppm Spread Spectrum SerDes PHY with Frequency Tracking Phase
   Interpolator for Serial ATA", in *VLSI Circuits Symposium*, pp. 107-110, June 2003.
- [11]. R. Walker, "Designing Bang-Bang PLLs for Clock and Data Recovery in Serial Data Transmission Systems" in *Phase-Locking in High Performance Systems*, Piscataway, NJ: IEEE Press, 2003, Razavi Ed.
- [12]. J. Lee, K.S. Kundert, B. Razavi, "Analysis and Modeling of Bang-Bang Clock and Data Recovery Circuits", *IEEE J. of Solid State Circuits*, vol. 39, pp. 1571-1580, Sep. 2004.
- [13]. N. Da Dalt, "A Design-Oriented Study of the Nonlinear Dynamics of Digital Bang-Bang PLLs", *IEEE Trans. Circuits Syst. I, Fundam. Theory Appl.* Vol. 52, pp 21-31. Jan. 2005.
- [14]. J. D. H. Alexander, "Clock Recovery from Random Binary Signals," *IEEE Electron*. *Lett.*, vol. 11, pp. 541, 1975.
- [15]. SONET OC-192, "Transport System Generic Criteria," *Bellcore, GR-1377-CORE*, no. 4, Mar. 1998.
- [16]. Wright, E. M. "Solution of the Equation z<sup>z</sup>=a", *Bull. Amer. Math. Soc.* vol. 65, pp. 89-93, 1959.

- [17]. B. Razavi, "Phase-locked Loops" in *Design of Analog CMOS Integrated Circuits*, New York: McGraw-Hill, pp. 532-576, 2001
- [18]. C.R. Hogge, "A Self-Correcting Clock Recovery Circuit", *IEEE J. Lightwave Tech.*, vol. 3, pp. 1312-1314, Dec. 1985.
- [19]. J. D. H. Alexander, "Clock Recovery from Random Binary Signals", *IEEE Electron*. *Lett.*, vol. 11, pp. 541, 1975.
- [20]. Y. M. Greshishchev, P. Schvan, "SiGe Clock and Data Recovery IC with Linear-Type PLL for 10-Gb/s SONET Application", *IEEE J. of Solid State Circuits*, vol. 35, pp.1353-1359, Sep. 2000.
- [21]. Y. M. Greshishchev, P. Schvan, J. L. Showell, M.-L. Xu, J. J. Ojha, et al., "A Fully Integrated SiGe Receiver IC for 10-Gb/s Data Rate", *IEEE J. of Solid State Circuits*, vol. 35, pp. 1949-1957, Dec. 2000.
- [22]. J. Cao, M. Green, A. Momtaz, K. Vakilian, D. Chung, et al., "OC-192 Transmitter and Receiver in Standard 0.18µm CMOS", *IEEE J. of Solid State Circuits*, vol. 37, pp. 1768-1780, Dec. 2002
- [23]. L. Henrickson, D. Shen, "Low-Power Fully Integrated 10-Gb/s SONET/SDH Transceiver in 0.13-μm CMOS", *IEEE J. of Solid State Circuits*, vol. 38, pp. 1595-1601, Oct. 2003.
- [24]. H. S. Muthali, T. P. Thomas, "A CMOS 10-Gb/s SONET Transceiver", IEEE J. of Solid State Circuits, vol. 39, pp. 1026-1033, Jul. 2004.

- [25]. J. Savoj and B. Razavi, "A 10-Gb/s CMOS Clock and Data Recovery Circuit with a Half-Rate Binary Phase/Frequency Detector", *IEEE J. of Solid State Circuits*, vol. 38, issue 1, pp. 13-21, Jan. 2003
- [26]. Richman, "Color-Carrier Reference Phase Synchronization Acccuracy in NTSC Color Television" in *Proc. IRE*, vol. 42, Jan. 1954, pp. 106-133.
- [27]. J. A. Bellisio, "A New Phase-locked Loop Timing Recovery Method for Digital Regenerators", *IEEE Intl. Conf. Commun. Rec.*, vol. 1, pp. 10-17, Jun. 1976.
- [28]. R. Cordell, J. Forney, W. Dunn, W. Garrett, "A 50MHz Phase and Frequencylocked Loop", *IEEE J. of Solid State Circuits*, vol. SC-14, no. 6, pp. 1003-1010, Dec. 1979.
- [29]. B. Razavi, "A 2.5-Gb/s 15-mW Clock Recovery circuit", IEEE J. of Solid State Circuits, vol. 31, issue 4, pp 472-480, , April 1996.
- [30]. A. Pottbacker, U. Langmann, H.-U. Schreiber, "A Si Bipolar Phase and Frequency Detector IC for Clock Extraction up to 8 Gb/s", *IEEE J. of Solid State Circuits*, vol. 27, issue 12, pp. 1747-1751, Dec 1992.
- [31]. D. Johns and K. Martin, "Data Converter Fundamentals" in *Analog Integrated Circuit Design*, New York: John Wiley & Sons, pp. 456, 1996.
- [32]. J. S. Lee, S. I. Lim, S. Kim, "Charge Pump with Perfect Current Matching Characteristics in Phase-locked Loops", *Electronic Letters*, vol. 36, issue 23, pp. 1907-1908, Nov 2000.

- [33]. R. J. Betancourt-Zamora, S. Verma, T. H. Lee, "1-GHz and 2.8-GHz CMOS Injection-locked Ring Oscillator Prescalers", in *VLSI Circuits Symposium*, pp. 47-50, Jun. 2001.
- [34]. K. Yamamoto, M. Fujishima, "A 44-μW 4.3-GHz Injection-locked Frequency Divider with 2.3-GHz Locking Range", *IEEE J. of Solid-State Circuits*, vol. 40, pp. 671-677, Mar. 2005.
- [35]. M. Tiebout, "A CMOS Direct Injection-locked Oscillator Topology as Highfrequency Low-power Frequency Divider", *IEEE J. of Solid-State Circuits*, vol. 39, pp.1170–1174, Jul. 2004.
- [36]. B. Razavi, "Phase-locked Loops" in *Design of Analog CMOS Integrated Circuits*, New York: McGraw-Hill, pp. 540-541, 2001
- [37]. S. Verma, H. Rategh, and T. Lee, "A Unified Model for Injection-locked Frequency Dividers", *IEEE J. of Solid-State Circuits*, vol. 38, pp. 1015–1027, Jun. 2003.
- [38]. F.H. Huang, "20 Ghz CMOS Injection-locked Frequency Divider with Variable Division Ratio", in *Radio Frequency integrated Circuits Symposium*, pp. 469-472, Jul. 2005.
- [39]. M. Acar, D. Leenaerts, B. Nauta, "A Wide-band CMOS Injection-locked Frequency Divider", in *Radio Frequency Integrated Circuits Symposium*, pp. 211-214, Jun. 2004.
- [40]. U. Singh, M. M. Green, "High-Frequency CML Clock Dividers in 0.13µm CMOS Operating Up to 38 GHz", *IEEE J. of Solid-State Circuits*, vol. 40, pp. 1658-1661, Aug. 2005.

- [41]. W. Rhee, "Design of High-performance CMOS Charge Pumps in Phase-locked Loops", in *International Symposium on Circuits and Systems*, vol. 2, pp. 545-548, May-Jun., 1999.
- [42]. E. Juarez-Hernandez, A. Diaz-Sanchez, "A Novel CMOS Charge-pump Circuit with Positive Feedback for PLL Applications", in *International Conference on Electronics, Circuits and Systems*, vol. 1, pp. 349-352, Sep. 2001.
- [43]. B. Bahreyni,, I. M. Filanovsky, C. Shafai, "A Novel Design for Deadzone-less Fast Charge Pump with Low Harmonic Content at the Output", in *Midwest Symposium* on Circuits and Systems, vol. 3, Aug. 2002.
- [44]. R. C. H. Beek, C.S. Vaucher, D.M.W. Leenaerts, E.A.M. Klumperink, B. Nauta, "A
   2.5-10-GHz Clock Multiplier Unit with 0.22-Ps RMS Jitter in Standard 0.18μm
   CMOS", *IEEE J. of Solid State Circuits*, vol. 39, pp. 1862-1872, Nov. 2004.
- [45]. J. F. Parker, D. Weinlader, J. L. Sonntag, "A 15mW 3.125GHz PLL for Serial Backplane Transceivers in 0.13μm CMOS", in *International Solid-State Circuits Conference*, pp. 412-413, 2005.
- [46]. D. M.W. Leenaerts, J. van der Tang, C. S. Vaucher, "Frequency Synthesizers" in *Circuit Design for RF Transceivers*, Boston: Kluwer Academic Publishers, pp. 300-304, Oct. 2001.
- [47]. T. S. Cheung, B. C. Lee, "A 1.8~3.2-GHz Fully Differential GaAs MESFET PLL", IEEE J. of Solid State Circuits, vol. 36, pp. 605-601, Apr. 2001.

- [48]. N. D. Dalt, C. Sandner, "A Subpicosecond Jitter PLL for Clock Generation in 0.12µm Digital CMOS", *IEEE J. of Solid State Circuits*, vol. 38, pp. 1275-1278., Jul. 2003.
- [49]. B. Terlemez, J. P. Uyemura, "The Design of a Differential CMOS Charge Pump for High Performance Phase-locked Loops", in *International Symposium on Circuits* and Systems, vol. 4, pp. IV-561-4, May 2004.
- [50]. B. Razavi, "Phase-locked Loops" in *Design of Analog CMOS Integrated Circuits*, New York: McGraw-Hill, pp. 550-556, 2001
- [51]. J.-S. Lee, M.-S. Keel, S.-I. Lim, S. Kim, "Charge Pump with Perfect Current Matching Characteristics in Phase-locked Loops", *Electronics Letters*, vol. 36, pp. 1907-1908, Nov. 2000.
- [52]. J.N. Babanezhad, "A Rail-to-Rail CMOS Op Amp", IEEE J. of Solid State Circuits, vol. 23, pp. 1414-1417, Dec. 1988.
- [53]. I.A.Young, J.K.Greason, K. L. Wong, "A PLL Clock Generator with 5 to 10 MHz of Lock Range for Microprocessors", *IEEE J. of Solid State Circuits*, vol. 27, pp. 1599-1607, Nov. 1992.

VITA

Shanfeng Cheng was born in Wuyuan County, Jiangxi Province, People's Republic of China in Dec 1977. He received his B.Sc. degree in the major of electrical engineering from Fudan University, Shanghai, China, in July 1998. He joined ASIC State Key Lab in Fudan University to study integrated circuit design as a master student in September, 1998. He received his M.Sc. degree from the Department of Electrical Engineering, Fudan University in July 2001. In August 2001, he joined the Analog & Mixed-signal Center, Department of Electrical Engineering, Texas A&M University, College Station, Texas to pursue his Ph.D. degree. He worked on the debugging and characterization of high-speed serial data link transceivers as an Engineering Intern with Broadcom Corporation, Irvine, California, USA from July 2005 to Dec 2005. He received his Ph.D. degree from the Department of Electrical Engineering, Texas A&M University in December, 2006. His main research interest focuses on the design of high-speed phase-locked loops, clock and data recovery and serial data link transceivers.