ANALOG BASEBAND FILTERS AND MIXED SIGNAL CIRCUITS FOR
BROADBAND RECEIVER SYSTEMS

A Dissertation

by

RAGHAVENDRA LAXMAN KULKARNI

Submitted to the Office of Graduate Studies of
Texas A&M University
in partial fulfillment of the requirements for the degree of

DOCTOR OF PHILOSOPHY

December 2011

Major Subject: Electrical Engineering
ANALOG BASEBAND FILTERS AND MIXED SIGNAL CIRCUITS FOR BROADBAND RECEIVER SYSTEMS

A Dissertation

by

RAGHAVENDRA LAXMAN KULKARNI

Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements for the degree of

DOCTOR OF PHILOSOPHY

Approved by:

Chair of Committee, José Silva-Martinez
Committee Members, Edgar Sánchez-Sinencio
Shankar P. Bhattacharyya
Alexander G. Parlos
Head of Department, Costas N. Georgiades

December 2011

Major Subject: Electrical Engineering
ABSTRACT

Analog Baseband Filters and Mixed Signal Circuits for Broadband Receiver Systems. (December 2011 )
Raghavendra Laxman Kulkarni, B.E., University of Mysore; M.Tech., IIT Delhi
Chair of Advisory Committee: Dr. José Silva-Martinez

Data transfer rates of communication systems continue to rise fueled by aggressive demand for voice, video and Internet data. Device scaling enabled by modern lithography has paved way for System-on-Chip solutions integrating compute intensive digital signal processing. This trend coupled with demand for low power, battery-operated consumer devices offers extensive research opportunities in analog and mixed-signal designs that enable modern communication systems.

The first part of the research deals with broadband wireless receivers. With an objective to gain insight, we quantify the impact of undesired out-band blockers on analog baseband in a broadband radio. We present a systematic evaluation of the dynamic range requirements at the baseband and A/D conversion boundary. A prototype UHF receiver designed using RFCMOS 0.18$\mu$m technology to support this research integrates a hybrid continuous- and discrete-time analog baseband along with the RF front-end. The chip consumes 120mW from a 1.8V/2.5V dual supply and achieves a noise figure of 7.9dB, an IIP3 of -8dBm (+2dbm) at maximum gain (at 9dB RF attenuation).

High linearity active RC filters are indispensable in wireless radios. A novel feed-forward OTA applicable to active RC filters in analog baseband is presented. Simulation results from the chip prototype designed in RFCMOS 0.18$\mu$m technology show an improvement in the out-band linearity performance that translates to increased dynamic range in the presence of strong adjacent blockers.
The second part of the research presents an adaptive clock-recovery system suitable for high-speed wireline transceivers. The main objective is to improve the jitter-tracking and jitter-filtering trade-off in serial link clock-recovery applications. A digital state-machine that enables the proposed mixed-signal adaptation solution to achieve this objective is presented. The advantages of the proposed mixed-signal solution operating at 10Gb/s are supported by experimental results from the prototype in RFCMOS 0.18μm technology.
DEDICATION

To my dear parents, my wife, my brother, and rest of the universe.
ACKNOWLEDGMENTS

This dissertation would not have been possible without the support of my teachers, colleagues, friends and family members. First, I would like to thank my advisor Dr. José Silva-Martínez for technical input, encouragement, patience and support during my program at Texas A&M University. I would also like to thank Dr. Edgar Sánchez-Sinencio, Dr. Shankar Bhattacharyya and Dr. Alexander Parlos for serving on my committee. I am also grateful to Dr. Jay Porter, Dr. Ben Zoghi, and Prof. George Wright who supported me during the early years of my program.

I thank my teammates Jianhong, Jusung, Hyung-Joon, Lo, and Andreas for helping me to complete my work. They have been very patient and accommodating with me. I would like to thank Hyung-Joon for designing the PCB and performing jitter characterization of our CDR chip at UTD. Marvin, Didem, Jason, Alfredo, Ramy, Hesam, CJ and many other AMSC colleagues have also helped me at several occasions. I thank them all. I would also like to thank Manisha, Chinmaya, Vijay for always being there to offer advice and help. I’m indebted to Chethan, Mahadev, Manohar, Vijay, Srikanth and especially Kapil, for helping me and my wife Soumya at various points in this journey.

I have learnt many life lessons during my stay at Texas A&M. I am grateful to my friends Felix, Erik, and Joselyn for stress release caffeine breaks and numerous technical discussions. I wouldn’t be completing this program, had it not been for immeasurable help from Erik. There are no words to describe my gratitude for what I have learnt from him.

I am fortunate to have unconditional support of a loving family throughout the years. My parents have sacrificed a lot and worked tirelessly through their entire lives. They have always encouraged me to pursue my interests. I also owe a lot to my uncle Dr. Alawandi, who took the trouble to gather information and introduce me to experts which somehow culminated into me opting a career in ECE. I am also
thankful to my parents-in-law for their understanding and support. My wife deserves special credit. She has patiently tolerated me through the Ph.D. rollercoaster and has positively supported me through it. I express my gratitude to her. I’m very grateful to Sanjeev and Aparna for their unconditional support to me and Soumya. I’m also thankful to Raman and Vinay. Just knowing that there is someone who can help me if I ever need it brings immense relief.
TABLE OF CONTENTS

ABSTRACT .................................................. iii
DEDICATION ................................................. v
ACKNOWLEDGMENTS ........................................ vi
TABLE OF CONTENTS ......................................... viii
LIST OF TABLES ............................................... x
LIST OF FIGURES ............................................. xi

1 INTRODUCTION ............................................. 1

2 ANALOG BASEBAND SYSTEM DESIGN FOR DIGITAL COMMUNICATION SYSTEMS .................................................. 5
  2.1 Modern Communication Systems .................................. 5
      2.1.1 Analog Modulation .................................. 6
      2.1.2 Digital Modulation .................................. 8
      2.1.3 Shannon’s Channel Capacity ......................... 9
      2.1.4 Multicarrier Modulation and OFDM Technology ...... 10
      2.1.5 Wireless Technologies for Digital Video Broadcast in UHF Spectrum ........................................... 12
  2.2 Radio Design for UHF Receivers ................................ 14
      2.2.1 System Dynamic Range Requirements .................. 16
  2.3 Interdependence of Baseband Filter and ADC Requirements .... 18
      2.3.1 Residual Undesired Power from Digital Adjacent Channels ... 20
      2.3.2 Residual Undesired Power from Analog Adjacent Channels ... 24

3 DESIGN OF A UHF RECEIVER AND ITS ANALOG BASEBAND .......... 28
  3.1 Receiver Design Specifications ................................ 28
  3.2 Analog Baseband Design ...................................... 30
      3.2.1 Cascaded Hybrid Baseband Architecture ............... 31
      3.2.2 Active RC Implementation ............................... 32
      3.2.3 SC Implementation ...................................... 35
      3.2.4 All-Digital Non Overlap Delay Tuning ................... 37
  3.3 Design of RF Front-End ...................................... 40
<table>
<thead>
<tr>
<th>Section</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>3.3.1 RF Variable Gain Amplifier (RFVGA)</td>
<td>40</td>
</tr>
<tr>
<td>3.3.2 Current-Mode Passive Mixer</td>
<td>42</td>
</tr>
<tr>
<td>3.4 Experimental Results</td>
<td>44</td>
</tr>
<tr>
<td>3.4.1 Baseband Response and Residual DR Measurements</td>
<td>44</td>
</tr>
<tr>
<td>3.4.2 System Performance</td>
<td>49</td>
</tr>
<tr>
<td>4 A LINEAR FEED-FORWARD OTA FOR ACTIVE-RC FILTER DESIGN</td>
<td>53</td>
</tr>
<tr>
<td>4.1 Spurious Free Dynamic Range (SFDR) of Analog Filters in Wireless</td>
<td>54</td>
</tr>
<tr>
<td>Receivers</td>
<td></td>
</tr>
<tr>
<td>4.2 Lossy Integrator Design</td>
<td>57</td>
</tr>
<tr>
<td>4.2.1 Loop Gain of a Lossy Integrator Using a Feed-Forward OTA</td>
<td>59</td>
</tr>
<tr>
<td>4.3 Lossy Integrator Design Using the Proposed OTA</td>
<td>64</td>
</tr>
<tr>
<td>4.3.1 Dominant Sources of Non-Linearity</td>
<td>70</td>
</tr>
<tr>
<td>4.3.2 Common-Mode Feedback (CMFB) Design</td>
<td>85</td>
</tr>
<tr>
<td>4.4 Noise and Distortion Performance Results</td>
<td>88</td>
</tr>
<tr>
<td>4.5 Conclusion</td>
<td>93</td>
</tr>
<tr>
<td>5 ADAPTIVE BANG-BANG CLOCK-DATA-RECOVERY</td>
<td>95</td>
</tr>
<tr>
<td>5.1 High Speed Serial Link system</td>
<td>96</td>
</tr>
<tr>
<td>5.1.1 Introduction</td>
<td>96</td>
</tr>
<tr>
<td>5.1.2 Clock and Data Recovery</td>
<td>97</td>
</tr>
<tr>
<td>5.1.3 Analog PLL Based Clock Recovery System</td>
<td>99</td>
</tr>
<tr>
<td>5.1.4 Types of Timing Non-Idealities</td>
<td>101</td>
</tr>
<tr>
<td>5.1.5 Jitter Handling in Clock Recovery Systems</td>
<td>104</td>
</tr>
<tr>
<td>5.2 Adaptive Clock Recovery Systems</td>
<td>108</td>
</tr>
<tr>
<td>5.3 Adaptive Bang-bang CDR</td>
<td>109</td>
</tr>
<tr>
<td>5.3.1 Predictor Design</td>
<td>111</td>
</tr>
<tr>
<td>5.3.2 Input-Output (I/O) Design for the CDR System</td>
<td>118</td>
</tr>
<tr>
<td>5.4 Experimental Results</td>
<td>128</td>
</tr>
<tr>
<td>5.5 Conclusion</td>
<td>131</td>
</tr>
<tr>
<td>6 CONCLUSION</td>
<td>133</td>
</tr>
<tr>
<td>REFERENCES</td>
<td>135</td>
</tr>
<tr>
<td>VITA</td>
<td>143</td>
</tr>
<tr>
<td>TABLE</td>
<td>Page</td>
</tr>
<tr>
<td>-------</td>
<td>------</td>
</tr>
<tr>
<td>2.1 Modulation schemes used for television broadcast in UHF spectrum in North America [7].</td>
<td>13</td>
</tr>
<tr>
<td>2.2 Parameters used for the analysis.</td>
<td>22</td>
</tr>
<tr>
<td>3.1 Desired block bevel specifications.</td>
<td>30</td>
</tr>
<tr>
<td>3.2 Pole-zero placement for baseband inverse Chebyshev approximation.</td>
<td>31</td>
</tr>
<tr>
<td>3.3 Gain bandwidth and feedback factor for SC biquad OTAs across configurations.</td>
<td>36</td>
</tr>
<tr>
<td>3.4 Measured $Residual DR$ with $(N + 1)$ and $(N + 3)$ blockers with varying $AC_{dB}$.</td>
<td>49</td>
</tr>
<tr>
<td>3.5 Experimental results with comparison to previous work.</td>
<td>52</td>
</tr>
<tr>
<td>4.1 Comparison of the filter performance with published results.</td>
<td>94</td>
</tr>
</tbody>
</table>
## LIST OF FIGURES

<table>
<thead>
<tr>
<th>FIGURE</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>1.1 A multi-featured mobile handheld device.</td>
<td>2</td>
</tr>
<tr>
<td>1.2 Simple block-level description of a modern wireless device.</td>
<td>2</td>
</tr>
<tr>
<td>1.3 Examples of serial link applications.</td>
<td>3</td>
</tr>
<tr>
<td>2.1 Basic communication system.</td>
<td>6</td>
</tr>
<tr>
<td>2.2 Analog TV channel spectrum with picture and audio carriers [7].</td>
<td>7</td>
</tr>
<tr>
<td>2.3 Constellation diagram of a 16-QAM modulation scheme.</td>
<td>9</td>
</tr>
<tr>
<td>2.4 (a) A general multi-carrier modulation scheme (b)OFDM sub-carriers.</td>
<td>11</td>
</tr>
<tr>
<td>2.5 Ultra High Frequency (UHF) spectrum for mobile digital video.</td>
<td>12</td>
</tr>
<tr>
<td>2.6 A typical wireless radio using direct conversion architecture.</td>
<td>14</td>
</tr>
<tr>
<td>2.7 Analog signal processing using cascaded filters and variable gain stages.</td>
<td>15</td>
</tr>
<tr>
<td>2.8 Effect of filtering and AGC on dynamic range requirements from antenna to ADC.</td>
<td>17</td>
</tr>
<tr>
<td>2.9 Comparison of filter approximations for orders 3 to 8 for 4MHz channel bandwidth. For each case, $n^{th}$ order Butterworth response is superimposed with $(n-1)^{th}$ order Inverse Chebyshev response with an additional single pole.</td>
<td>19</td>
</tr>
<tr>
<td>2.10 Baseband input spectrum with digital adjacent channels.</td>
<td>21</td>
</tr>
<tr>
<td>2.11 Output power density and definitions of integrated power in the desired channel, residual N+1 adjacent channel, and residual power in all undesired channels.</td>
<td>21</td>
</tr>
<tr>
<td>2.12 Residual dynamic requirement for digital adjacent channels with Butterworth filter.</td>
<td>23</td>
</tr>
</tbody>
</table>
FIGURE

2.13 Residual dynamic requirement for digital adjacent channels with Inverse Chebyshev filter. .................................................. 23

2.14 Analog adjacent channels with single analog carrier per channel. .......... 24

2.15 Residual dynamic range for the analog adjacent channel. .................. 25

2.16 Required ADC DR based on blocker type and filter order/approximation. 26

2.17 Published ADC power consumption data for 4MHz signal bandwidth. . 27

3.1 Direct-conversion broad-band UHF receiver architecture. ................. 29

3.2 Analog baseband architecture. ........................................... 32

3.3 Active RC multi-feedback-filter and programmable gain amplifier (single-ended structure shown). ........................................ 33

3.4 Variation of integrated noise for a given total capacitance budget with varying resistor ratio. .............................................. 34

3.5 Switched-capacitor biquad implementation (single-ended structure shown). 36

3.6 Conventional two-phase non-overlapping clock generation. ............... 38

3.7 All digital non-overlap time tuning system. ................................ 39

3.8 RFVGA with gain-independent shunt feedback input matching [25]. ....... 41

3.9 Single-to-differential converting transconductor ($G_m$) driving the mixer switches. ...................................................... 43

3.10 Current mode passive mixer terminated at TIA input with DC offset cancellation. ...................................................... 45

3.11 Chip micrograph of the UHF receiver. .................................... 46

3.12 System characterization setup. ............................................. 47
<table>
<thead>
<tr>
<th>FIGURE</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>3.13</td>
<td>Measured (a) continuous-time section (b) continuous- and discrete-time sections together.</td>
</tr>
<tr>
<td>3.14</td>
<td>Measured filtered PSD with combined input of two 64-QAM digital modulated channels (desired and N+1 adjacent) with $AC_{dB} = 30dB$</td>
</tr>
<tr>
<td>3.15</td>
<td>Measured $S_{11}$ performance (For 75Ω reference).</td>
</tr>
<tr>
<td>3.16</td>
<td>Measured linearity performance (a) two-tone measurement results for the system, (b) system IIP3 performance.</td>
</tr>
<tr>
<td>4.1</td>
<td>Analog signal processing using cascaded filters and variable gain stages.</td>
</tr>
<tr>
<td>4.2</td>
<td>Typical desired, blocker and intermodulation signal power for different IIP3 values.</td>
</tr>
<tr>
<td>4.3</td>
<td>Two integrator loop for implementing second-order transfer functions.</td>
</tr>
<tr>
<td>4.4</td>
<td>Lossless integrator and the small signal model of the feed-forward OTA (single-ended structure shown) used for the integrator design.</td>
</tr>
<tr>
<td>4.5</td>
<td>Loop gain for the lossy integrator using the OTA.</td>
</tr>
<tr>
<td>4.6</td>
<td>Loop gain response for the lossy integrator using the feed-forward OTA.</td>
</tr>
<tr>
<td>4.7</td>
<td>Proposed OTA structure.</td>
</tr>
<tr>
<td>4.8</td>
<td>Schematic of the conventional OTA.</td>
</tr>
<tr>
<td>4.9</td>
<td>Schematic of the proposed OTA.</td>
</tr>
<tr>
<td>4.10</td>
<td>Simulated lossy integrator response using the FPP OTA.</td>
</tr>
<tr>
<td>4.11</td>
<td>Simulated differential loop gain response from the conventional OTA.</td>
</tr>
<tr>
<td>4.12</td>
<td>Two out-band tones producing an in-band $IM_3$ tone.</td>
</tr>
<tr>
<td>4.13</td>
<td>OTA in feedback using admittances $Y_1$ and $Y_2$ and driving a load $Y_o$.</td>
</tr>
<tr>
<td>4.14</td>
<td>Simulated transfer functions from filter input to $v_o$, $v_x$ and $v_{x2}$.</td>
</tr>
<tr>
<td>FIGURE</td>
<td>Page</td>
</tr>
<tr>
<td>--------</td>
<td>------</td>
</tr>
<tr>
<td>4.15   Power of $IM_3$ tone at $f_3$ due to two -10dBm out-band tones at $f_1$ and $f_2$ for the FFF design.</td>
<td>81</td>
</tr>
<tr>
<td>4.16   Power $IM_3$ tone at $f_3$ due to two -10dBm out-band tones at $f_1$ and $f_2$ for the FPP design.</td>
<td>81</td>
</tr>
<tr>
<td>4.17   Ratio of the non-linearity contribution from $g_{m2}$ to $g_{m1}$ in both FFF ($g_{m1}$ FD, $g_{m2}$ FD) and FPP ($g_{m1}$ FD, $g_{m2}$ PD) designs.</td>
<td>83</td>
</tr>
<tr>
<td>4.18   Common-mode feedback loops in the integrator.</td>
<td>86</td>
</tr>
<tr>
<td>4.19   Common-mode feedback schematic for the conventional OTA.</td>
<td>87</td>
</tr>
<tr>
<td>4.20   Common-mode feedback for the proposed OTA.</td>
<td>88</td>
</tr>
<tr>
<td>4.21   Step response of the CMFB loop.</td>
<td>89</td>
</tr>
<tr>
<td>4.22   Layout of the two filters.</td>
<td>90</td>
</tr>
<tr>
<td>4.23   Simulated output noise from the two integrators.</td>
<td>91</td>
</tr>
<tr>
<td>4.24   Simulated $IM_3$ tone power and IIP3 performance at the passband edge with $f_1 = 10.1MHz$ and $f_2 = 11.3MHz$.</td>
<td>91</td>
</tr>
<tr>
<td>5.1    A general serial link system using embedded clocking architecture.</td>
<td>96</td>
</tr>
<tr>
<td>5.2    Timing margins at a re-timing DFF using a CDR.</td>
<td>98</td>
</tr>
<tr>
<td>5.3    A simple charge-pump based PLL based CDR architecture.</td>
<td>99</td>
</tr>
<tr>
<td>5.4    Common jitter classification [67,68].</td>
<td>102</td>
</tr>
<tr>
<td>5.5    Magnitude of jitter Fourier spectrum from [67].</td>
<td>103</td>
</tr>
<tr>
<td>5.6    An example of a jitter tolerance mask [69].</td>
<td>107</td>
</tr>
<tr>
<td>5.7    An adaptive clock recovery system.</td>
<td>110</td>
</tr>
<tr>
<td>5.8    Top level architecture of the adaptive clock recovery structure.</td>
<td>111</td>
</tr>
<tr>
<td>FIGURE</td>
<td>Page</td>
</tr>
<tr>
<td>--------</td>
<td>------</td>
</tr>
<tr>
<td>5.9</td>
<td>112</td>
</tr>
<tr>
<td>5.10</td>
<td>114</td>
</tr>
<tr>
<td>5.11</td>
<td>117</td>
</tr>
<tr>
<td>5.12</td>
<td>119</td>
</tr>
<tr>
<td>5.13</td>
<td>120</td>
</tr>
<tr>
<td>5.14</td>
<td>121</td>
</tr>
<tr>
<td>5.15</td>
<td>122</td>
</tr>
<tr>
<td>5.16</td>
<td>123</td>
</tr>
<tr>
<td>5.17</td>
<td>124</td>
</tr>
<tr>
<td>5.18</td>
<td>126</td>
</tr>
<tr>
<td>5.19</td>
<td>127</td>
</tr>
<tr>
<td>5.20</td>
<td>128</td>
</tr>
<tr>
<td>5.21</td>
<td>129</td>
</tr>
<tr>
<td>5.22</td>
<td>131</td>
</tr>
<tr>
<td>5.23</td>
<td>131</td>
</tr>
<tr>
<td>5.24</td>
<td>132</td>
</tr>
<tr>
<td>5.25</td>
<td>132</td>
</tr>
</tbody>
</table>
1. INTRODUCTION

Global semiconductor market has experienced a phenomenal growth in the last decade and continues to grow due to increasing demand for data communication solutions. Both wired and wireless data communication market segments continue to be key drivers for research, innovation and development of new technologies. Gordon Moore’s remarkable prediction of doubling of number of transistor per integrated circuit approximately every 2 years continues to hold as smallest feature sizes are scaled aggressively. This has led to design of multi-million transistor system-on-chip (SoC) solutions with remarkable digital signal processing and memory capabilities.

Low power mobile handheld devices featuring instant access to voice, video and Internet are one example of a communication system enabled by such SoC solutions (cf. Fig. 1.1). Video delivery to consumer wireless devices such as cellular handsets, e-readers and media tablets has become increasingly commonplace, as users are demanding access to video content. With the popularity of Digital TV (DTV), digital video broadcasting (DVB) standards have also been augmented to include the hand-helds. Battery powered environment of handheld devices demands small physical size and low power consumption in these communication devices.

In addition to digital demodulation and signal processing, wireless communication links also require high performance RF (radio-frequency) receivers in the signal chain. A simple block level partitioning is shown in Fig. 1.2. The RF front-end and analog-baseband termed as “radio”. The goal of radio is to deliver the desired channel to the demodulator with tolerable impairments (due to RF front-end non-idealities) at low cost. As a result design of low power, high performance RF and analog circuits continues to be a key focus area for research in wireless communication systems.

This dissertation follows the style of IEEE Trans. VLSI Syst..
The first part of this work (Sections 2, 3 and 4) addresses some of the research challenges for low power wireless communication devices. Section 2 provides an introduction to modern digital and analog communication techniques and analyzes the implications of the orthogonal frequency division multiplexing (OFDM) technology on the design of the wireless receiver and its analog baseband. In particular we explore the interdependence between analog baseband and dynamic range requirements of a subsequent ADC in the signal chain. Section 3 presents a prototype UHF receiver fabricated using the IBM RFCMOS 0.18μm technology. We also compare the performance of the receiver with state of the art receivers.
In Section 4 we analyze the linearity performance and SNDR requirements of a analog baseband filter in a wireless receiver. Based on the design requirements we propose a new feed-forward transconductor (OTA) structure suitable for building high-linearity active-RC filters. The Section also presents the resulting improvement in the out-band blocker performance from the new OTA structure.

Aggressive technology scaling has also led to spur of growth in the wired back-end infrastructure market supporting the global Internet and data communication traffic. The total volume of data transported over the telecommunications network has risen significantly mainly due to the increased Internet traffic. Technologies that expand the capacity of fiber based transport links to 10Gbps and beyond have gained prominence. The Synchronous Optical Network (SONET) protocol is a standardized multiplexing protocol that transfers digital bit streams over optical fibers. The base unit for SONET speeds is 51.84Mb/s and is termed as OC-1 rate. Similarly, OC-192 is a network line operating at 9.95328 Gb/s. In addition to optical links, a variety of electrical serial link standards have emerged as indicated in Fig. 1.3. This has led to a demand for a low cost and fully integrated transmitter (TX) and receiver (RX) chips to be deployed in the Internet backbone router, which is the core element in the network infrastructure.

Several high speed serial links (both optical and electrical) utilize an embedded clocking approach. This method relies on serial data coding schemes to ensure suffi-
icient transition density in the data stream. These data transitions facilitate recovery of the embedded clock to optimally re-sample the data in the center of the incoming data bit. Modern clock-data-recovery (CDR) systems use a phase-locked-loop (PLL) to perform this operation. This has fueled research on low-cost high-performance phase locking architectures for clock-recovery.

In Section 5 we present a high-performance mixed-signal adaptive clock-recovery solution. System requirements of such a system are introduced followed by architecture and implementation of the mixed-signal solution. Section 6 summarizes the conclusions from this research work.
2. ANALOG BASEBAND SYSTEM DESIGN FOR DIGITAL COMMUNICATION SYSTEMS*

This Section provides a short introduction to analog and digital modulation schemes employed in modern communication systems. Basic principles of multi-carrier modulation and OFDM technology are presented which set the foundation for the later Sections. For more detailed discussion of the communication systems and OFDM technology reader is referred to [1–6].

Effect of undesired adjacent blockers on dynamic range (DR) considerations in a modern radio are discussed in this Section. An analysis of interdependence between the baseband filter and analog-to-digital converter (ADC) DR requirement for a broadband receiver* is presented. High DR ADCs (for a given bandwidth) demand a steep $2 - 4X$ increase in power consumption per every additional bit in resolution. This significant increase in ADC power consumption justifies a systematic evaluation of the effect of filtering on ADC DR. This Section quantifies impact of adjacent blockers (digital or analog modulation) on filter-ADC DR interdependence for Butterworth and Inverse Chebyshev filters. Analysis reveals that (1) low-order Butterworth filters are quite efficient when the undesired power is dominated by far out blockers, and (2) high-order Inverse Chebyshev filters can offer up to $+12$dB additional reduction in ADC DR ($\sim 4X$ in power consumption) compared to Butterworth filters in the presence of analog modulated narrowband adjacent blockers.

2.1 Modern Communication Systems

A simple communication system with its constituent functional blocks is shown in Fig. 2.1. The main goal of the system is to transfer information from the source

*Part of this section is reprinted with permission from "UHF Receiver Front-End Implementation and Analog Baseband Design Considerations", by R. Kulkarni, J. Kim, H.-J. Jeon, J. Xiao, and J. Silva-Martinez, accepted for publication in IEEE Trans. VLSI Syst., DOI 10.1109/TVLSI.2010.2096438.
to the destination. In general, the source and the destination are spatially located
away from each other with the channel providing the link between the source and
the destination. Channel is the physical medium used to send the signals and many
types of channels are used for communication links. These include electrical wires,
wireless channels and optical fiber. The transmitter couples the source signal to the
channel. In most cases, the non-idealities of channel degrade the transmitted signal
quality in multiple ways. The main function of the receiver is to extract the source
signal from the degraded signal due to the impairments of the channel.

The signal from the source (termed as baseband signal) is generally not suited for
direct transmission over the channel. Hence the source signals are modified to enable
signal transmission. The baseband signal is utilized to alter a particular characteristic
of a high-frequency carrier signal. Such a process is termed as modulation. In
addition to modulation, the transmitter may also perform signal amplification and
filtering. The receiver performs the de-modulation process to extract the desired
signal. Depending on the nature of the signal source $x(t)$, modulation schemes
are classified as either analog or digital modulation. In both types of modulation
schemes, a sinusoidal high frequency carrier is still employed for coupling the signal
to the physical channel.

### 2.1.1 Analog Modulation

An analog modulated carrier $x_a(t)$ can be expressed as,

$$x_a(t) = A(t) \cos[w_c t + \phi(t)] \quad (2.1)$$
Figure 2.2. Analog TV channel spectrum with picture and audio carriers [7].

where $A(t)$ is the instantaneous amplitude of the carrier, $w_c$ is the frequency (in $\text{rad/s}$), and $\phi(t)$ is the instantaneous phase deviation of the carrier. We can obtain either amplitude (AM) or frequency/phase modulation (FM/PM) based on whether the signal source $x(t)$ modulates the amplitude $A(t)$ or the phase $\phi(t)$ of the carrier. Depending on the spectral relationship between $x(t)$ and $x_a(t)$, linear modulation schemes have been traditionally classified as: double-sideband modulation (DSB), single-sideband modulation (SSB), and vestigial-sideband modulation (VSB). Among these, VSB offers the best compromise between bandwidth conservation and power efficiency. Commercial TV based on analog modulation schemes, utilize (VSB + picture carrier) for transmitting video information. Analog FM is used for commercial radio stations and also for transmitting the audio component of the commercial TV broadcast. An example of the analog modulation used for TV channel is shown in Fig. 2.2.
2.1.2 Digital Modulation

In digital modulation schemes, the digital bits from the input source are first grouped into symbols. Then each symbol is mapped to one of the possible combinations of amplitude \( \{a_k\} \), phase \( \{\phi_k(t)\} \) and/or frequency \( \{w_k\} \). Such a digital modulated carrier can be expressed as,

\[
x_d(t) = \sum_{k=-\infty}^{k=\infty} a_k \cos[w_c t + w_k t + \phi_k(t)] u(t - kT_s)
\] (2.2)

where \( u(t) \) is a unit amplitude pulse with symbol duration time \( T_s \), and \( a_k, w_k, \) and \( \phi_k(t) \) are respectively the amplitude, frequency and phase trajectory of the \( k^{th} \) symbol during \( kT_s < t < (k + 1)T_s \) [3]. In general, digital modulation that uses amplitude, phase or frequency is referred to as amplitude shift keying (ASK), phase-shift keying (PSK), and frequency-shift keying (FSK) respectively. Modulations that use both amplitude and phase are termed as quadrature-amplitude modulation (QAM) schemes. Using the standard trigonometric identities, \( x_d(t) \) can be written as,

\[
x_d(t) = \sum_{k=-\infty}^{k=\infty} [I_k \cos(w_c t) + Q_k \sin(w_c t)] u(t - kT_s)
\] (2.3)

where \( I_k = a_k \cos(w_k t + \phi_k(t)) \) and \( Q_k = a_k \sin(w_k t + \phi_k(t)) \). \( I_k \) and \( Q_k \) are known as the in-phase and the quadrature components respectively. A constellation diagram of a 16-QAM modulation scheme as an example of a digital modulation scheme is indicated in Fig. 2.3. A few examples of digital modulation schemes used in recent technologies include 256-QAM used for HDTV in cable television in North America and various QPSK and QAM formats used for 802.11n Wireless LAN receivers.
Figure 2.3. Constellation diagram of a 16-QAM modulation scheme.

2.1.3 Shannon’s Channel Capacity

For a given communication channel, channel capacity \( C \) is the upper bound on the rate of reliable data transmission. For band-limited channels with additive white Gaussian noise (AWGN), Shannon defines the channel capacity as,

\[
C = B \log_2(1 + SNR)
\]  \hspace{1cm} (2.4)

where \( B \) is the channel bandwidth (in Hz) and SNR is the Signal-to-noise ratio available over that bandwidth. (2.4) shows that to increase \( C \) for a given link, either the \( B \) or \( SNR \) should be increased. Interestingly, (2.4) also shows that for a fixed \( B \), data rate can be increased if achievable \( SNR \) over the communication link can be increased. For digital modulation schemes, higher \( SNR \) can be used to pack more bits per symbol which effectively increases the data rate. The choice of the modulation scheme and the complexity of the system can vary depending on the channel characteristics, link impairments (noise and inter-symbol-interference), and
cost budget. As we will see in the subsequent sections, depending on the modulation scheme and the channel characteristics, it is common to specify the minimum SNR required from the radio to guarantee reliable transmission (set by a threshold bit-error-rate or BER).

2.1.4 Multicarrier Modulation and OFDM Technology

In its simplest form, communication systems can be designed with a digital modulation scheme using a single-carrier to transport data over a given channel. In such a system, the frequency dependent response of the channel leads to inter-symbol-interference (ISI) degrading the BER of the system. Generally, the frequency response of the channel is compensated in the receiver using a channel estimator and an equalizer. Communication links have continued to evolve and increasingly provide high data transmission rates to support multimedia services. This has led to design of transceivers to handle wider signal bandwidths (several MHz) and high SNRs to enable denser digital modulation constellation schemes. As the signal bandwidths and the data rates increase, the complexity of the digital estimator and equalizer required to compensate the non-idealities due to channel response increases. A multi-carrier modulation scheme addresses this issue using frequency division multiplexing within the available channel bandwidth.

Multi-carrier systems divide the channel bandwidth into multiple sub-bands or sub-channels. First, the high rate data stream to be transmitted is sub-divided into multiple parallel low rate data streams. Next, the low rate data streams modulate multiple orthogonal carriers within the channel bandwidth. For a narrow enough sub-channel bandwidth, the link response of the sub-channel can be assumed to be approximately constant and hence can be easily compensated by a simple digital equalizer. Essentially, the total available transmit power is evenly distributed over all the sub-channels achieving high spectral efficiency and full channel equalization is avoided. In case of narrowband distortions in the channel response, one or a few
Figure 2.4. (a) A general multi-carrier modulation scheme (b)OFDM sub-carriers.

sub-channels can be easily disabled without significantly affecting overall BER of the communication link.

A general case of multi-carrier scheme is essentially similar to conventional frequency division multiplexing, with guard-bands between adjacent sub-carriers so that a receiver can isolate them using digital bandpass filters. The frequency spectrum of such a scheme is shown in Fig. 2.4(a). However, by using sub-carriers separated by a frequency difference that is the reciprocal of the symbol duration, the multiplexed tones can be made orthogonal to each other. In such a scenario, the spectra of the sub-carriers overlap as indicated in Fig. 2.4(b). But it should be noted that, if orthogonality is preserved, then each sub-carriers peak occurs when the other sub-carriers are at null.

OFDM technology is widely used for communication links over wireless channels. This is due to its robustness against ISI and multipath distortion. Recent standards for Wireless LAN (WLAN), WiMax, Digital Video Broadcast (DVB) and Digital Audio Broadcast (DAB) utilize OFDM technology. In the next section, we will look at basics of the OFDM-based mobile and terrestrial digital video broadcast technologies.
2.1.5 Wireless Technologies for Digital Video Broadcast in UHF Spectrum

With the brief introduction different modulation schemes and OFDM technologies in the previous section, we now look at the wireless technology used for digital video broadcast/reception in the UHF spectrum. Digital video broadcast in the UHF spectrum is an intriguing example for radio design especially considering the co-existence requirement from digital and analog modulation schemes within the UHF band. Due to this requirement, design of radio front-ends for mobile digital video reception in UHF spectrum presents a unique set of challenges.

Consider the UHF Spectrum ranging from 470-862MHz as shown in Fig. 2.5. The spectrum shows the relative location of digital video broadcast standards (DVB-H (handheld) and DVB-T (terrestrial)) and analog TV channels along with spectral content from other consumer cellular standards. As indicated the desired DVB-H channels are embedded within UHF spectrum along with DVB-T and analog TV channels. The GSM and WCDMA cellular bands are out of the UHF band as indicated. Table 2.1 indicates the details of modulation schemes (OFDM with digital modulation on each sub-carrier versus analog) used for different standards used for digital video broadcast.
Table 2.1
Modulation schemes used for television broadcast in UHF spectrum in North America [7].

<table>
<thead>
<tr>
<th>Standard</th>
<th>Modulation Scheme</th>
<th>Use</th>
<th>RF Frequency Range (MHz)</th>
<th>Channel BW (MHz)</th>
</tr>
</thead>
<tbody>
<tr>
<td>NTSC</td>
<td>Analog</td>
<td>Terrestrial</td>
<td>47-88, 174-230, 470-854</td>
<td>6</td>
</tr>
<tr>
<td>PAL, SECAM</td>
<td>Analog</td>
<td>Terrestrial</td>
<td>47-88, 174-30, 470-854</td>
<td>6, 7, or 8</td>
</tr>
<tr>
<td>DVB-T</td>
<td>Digital, Coded OFDM (64QAM)</td>
<td>Terrestrial</td>
<td>174-230, 470-854</td>
<td>6, 7, or 8</td>
</tr>
<tr>
<td>ATSC</td>
<td>8-VSB</td>
<td>Terrestrial</td>
<td>47-88, 174-230, 470-854</td>
<td>6</td>
</tr>
<tr>
<td>DVB-H</td>
<td>Digital, Coded OFDM (QPSK, QAM)</td>
<td>Mobile</td>
<td>470-854</td>
<td>6, 7, or 8</td>
</tr>
<tr>
<td>MediaFlo</td>
<td>Digital, Coded OFDM (QPSK)</td>
<td>Mobile</td>
<td>716-722</td>
<td>6</td>
</tr>
</tbody>
</table>
2.2 Radio Design for UHF Receivers

A typical wireless receiver system using the direct down-conversion architecture is indicated in Fig. 2.6. From a communication systems standpoint, the overall receiver can be simply modeled as a demodulator. The spectral energy present in the undesired cellular bands could be pre-filtered by an external passive band pass filter.

The signal in the desired UHF band is amplified, filtered and after the analog-to-digital conversion is presented to the demodulator. The RF and the baseband analog signal processing chain preceding the demodulator can be treated as an impairment block affecting the performance of the communication system. Regardless of the implementation, the goal of the RF receiver is to deliver the desired signal to the demodulator. The entire spectral energy in the UHF band should be managed by the receiver front-end and the desired DVB-H channel should be down-converted with sufficient signal quality for demodulation. The overall requirement of signal to undesired ratio (including phase noise, quantization noise, thermal noise and other non-idealities) depends on the modulation scheme being employed and the BER requirement of the communication link. For example, a BER requirement of $2 \times 10^{-4}$, the minimum SNDR requirement from the receiver chain can be as high as 25dB for 64-QAM modulation scheme for a mobile channel. Such a demanding requirement directly impacts the design of the RF receiver front-end.
The presence of strong adjacent channel power in the spectrum presents several challenges for the RF receiver design. The undesired analog and digital channels could be up to 45dB higher than the desired signal power significantly raising the over-all DR requirements of the receivers. This is especially important in a direct-conversion RF receiver architecture which is the most popular architecture for consumer applications. In direct-conversion receivers, RF blocks in the signal chain (Low Noise Amplifier and I/Q demodulator) typically do not perform any channel-selection filtering. An I/Q demodulator implemented using a passive current-mode mixer can be terminated using a trans-impedance amplifier which can easily implement a single pole to perform first order filtering after the signal down-conversion. This first order filter provides attenuation for blocker power located far away from the desired channel, but leaves the spectral energy from the first adjacent channel unfiltered.

A typical analog baseband signal processing chain with cascaded filtering and variable gain blocks is shown in Fig. 2.7. The figure also illustrates how the input signal power is processed in the signal chain. The variable gain stages increase the signal amplitude, while the filters reduce the blocker power successively increasing the signal-dynamic range for the desired channel power. When processed, the signal

**Figure 2.7.** Analog signal processing using cascaded filters and variable gain stages.
is corrupted due to noise and distortion as indicated. It should be noted that the unfiltered residual blocker power reaches the output and must be accommodated into the DR requirements at the input of a subsequent ADC in the signal chain.

Due to the presence of blocker power, analog baseband blocks must exhibit not only a desired in-filter-band DR but also good linearity for out-of-filter-band signals. The non-linearity with out-of-filter-band signals results in inter-modulation with the output product components falling within the signal band corrupting the signal quality. Analysis and circuit techniques to improve the out-band non linearity performance will be revisited and discussed in Section 4.

2.2.1 System Dynamic Range Requirements

Dynamic range (DR) of a signal is the ratio of the maximum to the minimum value that the signal can take. In a broadband wireless radio, the signal is first sensed at the antenna (or LNA input) and then processed to extract the desired spectral content. The DR requirements from LNA input to the ADC input in a broadband receiver are illustrated in Fig. 2.8. Typical numbers relevant to an UHF receiver are also included in the figure. Different components of the broadband signal power contributing to the DR are explored in this section.

For a given two-sided bandwidth $B$, the noise floor ($N_{receiver}$) of the receiver in $dBm$ is defined as,

$$
Receiver\ noise\ floor = N_{receiver} = (kTB)_{dBm} + NF_{dB}
$$

(2.5)

where $NF_{dB}$ is the noise figure of the receiver.

Then, the sensitivity of the receiver defined as the minimum desired signal power to meet the $SNR$ requirement for the modulation scheme is,

$$
Receiver\ sensitivity\ level = P_{RF,min} = N_{receiver} + SNR_{desired}
$$

(2.6)
As indicated in the figure, the desired channel signal power can vary as much as 40dB and the undesired adjacent blockers can be up to 45dB higher than the signal power.

Peak-to-average ratio (PAR) is an attribute of OFDM based modulation schemes which is a result of utilizing a multiple carriers. As a result radios built for OFDM reception must budget for PAR in the DR calculations to avoid severe distortion and clipping in the signal chain. As indicated in the figure, automatic-gain-control (AGC) can help reduce the variation of the desired signal power where as analog baseband filtering can reduce the blocker power presented to the ADC. In the next section, we study the interdependence of analog baseband filter and ADC DR requirements in detail. We will analyze the implications of this DR requirement on the power consumption cost of a subsequent ADC. It should be noted from Fig. 2.8 that, for the UHF receiver applications the ADC $DR_{min} > 54dB$ ($\sim 9$ bits) even assuming an ideal AGC and complete blocker filtering prior to the ADC.

**Figure 2.8.** Effect of filtering and AGC on dynamic range requirements from antenna to ADC.
2.3 Interdependence of Baseband Filter and ADC Requirements

As previously indicated, analog baseband performs two tasks: (1) filter undesired adjacent blocker power; and (2) deliver constant power to the ADC with sufficient signal-to-noise-and-distortion ratio (SNDR). For a given input signal and blocker profile, choice of the baseband filter (order and approximation) determines the required ADC DR.

ADC DR should perform the following:

1. Be greater than the minimum DR set by the SNR requirement of the modulation scheme. The quantization noise of the ADC affects the performance of the demodulator. A design margin should be included in the DR of the ADC to minimize the degradation due to quantization noise. For this work a design margin of 20dB ($\approx 3$ bits) is used for estimating $DR_{min}$ [8–10].

2. Accommodate the peak-to-average ratio (PAR) of the received signal.

3. Include desired signal power variation that is not covered by automatic gain control (AGC). Since the input desired signal power can vary depending on the physical distance of the receiver from the transmitter, variable gain is required in the signal chain such that a constant input can be delivered to the ADC. The remaining signal power variation must be accommodated in the ADC DR.

4. Accommodate the residual undesired power at the ADC input after the channel select filter.

5. Meet the outband linearity performance requirement based on the residual blocker power.

In addition, the minimum sampling frequency of the ADC should be chosen to keep the undesired aliased signal power below the desired signal power by at least $DR_{min}$. 
Figure 2.9. Comparison of filter approximations for orders 3 to 8 for 4MHz channel bandwidth. For each case, $n^{th}$ order Butterworth response is superimposed with $(n-1)^{th}$ order Inverse Chebyshev response with an additional single pole.

The effect of Butterworth filter order on sampling frequency and resolution of the ADC has been analyzed in [11]. In this work, we focus on item (4) above and quantify the component of the ADC DR required to accommodate the residual undesired power for Butterworth and Inverse Chebyshev filters with orders ranging from 3 to 8. A 1$^{st}$ order pole is added to an $(n-1)^{th}$ Inverse Chebyshev filter for comparison to $n^{th}$-order Butterworth filter. This addition improves the high frequency attenuation for even order Inverse Chebyshev filters. A comparison of the filter transfer function for orders 3 to 6 for the two filters is provided in Fig. 2.9. These two filters are chosen so that we can compare the impact of an all-pole approximation (Butterworth) and an approximation with poles and stop-band zeros (Inverse Chebyshev).
As discussed previously, the residual power evaluation is performed for both digital and analog adjacent channels as undesired blockers in the UHF spectrum can employ either modulation scheme. In analog channels, the bulk of the signal energy is concentrated near the carrier, resulting in strong peaks as illustrated earlier in 2.2. In contrast, the energy in a multi-carrier modulated digital channel is spread smoothly across the channel [12]. Understanding the impact of this difference is important as analog modulation techniques continue to be used along with digital broadcast [7]. The following analysis will show that the higher order Inverse Chebyshev filters perform better at reducing the undesired power in the presence of analog adjacent channels.

2.3.1 Residual Undesired Power from Digital Adjacent Channels

To evaluate the residual undesired power for digital channels, the baseband input spectrum is modeled as shown in Fig. 2.10. From Fig. 2.10:

- The input power spectral density (PSD) $S_{in}(f)$ is defined for a broadband frequency range $(0, f_{max})$. Sub-carriers in the input spectrum are separated by $\Delta f$, resulting in $M = [f_{max}/\Delta f]$ total sub-carriers.

- The desired channel resides in a single-sided bandwidth of $(0, B)$ with $N_t$ sub-carriers in $(0, B - f_g)$ and a guard band $f_g$ with zero power carriers. Each undesired channel has a two-sided bandwidth of $2B$ with $2N_t$ sub-carriers in $2(B - f_g)$.

- The input sub-carrier power in the desired channel is set to $P_d$. The power of sub-carriers $P_{N+1}$ in the first adjacent channel (referred as the $N + 1$ channel) is set $AC_{dB}$ higher than $P_d$. For the remaining undesired channels ($> N + 1$), the power of sub-carriers $P_u$ is $UD_{dB}$ higher than the desired channel ($UD$ denotes undesired – to – desired).
Figure 2.10. Baseband input spectrum with digital adjacent channels.

Figure 2.11. Output power density and definitions of integrated power in the desired channel, residual N+1 adjacent channel, and residual power in all undesired channels.

Total integrated input power $P_{in}$ is

$$P_{in} = \int_{0}^{f_{max}} S_{in}(f) \, df$$  \hspace{1cm} (2.7)$$

This input spectrum is filtered using a transfer function $H(f)$. Hence, the integrated output power within a frequency range $(f_1, f_2)$ is

$$P = \int_{f_1}^{f_2} S_{in}(f) |H(f)|^2 \, df$$  \hspace{1cm} (2.8)$$
Using (2.8), integrated power in the desired channel ($P_{desired}$), the residual integrated power due to the $N+1$ channel ($P_{res,N+1}$), and the residual integrated power due to all the undesired channels ($P_{res,Total}$) are evaluated as indicated in Fig. 2.11. To quantify the component of the ADC DR required to accommodate the $N+1$ channel residual power and total residual power, we evaluate

$$ResidualDR_{N+1} = 10 \log_{10}\left( \frac{P_{desired} + P_{res,N+1}}{P_{desired}} \right) = 10 \log_{10}(1 + \frac{P_{res,N+1}}{P_{desired}})$$ \hspace{1cm} (2.9)$$

$$ResidualDR_{Total} = 10 \log_{10}\left( \frac{P_{desired} + P_{res,Total}}{P_{desired}} \right) = 10 \log_{10}(1 + \frac{P_{res,Total}}{P_{desired}})$$ \hspace{1cm} (2.10)$$

We find $ResidualDR_{Total}$ and $ResidualDR_{N+1}$ for Butterworth and Inverse Chebyshev filters as the first adjacent channel power changes relative to the desired power (indicated by $AC_{dB}$). Corner frequency is set to 4MHz for both the filters. Sub-carrier powers $P_d$, $P_{N+1}$ and $P_u$ are suitably adjusted to maintain fixed input power ($P_{in} = +6dBm$) with $UD_{dB} = 45dB$. Values of the other parameters are indicated in Table 2.2. Fig. 2.12 and Fig. 2.13 show $ResidualDR_{Total}$ and $ResidualDR_{N+1}$ for filters of order 3 to 8 as $AC_{dB}$ is varied from $+10dB$ to $+40dB$ in $10dB$ step size. As expected, Fig. 2.12 indicates that the residual DR requirement for the ADC ($ResidualDR_{Total}$) reduces with increasing filter order.

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Values</th>
</tr>
</thead>
<tbody>
<tr>
<td>$B, f_g, f_{max}$</td>
<td>4 MHz, 200kHz, 396MHz</td>
</tr>
<tr>
<td>$AC_{dB}, UD_{dB}$</td>
<td>38dB, 45dB</td>
</tr>
<tr>
<td>$2N_t$</td>
<td>3800 sub-carriers</td>
</tr>
<tr>
<td>$P_{in}$</td>
<td>+6dBm</td>
</tr>
</tbody>
</table>
Figure 2.12. Residual dynamic requirement for digital adjacent channels with Butterworth filter.

![Figure 2.12](image)

Figure 2.13. Residual dynamic requirement for digital adjacent channels with Inverse Chebyshev filter.

![Figure 2.13](image)

The two important observations from Fig. 2.12 are:

1. If the undesired input power is dominated by \( > (N + 1) \) channels \( (AC_{dB} = 10dB) \), then the first adjacent channel power is easily filtered by both Butterworth and Inverse Chebyshev filters. In this case, lower order Butterworth filters \((3 - 5)\) are quite effective compared to Inverse Chebyshev filters.
2. If the undesired input power is dominated by the \((N + 1)\) channel \((AC_{dB} = 40dB)\), then \(ResidualDR_{\text{total}} \approx ResidualDR_{N+1}\). In this case, both Butterworth and Inverse Chebyshev filters have comparable \(ResidualDR_{\text{total}}\) requirement from the ADC.

### 2.3.2 Residual Undesired Power from Analog Adjacent Channels

Compared to digital channels, analog channels use two narrow band carriers (one for video and one for audio) located within the channel bandwidth. The video carrier is 13dB higher than the audio carrier and is located closer to the pass band edge \([13,14]\). Hence, a single worst-case carrier at \(f_{\text{offset}}\) from the edge of the channel is modeled as shown in Fig. 2.14. For fair comparison, the input power of the single carrier is set equal to the integrated power from the undesired digital channel.

As indicated in Fig. 2.14, the power of the analog carrier in the \((N + 1)\) adjacent channel and remaining channels is

\[
P_{AN,N+1} = P_{N+1} (2N_t) \quad (2.11)
\]

\[
P_{AN,u} = P_u (2N_t) \quad (2.12)
\]

Similar to the previous analysis, \(AC_{dB}\) measures the difference in the desired and undesired first adjacent channel input power. As the \(ResidualDR_{\text{total}}\) requirement
for worst case high $AC_{dB}$ values is dominated by $ResidualDR_{N+1}$, the difference in the performance of the two filters is highlighted with $ResidualDR_{N+1}$. For $f_{offset} = 1.25MHz$, $ResidualDR_{N+1}$ for Butterworth and Inverse Chebyshev filters are evaluated and the results are shown in Fig. 2.15. The key observations from are:

1. For lower filter orders (3 to 5), both Butterworth and Inverse Chebyshev filters provide similar attenuation for the adjacent analog channel leading to comparable $ResidualDR_{total}$ requirement from the ADC.

2. For higher order Inverse Chebyshev filters (7 and 8), the $ResidualDR_{N+1}$ is 12dB lower than that for Butterworth filters. This improvement results from the sharp transition band and nulls in the transfer function due to stop-band $jw$-axis zeros in the Inverse Chebyshev approximation.

In general: (1) Low order Butterworth filters are more efficient at reducing the undesired blocker power when it is dominated by far out blockers ($> N + 1$) for both digital and analog modulation; (2) Butterworth and Inverse Chebyshev filters provide comparable performance when the residual power is dominated by the $(N+1)$ channel with digital modulation; and (3) Higher order Inverse Chebyshev filters (orders 7 to
Figure 2.16. Required ADC DR based on blocker type and filter order/approximation.

8) are favorable than Butterworth filters when the residual power is dominated by the $(N + 1)$ channel with analog modulation as illustrated in Fig. 2.16.

The drop in the required ADC DR due to filtering translates to reduction in ADC power consumption. ADC power consumption has a strong structural dependency [15]. According to [16], the power efficiency of the ADC for a given SNDR can be predicted using power per conversion bandwidth ($P/f_{\text{sig}}$) metric. Published ADC data indicates that $P/f_{\text{sig}}$ increases approximately at $2X$ per additional bit but may approach $4X$ per additional bit for noise limited high DR ADCs [16] [17].
Based on the survey in [17], we estimate the power consumption of a 4 MHz signal bandwidth ADC as shown in Fig. 2.17. This estimation shows that drop in ADC power consumption due to filtering depends on the targeted SNDR and filtering can result in significant ADC power saving for high SNDR ranges. For example, a 10dB drop in ADC DR requirement from an original requirement of 90dB results in 800mW savings, whereas a similar 10dB drop from 70dB results in 50mW savings. Low order Butterworth filters fare better than low-order Inverse Chebyshev filters if the undesired power is dominated by far out blockers. But in the presence of strong $(N+1)$ blocker, high order Inverse Chebyshev filters have either similar (digital blocker) or better (analog blocker) performance than Butterworth approximation of the same order. As quantified by the previous analysis, Inverse Chebyshev filters offer up to $+12$dB additional ADC DR reduction in the presence of strong analog adjacent blockers. Hence to tolerate the presence of strong $(N+1)$ adjacent channels (both analog and digital), Inverse Chebyshev approximation offers the best performance.

**Figure 2.17.** Published ADC power consumption data for 4MHz signal bandwidth.
3. DESIGN OF A UHF RECEIVER AND ITS ANALOG BASEBAND*

In this Section* we present the design of an UHF receiver prototype with emphasis on the design of the baseband filter. First we present the top level architecture of the receiver followed by the block level specifications of the receiver. Next, based on the analysis presented in the previous Section, a cascaded, programmable, hybrid active-RC and switched capacitor (SC) inverse Chebyshev filter is described. The proposed hybrid baseband implementation achieves sharp roll-off with precise stopband zeroes without requiring precision filter tuning schemes. An all digital non-overlap clock tuning system to minimize the variation of available settling time window in SC circuits is also included. The receiver presented in this Section integrates an RFVGA, an on-chip single-to-differential transconductor (balun) with current-mode passive mixer, and a hybrid analog baseband with an all-digital tuning scheme for non-overlap clock generation and achieves performance commensurate with the state of the art. This receiver achieves a measured noise figure of 7.9dB, an IIP3 of -8dBm at maximum gain and +2dBm at 9dB RF attenuation. The chip consumes 120mW (RFVGA, mixer and I-channel baseband) from 1.8V analog/2.5V digital dual supply and occupies $2.14mm^2$ in IBM RFCMOS 0.18µm technology.

3.1 Receiver Design Specifications

Some of the earliest UHF receiver solutions were implemented in BiCMOS processes [18–20], the trend is to integrate these receivers in CMOS [21–24]. A direct conversion receiver architecture as shown in Fig. 3.1 is adopted for analysis and implementation. The key system specifications (sensitivity level and signal-to-noise ratio (SNR) for different modulation schemes) pertinent to UHF receivers are also

*Part of this section is reprinted with permission from "UHF Receiver Front-End Implementation and Analog Baseband Design Considerations", by R. Kulkarni, J. Kim, H.-J. Jeon, J. Xiao, and J. Silva-Martinez, accepted for publication in IEEE Trans. VLSI Syst., DOI 10.1109/TVLSI.2010.2096438.
included in Fig. 3.1 [18]. The receiver has a single-ended RF input and uses a broadband RFVGA to provide gain independent matching [25,26]. The single-to-differential signal conversion is accomplished on-chip by two (I&Q) linear transconductors which in turn drive passive current-mode mixers.

![Diagram of Direct-conversion broad-band UHF receiver architecture.](image)

<table>
<thead>
<tr>
<th>Modulation</th>
<th>Sensitivity</th>
<th>SNR</th>
</tr>
</thead>
<tbody>
<tr>
<td>QPSK</td>
<td>-95 dBm</td>
<td>5.6  dB</td>
</tr>
<tr>
<td>16-QAM</td>
<td>-86 dBm</td>
<td>15.1 dB</td>
</tr>
<tr>
<td>64-QAM</td>
<td>-75 dBm</td>
<td>24.8 dB</td>
</tr>
</tbody>
</table>

**Figure 3.1.** Direct-conversion broad-band UHF receiver architecture.

System simulations are performed using cascaded NF and IIP3 equations to maximize the receiver dynamic range and arrive at block level specifications. Table 3.1 shows the targeted circuit block specifications. A gain range of 30dB in the RFVGA ensures that the mixer and baseband stages do not saturate. The RF take over point (i.e. when the RFVGA switches from gain to attenuation) is set to -20dBm at the input of the mixer. In the baseband, variable gain must be distributed between baseband VGA and filter to guarantee sufficient SNDR. Further details of the budgeting at system level specifications can be found in [26,27].
Table 3.1
Desired block bevel specifications.

<table>
<thead>
<tr>
<th>Performance</th>
<th>RFVGA</th>
<th>Mixer</th>
<th>Baseband</th>
</tr>
</thead>
<tbody>
<tr>
<td>Gain (dB)</td>
<td>-14 to +16</td>
<td>18</td>
<td>-6 to +53</td>
</tr>
<tr>
<td>Noise Figure (dB)</td>
<td>3 at +16dB gain</td>
<td>12</td>
<td>35 at 53dB gain</td>
</tr>
<tr>
<td></td>
<td>30 at -14dB gain</td>
<td></td>
<td></td>
</tr>
<tr>
<td>IIP3 (dBm)</td>
<td>0 at +16dB gain</td>
<td>13</td>
<td>Outband &gt; 33</td>
</tr>
<tr>
<td></td>
<td>20 at -14dB gain</td>
<td></td>
<td>Inband &gt; 33</td>
</tr>
</tbody>
</table>

3.2 Analog Baseband Design

Based on the analysis in the previous Section, an Inverse Chebyshev approximation is chosen for the 4MHz bandwidth filter to provide > 29dB attenuation at 5.25MHz. This results in an 8th-order approximation with pole-zero locations as indicated in Table 3.2.

Excellent linearity performance of active RC filters makes them suitable for broadband receivers [18–24]. However, the accuracy of pole-zero ratios limits roll-off sharpness, and process variations limit the absolute accuracy in active RC filters unless an automatic tuning scheme is employed. The complexity of the filter tuning scheme to mitigate this variation depends on the desired precision. In contrast, switched capacitor (SC) filters can implement precise transfer functions without tuning but require anti-alias filtering. A 700kHz SC filter for channel selection using an anti-alias filter with > 2X larger bandwidth (1.5MHz) has been reported in [28]. A solution that implements an SC ladder filter with embedded anti-aliasing has been reported previously [29]. The required frequency and gain programmability of the baseband filter in this work precludes the use of such hybrid ladder architecture.
Table 3.2

Pole-zero placement for baseband inverse Chebyshev approximation.

<table>
<thead>
<tr>
<th>Complex Pole Pairs (MHz)</th>
<th>Complex Zero Pairs (MHz)</th>
<th>Realization</th>
</tr>
</thead>
<tbody>
<tr>
<td>4.05/5.40, Q=0.71</td>
<td>21.89/29.19</td>
<td>Active RC (Zero not realized)</td>
</tr>
<tr>
<td>4.66/6.21, Q=0.52</td>
<td>7.69/10.25</td>
<td>Switched Capacitor</td>
</tr>
<tr>
<td>3.49/4.66, Q=1.23</td>
<td>5.14/6.85</td>
<td>Switched Capacitor</td>
</tr>
<tr>
<td>3.21/4.29, Q=3.79</td>
<td>4.35/5.81</td>
<td>Switched Capacitor</td>
</tr>
</tbody>
</table>

3.2.1 Cascaded Hybrid Baseband Architecture

The hybrid active RC and SC filter with built-in anti-aliasing shown in Fig. 3.2 is suitable for realizing cascaded transfer functions. The desired Inverse Chebyshev approximation is realized as a cascaded function of 4 biquad stages ($\prod H_i(s)$).

Each biquad transfer function is given by

$$H_i(s) = K_i \frac{1 + \left(\frac{s}{w_{z,i}}\right)^2}{1 + \left(\frac{s}{w_{p,i}Q_{p,i}}\right) + \left(\frac{s}{w_{p,i}}\right)^2} \quad (3.1)$$

where $K_i$ is the DC gain, $w_{p,i}$ is the location of the complex pole-pair with quality factor $Q_i$, and $w_{z,i}$ is the location of the $jw$-axis zero pair. The required anti-aliasing transfer function is realized using a part of the Inverse Chebyshev approximation. Thus, the overall filter transfer function is

$$H(s) = H'_1(s) H_2(s) H_3(s) H_4(s) \quad (3.2)$$

where $H_2(s)$, $H_3(s)$, and $H_4(s)$ are realized with SC filters while $H'_1(s)$ is approximated from $H_1(s)$ by ignoring the highest zero pair from the Inverse Chebyshev approximation (29 MHz in Table 3.2) to provide anti-aliasing. The cascaded SC biquads implement precise stop-band high-Q zeros to provide accurate transition band positioning and the output can be easily coupled to a Nyquist-rate ADC due to the sampled nature of the output.
3.2.2 Active RC Implementation

Single-opamp multi-feedback (MFB) filter structure shown in Fig. 3.3 is used to implement \( H'_1(s) \). This structure has the advantage of low sensitivity (to Q and \( w_o \) variations). The transfer function of the filter is given by

\[
H'_1(s) = \frac{K_1}{1 + \left(\frac{s}{w_o Q_p}\right) + \left(\frac{s}{w_p}\right)^2}
\]

(3.3)

where

\[
K_1 = \frac{R_4}{R_1}; \quad w_p = \frac{1}{RC\sqrt{mn}}; \quad Q_p = \frac{\sqrt{n}}{(2\sqrt{m} + \sqrt{\frac{1}{m})}}
\]

(3.4)

with \( K_1 = 1 \), \( R_1 = R_4 = R \), resistor ratio \( m = R_3/R_1 \) and a capacitor ratio \( n = C_2/C_5 \). For a given \( Q_p \), \( m \) and \( n \) are interdependent and thus cannot be set independently.

The total thermal output spot noise spectral density for the filter is

\[
v_{on}^2 = 16kTR \left| \frac{1}{1 + \left(\frac{s}{w_o Q_p}\right) + \left(\frac{s}{w_p}\right)^2} \right|^2 + 8kTmR \left| \frac{2(1 + \left(\frac{s}{2w_p \sqrt{\frac{n}{m}}\right))}{1 + \left(\frac{s}{w_o Q_p}\right) + \left(\frac{s}{w_p}\right)^2} \right|^2
\]

\[+ \nu_{amp}^2 \frac{2(1 + \left(\frac{s}{w_o Q_p}\right) + \left(\frac{s}{w_p}\right)^2}{1 + \left(\frac{s}{w_o Q_p}\right) + \left(\frac{s}{w_p}\right)^2} \right|^2
\]

(3.5)
Figure 3.3. Active RC multi-feedback-filter and programmable gain amplifier (single-ended structure shown).
Figure 3.4. Variation of integrated noise for a given total capacitance budget with varying resistor ratio.

where \( v_{\text{amp}}^2 \) is the input referred noise density of the amplifier. The first term represents the noise contribution from \( R_1 \) and \( R_4 \); the second term represents the noise contribution from \( R_3 = mR \); and the third term represents the noise contribution from the amplifier. Parameters \( w_z \) and \( Q_z \) are

\[
w_z = w_p \sqrt{2}; \quad Q_z = \frac{\sqrt{2mn}}{(2m + n + 1)}
\]

For a fully differential filter, the total capacitance is \( C_{\text{total}} = C_2/2 + 2C_5 = (n/2 + 2)C \). For a given \( C_{\text{total}} \), depending on the choice of \( m \) (and hence \( n \) as set by \( Q_p \)), the integrated noise is plotted in Fig. 3.4. The optimum range of resistor ratio \( m \) for reducing the noise for a given capacitance budget is 0.2-0.4. Hence, we choose \( m = 0.25 \) (resulting in \( n = 4.54 \)) and size the capacitors accordingly to meet the noise figure requirement. Programmable capacitors \( C_2 \) and \( C_5 \) (cf. Fig. 3.3) are adjusted using digital control bits to implement 3 and 4MHz bandwidth settings in the filter.

The filter linearity requirement \((\text{IIP}_3 > 33\text{dBm})\) sets the minimum loop-gain in the filter passband to suppress distortion adequately. The minimum loop-gain
and hence, the gain-bandwidth (GBW) of the amplifier is obtained using simulations and thus, we designed a two-stage Miller amplifier with 160MHz GBW. The amplifier consumes 2.15mA from a 1.8V supply. The input resistor $R_1$ is split into two separate resistors with additional capacitance (not shown in Fig. 3.3) resulting in 3$^{rd}$ order filter to further enhance the anti-aliasing and rejection of far-out blockers.

A continuous time PGA with a gain range of (-6 to +18dB) follows the MFB filter. The PGA resistors are sized to minimize the input-referred noise of the PGA ($18 \, nV/\sqrt{Hz}$) in the maximum gain setting. The switches and resistor arrays are ratioed proportionately to keep the gain of the PGA independent of switch size to the first order. In addition, the PGA provides a low closed-loop output impedance to drive the subsequent SC filter.

### 3.2.3 SC Implementation

Three cascaded SC biquads emulate the transfer functions $H_2(s)$, $H_3(s)$, and $H_4(s)$. Each biquad is a two integrator loop implemented using operational transconductor amplifiers (OTA) as shown in Fig. 3.5. The preceding PGA allows to relax the input-referred noise density of the SC section ($285 \, nV/\sqrt{Hz}$).

For a sampling frequency of 80MHz ($T_s = 1/f_s = 12.5\, ns$), 1ns is budgeted for slew rate effects, 1ns for switch resistance delay, and 0.9ns for the non-overlapping time, yielding a $T_{linear-settling}$ on the order of 3.35ns [30]. The desired open-loop GBW ($f_{u,i}$ Hz) of each OTA for 0.5% settling ($6\tau$) is

$$f_{u,i} = \frac{6}{2\pi \beta_i T_{linear-settling}} \quad (3.7)$$

where $\beta_i$ is the feedback factor of the $i^{th}$ OTA for a given filter configuration. Programmable capacitor arrays ($C_1$, $C_3$ and $C_6$ in Fig. 3.5) are used in the biquads to achieve accurate gain (0 or 6dB) and frequency programming (3 or 4MHz). Computation of the required capacitor ratios and OTA GBW account for dynamic range
Figure 3.5. Switched-capacitor biquad implementation (single-ended structure shown).

Table 3.3
Gain bandwidth and feedback factor for SC biquad OTAs across configurations.

<table>
<thead>
<tr>
<th>Filter Setting</th>
<th>Feedback factor</th>
<th>GBW (MHz)</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>$\beta_1$</td>
<td>$\beta_2$</td>
</tr>
<tr>
<td>3 MHz, 0 dB</td>
<td>0.46</td>
<td>0.44</td>
</tr>
<tr>
<td>3 MHz, 6 dB</td>
<td>0.40</td>
<td>0.40</td>
</tr>
<tr>
<td>4 MHz, 0 dB</td>
<td>0.43</td>
<td>0.38</td>
</tr>
<tr>
<td>4 MHz, 6 dB</td>
<td>0.37</td>
<td>0.33</td>
</tr>
</tbody>
</table>

node scaling and noise estimation. Feedback factor and GBW for each OTA across programmable settings are indicated in Table 3.3. In this prototype, the OTAs are designed for the worst case $f_u$ (4MHz, 6dB setting).

The noise constraint ($k_B T/C$) determines the unit capacitor size, which in turn sets the power consumption of each OTA for a desired $f_u$. We select $C_{\text{unit}} = 113 fF$ (minimum value imposed by the process), which results in a cascaded input referred noise density $< 100 nV/\sqrt{Hz}$ for the maximum gain setting. Fully differential, folded cascode transconductors (PMOS input) with switched capacitor common-mode feed-
back are used for the OTAs. Including the biasing circuits, the three biquads consume 6.3mA, 7.7mA, and 6.7mA from a 1.8V supply. The input referred noise density requirement for second and third biquads can be reduced allowing to relax the minimum $C_{\text{unit}}$ and the total capacitance used in the biquads. Such a power optimization will allow to reduce the power consumption of later stages of filter. In this work, the process imposed constraint of minimum $C_{\text{unit}} = 113fF$, prevented such optimization. The switches are implemented with 2.5V NMOS devices.

The active RC section provides 24dB variable gain and a 3rd order filter response with an attenuation of 4dB at 5.25MHz. With the additional SC filter (6th order), this attenuation increases to 29dB (42dB) at 5.25MHz (5.75MHz) with an additional 18dB variable gain. This improved attenuation and variable gain consumes 37.3mW (limited by the high $C_{\text{unit}}$ of the technology in this design). The variable gain control in the RF and the baseband section were manually controlled using digital control bits. The passband group delay resulting from filter response adds to the total wireless channel delay which is time-variant and has to be compensated by the adaptive channel equalizer in the digital demodulator. Frequency dependent gain and phase mismatch between I/Q branches causes sub-carrier dependent errors within the bandwidth, but can be reduced at the system-level using digital compensation techniques [6].

3.2.4 All-Digital Non Overlap Delay Tuning

Fig. 3.6 illustrates a conventional two-phase non-overlapping clock generator. Digital delays for two-phase non-overlapping clock generation schemes suffer significantly from process variations, that may result in settling time window variations up to 30%.

As shown in Fig. 3.6, the valid time available for linear settling is

$$T_{\text{VALID}} = \frac{T}{2} - (t_N + t_d)$$

(3.8)
Figure 3.6. Conventional two-phase non-overlapping clock generation.
where $T$ is reference clock period, $t_N$ is the NOR gate delay, and $t_d$ is the delay to generate the non-overlapping time. The clock phases ($\Phi_1$ and $\Phi_2$) must be non-overlapping to guarantee that charge is not inadvertently lost. The design value of $(t_N + t_d)$ cannot be arbitrarily small, lest the uncontrolled clock routing skews cause the phase to overlap. In slow process corners delay is maximum, so available settling time is minimum. In addition, switch time constants also increase in the slow process corner, demanding an over design of the OTA to accommodate the smallest available settling time.

Typical SC circuits requiring multiphase clocks employ complex PLLs (or DLLs) to generate precision clocks to minimize the variation in available settling time [31]. Timing skew and duty cycle adjustment circuits for SC circuits have also been proposed [32,33]. Alternatively, this work proposes a low-complexity all-digital delay tuning scheme to reduce the uncertainty of available settling time. As shown in Fig. 3.7, a replica delay element configured as a ring oscillator drives a counter, which counts the number of ring oscillator transitions per reference clock. The total number of delay cells in replica delay loop can be adjusted using digital control bits to the multiplexer to adjust the counter output. A digital comparator compares this count

**Figure 3.7.** All digital non-overlap time tuning system.
to a desired value and tunes the ring delay to achieve a desired count. The multiplexer control bits thus obtained set the delay in Fig. 3.6. This scheme reuses the reference clock which is already available to generate the non-overlap clock phases. For an 80MHz clock \( T = 12.5\,\text{ns} \), the low complexity tuning scheme reduces the variation of available settling from 17% to less than 4% (900ps to less than 200ps out of nominal 5.25ns) across process corners.

3.3 Design of RF Front-End

A short description of the RF Front-end design is provided in this section. Analysis and implementation details of the LNA and Mixer used in this receiver can be found in [25,26,34].

3.3.1 RF Variable Gain Amplifier (RFVGA)

The RF front-end consists of an RF variable gain amplifier followed by single-to-differential transconductor and current mode quadrature mixers. A single ended RF input is used to reduce the system cost by obviating the external balun (cf. Fig. 3.8). The variable gain helps to maximize the output SNDR. Adopted from [25], the RFVGA implements a modified shunt feedback scheme to achieve wideband input matching independent of gain without a shunt peaking inductor. The VGA consists of five identical \( G_m \) stages connected with a capacitive divider configuration. This cascaded arrangement facilitates a 6dB coarse gain setting in the RFVGA. Fine gain steps with smooth gain adjustment are implemented with a current-steering scheme (not shown in the figure) using a process independent control block [25]. Operating with a 1.8V supply, the RFVGA provides a gain range of -14dB to +16dB with a targeted NF of 3dB at maximum gain and IIP3 performance of +20dBm at 14dB RF attenuation.
Figure 3.8. RFVGA with gain-independent shunt feedback input matching [25].
3.3.2 Current-Mode Passive Mixer

A single-ended RF input in the receiver requires an on-chip single-to-differential balun in the signal processing chain to minimize common mode noise and even order distortion. One possible approach to on-chip single-to-differential conversion is to utilize a combination of common-gate and common-source stages [35–37]. An on-chip transformer load could be used to obtain a single-ended-to-differential LNA architecture [38,39]. In this work, single-ended to fully-differential conversion is achieved with a single $G_m$ as indicated in Fig. 3.9. Within the desired bandwidth (470-862 MHz), the single-to-differential converter produces a gain and phase mismatch of 0.4dB and $14^\circ$ respectively. The transconductor uses resistively source-degenerated complementary NMOS and PMOS differential pairs to achieve high linearity and power efficiency through current reuse [40]. The cross-modulation between VHF and UHF bands can generate in-band second-order inter-modulation distortion (IM2) products [37]. But the receiver in this work is targeted for UHF band only, hence the second-order inter-modulation in the RF front-end give rise only to out-of-band inter-modulation tones. The transconductor and the mixer switches are AC coupled mainly to eliminate the out-of-band IM2 distortion but also to suppress flicker noise, reduce DC offset, and provide biasing flexibility.

The passive mixer shown in Fig. 3.10 is terminated at the virtual ground of a transimpedance amplifier (TIA) stage ( [39,41,42]), exhibits higher linearity than Gilbert-type mixer without the headroom constraints. The transconductor, passive switch, and TIA cascaded together have an IIP3 of 12.2dBm with sinusoidal LO in simulation. The on chip LO signal is provided by a frequency divider consisting of two current-mode logic (CML) latches and such a signal is a non ideal pulse with finite rise- and fall-times. For a typical $(\text{pulsewidth/period})$ ratio of 0.3-0.4, the IIP3 performance is better than sinusoidal LO signal (IIP3 is 13.4dBm for $(\text{pulsewidth/period}) = 0.35$). The mixer switches are replica biased at the onset of inversion to minimize clock feed through and even-order harmonic distortion. The
Figure 3.9. Single-to-differential converting transconductor ($G_m$) driving the mixer switches.
TIA provides a broadband low impedance (< 10Ω) current path for the down converted signal using a wide gain-bandwidth (460 MHz), fully differential, two-stage Miller-compensated amplifier. An on-chip frequency divider generates the required quadrature LO signals. A DC-offset cancellation loop is included around the TIA stage (cf. Fig. 3.10), which has a highpass corner frequency of 2.4kHz. Operating with a 1.8V supply, the mixer and TIA provide a gain of +18dB with a targeted NF and IIP3 performance of 12dB and +13dBm, respectively.

3.4 Experimental Results

The receiver was fabricated in IBM 0.18µm RFCMOS technology. Fig. 3.11 shows the chip micrograph. Only one baseband channel was realized (out of I and Q) in the prototype due to area constraints; however, analog performance verification only requires testing of one channel. The system occupies 2.14mm² of active area and was characterized in a QFN80 package.

Fig. 3.12 illustrates the measurement setup. Baseband outputs tapped at intermediate points in the signal chain are buffered with on-chip open drain buffers, terminated on the board. To accommodate output swing in the baseband outputs, the output buffers are source degenerated. The differential signal outputs are buffered separately using highly linear commercial amplifiers.

3.4.1 Baseband Response and Residual DR Measurements

The measured baseband transfer functions are shown in Fig. 3.13. Fig. 3.13(a) shows the frequency (3 & 4MHz) and gain programmability (-6dB to +18dB with 6dB per step) of the continuous time section. Fig. 3.13(b) shows frequency programmability of the composite hybrid filter (3 & 4 MHz options) along with the additional gain programmability of the SC section (0 to +18dB range with 0 or 6dB per biquad). For the 4 MHz setting, the measured frequency response indicates a
Figure 3.10. Current mode passive mixer terminated at TIA input with DC offset cancellation.

Design Values

<p>| | |</p>
<table>
<thead>
<tr>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>$M_1$</td>
<td>18(8μm/0.36μm)</td>
</tr>
<tr>
<td>$M_2$</td>
<td>2(5μm/5μm)</td>
</tr>
<tr>
<td>$M_3$</td>
<td>6(20μm/5μm)</td>
</tr>
<tr>
<td>$R_1$</td>
<td>2 kΩ, 10pF</td>
</tr>
<tr>
<td>$C_1$</td>
<td></td>
</tr>
<tr>
<td>$R_2$</td>
<td>480 kΩ, 32pF</td>
</tr>
<tr>
<td>$C_2$</td>
<td></td>
</tr>
</tbody>
</table>
Figure 3.11. Chip micrograph of the UHF receiver.
stopband attenuation of >29dB for frequencies >5.25 MHz, while the continuous
time filter provides >2.8dB attenuation at the same frequency.

To measure the ResidualDR, input power spread over two channels (generated
using two signal generators and a power combiner) is injected into the filter. A digital
channel is generated using 64-QAM modulation using Root-Nyquist (RNYQ) pulse
shape with appropriately scaled symbol rate to generate a flat PSD through-out the
channel, while an analog channel is generated using a single carrier. The output
power levels between the desired and the undesired channels is suitably adjusted
to vary the \( AC_{dB} \) values to obtain a wide range of measurements. Filtered PSD is
measured at both the continuous- and discrete-time outputs. Fig. 3.14 shows the
measured input and filtered outputs with digitally modulated desired and \( N + 1 \)
adjacent channel with $AC_{dB} = 30dB$. The PSD shows attenuation below 1 MHz due to the frequency limitation from the power combiner. Table 3.4 indicates the computed $ResidualDR$ from measurements for near digital and analog blocker ($N+1$ channel) and far out digital blocker ($N + 3$ channel) for varying $AC_{dB}$ values. The filtered PSD at the hybrid filter output for the $N + 3$ channel is below the output noise floor for $AC_{dB}$ values of 10 and 20dB. The proposed hybrid filter reduces the $ResidualDR_{N+1}$ by $+17.5$dB ($> 2.0$ bits) for digital $N + 1$ channel and $+24.9$ dB ($\approx 3.9$ bits) for analog $N + 1$ channel. Improvement in $ResidualDR_{N+1}$ is better in the presence of analog adjacent channel for the hybrid filter as predicted by the analysis in Section 2.
Figure 3.14. Measured filtered PSD with combined input of two 64-QAM digital modulated channels (desired and N+1 adjacent) with \( AC_{dB} = 30\text{dB} \)

Table 3.4

<table>
<thead>
<tr>
<th>Blocker Profile with ( AC_{dB} )</th>
<th>Continuous-time filter (dB)</th>
<th>Hybrid filter (dB)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Digital ((N + 1)) channel</td>
<td>7.5 16.5 26.4 <strong>34.5</strong></td>
<td>1.3 2.9 8.8 <strong>17.0</strong></td>
</tr>
<tr>
<td>Analog ((N + 1)) channel</td>
<td>10.6 20.2 30.2 <strong>40.2</strong></td>
<td>0.15 1.3 6.3 <strong>15.3</strong></td>
</tr>
<tr>
<td>Digital ((N + 3)) channel</td>
<td>0.53 0.84 3.12 <strong>9.83</strong></td>
<td>- - 0.45 <strong>0.51</strong></td>
</tr>
</tbody>
</table>

\( a \) By definition, equivalent integrated power of the \((N + 1)\) channel is \((AC + 3)_{dB}\) higher than the integrated power of the desired channel due to 2X integrating bandwidth of \((N + 1)\) channel.

3.4.2 System Performance

The desired input impedance of the RFVGA is 75Ω (video standard). The measured \( S_{11} \) response from the network analyzer (referenced to 50Ω) is post-processed to obtain matching performance with respect to 75Ω (cf. Fig. 3.15). The plot indicates the \( S_{11} \) performance for two cases of shunt feedback matching and resistive
matching in the frequency range from 400 to 900MHz. We measured a NF of 7.9dB at maximum gain using the Y-factor method with an NC346B noise source. We observed an additional NF penalty of 2.5dB with respect to the simulation result, which can be attributed to (1) insertion loss of interconnections between the noise source and LNA, (2) the noise contribution of the gain control block in the RFVGA, and (3) RC routing parasitics between the RFVGA and the mixer.

![Graph](image)

**Figure 3.15.** Measured $S_{11}$ performance (For 75Ω reference).

![Graph](image)

**Figure 3.16.** Measured linearity performance (a) two-tone measurement results for the system, (b) system IIP3 performance.
To obtain the system linearity performance, two out-of-channel RF tones located at \( N + 2 \) (516MHz) and \( N + 4 \) (531MHz) are injected at the LNA input, and the in-band distortion tone located at 1 MHz after down conversion was measured. We varied the \( N + 2 \) and \( N + 4 \) tone input power in steps of 1dB to obtain the system IIP3. The plot in Fig. 3.16(a) shows an IIP3 of -8 and +2dBm for the highest gain and 9dB RF attenuation cases, respectively. Also, the 3rd order harmonic saturation is measured at -23dBm for the highest gain setting but is not detected for 9dB RF attenuation showing better linearity performance.

Table 3.5 summarizes the experimental results and compares this receiver to published UHF receivers. This work was fabricated using IBM 0.18\( \mu m \) RFCMOS technology without using any special RF components other than MIM capacitors. The power consumption and the area metrics indicated for this work do not include the frequency synthesizer, and quadrature generator. The RF front end and baseband blocks consume 58mW and 52mW from a 1.8V supply respectively. The digital clock tree to drive the SC filter switches consumes 10mW from a 2.5V digital supply. Compared to the previously published UHF receiver solutions, this work implemented in CMOS process offers competitive performance.
Table 3.5
Experimental results with comparison to previous work.

<table>
<thead>
<tr>
<th></th>
<th>This Work</th>
<th>[18]</th>
<th>[19]</th>
<th>[20]</th>
<th>[21]</th>
<th>[22]</th>
<th>[24]</th>
</tr>
</thead>
<tbody>
<tr>
<td>Frequency range (MHz)</td>
<td>470-862</td>
<td>470-862</td>
<td>470-862</td>
<td>470-860</td>
<td>470-860</td>
<td>470-860</td>
<td>470-862</td>
</tr>
<tr>
<td>RFVGA gain range (dB)</td>
<td>29.2</td>
<td>40</td>
<td>35</td>
<td>&gt;50</td>
<td>20</td>
<td>Not Available (N.A.)</td>
<td>40</td>
</tr>
<tr>
<td>(15.2 to -14)</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Channel bandwidth (MHz)</td>
<td>6/8</td>
<td>8</td>
<td>7/8</td>
<td>6/7/8</td>
<td>4 to 10</td>
<td>5/6/7/8</td>
<td>2-5</td>
</tr>
<tr>
<td>Maximum gain (dB)</td>
<td>&gt;80</td>
<td>75</td>
<td>85</td>
<td>94 to 100</td>
<td>86</td>
<td>95</td>
<td>95</td>
</tr>
<tr>
<td>Overall AGC range (dB)</td>
<td>&gt;75</td>
<td>75</td>
<td>65</td>
<td>&gt;98</td>
<td>80</td>
<td>95</td>
<td>103.5</td>
</tr>
<tr>
<td>NF at Max gain (dB)</td>
<td>7.9</td>
<td>8.5</td>
<td>3.6</td>
<td>3.1 to 4.6</td>
<td>3.5/4</td>
<td>4.5/5</td>
<td>3.7/4.3</td>
</tr>
<tr>
<td>IIP3 at Max gain (dBm)</td>
<td>-8</td>
<td>N.A.</td>
<td>N.A.</td>
<td>-13</td>
<td>N.A.</td>
<td>N.A.</td>
<td>-13</td>
</tr>
<tr>
<td>IIP3 (dBm)</td>
<td>+2 at</td>
<td>+12 at</td>
<td>+4 at</td>
<td>-6.8 at</td>
<td>-9/-3/0.5 at</td>
<td>-5/-6 at</td>
<td>+5 at</td>
</tr>
<tr>
<td>Power consumption</td>
<td>120mW (182mW) at 1.8V analog and 2.5V digital supply</td>
<td>240mW at 2.78V</td>
<td>340mW at 2.7V</td>
<td>184mW at 2.8V</td>
<td>295mW at 2.8V</td>
<td>184/207mW at 2.8V</td>
<td>114mW at 1.2V</td>
</tr>
<tr>
<td>Die Size (mm²)</td>
<td>2.14</td>
<td>11.5</td>
<td>12.25</td>
<td>16</td>
<td>9.7</td>
<td>7.8</td>
<td>7.2</td>
</tr>
<tr>
<td>Technology</td>
<td>CMOS 0.18µm</td>
<td>SiGe 0.35µm</td>
<td>SiGe 0.35µm</td>
<td>SiGe 0.5µm</td>
<td>CMOS 0.18µm</td>
<td>CMOS 0.18µm</td>
<td>CMOS 0.13µm</td>
</tr>
</tbody>
</table>

* Measured with an external LNA of 10dB gain.

b Includes estimated additional power for Q-baseband channel.

c Does not include the area for Frequency synthesizer and Q-baseband channel.
4. A LINEAR FEED-FORWARD OTA FOR ACTIVE-RC FILTER DESIGN

The previous two Sections, presented some of the challenges and a solution for design of analog filters for a UHF receivers. In Section 2, we analyzed the impact of out-band blockers for analog baseband filters in wireless receivers. Towards this end, we present a new operational transconductor (OTA) structure suited for design of active RC filters in this Section. As it will be shown in this Section, the proposed OTA can be used for designing active RC filters with improved out-band linearity performance and blocker tolerance. The OTA uses a feed-forward structure for frequency compensation. To demonstrate the improvement in linearity performance, a lossy integrator (1st order active RC) structure with a 10MHz corner frequency using the new OTA has been designed. A lossy integrator is chosen as test vehicle in this prototype since it serves as a fundamental building block in active RC biquads and loop filters for continuous-time ΔΣ modulators. A reference lossy integrator using a conventional feed-forward OTA has also been designed for fair comparison of linearity performance with the proposed structure. The main thrust of this Section is to highlight the linearity improvement offered by the new OTA in active RC filter designs.

First, we explore the noise and linearity performance requirements of analog-baseband filters for wireless receivers. This will set the background to evaluate the impact of out-band blockers on the linearity requirements. Then, we analyze the frequency dependent loop gain response of the lossy integrator structure using an OTA. We present the proposed OTA structure and the design of the two integrators using the TSMC 0.18µm RFCMOS technology. The designs are currently under fabrication. A comparison to the state-of-the-art active RC filters is provided based on the simulation results. Brief conclusions are presented in the end.
4.1 Spurious Free Dynamic Range (SFDR) of Analog Filters in Wireless Receivers

In Section 2.2, we introduced a typical wireless receiver system using a direct conversion architecture. We noted that the RF building blocks (LNA and I/Q demodulator) offer little to no help for channel-selection and blocker filtering and as a consequence the entire down-converted broadband spectral energy is presented unfiltered to the analog baseband section. An I/Q demodulator implemented using a passive current-mode mixer can be terminated using a trans-impedance amplifier which can easily implement a single pole to perform first order filtering after the signal down-conversion. This first order filter provides attenuation for blocker power located far away from the desired channel, but leaves the spectral energy from immediate adjacent channels unfiltered. We also introduced the functionality of the baseband analog signal processing chain, shown again in Fig. 4.1 for convenience. During signal processing, signal gets corrupted due to noise and distortion as illustrated. The variable gain stages in the analog baseband increase the signal amplitude, while the filtering sections reduce the blocker power successively increasing the signal-dynamic range for the desired channel power.

As the signal is processed in the baseband chain, the signal gets corrupted due to additive noise from the active devices like transistors. In addition, due to weakly-
nonlinear behavior of these active devices, nonlinear distortion terms are also added to the signal power. In most cases, the magnitude of the additive noise from the active devices is independent of the input signal power. On the other hand, the magnitude of the distortion generated depends on input signal level. Higher the magnitude of the input power, higher the generated distortion power. Hence it can be seen that while the signal-to-noise ratio (SNR) could be increased by increasing the input signal power, the signal-to-distortion actually reduces with increasing input power. This is indeed very critical for the design of first filtering blocks in the signal chain when the blocker power still relatively unattenuated. Hence we should expect an optimal input signal power at which the overall signal-to-noise-and-distortion (SNDR) can be maximized. This value of SNDR denotes the available Spurious-Free-Dynamic-Range (SFDR) of the system.

Compared to general filtering applications, analog baseband section of a wireless receiver signal chain exhibits an important difference. The input to the analog baseband could contain the entire broadband spectral energy. The undesired neighboring channels (termed as out-band blockers) could be much higher than the design signal power by up to 45dB (as defined by $UD_{dB}$ or undesired-to-desired ratio in Section 2). The non-linearity terms generated with out-of-filter-band signals results in intermodulation (IM) with the output product components falling within the signal band corrupting the desired signal quality. Due to the presence of blocker power, analog baseband blocks must exhibit not only a desired in-filter-band DR but also good linearity for out-of-filter-band signals, especially the first baseband filtering block in the signal chain after signal down-conversion from the RF signal.

An example of the effect of distortion due to out-band blocker on the total signal dynamic range of distortion is shown in Fig. 4.2. The figure plots the desired signal power, blocker power, integrated noise power (input referred) and $3^{rd}$ order (IM3) distortion power (input referred) as the input signal increases on a log-log scale. The distortion power plotted in the figure is generated due to the intermodulation
of two un-filtered out-band blockers such that the intermodulation term is in-band. In this example, the integrated noise power is independent of the input power and is marked at -76dBm (or 40µVrms) which is a typical number for a low pass filter with the desired signal bandwidth in the MHz range. The $UD_{dB}$ is set to 30dB in Fig. 4.2. The linearity performance of the building block is specified at different third-order intercept points (IIP3) of +20 to +35dBm which is a typical number for a baseband low pass filter. It is well known that the distortion power due to the 3rd order inter-modulation increases with a slope of 3 dB/dB with the input power while the desired (or the fundamental) increases with a slope of 1 dB/dB [27]. As indicated in the figure, the distortion power rises out of the output noise floor at different input power levels depending on the linearity performance. Higher the IIP3, higher is the blocker power (and hence the signal power for a given $UD_{dB}$) at which the distortion power rises above the noise floor. Any increase in the blocker power beyond this will result in SNDR of the desired channel being limited by the distortion power, while any reduction in the signal power leads to drop in the SNR. Hence the over-all SFDR can be maximized if out-band linearity can be improved in a analog baseband filter.

Figure 4.2. Typical desired, blocker and intermodulation signal power for different IIP3 values.
for wireless receivers. In conclusion, we make the following observations relevant to
design of analog filters for wireless receivers,

- The minimum in band SNR of the system is set by the modulation scheme being
  employed by the communication link. For low desired input power levels, the
  system is limited by SNR.

- Distortion power generated due to large desired input power levels (in-band
  signals) is generally lower than the minimum tolerable SNDR of the system.

- Distortion power generated due to large out-band blockers residing in adjacent
  channels which could be $UD_{dB}$ higher than the desired power limit the SNDR
  performance of the system. Out-band linearity performance must be improved
  to tolerate large blockers powers to keep the distortion power below the output
  noise floor.

With the above introduction to out-band linearity constraints, we will present a first
order active-RC filter (lossy integrator) with improved out-band blocker tolerance in
the following sections. As we will observe, the improvement in performance arises
from re-designing the active transconductor ($G_m$) used for the integrator design. First
we present the design of the filter and analyze the loop gain response followed by the
schematic design of the transconductor. The simulated results from the design will
be presented along with the noise and distortion performance for out-band blockers.

4.2 Lossy Integrator Design

Desired high order analog baseband transfer functions can be realized using cas-
caded second-order responses. Tow-Thomas biquad structure using two integrators
in a feedback loop is a popular method to implement biquadratic transfer functions.
Two integrator biquad structure using active-RC integrators is shown in Fig. 4.3.
The loop contains a lossless integrator, a lossy integrator and a signal inversion to
guarantee negative feedback loop. The signal inversion can easily be achieved in a fully-differential implementation by crossing the differential signals. The resistor $R_1$ and $C_2$ are set based on the desired pole frequency ($\omega_p \text{ rad/s}$), while the resistor ratio ($R_2/R_1$) is set based on the quality factor ($Q_p$) of the pole. While operational amplifiers (Op-Amps) are used in the filter depicted in Fig. 4.3, OTAs can also be used as active gain stages since they provide an effective low output impedance in closed loop operation.

The non-linearity in the lossy integrator structure arises due to non-linear $V-I$ characteristics of the active transconductors used for building the OTA. Unlike Gm-C filters which rely on open-loop transconductors, active RC filters employ high gain transconductors in negative feedback. Application of negative feedback using a linear feedback network is the most common technique to reduce non-linear distortion. As noted in [43–45], while the nonlinearities of the feedback network are not suppressed by feedback, the distortion produced by the forward amplifier can be suppressed by a large loop gain if the feedback circuit is linear. The minimum loop gain at the corner frequency of the filter is set by the desired improvement in the linearity performance due to feedback. Hence we analyze the frequency dependent characteristics of the loop gain of a lossy integrator in the following section. The analysis also helps us identify the poles and zeros present in the loop to ensure stability of the

Figure 4.3. Two integrator loop for implementing second-order transfer functions.
negative feedback system. It should also be noted that effect of finite DC gain and finite gain-bandwidth (GBW) on the accuracy of $\omega_p$ and $Q_p$ in active RC filters has been studied extensively in the literature. Interested readers are referred to excellent texts [46].

4.2.1 Loop Gain of a Lossy Integrator Using a Feed-Forward OTA

In this section we analyze the small signal model and the loop gain of a lossy integrator designed using a feed-forward OTA. The feed-forward OTA structure offers several advantages over the Miller-OTA and has been widely used for design active RC filters [47–52].

A lossy integrator structure using a feed-forward OTA is shown in Fig. 4.4. The load resistor ($R_{\text{load}}$) value is equal to $R_1$ when it is used in the two-integrator loop biquad as indicated earlier in Fig. 4.3. In addition to $R_1$, $R_2$, $C_2$ and output load capacitance $C_o$, the schematic also includes the parasitic capacitor $C_p$ due to the input stage of the OTA. The internal small signal model of the OTA is also shown in the figure. A feed-forward OTA structure is composed of three transconductors ($g_{m1}$, $g_{m2}$, $g_{m3}$). Two transconductors are arranged in a cascaded fashion ($g_{m1}$ and $g_{m2}$), while the third stage ($g_{m3}$) feed-forwards the signal from the input to the output stage. Such arrangement of transconductors creates a zero which can be used to compensate a non-dominant pole in a feedback loop. The input capacitance of the OTA $C_p$ could be significant due to Miller-effect in the first stage and the feed-forward stage especially in the absence of a cascode device.

The loop gain response of the integrator can be analyzed as shown in Fig. 4.5. To simplify the analysis, the loop can be decomposed as a cascaded combination of frequency dependent OTA ($G_m$) and frequency dependent feedback network as shown. The equivalent output impedance ($r_o$) includes the parallel combination of $r_{o2}$, $r_{o3}$ and $R_{\text{load}}$. The impedances $z_o$, $z_1$ and $z_2$ are given as,
The transconductance $G_m$ is given by,

$$G_m = -(g_{m1}(r_{o1}||\frac{1}{sC_1})g_{m2} + g_{m3})$$

which can be simplified as,

$$G_m = -G_{m,dc}\frac{1 + \frac{s}{\omega_{z,gm}}}{1 + \frac{s}{\omega_{p,gm}}}$$

where $\omega_{p,gm}$ and $\omega_{z,gm}$ are the poles and zeros from the response of the transconductor. $G_{m,dc}$ is the low-frequency transconductance given by,
Figure 4.5. Loop gain for the lossy integrator using the OTA.

\[ G_{m,dc} = (g_{m1}r_o g_{m2} + g_{m3}) \]  

(4.6)

The output of the first stage contains a pole at \( \omega_{p, gm} \) given by,

\[ \omega_{p, gm} = \frac{1}{r_o C_1} \]  

(4.7)

The location of the feed-forward zero is given by,

\[ \omega_{z, gm} = \omega_{p, gm} (1 + \frac{g_{m1}r_o g_{m2}}{g_{m3}}) \]  

(4.8)

The frequency dependent loop-gain response can then be obtained as,

\[ LG(s) = \frac{v_{fb}}{v_x} = \frac{G_m r_o R_1}{(r_o + R_1 + R_2)} \left( 1 + \frac{s}{\omega_{z, RC}} \right) \left( 1 + a_1 s + a_2 s^2 \right) \]  

(4.9)

where

\[ \omega_{z, RC} = \frac{1}{R_2 C_2} \]  

(4.10)

and the coefficients \( a_1 \) and \( a_2 \) of the denominator biquadratic expression are,

\[ a_1 = \frac{r_o (R_1 C_p + R_2 C_2) + R_2 (r_o C_o + R_1 C_p) + R_1 (r_o C_o + R_2 C_2)}{r_o + R_1 + R_2} \]  

(4.11)
\[ a_2 = \frac{r_o(R_1C_pR_2C_2) + R_2(r_oC_oR_1C_p) + R_1(r_oC_oR_2C_2)}{r_o + R_1 + R_2} \] (4.12)

As it can be seen, the denominator is biquadratic despite the presence of 3 capacitors \((C_p, C_2, \text{and } C_2)\) since there are only two independent nodes \((v_x \text{ and } v_o)\). In order to get more insight and evaluate the approximate location of the two poles, we make the following observations:

- If the output of the gain block was instead modeled as a voltage amplifier (with a low output impedance), it can be easily shown that the output impedance and the load capacitance \((r_o \text{ and } C_o)\) would not appear in the loop gain expression. Instead the feedback factor would depend on \(z_2 \text{ and } z_1\). The resulting loop gain would contain a zero at \((R_2C_2)^{-1}\) and a pole at \(((R_1||R_2)(C_2 + C_p))^{-1}\\.

- In the OTA model, the output impedance \(r_o\) is a parallel combination of \(r_{o2}, r_{o3}, R_{load}\) and any resistor used for common-mode feedback sensing. When \(R_{load} = R_1\), this leads to \(r_o\) being less than both \(R_1\) and \(R_2\).

- The capacitance \(C_2\) is used for defining the pole location of the lossy integrator. Hence \(C_2\) is typically larger than \(C_p\) which is the parasitic input capacitance of the OTA. Unless the lossy integrator is driving a large capacitive load, the output capacitance \(C_o\) is also less than \(C_2\).

- The quadratic coefficients can be used to isolate the two roots when one of the roots is larger than the other.

With these assumptions, the quadratic expression leads to two simple roots given by,

\[ \omega_{p,RC} \approx \frac{1}{(R_2||(r_o + R_1))(C_2 + \frac{C_pC_o}{C_p + C_o})} \] (4.13)
Figure 4.6. Loop gain response for the lossy integrator using the feed-forward OTA.

\[ \omega_{p,C_p} \approx \frac{1}{(R_1||r_o)(C_p + C_o)} \]  
(4.14)

with \( \omega_{p,C_p} > \omega_{p,RC} \).

Using 4.5, 4.13 and 4.14, we can rewrite 4.9 as,

\[
LG(s) = \frac{v_{fb}}{v_x} = -\left[ \frac{G_{m,dc}r_oR_1}{(r_o + R_1 + R_2)} \right] \left( 1 + \frac{s}{\omega_{z,gm}} \right) \left( 1 + \frac{s}{\omega_{z,RC}} \right) \left( 1 + \frac{s}{\omega_{p,RC}} \right) \left( 1 + \frac{s}{\omega_{p,C_p}} \right) \]  
(4.15)

In summary, the OTA contributes a low frequency pole \( \omega_{p,gm} \) and a high frequency zero \( \omega_{z,gm} \) to the overall loop gain. The feedback network consisting of \( R_1 \) and \( R_2||C_2 \), create a zero-pole pair \( (\omega_{z,RC} \text{ and } \omega_{p,RC}) \) that track each other depending on the location of the filter pole. The parasitic capacitor at the input of the OTA along with the load capacitor \( C_o \) creates a pole \( (\omega_{p,C_p}) \). The loop-gain of the lossy integrator structure is shown in Fig. 4.6. In the figure, \( LG_0 \) is the DC loop gain,

\[
LG_0 = \frac{G_m r_o R_1}{(r_o + R_1 + R_2)} \]  
(4.16)
Also as indicated in the figure, the zero $\omega_{z,gm}$ can be suitably placed to cancel the non-dominant pole $\omega_{p,Cp}$ to achieve approximate pole-zero cancellation resulting in approximate first-order loop-gain response with $\omega_{p, gm}$ as the dominant pole.

4.3 Lossy Integrator Design Using the Proposed OTA

This section presents the design of two lossy integrators and a comparison of the conventional OTA with the proposed OTA used in the two integrators. First we present the integrator design, followed by details of the proposed and the conventional OTA. Simulation results to demonstrate the linearity improvement offered by the proposed OTA are presented in the subsequent section.

In this test prototype, the bandwidth of the lossy integrators is set to 10MHz. The schematic of the integrator is identical to the structure shown in Fig. 4.4. Passband gain is set to 1 (0dB) resulting in $R_1 = R_2$. The capacitance of the integrator ($C_2$) is set to 3.2pF, such that two 5kΩ resistors can be used for $R_1$ and $R_2$ for a 10MHz bandwidth. This choice of $R_1$, $R_2$ and $C_2$ enables us to obtain relatively moderate levels of integrated output noise while not presenting a large resistive load to the previous stages in the signal chain. The values of $R_1$, $R_2$ and $C_2$ are identical in the two integrators to ensure fair comparison of the two OTAs. Poly resistors and MIM (metal-insulator-metal) capacitors have been used to implement the passive components and their layouts are identical.

Conceptually, the difference between the conventional OTA and the proposed OTA is shown in Fig. 4.7. The structure of the conventional OTA [47–50] uses three fully-differential structures for implementing the three $g_m$ stages. In the proposed OTA, the fully differential second stage ($g_{m2}$) and the feed-forward stage ($g_{m3}$) have been replaced by complementary pseudo-differential stages. We refer to the conventional OTA as FFF (indicating three fully-differential stages) and the proposed OTA as FPP (indicating one fully-differential and two pseudo-differential stages) in the subsequent discussion. While the second-stage ($g_{m2}$) can be easily implemented as a
pseudo stage, the feed-forward stage cannot be isolated and implemented as a pseudo stage. In this work, we propose to decouple the DC biasing from the input nodes of the OTA to the feed-forward stage and *AC couple* the signal. As analyzed in 4.2.1, the feed-forward stage is useful in creating $\omega_{z,gm}$ to cancel the non-dominant pole in negative feedback loop. This means that an AC coupling arrangement is feasible as long as we can guarantee a stable loop gain response. As we will see, the main advantage of this structure is the increased headroom at the output delivering the much desired improvement in the linearity performance.

The schematic of the FFF OTA and the proposed FPP OTA are shown in Fig. 4.8 and Fig. 4.9 respectively. Both the OTAs implement identical transconductance values of $g_{m1} = 4mA/v$, $g_{m2} = 1mA/V$ and $g_{m3} = 2mA/V$ with identical bias levels (equal $g_m/I_D$ levels) and channel length (for same intrinsic $r_o$) for the active devices. The $g_m$ values results in a low-frequency transconductance $G_{m,dc} = 22mA/V$.

As it can be seen, the first fully-differential stage ($g_{m1}$) is identical in both the OTAs. The fully-differential NMOS input pair (M1n) is biased by the tail current source (M1cs) and acts as the first stage transconductor $g_{m1}$. The output small signal current is converted into a differential voltage ($V_{o1p}-V_{o1m}$) at the output of the first stage with a parallel combination of output impedances of the transistors (M1n and
Figure 4.8. Schematic of the conventional OTA.

Figure 4.9. Schematic of the proposed OTA.
M1p), additional parallel $RC$ network and the input parasitic capacitance of the second stage $g_{m2}$. The additional parallel $RC$ network ($10k\Omega||2.2pF$) at the output of the first stage helps to define a stable pole location $\omega_{p, gm}$ in the loop gain response and also sets the common-mode level at the output of the first stage. The 10$K$ resistor along with finite output impedance of M1n and M1p, yields effective output impedance $r_{o1} \approx 5K\Omega$. The 2.2$pF$ capacitance adds to the input capacitance of the second-stage and parasitic capacitance at the output. Effectively, this results in a nominal $\omega_{p, gm}$ around $2\pi (10MHz)$ (due to $\sim 5K\Omega||3pF$) and hence a feed-forward zero ($\omega_{z, gm}$) from the transconductor around $2\pi (110MHz)$.

In the FFF OTA design shown in Fig. 4.8, the output of the first stage is DC coupled to the input of the second stage implemented with the fully differential NMOS pair M2n. The feed-forward stage is implemented using the fully-differential NMOS pair (M3n). The gate of the feed-forward stage is DC coupled to the input nodes of the OTA (Vip, Vim) as shown. The load current source (Mc) is controlled using a common-mode feedback network. The details of the CMFB design will be addressed in the next section. The available single-ended output swing ($V_{se,swing,fff}$) for this OTA can be obtained based on highest ($V_{high,fff}$) and the lowest ($V_{low,fff}$) permissible voltages at the output node. Referring to Fig. 4.8,

$$V_{high,fff} < (V_{dd} - V_{dsat,Mc})$$ (4.17)

$$V_{low,fff} > Max((V_{dd} - V_{GS,M1p} - V_{GS,M2n} + V_{dsat,M2n}); (V_{incm} - V_{GS,M3n} + V_{dsat,M3n}))$$ (4.18)

which can be simplified assuming ($V_{GS} = V_T + V_{dsat}$) as,

$$V_{low,fff} > Max((V_{dd} - V_{GS,M1p} - V_{T,M2n}); (V_{incm} - V_{T,M3n}))$$ (4.19)
Hence we can obtain the available single-ended and differential output swings as,

\[ V_{se,swing,fff} = (V_{high,fff} - V_{low,fff}) \]  

(4.20)

\[ V_{diff,swing,fff} = 2V_{se,swing,fff} \]  

(4.21)

The conditions for the low side swing limits arise due to (1) DC coupling from the output of the first stage to the gates of the second stage and (2) DC coupling from input of the first stage to the feed-forward stage. As it can be seen, on the high side, the swing limit can be easily set, but for the low side the swing limit depends on the input common-mode \( V_{incm} \) and \( V_{GS,M1p} \) of the first stage. Under optimum biasing conditions, the gate bias voltages at the input of the second stage and the feed-forward stage can be \( \approx V_T + 2V_{dsat} \), reducing the low-side limitation to \( 2V_{dsat} \).

In summary, swing at the output node has to accommodate at least one \( V_{dsat} \) on the high side and \( 2V_{dsat} \) on the low side even with optimum biasing conditions.

In the FPP OTA design shown in Fig. 4.9, the output of the first stage is DC coupled to the input of the second stage implemented using a pseudo-differential PMOS pair (M2p). The feed-forward stage is implemented using a pseudo-differential NMOS pair (M3n). The gate of the feed-forward stage is AC coupled to the input nodes of the OTA (Vip, Vim) as shown. The DC biasing resistor and AC coupling capacitor are also shown in the schematic. Since part of the bias current for the feed-forward stage is already provided by M2p, the remaining bias current is provided with the transistor M2c which also helps to introduce common-mode feedback to control the output common-mode voltage. The details of the CMFB design in comparison to the FFF OTA CMFB design will be addressed in the next section. The available single-ended output swing for this OTA can be obtained based on highest \( V_{high,fpp} \) and the lowest \( V_{low,fpp} \) permissible voltages at the output node. Referring to Fig. 4.9,
\[ \text{Eq. 4.22} \quad V_{\text{high,fpp}} < (V_{\text{dd}} - V_{\text{dsat,M2c}}) \]

\[ \text{Eq. 4.23} \quad V_{\text{low,fpp}} > V_{\text{dsat,M3n}} \]

\[ \text{Eq. 4.24} \quad V_{\text{se,swing,fpp}} = (V_{\text{high,fpp}} - V_{\text{low,fpp}}) \]

\[ \text{Eq. 4.25} \quad V_{\text{diff,swing,fpp}} = 2V_{\text{se,swing,fpp}} \]

As it can be seen, the swing at the output node has to accommodate at least one \( V_{\text{dsat}} \) on the high side and one \( V_{\text{dsat}} \) on the low side.

The supply voltage for this design (in 0.18\( \mu \text{m} \) technology) is set to 1.8V (nominal). The output common-mode voltage is set to 1V using a CMFB loop with a reference voltage. The input common-mode signal of the integrator is also set to 1V. This along with the external loop of the integrator through \((R_2||C_2 \text{ and } R_1)\) bias the input of the OTA at 1V. As we will see in Section 4.4, to ensure fair comparison for linearity performance of the two designs, both the integrators can also be operated with supply voltage at 1.6V and common-mode voltage set to 0.9V.

The simulated frequency response of the integrators is shown in Fig. 4.10. The simulated differential loop gain of both the integrators is shown in Fig. 4.11. As shown both the integrators have identical frequency response with loop gain-bandwidth product (GBW) of around 250MHz which guarantees a gain of 25 at the 10MHz filter corner frequency. Using the results from [43], a loop gain of 25 offers an IIP3 improvement of \(30\log(1 + 25) = 42dB\) from the inherent open-loop IIP3 of the transconductor without feedback.
4.3.1 Dominant Sources of Non-Linearity

In this section we identify the dominant contributors to the output non-linearity in lossy integrator designs using the FFF and FPP OTAs. A description of the inherent nonlinearities of the pseudo- and fully-differential transconductors is presented. Then we evaluate the power of the $IM_3$ term when two tones at $f_1$ (Hz) and $f_2$ (Hz) are applied to the lossy integrator. The method is then used to find the out-band linearity performance of the lossy integrators built using both FFF and FPP OTAs. In the out-band linearity test, $f_1$ and $f_2$ are appropriately selected in the filter stop-band such that the low frequency $IM_3$ tone at $f_3 = 2f_1 - f_2$ is within filter bandwidth as shown in Fig. 4.12. The goal of the analysis is to evaluate the
Figure 4.11. Simulated differential loop gain response from the conventional OTA.

power of the tone at $f_3$ and also identify the dominant sources of contribution as a function of frequency.

4.3.1.1 Inherent Transconductor Non-linearities

Transconductors used for analog signal processing exhibit weakly non-linear behavior. This behavior can be mathematically expressed using a power series as,

$$y = a_1 u + a_2 u^2 + a_3 u^3$$  \hspace{1cm} (4.26)

where $y$ is the output for an input $u$. For a transconductor, $y$ is the output current for a given input voltage $u$. The coefficient $a_1$ is the transconductance, while $a_2$
and $a_3$ are the second- and third-order non-linearity coefficients which determine the second- and third-order nonlinear circuit behavior.

A MOSFET biased in the saturation region is commonly used as a transconductor. Using a square-law approximation, drain current ($i_{DS}$) of a MOSFET for an input voltage $v_{GS}$ is given by,

$$i_{DS} = I_{DS} + i_{ds} = \frac{\mu C_{ox}}{2} \left( \frac{W}{L} \right) (v_{GS} - V_T)^2 = \frac{\beta}{2} (v_{GS} - V_T)^2$$  \hspace{1cm} (4.27)

Using $v_{GS} = V_{GS} + v_{gs}$ and $V_{dsat} = V_{GS} - V_T$, we can evaluate $a_1$, $a_2$ and $a_3$ as,

$$a_1 = \beta V_{dsat}, \quad a_2 = \frac{\beta}{2}, \quad a_3 = 0$$  \hspace{1cm} (4.28)

Hence the single-ended MOSFET transconductor exhibits only second-order non-linearity. For a pseudo-differential transconductor, two such transistors are used (without a tail-current source) with a differential input voltage. The non-linearity coefficients for a pseudo-differential transconductor then are,

$$a_1 = \beta V_{dsat}, \quad a_2 = 0, \quad a_3 = 0$$  \hspace{1cm} (4.29)
Hence ideally the pseudo-differential transconductor does not exhibit either second- or third-order non-linearity with a simple square-law approximation. Taking mobility degradation due to vertical field into account, the $i_{DS}$ can be modeled as,

$$i_{DS} = I_{DS} + i_{ds} = \frac{\beta}{2} \frac{(v_{GS} - V_T)^2}{1 + \theta(v_{GS} - V_T)} \approx \frac{\beta}{2} (v_{GS} - V_T)^2 \left(1 - \theta(v_{GS} - V_T)\right)$$  \hspace{1cm} (4.30)

where $\theta$ is a fitting parameter (with $\theta_{nom} \approx 0.4 - 0.8V^{-1}$).

The non-linearity coefficients for a pseudo-differential transconductor then are,

$$a_1 = \beta V_{dsat} \left(1 - \frac{3\theta V_{dsat}}{2}\right), \quad a_2 = 0, \quad a_3 = -\frac{\theta \beta}{8}$$  \hspace{1cm} (4.31)

which can be rewritten as,

$$a_1 \approx \beta V_{dsat}, \quad a_2 = 0, \quad a_3 \approx -\frac{\theta}{8V_{dsat}}a_1$$  \hspace{1cm} (4.32)

where $a_3$ is expressed in terms $a_1$.

For a fully-differential transconductor (with a tail current-source $I_{tail}$), the output current can be obtained as,

$$i_o = g_m v_i \sqrt{1 - \left(\frac{v_i}{2V_{dsat}}\right)^2}$$  \hspace{1cm} (4.33)

which leads to,

$$a_1 = \frac{I_{tail}}{V_{dsat}}, \quad a_2 = 0, \quad a_3 = -\frac{1}{8} \frac{g_m}{V_{dsat}^2} = -\frac{1}{8V_{dsat}^2}a_1$$  \hspace{1cm} (4.34)

Hence it can be seen from (4.32) and (4.34) that for the identical $a_1$ (equal $g_m$),

$$\frac{a_{3, pd}}{a_{3, fd}} = \theta V_{dsat}$$  \hspace{1cm} (4.35)
For a $\theta_{nom} = 0.8V^{-1}$ and $V_{dsat} = 0.15V$, $\frac{a_{3,\text{pd}}}{a_{3,\text{fd}}} = 0.12$, which means that $a_{3,\text{pd}}$ is $\approx 8X$ smaller than $a_{3,\text{fd}}$.

Second order non-linearity coefficient $a_2 = 0$ in both pseudo- and fully-differential transconductors only in the absence of any mismatches. The presence of random offsets ($\Delta V_T$ and $\Delta \beta$) and systematic layout gradients gives rise to second-order non-linearity in both the transconductors. Second order non-linearity coefficient can produce a third-order distortion component when the non-linear device is placed in feedback [45]. This is applicable to feedback systems with a single-ended stage in the feedback loop (e.g. differential input, single-ended output stage Miller amplifier used in feedback). In our example, $a_3$ is dominant for both pseudo- and fully-differential transconductors but $a_2$ exists only due to mismatches. Although $a_2$ cannot be ignored under mismatch conditions, we do so in the following analysis to obtain insight into the frequency dependent non-linearity estimation originating due to $a_3$.

4.3.1.2 Frequency Dependent Non-Linearity Estimation

To find the relative non-linearity contributions from the each stage, we derive the transfer function from the filter input ($v_i$) to input of the each stage (virtual ground $v_x$ and node $v_{x2}$ in Fig. 4.4). An OTA (with transconductance $G_m$) in a negative feedback inverting configuration using admittances $Y_1$, $Y_2$ with a load $Y_o$ is shown in Fig. 4.13. As noted in [53], the transfer function ($H(s)$) from input ($v_i$) to output $v_o$ is given by,

$$H(s) = \frac{v_o}{v_i}(s) = -\frac{Y_1}{Y_2} \left( 1 + \frac{1 - \frac{Y_o}{G_m}}{\frac{Y_1 + Y_o + Y_1 Y_o / Y_2}{G_m}} \right)$$

(4.36)

If $G_m$ is sufficiently large, then $H(s)$ can be approximated as,

$$H(s) \approx -\frac{Y_1}{Y_2}$$

(4.37)
We can also obtain the transfer function from $v_i$ to $v_x$ as,

$$
H_{vx}(s) = \frac{v_x}{v_i}(s) = \frac{Y_1(Y_2 + Y_o)}{Y_2 G_m} \left( \frac{1}{1 + \frac{Y_1 + Y_o + Y_1 Y_o / Y_2}{G_m}} \right) \tag{4.38}
$$

Then using the feed-forward OTA topology (cf. Fig. 4.4) for the $G_m$, the transfer function from $v_i$ to $v_{x2}$ is,

$$
H_{vx2}(s) = \frac{v_{x2}}{v_x}(s) = H_{vx}(s) A_1(s) \tag{4.39}
$$

where $A_1(s)$ is the gain of the first stage given by,

$$
A_1(s) = \frac{A_{1,dc}}{(1 + s/\omega_{p,gm})} = \frac{g_m r_o}{R} \frac{1}{1 + s/\omega_{p,gm}} \tag{4.40}
$$

with $A_{1,dc} = g_m r_o$ being the DC gain of the first stage of the OTA. For a lossy integrator shown in Fig. 4.4, $Y_1 = 1/R_1$, $Y_2 = (1 + sR_2C_2)/R_2$ and $Y_o = (1 + sr_oC_o)/r_o$. Hence the filter response can be approximated using (4.37) as,

$$
H(s) \approx -\frac{R_2}{R_1} \frac{1}{1 + s/\omega_{p,filter}} \tag{4.41}
$$

where $\omega_{p,filter} = (R_2C_2)^{-1}$.

The simulated magnitude response of $H(s)$, $H_{vx}(s)$ and $H_{vx2}(s)$ are shown in Fig. 4.14. The responses can be qualitatively explained in three separate (low-, mid-, and high-frequency) ranges.

In the low-frequency range ($\omega \ll \omega_{p,filter}$ and $\omega \ll \omega_{p,gm}$) each capacitor can be approximated by an open-circuit. Assuming $R_1 = R_2 = R$ (as in our design), (4.37)(4.38)(4.39) can be simplified as,

$$
|H_{dc}| \approx 1 \tag{4.42}
$$

$$
|H_{vx,dc}| \approx \frac{1}{G_m(r_o|R|)} \frac{1}{(1 + \frac{s}{\omega_{p,filter}})} = \frac{r_o + R}{r_o + 2R} \frac{2}{2 + |A_{dc}|} \tag{4.43}
$$
Figure 4.13. OTA in feedback using admittances $Y_1$ and $Y_2$ and driving a load $Y_o$.

Figure 4.14. Simulated transfer functions from filter input to $v_o$, $v_x$ and $v_{x2}$.

where $A_{dc}$ is the open-loop DC gain of the OTA given by,

$$|A_{dc}| = G_m (r_o || 2R)$$

(4.44)
and assuming $|A_{dc}| \geq 2$, we get

$$|H_{vx,dc}| \approx \frac{r_o + R}{r_o + 2R} \frac{2}{|A_{dc}|}$$

(4.45)

where $A_{dc} = A_{1dc}A_{2dc}$ is assumed with $A_{2dc}$ being the DC gain of the second stage. This leads to the expected low-frequency magnitude response shown Fig. 4.14. Essentially, at low-frequency $|H_{vx}|$ is inversely proportional to $A_{dc}$ and $|H_{vx2}|$ is inversely proportional to $A_{2dc}$.

In the high-frequency range (when $|z_{C2}|$ is small), then $v_x$ and $v_o$ are effectively *shorted-out*, leaving the feed-forward stage $g_{m3}$ diode-connected. Then we can obtain $H_{vx}(s)$ and $H(s)$ as,

$$H_{vx}(s) \approx H(s) \approx \frac{(r_o||\frac{1}{g_{m3}})}{((r_o||\frac{1}{g_{m3}}) + R_1)} \frac{1}{(1 + s(r_o||R_1)||\frac{1}{g_{m3}})(C_p + C_o)}$$

(4.46)

and

$$H_{vx2}(s) = H_{vx}(s) \frac{A_{1,dc}}{(1 + s/\omega_{p,gm})}$$

(4.47)

Hence $|H_{vx2}|$ rolls-off faster with frequency compared to $|H_{vx}|$ and $|H|$ which exhibit a single-pole high-frequency response.

In the mid-frequency range ($\omega > \omega_{p,gm}$ and $\omega > \omega_{p,filter}$), $C_1$ (at the output of the first stage) and $C_2$ (filter capacitor) affect $H_{vx}(s)$ and $H_{vx2}(s)$. In this design, $\omega_{p,filter} \approx \omega_{p,gm} \approx 2\pi(10MHz)$. To simplify the algebraic expressions and get a better understanding of the frequency response, we set $R_1 = R_2 = r_o = R$, $C_p = C_o = 0$, and $G_m = -\frac{g_{m2}}{2C_1}$ ($g_{m2} + g_{m3}$) (assuming that the first stage behaves like an integrator). Then we obtain,

$$H_{vx}(s) = \frac{2}{g_{m2}R} \frac{sC_1}{g_{m1}} \frac{(1 + sR)}{(1 + s(R + C_2)) + s^2C_1C_2(\frac{2g_{m3}R}{g_{m1}g_{m2}}))}$$

(4.48)
This response has 2 zeros and 2 poles. The first zero located at origin is due to our assumption of an ideal integrator response from the first stage. This zero would be shifted to $\omega_{p, gm}$ assuming a finite gain in the first stage. The second zero is located at $\omega_{z, vx} = (2/R C_2)^{-1} = 2\omega_{p, filter}$ is due to the feedback network. The quadratic denominator is due to the interaction of $C_1$ and $C_2$. The poles can be estimated (numerically) for the $R = 5K$, $C_1 = 3pF$, $C_2 = 3.2pF$ and $g_m$ values used in this design. The two poles are at $\omega_{p1vx} = 2\pi(9.8MHz)$ and $\omega_{p2vx}2\pi(89.3MHz)$. As it can be seen the first zero (at $\approx \omega_{p, gm}$) cancels with $\omega_{p1vx}$, but the gain $H_{vx}(s)$ increases at $+20dB/decade$ after $\omega_{z, vx}$ until the next pole at $\omega_{p2vx}$ providing the high-pass behavior in the the mid-frequency range.

The qualitative understanding of $H_{vx}(s)$ and $H_{vx2}(s)$ responses and simulated results shown in Fig. 4.14 helps us to gain insight and identify the dominant sources of non-linearity in the mid-frequency range as we will see in the next section.

4.3.1.3 Output Non-Linearity Due to $g_{m1}$, $g_{m2}$ and $g_{m3}$ in Mid-Frequency Range

The two-tones located at $\omega_1 = 2\pi f_1$ and $\omega_2 = 2\pi f_2$ applied at the filter input $v_i$ reach $v_x$ with magnitudes $|v_x(j\omega_1)|$ and $|v_x(j\omega_2)|$. As a result, the transconductor in the first stage ($g_{m1}$), produces a non-linear output current component $i_{gm1\_im3}$ at $\omega_3 = (2\omega_1 - \omega_2)$ given by,

$$i_{gm1\_im3} = +\frac{3}{4}a_{3\_gm1} |v_x(j\omega_1)|^2 |v_x(j\omega_2)| \quad (4.49)$$

where $a_{3\_gm1}$ is the third-order non-linearity coefficient of the Taylor-series approximation for the transconductor and has a negative sign for compressive non-linearity. $i_{gm1\_im3}$ is converted to a voltage $v_{x2\_gm1\_im3}$ given by,

$$v_{x2\_gm1\_im3} = i_{gm1\_im3} |z_{o1}(j\omega_3)| \quad (4.50)$$

where $z_{o1}$ is the impedance at the output of the first stage given by,
\[ z_{o1}(s) = r_{o1} \left| \frac{1/sC_1}{1 + sr_{o1}C_1} \right| = \frac{r_{o1}}{1 + s/\omega_{p, gm}} \]  

\[ v_{x2 \_ gm1 \_ im3} \] produces an output current from \( g_{m2} \) given by,  

\[ i_{o \_ gm1 \_ im3} = -g_{m2} v_{x2 \_ gm1 \_ im3} \]  

\( i_{o \_ gm1 \_ im3} \) is injected into the output node of the filter. Hence the filter output voltage is,  

\[ v_{o \_ gm1 \_ im3} = i_{o \_ gm1 \_ im3} \left| \frac{z_{open}(j\omega_3)}{1 + LG(j\omega_3)} \right| \]  

where  

\[ z_{open}(s) = z_o(s) \left| z_1(s) + z_2(s) \right| \]  

Combining (4.49), (4.50), (4.52), and (4.53), we get  

\[ v_{o \_ gm1 \_ im3} = -\frac{3}{4} a_{3 \_ gm1} \left| v_x(j\omega_1) \right|^2 \left| v_x(j\omega_2) \right| \left| z_{o1}(j\omega_3) \right| \left| g_{m2} \right| \left| z_{open}(j\omega_3) \right| \left| 1 + LG(j\omega_3) \right| \]  

Similarly, we can obtain the non-linear output current component due to \( g_{m3} \) as,  

\[ i_{gm3 \_ im3} = -\frac{3}{4} a_{3 \_ gm3} \left| v_x(j\omega_1) \right|^2 \left| v_x(j\omega_2) \right| \]  

The negative sign is due to the inversion due to the transconductor itself. \( a_{3 \_ gm2} \) is also negative (for compressive non-linearity). \( i_{gm3 \_ im3} \) produces an output voltage given by,  

\[ v_{o \_ gm3 \_ im3} = i_{o \_ gm3 \_ im3} \left| \frac{z_{open}(j\omega_3)}{1 + LG(j\omega_3)} \right| \]  

Using (4.56), we get,
\[ v_{o\_gm3\_im3} = -\frac{3}{4}a_{3\_gm3} |v_x(j\omega_1)|^2 |v_x(j\omega_2)| \left| \frac{z_{open}(j\omega_3)}{1 + LG(j\omega_3)} \right| \] (4.58)

Likewise, the non-linear output current term due to \( g_m \) is,

\[ i_{o\_gm2\_im3} = -\frac{3}{4}a_{3\_gm2} |v_{x2}(j\omega_1)|^2 |v_{x2}(j\omega_2)| \] (4.59)

The output voltage then is given by,

\[ v_{o\_gm2\_im3} = i_{o\_gm2\_im3} \left| \frac{z_{open}(j\omega_3)}{1 + LG(j\omega_3)} \right| \] (4.60)

Combining (4.59), and (4.60), we get,

\[ v_{o\_gm2\_im3} = -\frac{3}{4}a_{3\_gm2} |v_{x2}(j\omega_1)|^2 |v_{x2}(j\omega_2)| \left| \frac{z_{open}(j\omega_3)}{1 + LG(j\omega_3)} \right| \] (4.61)

The total output \( IM3 \) contribution from the total \( IM3 \) powers combined from (4.55), (4.58) and (4.61).

4.3.1.4 Comparison of Results and Summary

The computed out-band \( IM3 \) power is compared with Cadence Spectre simulation results. The frequency of the first tone \( f_1 \) is swept in the frequency range from 10-140MHz, and \( f_2 \) is appropriately adjusted such that \( f_3 = 2f_1 - f_2 \) is at the pass-band edge. Input power to the tones is set to -10dBm. Comparison of the estimated and the Spectre simulation results is shown in Fig. 4.15 for the FFF design and Fig. 4.16 for the FPP design respectively. \( \theta = 0.8 \) is used for numerical computation of non-linearity from the pseudo-transconductor in the FPP design.

Based on Fig. 4.14, Fig. 4.15 and Fig. 4.16, we make the following observations:
Figure 4.15. Power of $IM_3$ tone at $f_3$ due to two -10dBm out-band tones at $f_1$ and $f_2$ for the FFF design.

Figure 4.16. Power $IM_3$ tone at $f_3$ due to two -10dBm out-band tones at $f_1$ and $f_2$ for the FPP design.
• Computed non-linearity contribution from $g_{m1}$ is identical in both the FPP and FFF designs since the loop-gain response is identical in both lossy integrators. Inherent non-linearity from $g_{m2}$ and $g_{m3}$ are lower in FPP design (due to pseudo-differential implementations) compared to the FFF design as predicted by (4.35).

• In the low-frequency range and within the filter passband, non-linearity from $g_{m2}$ will dominate compared to non-linearity arising from the $g_{m1}$ and $g_{m3}$ in both FFF and FPP designs since the input signal to $g_{m2}$ is larger due to gain of the first stage. In the FFF design, $g_{m2}$ is fully-differential while it is pseudo-differential in the FPP design. Hence there is an improvement in the linearity performance within the filter passband and also at the edge of the passband.

• In the mid-frequency range, either $g_{m1}$ or $g_{m2}$ could dominate depending on their relative contributions. In the FPP design, the inherent contribution from $g_{m2}$ is reduced compared to FFF design. Hence the non-linearity contribution from $g_{m1}$ rises above the $g_{m2}$ contribution around 35MHz. In the FFF design, the inherent $g_{m2}$ contribution is much higher and hence the non-linearity contribution from $g_{m1}$ rises above the $g_{m2}$ contribution at a higher frequency around 70MHz.

• In the high frequency range, $g_{m1}$ is the dominant source of non-linearity contribution in both the OTAs. The amplitude of the signal at the input of $g_{m1}$ and $g_{m3}$ are identical. But the nonlinear output from $g_{m1}$ located within the filter passband at $f_3$ is amplified by the second stage, while the contribution from $g_{m3}$ does not experience such gain. Since $g_{m1}$ has an identical implementation in both FFF and FPP designs, the computed and simulated non-linearity contributions are identical between 100-140MHz.
Figure 4.17. Ratio of the non-linearity contribution from $g_{m2}$ to $g_{m1}$ in both FFF ($g_{m1}$ FD, $g_{m2}$ FD) and FPP ($g_{m1}$ FD, $g_{m2}$ PD) designs.

To compare the relative contributions of $g_{m1}$ and $g_{m2}$ stages to output non-linearity in the mid-frequency, we compute the ratio $(v_{o,gm2_{im3}}/v_{o,gm1_{im3}})$. Using (4.55) and (4.61) and algebraic simplification we get,

$$
\frac{v_{o,gm2_{im3}}}{v_{o,gm1_{im3}}} = \frac{a_{3,gm2}}{a_{3,gm1} g_{m2}} \frac{|v_{x2}(j\omega_1)|^2 |v_{x2}(j\omega_2)|}{|v_{x}(j\omega_1)|^2 |v_{x}(j\omega_2)| |z_{o1}(j\omega_3)|} \tag{4.62}
$$

Recognizing that $v_{x2}(j\omega) = g_{m1}z_{o1}(j\omega)v_{x}(j\omega)$, we can simplify (4.62) to,

$$
\frac{v_{o,gm2_{im3}}}{v_{o,gm1_{im3}}} = \frac{a_{3,gm2} g_{m1}^3}{a_{3,gm1} g_{m2}} \frac{|z_{o1}(j\omega_1)|^2 |z_{o1}(j\omega_2)|}{|z_{o1}(j\omega_3)|} \tag{4.63}
$$

This can be further simplified using (4.51) as,

$$
|\frac{v_{o,gm2_{im3}}}{v_{o,gm1_{im3}}}| = \frac{a_{3,gm2} g_{m1}}{a_{3,gm1} g_{m2}} A_{1,dc}^2 \frac{|1 + \frac{j\omega_3}{\omega_{p,gm}}|}{|1 + \frac{j\omega_1}{\omega_{p,gm}}|^2 |1 + \frac{j\omega_2}{\omega_{p,gm}}|} \tag{4.64}
$$
(4.64) predicts the relative amplitudes of the non-linearity contributions at $\omega_3$ from two out-band tones located at $\omega_1$ and $\omega_2$. When $\omega_1$ is swept in the stop band of the filter, $\omega_2$ is suitably adjusted such at $\omega_3$ is at the edge of the passband. Hence $\omega_2 = 2\omega_1 - \omega_3$. The ratio in (4.64) is plotted in Fig. 4.17. At frequencies before the $0dB$ crossing of the ratio, $g_{m2}$ non-linearity dominates compared to $g_{m1}$ non-linearity and vice versa. $a_{3\_gm1}$ is identical in both FFF and FPP designs. But $(a_{3\_gm2})_{FPP} < (a_{3\_gm2})_{FFF}$ due to pseudo implementation as predicted by (4.35). Hence the curve for FPP (with $g_{m1}FD$, $g_{m2}PD$) crosses the $0dB$ curve at a lower frequency compared to FFF design.

In addition, integrator designed with the FPP OTA offers higher output swing than the FFF OTA. As noted in [44], the output impedance and the non-linearity contribution from output impedance depends on the over drive voltage ($V_{ds} - V_{dsat}$) of the transistor. Hence for a given output swing, contribution from the $r_o$ non-linearity in the FPP OTA will be lower than the FFF OTA. This is really important for large out-band blockers located near the filter passband. In conclusion, integrator designed with the FPP OTA improves the out-band non-linearity contribution in the critical mid-frequency range. High frequency blocker power could be attenuated using an additional passive $RC$ filter in the input path as done in Section 3. For a given loop gain response, inherent non-linearity contribution from $g_{m1}$ has to be optimized to improve the out-band linearity performance in the high frequency range.

### 4.3.1.5 Design Guidelines for Improving Out-Band Linearity Performance

Out-band non-linearity contributions by $g_{m1}$ and $g_{m2}$ depend on the input signal level to the transconductors at the intermodulating blocker frequencies. While the low-frequency (or DC) values of $|H_{vx}|$ and $|H_{vx2}|$ can be reduced by increasing the DC gains of either the first or the second stages, this does not help the high-frequency linearity performance. To improve the out-band non-linearity performance, essentially the $GBW$ of the loop-gain must be increased to reduce the
signal swing at $v_x$ and $v_{x_2}$ at blocker frequencies. The loop $GBW$ can be increased by increasing either $g_{m1}$ or $g_{m2}$. Increasing $g_{m2}$ will increase the gain of the second-stage for a given loading conditions assuming that the passive element values ($R_1$, $R_2$ and $C_2$ for the lossy integrator) are already fixed by the filter requirements (frequency and noise performance). Increasing the gain of the second-stage reduces the signal swing at the input of both the first- and second-stages and results in improved linearity performance. Linearity performance can also be optimized under the constraint of fixed overall transconductance ($G_{mdc} = (g_{m1}r_{o1}gm_2 + g_{m3}) \approx (g_{m1}r_{o1}gm_2)$) by appropriately choosing $g_{m1}$ and $g_{m2}$. This optimization depends on the inherent transconductor non-linearity (set by $a_{3_{-gm1}}$ and $a_{3_{-gm2}}$) and their relative frequency dependent contributions as predicted by (4.64).

4.3.2 Common-Mode Feedback (CMFB) Design

To understand the common-mode loop behavior, we first recognize that there exist two common-mode loops in the integrator design using a feed-forward OTA as shown in Fig. 4.18. First loop is through the passive network ($R_2||C_2$ and $R_1$) and the OTA. Second loop is through the CMFB network which is added intentionally inside the OTA to set the output common-mode level. In first loop, two paths exist through the OTA for common-mode signals in both the OTAs. Referring to Fig. 4.8 and Fig. 4.9, we note the following to contrast the difference between the FFF and FPP OTA designs.

In the FFF OTA,

- The first path has a total positive gain (non-inverting) through the cascade of the first ($g_{m1}$) and the second ($g_{m2}$) stages. Both the first stage and the second stages are degenerated by their respective tail current sources (M1cs, M2cs) for common-mode signals and offer a small positive low-frequency gain (non-inverting) when cascaded.
Figure 4.18. Common-mode feedback loops in the integrator.

- The second path has a total negative gain (inverting) through the feed-forward stage ($g_{m3}$). The feed-forward stage is also degenerated by the current source (M3cs) for common-mode signals and offers a small negative low-frequency gain (inverting).

- At frequencies higher than the corner frequency of the integrator (when $|z_{C2}| < R_2$), M3n appears like a diode-connected device. But the tail current source M3cs still degenerates the device still maintaining a high impedance common-mode signals increasing the common-mode impedance at the output node.

In the proposed OTA,

- The first path has a total positive gain (non-inverting) through the cascade of the first ($g_{m1}$) and the second ($g_{m2}$) stages. The first stage is degenerated by the current source (M1cs) and offers a small low-frequency gain. Since the second-stage is pseudo-differential, it offers a higher negative common-mode gain (inverting) compared to the fully-differential stage in the FFF OTA.
Figure 4.19. Common-mode feedback schematic for the conventional OTA.

- The gain through the second path through the feed-forward stage ($g_{m3}$) is negative (inverting). Since the feed-forward is a pseudo-differential stage that directly connects input to output, it offers a higher negative common-mode gain compared to a fully differential stage.

- At frequencies higher than the corner frequency of the integrator (when $|z_{C2}| < R_2$) and when the AC coupling capacitor is effectively shorted-out, the feed-forward stage through M3n appears like a grounded diode-connected device reducing the common-mode impedance at the output node.

In summary, we establish that (1) the negative gain offered by the OTA for CM signals is higher in the FPP OTA compared to the FFF OTA, and (2) the common-mode impedance at the output node is lower for the FPP OTA compared to FFF OTA.

The schematic of the CMFB blocks in each of the OTAs are shown in Fig. 4.19 and Fig. 4.20. The output common-mode is sensed through a parallel $RC$ combination. The error amplifier compares the output common-mode to the reference voltage and
Figure 4.20. Common-mode feedback for the proposed OTA.

injects a correction signal into the output node using the PMOS current source Mc in the FFF OTA and M2c in the FPP OTA. The advantage of having a common-mode low impedance in the FPP design can be used increase the low frequency common-mode gain by using a high-gain stage in the error amplifier. Hence a mirror stage is used in the FFF OTA CMFB network to convey the error signal while a high gain stage is used in the FPP OTA CMFB network. Despite the difference in the common-mode behavior of the two OTAs, both the OTAs have been designed to have a stable and equal common-mode loop bandwidth to stabilize the output common-mode level. The simulated step-response of CM loops in both the OTAs is shown in Fig. 4.21.

4.4 Noise and Distortion Performance Results

The design and layout of the prototype chip with the two integrators has been completed using TSMC 0.18µm RFCMOS technology and submitted for fabrication. The layout of the test chip is shown in Fig. 4.22 with both the the integrators
The integrators consume approximately 0.105 mm$^2$ each. This excludes the area of the on-chip decoupling capacitor. The additional area due to AC coupling capacitor and resistors in the FPP design is 0.0072 mm$^2$ and is negligible as it can be seen in the layout (6.8%). Majority of the area in both the integrators is due to the inter-digitized MIM capacitors used for setting the filter corner frequency. The FFF and FPP OTA cores shown in Fig. 4.8 and Fig. 4.9 consume 1.4mA and 1mA respectively. Both the integrators operate from a 1.8V supply and consume $\approx 2mA$ including CMFB and additional biasing circuits.

The simulated output spot noise spectrum for both the integrators is nearly identical and is shown in Fig. 4.23. The flicker noise component is due to NMOS fully differential input stage ($g_{m1}$) and can be reduced by employing a PMOS input stage based OTA design. The mid-band spot noise is predominantly due to the resistors ($R_1 = 5K$ and $R_2 = 5K$) used in the integrator. As expected, four 5K resistors (fully-differential design) contribute a thermal noise of $17.8nV/\sqrt{Hz}$. Both the integrators have an integrated noise of $71\mu V_{rms}$ (which is $-70dBm$ using 50$\Omega$ reference) in a noise bandwidth of $\pi/2(f_{-3dB}) = 15.7MHz$ for a $f_{-3dB} \approx 10MHz$. 

**Figure 4.21.** Step response of the CMFB loop.
Figure 4.22. Layout of the two filters.

The simulated distortion performance for both the integrators is shown in Fig. 4.24. Two un-filtered blocker tones at the band-edge at $f_1 = 10.1 MHz$ and $f_2 = 11.3 MHz$ are applied to the integrator and the in-band IM3 distortion tone at $f_{IM3low} = 2f_1 - f_2 = 890 Hz$ is observed. The figure shows the peak blocker power per each tone used for the simulation and the power of the in-band distortion tone power. The simulation test bench includes identical package and board parasitics for both the integrators. As it can be seen in the figure, the proposed OTA improves the linearity performance (higher IIP3). This results in higher blocker tolerance. The integrated output noise floor is also indicated in the figure. The plot indicates that when the 3rd order distortion power reaches the output noise floor, the FPP OTA design can tolerate +6dB higher blocker input power than the conventional OTA. For the FFF design, output IM3 tone reaches $-70 dBm$, when each blocker tone is at $-4.5 dBm (190 mV pp)$, while for the FPP design output IM3 tone reaches $-70 dBm$ when each blocker tone is at $+1.5 dBm (375 mV pp)$. 
Figure 4.23. Simulated output noise from the two integrators.

Figure 4.24. Simulated $IM_3$ tone power and IIP3 performance at the passband edge with $f_1 = 10.1MHz$ and $f_2 = 11.3MHz$. 
A common figure-of-merit (FoM) used for comparing analog filters is given by,

\[
FoM = \frac{P_q}{N_{poles} \cdot BW \cdot SFDR_{lin}}
\]  

(4.65)

where \(P_q\) is the quiescent power consumption, \(N_{poles}\) is the number of poles implemented in the filter, \(BW\) is the bandwidth and \(SFDR_{lin}\) is defined as,

\[
SFDR_{lin} = 10^{\frac{SFDR_{dB}}{10}}
\]

(4.66)

with,

\[
SFDR_{dB} = \frac{2}{3}(IIP3_{dBm} - P_{noise,dBm})
\]

(4.67)

for a given IIP3 (third-order intercept) in dBm and input referred integrated noise power in dBm. This is different from using dynamic range (\(DR\)) in the FoM instead of \(SFDR_{lin}\). Several reported publications use \(DR\) as the ratio of in band power for 1% output THD to noise floor. We use the FoM definition with \(SFDR_{dB}\) as defined above using out-band \(IIP3_{dBm}\) since this is the most relevant for filters for wireless radios. A detailed comparison table with previously reported filters is indicated in Table 4.1. The two lossy integrators based on FFF and FPP OTAs are compared with 9 previously reported active RC filters that implement different filter approximations (Butterworth, Bessel, Elliptic and Chebyshev). The type of the amplifier (or the active device) used in the implementation of the filter is also indicated in the table. As it can be seen from the table, the proposed lossy integrator structure using the FPP OTA has an excellent FoM and offers a power-efficient way to improve out-band linearity performance and hence dynamic range of active-RC filters.
4.5 Conclusion

This Section highlighted the impact of out-band non-linearity performance of a baseband filter on the overall SFDR in an analog baseband wireless receiver chain. A new feed-forward OTA structure suitable for use in active-RC filter design has been presented. Two lossy integrator structures (first one using the conventional structure and second one using the proposed OTA) designed with same loop gain and loading conditions serve as test vehicles to demonstrate the linearity improvement. The proposed chip has been designed using the TSMC 0.18\textmu m RFCMOS technology. A detailed comparison table indicates that the lossy integrator structure using the proposed FPP OTA has an excellent \textit{FoM} and the additional linearity improvement is achieved without significant cost in power or silicon area.
Table 4.1
Comparison of the filter performance with published results.

<table>
<thead>
<tr>
<th>Filter BW (MHz)</th>
<th>Order</th>
<th>Int. noise (µVrms)</th>
<th>IIP3 (dBm)</th>
<th>SFDR (dB)</th>
<th>Power (mW)</th>
<th>FoM (fJ)</th>
<th>CMOS Node (µm)</th>
<th>Additional Comments</th>
</tr>
</thead>
<tbody>
<tr>
<td>44-300</td>
<td>5</td>
<td>1300</td>
<td>21</td>
<td>43.8</td>
<td>54</td>
<td>10.2</td>
<td>0.18</td>
<td>Active-RC with Feed-forward OTA</td>
</tr>
<tr>
<td>25</td>
<td>5</td>
<td>42</td>
<td>28.8</td>
<td>68.9</td>
<td>22</td>
<td>0.023</td>
<td>0.18</td>
<td>Active RC with Feed-forward OTA</td>
</tr>
<tr>
<td>20</td>
<td>5</td>
<td>980</td>
<td>27</td>
<td>49.4</td>
<td>4.5</td>
<td>0.51</td>
<td>0.18</td>
<td>Active RC with helper OTA</td>
</tr>
<tr>
<td>2</td>
<td>5</td>
<td>80</td>
<td>33</td>
<td>68</td>
<td>4.9</td>
<td>0.078</td>
<td>0.18</td>
<td>Active RC with adaptive biasing</td>
</tr>
<tr>
<td>1-20</td>
<td>5</td>
<td>232</td>
<td>8</td>
<td>57.1</td>
<td>7.5</td>
<td>0.146</td>
<td>0.13</td>
<td>Active RC with Netsted Miller</td>
</tr>
<tr>
<td>2.1</td>
<td>4</td>
<td>36</td>
<td>31</td>
<td>71.2</td>
<td>14.2</td>
<td>0.127</td>
<td>0.13</td>
<td>Active Gm-RC</td>
</tr>
<tr>
<td>10</td>
<td>4</td>
<td>24</td>
<td>17.5</td>
<td>64.6</td>
<td>4.1</td>
<td>0.036</td>
<td>0.18</td>
<td>Source-follower based structure</td>
</tr>
<tr>
<td>10</td>
<td>5</td>
<td>453</td>
<td>21.3</td>
<td>50.1</td>
<td>6.1</td>
<td>1.19</td>
<td>0.12</td>
<td>Active RC</td>
</tr>
<tr>
<td>19.7</td>
<td>5</td>
<td>133.2</td>
<td>18.3</td>
<td>55.2</td>
<td>11.3</td>
<td>0.345</td>
<td>0.13</td>
<td>Active RC with Feed-forward OTA</td>
</tr>
</tbody>
</table>

**FFF Design** 10 1 71 27    65.2 2.6 0.088 0.18  Active RC with Feed-forward OTA

**FPP Design** 10 1 71 37    71.9 2.9 0.022 0.18  Active RC with Feed-forward OTA

---

*Reported number for the lowest bandwidth setting*

*Out-band IIP3 or IIP3 reported/simulated at the band-edge*

*In-band IIP3, since IIP3 at bandedge not reported*
5. ADAPTIVE BANG-BANG CLOCK-DATA-RECOVERY

This Section begins with a brief introduction to clock and data recovery (CDR) in high speed serial link applications. The introduction leads to a review of the popular analog PLL based CDR system followed by discussion of the timing-non idealities and jitter performance indicators in a typical CDR system. A review of the jitter performance requirements paves way to possibility of system optimization solutions using adaptive loop-gain CDR solutions. This leads to the description of an adaptive bang-bang CDR solution discussed in this thesis.

The goal of the adaptive solution is to optimize the jitter filtering and jitter transfer trade-off especially when the input jitter profile is not known apriori. The loop gain adaptation based on the input jitter profile is performed using an efficient mixed-mode solution in this work. Focus of this Section is the design of digital logic blocks in the loop gain control mechanism. Architecture of the adaptive digital section will be presented first followed by implementation details. Design of the digital state-machine along with timing information required to enable the charge-pump control and loop adaptation are also described. Output driver circuits for real-time monitoring of the loop bandwidth control code on the logic analyzer and the input amplifier required to interface the CDR with the external equipment are also presented.

The content presented in this Section is based on the work done in collaboration with and supports the work presented in [61]. Hence the reader is referred to [61] for additional material regarding the analog section of the predictor that complements this Section. Two prototypes operating at 10Gb/s have been designed using TSMC RFCMOS 0.18µm technology for this research. The measurement results from the first prototype are included in this Section. The second design is currently under fabrication. Conclusions from this work are drawn at the end.
5.1 High Speed Serial Link system

5.1.1 Introduction

Digital data transfer rates continue to increase fueled by consumer and enterprise demand for *cloud computing* with audio, video and other multimedia content residing in the cloud. The new social media and cloud computing trends have led to steady growth in high speed data networking and storage infrastructure markets. On the technology side, research initiatives have been looking for efficient solutions for shuttling data between computing hubs such as microprocessors and memory, network routers and storage devices. High speed serial links are commonly used in such applications to transfer data over optical and band-limited electrical electrical channels. For this reason, high performance serial links capable of data transfer at several Gbps continue to receive significant attention in the research community.

A general serial link system is shown in Fig. 5.1. It contains a transmitter which couples a high speed data stream to the channel and a receiver which recovers the original data stream. On the transmitter side, the parallel data is multiplexed to a high speed serial data stream. This high speed serial data stream is transmitted directly using a transmit driver which couples the signal to the channel. Due to finite non-idealities of the channel, the data stream at the input of the receiver is a poor-replica of the transmitted signal. The receiver may perform channel equalization.
to compensate for the non-idealities of the channel and reconstruct the signal. This allows demultiplexing of the high speed serial data stream into its original constituent parallel bit streams.

Two commonly used options for transmitting the timing-information along with the data are classified as (1) Forwarded Clock Architectures, and (2) Embedded Clock architectures. In forwarded clock architectures, a separate channel is used in parallel to transmit the clock signal that can be used to re-time the data. In embedded clocking schemes, the timing information is extracted using the edges present in the data stream. Presence of sufficient edge information in the data stream can be guaranteed using coding of the original data stream, although at the expense of data throughput. Embedded clock architectures are quite commonly used in serial link applications. Recovering the timing information from the data stream and re-timing the data is popularly called as Clock-and-Data-Recovery (CDR). This Section outlines a CDR system suitable for serial link applications especially when the characteristics of the non-idealities present in the incoming data stream are not known apriori.

5.1.2 Clock and Data Recovery

As introduced in the previous section, embedded clocking architectures need to perform retiming and demultiplexing of the incoming data. Since these operations are synchronous, the receiver must generate a clock to perform these operations. A CDR unit provides such a clock from the incoming data. The data is then re-timed using a D-flip-flop (DFF) and the recovered clock. The basic idea is that, although the incoming data stream is noisy, the re-timed data has much less amplitude and timing noise.

It is desirable that the edge used for retiming that data should maintain an optimum phase-relationship to minimize bit errors. Rising edge of the recovered clock is used for retiming in Fig. 5.2. for illustration purposes. For this case, the
Figure 5.2. Timing margins at a re-timing DFF using a CDR.

rising edge of the clock must must be located at the middle of each bit period. Such a phase-relationship will ensure adequate timing margins with maximum allowable setup and hold times for the re-timing DFF. This requires the CDR to track certain components of the incoming data phase variations, such that the recovered clock tracks the available timing window in the data signal. In addition, the recovered clock should also have the exact frequency as the incoming data rate. Any frequency offset between the data rate and the recovered clock will result in a retiming edge that runs away from the optimum sampling point, resulting in errors.

Many schemes have been proposed in the literature to design the CDR unit for high speed serial link systems. Each scheme has its own advantages and disadvantages. Designers make their architectural choice based on the suitability to the application under consideration and any implementation constraints such as channel characteristics, performance of the available silicon process technology, and power consumption budget. Phase and delay locked loops (PLL, DLL) tend themselves naturally to CDR applications and are quite popular. While analog PLL solutions are still popular, digital PLLs offer scalability and flexibility that is afforded by scaled technology nodes. Solutions using phase interpolators, injection locking schemes and oversampling architectures are also reported in the literature. An interested reader is referred to an excellent review and comparison of these architectures in [62].
this Section, we discuss the design trade-offs and implementation of a CDR system using an analog loop. The next section provides an introduction to analog PLL based CDR system.

5.1.3 Analog PLL Based Clock Recovery System

A CDR using a charge-pump based PLL is shown in Fig. 5.3. The loop consists of a phase-detector (PD), a charge-pump, an analog loop-filter and a voltage-controlled-oscillator (VCO). This is similar to the classic analog PLL used for clock synthesis (without the the frequency divider and the pre-scalar) except that the input reference clock is replaced by the random incoming serial data stream. The term analog describes the nature of the filter in the control loop (continuous time, continuous amplitude). Essentially the local clock signal generated by the VCO is continuously tuned to track with the incoming data edges using a analog control loop. The phase detector provides a phase-domain comparison of the incoming data edges and the reference oscillator. The output error signal is converted to charge packets which are integrated using a loop filter to generate the control voltage of the VCO. This control voltage continuously steers the VCO phase to track the incoming data. Such an arrangement is often used to adjust the phase to maximize the timing margin available for the re-sampling DFF.
Two types of phase-detector architectures commonly used in combination with analog PLL loop are (1) linear PD, and (2) bang-bang PD or binary PD. In a linear PD, the output of the phase detector is linearly proportional to the phase error between the incoming data edges and the reference VCO edges. In the phase domain, such a PD can be modeled as a constant small signal gain from input phase error to the output signal of the phase detector. The information in the output is contained in the pulse width of the output voltage signal. Hogge PD, a very commonly used linear PD utilizes re-timing DFFs to produce two pulses, one called a proportional pulse and another termed as a reference pulse[63,64]. The reference pulse width is half the cycle time, while the proportional pulse changes in width depending on the phase relationship between the incoming data and the recovered clock. The output error indicated by the PD is the difference in the two pulse widths. So in theory, a linear PD should be able to produce and compare narrow pulse widths to indicate small phase-errors between input phases under a locked condition. Static phase offsets and Clock-to-Q delays of the DFFs used for pulse width generations could be a significant fraction of the cycle time. This bottleneck limits the use of linear PD at high data rates relative to $f_T$ of the process. In contrast, a bang-bang or a binary PD outputs a digital high or a low signal depending on the phase-relationship between the data edge and the reference clock edge. As CMOS technology continues to scale-down in tandem with increasing data rates into the Gbps regime, bang-bang PD implementations are more favored. This comes at the cost of introducing a hard non-linearity into the loop, which makes the steady-state analysis of the control loop difficult if not impossible. A design oriented approach to evaluate the stability of a bang-bang PLL loops is proposed by[65]. [65] uses a design a discrete-time model for the PLL to arrive at conditions for existence of stable orbits (indicating stable loop dynamics) in the phase plane. But from a rigorous control-theoretic perspective, this is the classical Lur’e or Popov problem which considers the stability of fixed time-invariant systems perturbed by a non-linear feedback gain. This is also known
as the *absolute stability problem*. For more information the reader can refer to a more generalized mathematical problem provided in [66].

To understand the trade-offs involved in the design of a CDR based on analog PLL loop, we first introduce the the types of timing non-idealities (termed as jitter) that affect a CDR system in the next section. Then we briefly discuss the jitter performance indicators used for evaluating the behavior of CDRs. This will provide the necessary material to introduce the adaptive loop CDR which is the main focus of this Section.

### 5.1.4 Types of Timing Non-Idealities

As introduced previously, a CDR unit extracts a clock edge that continuously tracks the incoming random data edges to re-time the data. Due to non-idealities in the system, the timing of the extracted clock deviates from the ideal phase and this deviation is commonly referred as jitter.

Several types of timing errors affect the bit error rate (*BER*) performance of a serial link. Depending on the source of non-ideality, the timing error can either be bounded or un-bounded. A bounded error can be modeled using deterministic models and have a finite theoretical bound on the maximum error value. An un-bounded timing error originates from a random source and can only be modeled using a statistical model. Although, absolute value of the error contribution from such a source can be infinite in theory (with an extremely low probability), it is common to specify the limits on an un-bounded error source (e.g. ±3σ limit for a Gaussian distribution). A general classification of jitter components identifying both bounded and un-bounded error sources is shown in Fig. 5.4 [67,68].

Random jitter (*R_j*) arises from the inherent device noise (thermal, flicker and shot noise) added in the system during analog signal processing. *R_j* is unbounded and modeled using a Gaussian distribution. The distribution has a zero mean and a non-
zero standard-deviation. Hence generally the maximum peak-to-peak contribution from $R_j$ is quoted at a certain bit error rate (BER).

Deterministic jitter ($D_j$) is the bounded component of the jitter. The peak-to-peak value of $D_j$ is bounded for a given system. Several components of $D_j$ have been identified in the literature [67, 68]. Sinusoidal Jitter or also known as Periodic jitter ($P_j$) is due to the spread-spectrum clocking schemes used in serial links to mitigate the electro-magnetic interference or from periodic supply noise coupled from various sources. $P_j$ is independent of the data pattern that is transmitted on the link. Other forms of $D_j$ are data dependent ($DD_j$). Timing uncertainty due to the finite bandwidth of the channel is called inter-symbol interference ($ISI_j$). The circuits used for processing clock signals often introduce duty cycle distortions (DCD) in the signal resulting in $DCD_j$. Crosstalk due to adjacent channels also affects the timing information of the data signals. Such error is often un-correlated to the data pattern and is bounded. All such errors are lumped into a category termed as bounded uncorrelated jitter ($BU_j$).
Figure 5.5. Magnitude of jitter Fourier spectrum from [67].

The timing errors in data signal received at the input of the CDR system contain all the above components of jitter. It is also instructive to look at the frequency domain representation of the above jitter components. Fig. 5.5 shows a typical plot of the Fourier spectrum of jitter as reported in [67]. It should be noted at this point that the Fourier spectrum is different from Power Spectral Density (PSD) of the jitter process. Fourier spectrum magnitude $\Delta T_m$ at a given frequency $f_x$ can be used to estimate the PSD over a time period $T$ as,

$$S(f_x, T) = \frac{|\Delta T_m(f_x)|^2}{T}$$  \hspace{1cm} (5.1)

As noted in [67], 5.1 does not approach the true PSD $S(f)$ and serves as an approximation at best for a random jitter process. First, we discuss the deterministic components of the jitter. As indicated, $DD_j$ and $P_j$ are narrow-band. $P_j$ appears at a known frequency ($f_{P_j}$) in the spectrum depending on the source of the periodic jitter (e.g. spread spectrum clocking). On the other hand, the spectral content of $DD_j$ resides at a frequency ($f_{DD_j}$) which is a periodic multiple of a repeating data pattern. The $BU_j$ components exhibit both narrow-band and broad-band spectral content. Unless the source of $BU_j$ is known apriori, it is not possible to distinguish
between the narrow-band $BU_j$ and $DD_j / P_j$. The $R_j$ components are broad-band in nature. Again, broadband $BU_j$ components cannot be easily isolated from $R_j$.

It is important to identify and distinguish different sources of timing-errors during measurements. Several time- and frequency-domain measurement techniques have been reported to isolate and separately estimate different components of jitter. Reader is referred to [67] for an extended discussion of these measurement techniques for jitter separation and estimation. But what is relevant in our scenario is that, as we will see in the subsequent sections, loop attributes of an analog PLL based CDR systems can be optimized based on certain time- and frequency-domain properties of different jitter components. This is applicable especially in systems where the incoming jitter profile is not known apriori. As we will discuss in the next two sections, ideally the CDR should selectively either track or filter the timing errors based on the their attributes to reduce the $BER$ and maximize the performance of the CDR. This Section will provide an outline of an adaptive solution which relies on on-chip jitter separation to adapt the control loop behavior.

With the above discussion on properties of different jitter components, we will present the few jitter performance metrics used to describe the behavior of the CDR in the next section. This will lead us to a discussion of ideal expected behavior of a CDR in this context and possibility of loop on-chip adaptation.

5.1.5 Jitter Handling in Clock Recovery Systems

In this section we introduce three jitter performance indicators used for analyzing the performance of serial link CDRs. These are jitter transfer ($J_{TRAN}$), jitter generation ($J_{GEN}$) and jitter tolerance ($J_{TOL}$).
5.1.5.1 Jitter Transfer ($J_{TRAN}$)

Jitter transfer ($J_{TRAN}$) is the equivalent of the transfer function of the CDR system. $J_{TRAN}$ provides an indication of output jitter for a given input jitter as a function of frequency. At low frequency input jitter variations, the CDR should track the input such that the recovered clock can be positioned to maximize the timing margins at the retiming DFF. As the frequency of the input jitter increases, the input jitter must be filtered. Hence the $J_{TRAN}$ has a low pass filter response similar to a phase-transfer function of PLLs [69].

Analog PLL based clock recovery systems do exhibit a low-pass transfer functions. For CDRs with linear PDs, the exact transfer can be obtained using a linear small-signal (in phase) model. Depending on the particular serial-link application, the $J_{TRAN}$ specification may require the PLL to comply to a given bandwidth and constrain any peaking in the transfer function. For example, optical link CDR specifications like SONET require a $J_{TRAN}$ bandwidth of 120kHz with a very stringent jitter peaking requirement ($< 0.1dB$) [69]. For CDRs with binary PDs, the presence of a hard non-linearity in the loop does not easily lend to small-signal transfer function analysis. In addition the gain of the bang-bang PD and hence the overall loop gain depends on the amplitude of the input jitter itself [70]. A method based on large-signal piece-wise model has been proposed by [70] to obtain the $J_{TRAN}$ response and equivalent bandwidth of bang-bang loops. Such analysis obtains the equivalent low-pass transfer function of the bang-bang loop as a function of the amplitude of the input jitter. The limitations of linearizing the PD gain as a function of fixed input jitter have been recognized [71, 72]. [71] analyzes the PD gain using Markov models to incorporate the effect of dynamics of the bang-bang loop in the presence of input jitter. Non-linear stochastic analysis has been used recently to model bang-bang PLLs [73, 74]. While developing accurate mathematical models for digital and analog bang-bang PLLs continues to be an active area of theoretical research, challenge lies in obtaining design insights by careful approximations to rigorous analytical models.
In summary, although linearization of the bang-bang PD gain could be useful to designers only analytical models can hold up in mathematical analysis.

5.1.5.2 Jitter Generation ($J_{GEN}$)

Jitter generation ($J_{GEN}$) is the contribution of the CDR circuit itself in the absence of the input jitter to the total jitter in the recovered clock. Naturally, $J_{GEN}$ is an important specification in clock generation circuits employed in transmitters. Several sources of jitter exist within a CDR. As indicated by [69] in linear CDRs key sources of jitter include (1) phase noise from the VCO produced due to device noise (white, flicker, shot etc), and (2) Coupling to various nodes of the PLL from input data transitions, supply noise and substrate noise.

In addition to the above sources, non-linear bang-bang PLLs also exhibit a different source of jitter called hunting jitter. This originates from the limit cycle behavior of non-linear loops [65, 75, 76]. The non-linear binary PD quantizes the input error and produces a binary output even when the phase difference in the input phases is close to zero. Since PD output is quantized to ±1, a frequency step of $f_{bb}$ is generated at the VCO output. Based on the phase-domain model presented in [75], this frequency step leads to the PLL generating an excess hunting jitter with a peak-to-peak value of $J_{pp} = 4\pi (f_{bb}/f_{nom})$, where $f_{nom}$ is the frequency of the input data rate and $f_{bb}$ is the bang-bang frequency step. Another approach to evaluating hunting jitter based on a discrete-time model is presented in [65]. Based on this model, hunting jitter can be reduced by reducing the loop bandwidth of the CDR [61]. We should note that, the proportional dependence of hunting jitter on loop bandwidth is opposite to the dependence of jitter due to VCO noise on loop bandwidth. Due to high-pass nature of the transfer function from VCO phase noise to the clock output in a PLL loop, contribution of the VCO noise can be reduced by increasing the loop bandwidth of the CDR. In summary, several sources of $J_{GEN}$ exist in a CDR and these sources may exhibit different dependencies on the properties of the CDR loop.
5.1.5.3 Jitter Tolerance ($J_{TOL}$)

Jitter tolerance ($J_{TOL}$) performance is perhaps the most important specification for a given serial link system. Jitter tolerance specifies the frequency dependent input jitter profile that the CDR must tolerate without increasing the $BER$ above a certain threshold limit. The main idea behind the $J_{TOL}$ specification is that the clock must always be optimally positioned to facilitate accurate retiming and regeneration of the incoming data pattern. In an ideal scenario, it would make sense to completely track the input jitter regardless of the frequency content of the input jitter. As noted in [77], the presence of loop delay between the incoming data transitions and recovered clock actually hurts the available timing margin at high frequency inputs. Hence the $J_{TOL}$ performance requirement is often specified as a mask which the CDR should meet.

A typical example of such a mask is shown in Fig. 5.6. The mask is defined to keep the system $BER$ below a threshold value (e.g. $10^{-12}$) for a given sweep of frequency and magnitude of the input $P_j$. The magnitude value (specified in UI or unit-intervals) indicated in the figure is the lower boundary for the applied $P_j$. The CDR fails the $J_{TOL}$ test if the magnitude of $P_j$ is less than the mask value and vice
versa. One method to estimate the $J_{TOL}$ performance is to modulate the input data pattern with intentional $P_j$ to the CDR and observe the $BER$ performance of the system as the amplitude and the frequency of $P_j$ is varied. The $BER$ profile can be obtained for a given amplitude of the modulation and frequency of the modulation pair. At low frequency, we should expect the CDR to track the input jitter. At higher frequency, the system should filter the jitter. At high frequency, low amplitude modulation CDR fails to track the input jitter. Hence the phase location of the recovered clock remains largely invariant to the input jitter while the data eye moves around rapidly. Hence the $BER$ performance for this input provides an idea of the timing margin available in the system. But as the frequency of $P_j$ is reduced, the CDR loops starts tracking the input jitter which improves the timing margin at the input of the retiming DFF. Hence the $J_{TOL}$ mask shown in Fig. 5.6 contains low-pass filter like frequency dependent shape.

In the next section we will look at how different input jitter components ($R_j$ and $D_j$) affect the design choices in a CDR system and the impact of CDR loop parameters on the jitter performance indicators of the system. We also introduce adaptive CDR loops leading to the solution discussed in this Section.

5.2 Adaptive Clock Recovery Systems

The performance of a CDR based on analog PLL implementation depends on the input jitter profile and the loop design parameters (loop gain, bandwidth, peaking in the transfer function etc). In this section we discuss the interdependence of input jitter and loop parameters.

A well designed CDR should maximize the timing margin at the timing DFF and minimize the $BER$. A CDR with a large loop gain will be able to minimize the contributions from its own noise sources (VCO), but will result in excessive transfer of high frequency input jitter. In such a scenario, although broadband jitter components such as $R_j$, $ISI_j$ and $BU_j$ will be transferred to the recovered clock,
timing margins between the data and recovered clock will be reduced due to PLL loop delay. As indicated by the $J_{TOL}$ requirements, low frequency, large amplitude jitter inputs (e.g. $P_j$ from spread spectrum clocking) must be tracked successfully to improve the timing margin. This requirement demands a large loop gain. On the contrary, a CDR with a small loop gain will result in slope-overloading in a bang-bang PLL loop [75, 78], but will successfully filter the high frequency input jitter, improving the timing margin.

As it can be seen, conflicting requirements for optimum loop gain can exist in a serial link system especially if the input jitter profile is not known apriori. One approach to optimizing the loop bandwidth of a CDR has been discussed in [77, 79]. [79] treats the CDR as an estimation problem and implements a solution based on an adaptive digital loop-filter solution. In order to implement an all-digital adaptation scheme requires the binary output of the PD to be decimated before a programmable digital filter solution can post-process the input. As the input data rates continue to increase, the solution will demand a combination of a high performance CML circuits and high-speed digital filter implementations. The adaptive solution discussed in the next section provides an efficient alternative solution to this problem using a combination of analog and digital circuits to implement jitter profile detection and loop adaptation.

5.3 Adaptive Bang-bang CDR

Conceptually, the main idea of the adaptive bang-bang CDR is illustrated in Fig. 5.7. PD present at the input of the CDR continuously provides information regarding the relative phase error between the input phase and the VCO clock phase. Hence it should be possible to monitor the output of the PD to extract information regarding the behavior of the loop or the nature of the input jitter profile. Such a “smart logic” block as indicated in the figure, should be able to dynamically adjust the bandwidth of the loop. The solution outlined below is based on this principle.
Figure 5.7. An adaptive clock recovery system.

The architecture of the adaptive bang-bang CDR solution is shown in Fig. 5.8. As the name suggests, the CDR employs a non-linear bang-bang PD embedded in an analog loop. Other building blocks in the CDR include a LC voltage controlled oscillator (VCO) with a high speed divider for quadrature clock generation, a programmable charge pump suitably adjusted during run-time using a controller. The loop filter has been implemented using off-chip passive components as indicated. The details of CML implementation of the half rate PD and the variable charge pump (1X to 8X) can be found in [61]. In this section we focus on the design of the loop gain control block shown in Fig. 5.8.

The goal is to estimate the incoming jitter profile and suitably adjust the loop behavior. The system level solution for the loop gain control outlined in this Section is the solution proposed in [61]. The solution offers an efficient approach to separately detect wideband and narrowband jitter components in the incoming data stream. Depending on the nature of the input jitter profile, the loop gain is either increased or reduced dynamically. The jitter detection and loop adaptation block has been termed as predictor and will be discussed in the following subsections. A mixed-mode predictor has been used for estimating the appropriate bandwidth setting of the CDR. The predictor contains two sections. An analog front-end (AFE) which interfaces to the output of the bang-bang PD and a digital section which interprets
Figure 5.8. Top level architecture of the adaptive clock recovery structure.

An outline of the AFE of the predictor is presented first, followed by the design of the digital section of the controller. State transition diagram of the controller is presented along with the timing constraints in the implementation. We also include a description of the I/O driver circuits which enable monitoring the control signals digital control signals from the predictor on during measurements to experimental verify the proposed CDR system.

5.3.1 Predictor Design

The top level block diagram of the predictor and the digital section is provided in Fig. 5.9. It contains an AFE that interfaces with the phase detector and a digital controller. Conceptually, the AFE essentially monitors the output of the PD to detect if the PD is indicating either a continuous lead or a continuous lag in the recovered clock phase with respect to the input phase. That is, the goal of the predictor is to detect if the PD output has excessive number of either UP or DOWN
Figure 5.9. Top level architecture of the mixed signal predictor.

signals in its output. In that case, the predictor interprets this and then increases the loop bandwidth (at a certain rate). If the number of UP and DOWN signals are comparable within a certain window, the loop bandwidth is reduced.

The AFE consists of a dual path switched capacitor integrator. The PD output drives a V-I converter (transconductor stage) which produces a current that is integrated onto a capacitor. The capacitor is periodically refreshed to restart the integration window. The frequency of the clock used for generating the non-overlap clocks determines the duration of this integration. The arrangement is essentially similar to an integrate-and-dump block. Two such integrate-and-dump blocks are employed in tandem such that one path is integrating while the other is dumping (or resetting). Such a ping-pong tandem arrangement allows for continuous monitoring for the PD output such that the bandwidth setting accurately reflects the instantaneous state of the loop. The accumulated charge on the capacitor is compared to two levels to perform an absolute analog-to-digital conversion (ADC). If the total accumulated charge from the integration time window is such that the voltage on the capacitor exceeds a certain threshold (either positive or negative), the ADC indicates a high (logic 1) signaling a bandwidth increase. A low (logic 0) from at the ADC output results in the reduction of bandwidth. Essentially, the trick is to ensure
that the single-bit ADC indicates a 1 (increase bandwidth) for large amplitude low
frequency $P_j$ and a 0 for (reduce bandwidth) for wideband $R_j$ and $ISI_j$. The logic
output of the ADC is continuously monitored by a digital state machine, which pro-
vides the necessary controls for the charge pump. The digital design of the digital
section is introduced in the next section.

The adaptation performance of the CDR depends on the clock frequency used for
the integration and the reference voltages used for the comparison. In this design the
tandem integrate-and-dump blocks run at 78MHz, producing a digital control signal
at 156MHz. Further discussions and justification for this choice of clock frequency
can be found in [61].

There is one key advantages of the analog integrate-and-dump approach. The
approach performs an analog decimation function which otherwise would be hard
to achieve using a digital approach. As the data rates approach 10 Gbps, a digital
decimator interfacing with PD would have to operate at 10Gbps resulting in power
hungry CML circuits for division and demultiplexing. Even after a power hungry
CML circuit achieving the decimation, the digital circuits used for implementing a
digital loop filter would still run at a much higher speed compared to the 156MHz
chosen in this work. Analog integration allows for efficient decimation and bandwidth
adaption at a lower speed.

5.3.1.1 Digital State-Machine and Controller

The digital state-machine and the controller implemented in this work are an ex-
tension of the outline presented in [61]. The charge pump is programmable between
1X to 8X. Hence the output of the controller is an 8 bit thermometric code ranging
from 1 (0000 0001) to 8 (1111 1111). Thermometric code is used for charge-pump
control to guarantee monotonicity in the loop adaptation. The variable charge pump
hence includes 8 equal valued current source arms which either source or sink current
into the loop filter depending on the UP/DOWN signal from the phase detector. One
implementation of such a counter based on [61] is shown in Fig. 5.10. Conceptually, the counter is implemented using a synchronously clocked shift register and multiplexers as indicated. The 8 bit digital code \( \{B_0 - B_n\} \) is the thermometric output controlling the charge pump. An increase in the count (e.g. 0000 0001 to 0000 0011) is achieved by shifting a logic '1' from left to right, while a decrease in the count (e.g. 0000 0011 to 0000 0001) is achieved by shifting a logic '0' from right to left. The Shift Control signal is generated by a state-machine depending on the signal from the AFE.

In addition to the thermometric code, a state-machine that remembers the logic decisions of the AFE is included in this design. This addition offers the flexibility to vary the rate of loop adaptation depending on the history of the control signal generated by the AFE. The state-machine generates the shift control signals for the counter in order to produce the control code for the charge pump.
To illustrate the interaction of the AFE and different digital building blocks, one bit slice of the 8-bit shift register is shown in Fig. 5.11. The figure depicts the AFE, digital state-machine, the clock dividers used for synchronization and the combinational logic for code-conversion. As shown, an external clock signal at 156MHz clocks the digital state machine. A digital divide by 2 circuit generates the clock phases required by the 78MHz clock signal that clock the switched capacitor AFE. The combination-logic for translation between binary code to thermometer code and vice versa is also shown.

To explain the functional operation of the digital logic architecture together with the AFE (cf. Fig. 5.11), we note that,

- The shift register output for $i^{th}$ bit is $B_i$. The shift register can be shifted right (or left) to increase (or decrease) the thermometer code between 1 to 8.

- The shifting operation can be chosen between shift by 1 bit or shift by 2 bits. For shift by 1 bit, $B_{i-1}$ or $B_{i+1}$ is loaded into $B_i$. For shift by 2 bits, $B_{i-2}$ or $B_{i+2}$ is loaded into $B_i$.

- Charge pump can also be controlled directly from external control. The binary-to-thermometer controller converts the external control $BW\ Value[2:0]$ to 8 bit thermometer code $BW_i$.

- The $i^{th}$ arm of the charge pump is controlled using the bit $BC_i$. Depending on the value of the $BW$ load control bit, $BC_i$ is selected either from $B_i$ or $BW_i$. Such an arrangement allows for synchronous transfer of control from an external code to automatic control from the state-machine without introducing abrupt changes in the CDR loop.

- The bandwidth control signal $BC_i$, is converted to binary value $BW\ _Control[2:0]$ so that it can be monitored externally on a logic analyzer. The I/O driver circuits required to drive the load presented by an external logic analyzer are described in the next section.
The timing waveform between the AFE and the digital control output is indicated in Fig. 5.12. As indicated in the figure, $\Phi_1$ and $\Phi_2$ are two non-overlapping clocks at 78MHz that clock the data output of the two ADCs running in tandem. $\Phi_3$ is the 156MHz clock signal that controls the digital state machine and the shift-register. The increment/decrement_bar signal is generated from either of the two ADCs at the rising edge of either $\Phi_1$ or $\Phi_2$. This signal is processed by the digital stage machine to produce an updated bandwidth control signal on the subsequent rising edge of $\Phi_3$ as indicated in the figure. The state-machine controls the shift register to provide an output code between 1 to 8. The logic implemented in the state-machine can be described as below:

- Increase (or decrease) the bandwidth control if increment signal is high (or low). Retain the bandwidth control at the maximum value (or minimum value) if a increment high (or low) is received when the code is at 8 (or 1).

- The step control signal controls the bandwidth control step size. If step control is set high, then the bandwidth increase (or decrease) step size is changed to 2 at the third consecutive increment (or decrement) signals.

The above logic is encoded using a state-machine as depicted in Fig. 5.13. At every rising edge of $\Phi_3$, the state-machine senses the input increment/decrement_bar signal and transitions from state $(n)$ to state $(n + 1)$ as indicated in the table. Three bits ($s2, s1, s0$) indicate the state. At every transition, depending on the current state and the input a corresponding two bit output $(y1,y0)$ is generated. The output $(y1,y0)$ controls the step-size, shift-right or shift-left of the thermometric counter. The critical timing path for setup condition in the flip-flop can be identified all the way from the increment/decrement_bar signal to the D input of flip-flop as shown in Fig. 5.11. Depending on the state-machine output, the appropriate bandwidth control code $BC_i$ is generated.
Figure 5.11. Architecture of the digital logic and interface to analog front-end.
The mixed-mode predictor thus implemented provides the combined features of input jitter estimation and loop gain control. Such a mixed-mode implementation offers several advantages which will be discussed in the next section.

5.3.1.2 Advantages of Mixed-Mode Predictor Design

There are several advantages of using the mixed-mode approach to estimation compared to previously reported all-digital approaches:

1. AFE interfacing to the PD output acts as an efficient decimation and estimation block.

2. As data rates continue to increase, at higher data rate to $f_T$ ratios, all-digital approaches will not be quite efficient. Such approaches will require power hungry CML circuits to interface to the PD output and decimate the digital output from the PD before performing estimation and filtering using an all-digital filter. Although the cost of the digital filter itself may be very low, the interface and decimation circuits will still be power hungry.

3. This approach still maintains a digital section that can be programmed to alter the bandwidth adaptation rate as indicated in the current implementation.

5.3.2 Input-Output (I/O) Design for the CDR System

5.3.2.1 Digital Output Driver to Drive the Logic-Analyzer

The bandwidth control code from the predictor should be monitored in real-time to validate the effectiveness of jitter separation and bandwidth control algorithm proposed. Hence the bandwidth control code is routed off-chip for monitoring as illustrated in Fig. 5.11. The bandwidth control signals are updated at the rising edge of the clock $\Phi_3$ running at 156MHz. The 8-bit thermometer code is converted
Figure 5.12. Timing diagram of analog and digital interface.

Minimum: 1 → dec (by 1 or >1) → 1.  
Step control: inc (by 1) → inc (by 1) → inc (by >1).  
Maximum: 8 → inc (by 1 or >1) → 8.  
Step control: dec (by 1) → dec (by 1) → dec (by >1).  
Synchronize edges of $\Phi_1$, $\Phi_2$, $\Phi_3$ at end-points (switches, comparators, multiplexers and flip-flops.)
to a 4 bits that include a 3 bit equivalent binary code and a 1 bit error code which indicates if there is an error in the thermometer code. The 4 bits drive a digital output driver that drives an off-chip logic-analyzer load. This section provides details of the driver designed to interface with the logic analyzer load at 156MHz.

The load presented by the logic analyzers can be modeled using a parallel combination of a resistor and a capacitor as indicated in Fig. 5.14. While a probe load could be $R_{load} = 20k\Omega || C_{load} = 10pF$, the exact value of the $R_{load}$ ($10K - 20K$) and $C_{load}$ ($5pF - 20pF$) can vary depending on the particular analyzer or the probe that is being used. It should be noted that the value of the $R_{load}$ is order of magnitude larger than the typical characteristic impedance ($Z_{line,typical} \sim 50 - 75\Omega$) of a transmission line. Hence such a load behaves similar to a capacitive load rather than a terminated transmission line. A capacitive load has the attributes of a transmission line terminated in both an open and a short. During steady state the load behaves like an open, while during signal transitions the load behaves like a short. If the

---

**Figure 5.13.** Predictor state machine.
load capacitor varies across different conditions, then the reflection co-efficient at the end of the transmission line is a frequency dependent variable which changes during signal transitions.

Due to these reasons, using a conventional inverter based digital output driver would generate a *staircase* waveform at the logic analyzer input, with the waveform *resting* at the mid-range between $0 - V_{dd}$ during the transitions. Such *pedestal* waveforms are not desired especially when the output needs to be sampled by the internal clock of a logic-analyzer to convert the signal to decipher a logic high or a logic low level. Certain models of the logic analyzers provide a method to adjust the delay of the sampling phase of internal clock for the entire bus. But in the presence of an skew between multiple data lines in the bus, then the presence of *pedestal* waveforms may present a problem finding an optimum sampling point for the logic-analyzer to decipher the digital code. Hence in this section, we outline a slew-rate controlled predriver based digital pad driver that avoids this issue for a wide range of loads and guarantees a smooth and controlled *high-to-low* and a *low-to-high* transitions. The proposed driver combines several output driver design techniques used for both push-pull and current source drivers [80]. The novelty of the proposed driver is the use of current-source based charging/discharging of gate signals driving the driver transistors instead of using current sources as drivers themselves. This approach is also different from digital output impedance control commonly employed in digital I/O drivers.
Figure 5.15. Digital driver based on a current source based pre-driver.
Figure 5.16. Logic and the timing generation logic for predriver control.

The schematic of the output driver is shown in Fig. 5.15. The final stage of the driver contains two transistors (P driver and N driver) for the push-pull operation. The gate of the P driver (PDRV) and gate of the N driver (NDRV) are separately controlled as indicated. This is in contrast to a conventional voltage-mode push-pull driver which use digital hard-switching. This is also different from truly current-mode drivers which employ constant current sources at the output stage. The digital control logic required for generating appropriate PDRV and NDRV signals is shown in 5.16. The main highlights of this design are:

- A break-before-make scheme is employed. This break-before-make scheme ensures that there is no crowbar current (short-circuit current from supply-to-ground during transitions) through the output driver transistors during the transitions.

- The break-before-make scheme is implemented using a non-overlap clock generator as indicated in Fig. 5.16. The timing waveforms and signal transitions are indicated in Fig. 5.17. P1 (and P1B) and P2 (and P2B) are two non-overlapping phases generated from the input data transitions. Based on this non-overlap control,
  - PMOS driver transistor is shut-off (using break-P) before the NMOS driver transistor is enabled (using make-N) for a high-to-low output (at the PAD) transition.
- NMOS driver transistor is shut-off (using \textit{break-N}) before the PMOS current source is enabled (using \textit{make-P}) for a low-to-high output (at the PAD) transition.

- The OFF-to-ON transition at the gate of the driver transistors (both P and N) is facilitated using a constant current sources. The ON-to-OFF transition is performed instantly using a digital signal.

- The rate of driver gate transitions at PDRV and NDRV is controlled using the strength of the current source. The value of the current source can be controlled using a programmable input current source to the output buffer. In this prototype, the value of the current is controlled by adjusting an external resistor.

- Controlling the gates of the driver transistors provides a way to control the output rise/fall transitions and hence eliminate the pedestal output waveforms.

\textbf{Figure 5.17.} Timing waveforms for predriver control.
As suggested by [80], the rise/fall-times of the output transitions in I/O drivers have to be just right like the Goldilock’s porridge. Too fast a rise time couples excessive energy into parasitic LC elements resulting in reflections from short discontinuities, while a very slow rise time reduces the timing margin in the system.

- Gradual and controlled enabling of the output driver transistors ensures that \(Ldi/dt\) noise generated on the supply nodes due to multiple simultaneous switching outputs (SSOs) is kept under control.

- The control logic shown in 5.16 also includes an enable (En) control for the output driver to optionally Tri-state the output driver to minimize substrate noise during normal operation of the CDR.

As it will be indicated in the subsequent sections, real-time bandwidth control information has been successfully captured using these current-controlled output drivers to drive a logic-analyzer load.

### 5.3.2.2 CML Limiting Amplifier to Drive the Phase Detector

In order to test the proposed CDR system, input data from the test equipment at 10Gb/s must reach the input of the PD. This puts demanding requirements on the input limiting amplifier which must interface with the input parasitics in a packaged system and drive the loading presented by the PD. Key requirement specifications of such a high speed input amplifier are:

- Provide a differential 100\(\Omega\) input termination to the signal source.

- Deliver an output differential swing of at least 600mV to the phase detector with a reliable input common-mode signal range in order to bias the input of the PD.
Figure 5.18. Single stage input amplifier to interface the signal source to PD.

- Tolerate input $ISI_j$ and minimize the additional $ISI_j$ from the amplifier in the presence of input $ISI_j$ from the signal source. The input signal source may contribute up to 10ps of jitter. This is especially important in our application since we do not have a continuous-time equalizer in the input path to simplify the design.

Two versions of the input buffer have been designed in this work. The design used in the first prototype included a two stage solution to tolerate small input signals from the signal source and provide signal amplification in addition to driving the PD. In the second design, a single buffer stage was employed to minimize the $ISI_j$ contribution from the amplifier. The schematic of the single stage amplifier is shown in Fig. 5.18. The load resistor $R_{load}$ can vary $\pm 20\%$ from the nominal value such that $R_{load,min} = 0.8 \times R_{load,nom}$ and $R_{load,max} = 1.2 \times R_{load,nom}$. The input capacitance of the PD (due to $C_{gs}$) and the routing parasitics contribute a load capacitance $C_{load} = 100 fF$ on each output node. The desired differential output swing at the output of the amplifier is given by $I_{bias} \times R_{load}$. The minimum bandwidth of the amplifier at the output node $\omega_{-3dB} = (R_{load,max} C_{load})^{-1}$ is constrained by
Figure 5.19. Simulated eye diagram across corners including package parasitics.

\( R_{\text{load,max}} \), while the minimum single-ended output swing \( V_{se,\text{min}} = I_{\text{bias}} \times R_{\text{load,min}} \) is limited by \( R_{\text{load,min}} \). To maximize the output bandwidth for the given \( C_{\text{load}} \), \( R_{\text{load,nom}} = 75\Omega \) is chosen. Hence to achieve a desired single-ended output swing of > 700mV across process and temperature corners, \( I_{\text{bias}} \) is set to 12mA. The design has been simulated in an elaborate test bench including the parasitics in the input path, supply parasitics, decoupling capacitor networks and external AC coupling in the input path. The resulting eye diagram from the input path across process corners is shown in Fig. 5.19.

In addition to the input limiting amplifier, the other contributions to the CDR system include design and characterization of the power supply network including both on chip and off chip decoupling capacitors, design of digital input clock pad circuit and a digital register file with a serial to parallel logic to store the external control codes for the entire system.
5.4 Experimental Results

The proposed adaptive CDR has been fabricated using TSMC 0.18um CMOS technology (with process $f_T = 44\,\text{GHz}$). The chip micrograph is shown in Fig. 5.20. The total die-size is $2.3\,\text{mm} \times 2.3\,\text{mm}$. The core area of the CDR occupies $1.5\,\text{mm} \times 1.5\,\text{mm}$. Pins used for the digital I/O occupy the additional area in the die.

In order to verify the effectiveness of the proposed jitter separation and loop adaptation techniques, digital loop gain control signals from the predictor are continuously monitored under the test with different jitter profile. The measurement setup is shown in Fig. 5.21. The input data pattern is applied from a high performance Bit Error Rate Tester (BERT). BERT allows for controlled-modulation ($P_j$) of the triggering clock used for clocking out the data to the DUT. The recovered clock and data are returned back to the BERT to measure the $BER$ of the system. The
The measured loop gain adaptation is shown in Fig. 5.22, Fig. 5.23 and Fig. 5.24.

- Without additional $P_j$, the proposed CDR successfully recovers clock. The instantaneous bandwidth code from the CDR is plotted in Fig. 5.22. The average bandwidth code is maintained at a low value of 1.42 as seen in Fig. 5.22.

- The instantaneous bandwidth code with a moderate $P_j$ (0.8UIpp at 1MHz) is shown in Fig. 5.23. The average value of the code is 2.1. We should expect the CDR to have high bandwidth when the slope of the input $P_j$ is the largest. For $P_j$ frequency of 1MHz, highest slope should occur during zero crossings of the phase modulation sinusoid every 0.5µs. It should also be noted that the variation of bandwidth adaptation shows the expected periodicity of 0.5µs.
• The instantaneous bandwidth code with a moderate $P_j$ (0.8 UIpp at 4MHz) is shown in Fig. 5.23. The average value of the code is 4.6. Bandwidth adaptation shows the expected periodicity of 0.125 $\mu$s.

• A comparison of the jitter histogram from the recovered clock for different input jitter profiles is shown in Fig. 5.25. The jitter profiles are extracted for the lowest (1X) and the highest (8X) fixed bandwidth settings in addition to adaptive bandwidth operation. As indicated in the figure:
  
  – The jitter histogram of the adaptive CDR is identical to the minimum fixed bandwidth CDR (with 1X strength) in the absence of any intentional $P_j$ applied at the input.
  
  – The jitter histogram of the adaptive CDR is identical to the maximum fixed bandwidth CDR (with 8X strength) in the presence of a large $P_j$ (0.7 UIpp at 16MHz) at the input.

• The recovered clock from the CDR has a measured jitter of 1.13ps rms at 9.4Gb/s using a $2^7 - 1$ PRBS pattern from the BERT.

Measurement results from the first prototype demonstrate that the proposed adaptive CDR unit optimizes the loop gain based on dominant jitter frequency detection. The mixed-mode predictor implemented as a combination of a switched-capacitor AFE and a digital controller successfully estimates the required bandwidth of the CDR depending on the $P_j$ applied at the input. The periodicity of loop gain adaptation logic as indicated by the digital code obtained using a logic analyzer correlates to the frequency of the applied $P_j$.

Experimental results also indicated that the first prototype chip was fabricated in the slow process corner leading to excessive ISI in the output buffers operating at 10Gb/s. The output buffers and other CML circuits in the CDR have been redesigned with wider-bandwidths to minimize ISI in the second prototype which is currently under fabrication.
5.5 Conclusion

This Section outlined an adaptive loop gain bang-bang CDR solution using an analog PLL implementation. The proposed CDR controls loop gain adaptively for different jitter profile estimated using a mixed-mode jitter estimator and loop-gain adaptation. The experimental results from the first prototype fabricated using TSMC
Figure 5.24. Measured real-time bandwidth control code value for $P_j = 0.8UIpp$ at 4MHz.

Figure 5.25. Comparison of measured jitter histogram for different input jitter.

RFCMOS 0.18μm technology running at 10Gb/s indicate that the loop gain control mechanism performs as expected.
6. CONCLUSION

The research presented in this thesis addresses some of the challenges associated with both wireless and wireline communication systems. Sections 2, 3 and 4 focused on analog baseband design for a high dynamic range wireless radios, while Section 5 presented an adaptive clock-recovery system meant for a high-speed wireline transceiver.

Section 2 offers insights into the critical impact of out-band undesired blockers on analog baseband design in high dynamic range radios. Systematic analysis quantifies dynamic range requirements at the A/D conversion interface depending on the type, strength and relative spectral position of the blocker. We showed that baseband filters implementing high order Inverse Chebyshev approximations that provide a sharp transition band due to stop-band zeros are desirable in the presence of aggressive analog adjacent blockers, while relatively smoother Butterworth filters are suitable for the case of far out digital blockers. In Section 3, a prototype broadband UHF wireless receiver was presented. The receiver integrated an RFVGA, a current-mode passive mixer along with a hybrid continuous- and discrete-time analog baseband. Experimental results from the prototype system include residual dynamic range measurements under different input blocker profiles.

We delved further into dynamic range requirements of the analog baseband system in a wireless radio from a out-band linearity perspective in Section 4. A novel feed-forward OTA structure suitable for active RC filter design for analog baseband was proposed. Simulation results from the chip prototype that include a reference design using the conventional feed-forward OTA for fair comparison were presented. This research shows that circuit techniques to improve the out-band linearity performance enhance the blocker tolerance and the dynamic range of the receiver. As indicated in Section 4, proposed OTA has an excellent FoM and improves the out-band linearity performance without significant cost in area or power.
Section 5 dealt with clock recovery systems for wireline transceivers and presented details of the research performed in collaboration with [61]. An efficient mixed-signal approach to optimize the loop gain of a clock recovery loop was presented. The Section provided complete details of the supporting digital state-machine in the mixed-signal solution to achieve adaptive tracking along with the auxiliary circuits used in the system. The mixed-signal solution presented achieves on-chip jitter separation and improves the jitter tracking and jitter-filtering trade-off. The main conclusion from this research is that smart mixed-signal solutions can enable adaptive clock-recovery systems that operate at high data rates relative to $f_T$ of the process technology.
REFERENCES


[29] S. Azuma, S. Kawama, K. Iizuka, M. Miyamoto and D. Senderowicz, “Embedded Anti-Aliasing in Switched-Capacitor Ladder Filters With Variable Gain and


VITA

Raghavendra Kulkarni received B.E. degree in Electronics and Communication from University of Mysore, India and M. Tech. degree in VLSI Design, Tools and Technology (VDTT) from IIT, Delhi, India in 1997 and 2001 respectively. He received his Ph.D. degree from the Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX in December 2011.

He can be reached at 315-F Wisenbaker Engineering Research Center, Bizzell Street, Texas A&M University, College Station, TX 77843-3126. His email ID is ragkul@neo.tamu.edu.