# DESIGN TECHNIQUES FOR HIGH-SPEED MULTI-CARRIER WIRELINE

# RECEIVERS

# A Dissertation

by

# YUANMING ZHU

# Submitted to the Graduate and Professional School of Texas A&M University in partial fulfillment of the requirements for the degree of

# DOCTOR OF PHILOSOPHY

| Chair of Committee, | Samuel Palermo          |
|---------------------|-------------------------|
| Committee Members,  | Sebastian Hoyos         |
|                     | Jun Zou                 |
|                     | Duncan M. (Hank) Walker |
| Head of Department, | Miroslav Begovic        |

May 2022

Major Subject: Electrical Engineering

Copyright 2022 Yuanming Zhu

#### ABSTRACT

The explosion in network traffic driven by cloud computing and wireless data usage necessitates serial I/O operate at higher data rates. The per-channel I/O data rate is projected over 100Gb/s due to packaging technology allowing only modest increases in I/O channel number. As the high-speed data symbol times shrink, this results in an increased amount of inter symbol interference (ISI) for transmission over both severe lowpass electrical channels and dispersive optical channels. This necessitates increased equalization complexity, consideration of more bandwidth-efficient modulation schemes, such as baseband PAM4 and coherent QAM, and the use of forward error correction. Serial links that utilize an analog-to-digital converter (ADC) receiver front-end offer a potential solution, as they enable more powerful and flexible digital signal processing (DSP) for equalization and symbol detection and can easily support advanced modulation schemes. Moreover, the DSP back-end provides robustness to process, voltage, and temperature variations, and benefits from improved area and power with CMOS technology scaling. However, with the data-rate increasing to 100+ GS/s, the front-end ADC for PAM 4 modulation requires 50G+ GS/s sample rate, and high input bandwidth is essential for less induced ISI. Power consumption of such high-speed ADC is a major problem. This motivates the exploration of energy-efficient high-speed and highbandwidth time-interleaved ADC design. In addition to this, sampling clock jitter places fundamental performance limitations on common time-interleaved ADC architectures,

necessitating clock generation and distribution circuitry that achieve rms jitter of a few hundred of femtoseconds.

This dissertation presents three researches. The first work presents a 1.5GS/s 8-bit unit pipeline-SAR ADC on 14nm with output level shifting techniques that significantly reduce the unit-ADC power and maintain the high-speed. The unit-ADC operates with a 0.8V supply, consumes 2.4mW power, and achieves16.7fJ/conv.-step FOM at Nyquist. The second research presents a high-speed time-interleaved ADC, proposed a speedenhanced bootstrapped switch that enable a low-power and high bandwidth interleaver. A 7-bit 38GS/s ADC 22nm prototype achieves 41.9fj/conv.-step at low frequency, 64.1fj/conv.-step at Nyquist, and has a 20GHz 3dB bandwidth. The third research is a novel frequency domain multi-carrier ADC-based receiver front-end. The multi-carrier technique significantly improves jitter robustness and reduce the conversion speed of front-end ADC as well as the DSP complexity. A 40Gb/s receiver frontend 22nm prototype can operate with the highest 1.6psrms jitter and achieves 3.05pJ/bit power efficiency DEDICATION

For the memory of my father

#### ACKNOWLEDGEMENTS

It has been a while since I start my PhD career, I would never forget this journey in my life. In my PhD period, I have experienced this endless pandemic era and the pain of losing my father, it was not easy for me. Thanks to all the people who has helped me, supported me, encouraged me, which takes me where I am.

First and foremost, I would like to thank my advisor Prof. Palermo for educating me through my PhD research with many inspiring formats, also offering me tremendous opportunities for participating in many high quality research. This work would not success without his guidance. His passionate, technical expertise and determination set a standard for my future career. And thanks to my co-advisor Prof. Hoyos, I have benefitted greatly from both his technical intuition and his warm, supportive nature.

I wish to show my gratitude to Prof. Zou, Prof. Walker, for serving in my committee and their constructive feedback.

I would like to thank all the senior students. Thank you, Shiva and Shengchang for inducing me many experiences in my first few years, and those unforgettable Ping-Pong time after the weekly meeting. Thank you Kunzhi for accommodating me during my internship at HP lab in spring 2019 and the great suggestions. Thank you, Chaerin, for sharing with me all the interesting Korean facts and supporting me in those hard time. Thanks, Tong and Ruida, those fishing and barbecue time will be one of the best memories in my life. Thanks, Ankur, David, Yanghang, Po-Hsuan and Anil, for those fruitful and funny discussions. I would like to thank Jim Huang, Zhihong Huang and Marco Fiorentino from HP Lab, Manisha, Ahmed and Bo from Marvell for the wonderful learning experiences and providing the great internships at HP lab in 2019 and Marvell in 2020. Special thanks to Surej from Intel for his patient and strong support in the Intel university shuttle tape-out program.

I want to extend my special thanks to my talented project partner Julian, Srujan, Il-min. Our project would never be as successful if it were not for all your outstanding work and the support during the worst time. Thank you for making all the testing days and nights memorable with fun Columbia and Indian facts, which I will cherish forever.

And special thanks to Dr. Su, who treat me like family. Encourage me, comfort me, and cheer me up in my most difficult times.

Finally, great thanks to my mom and dad. Their selfless love comes along with my born and grown up. Thank you, mom, you carry everything on your own shoulder when dad was fell down, your indomitable spirit will always inspire me. Thank you, dad, you taught me integrity and brave when I was young, I will remember your smile and take your optimistic nature to my life. It would be not possible for me to finish this dissertation without their unconditional giving and sacrifice. For that, I dedicate this dissertation to them.

## CONTRIBUTORS AND FUNDING SOURCES

# Contributors

This work was supervised by dissertation committee consisting of Professor Samuel Palermo [advisor], Professor Sebastian Hoyos [co-advisor] and Professor Jun Zou of the Department of electrical and computer engineering and Professor Duncan M. (Hank) Walker of the Department of computer science and engineering.

The multi-carrier techniques depicted in Chapter 2.2 was provided by Professor Sebastian Hoyos.

All other work conducted for the dissertation was completed by the student independently.

# **Funding Sources**

Graduate study was supported in part by SRC TxACE Grant 2810.013 and NSF Grant 1930828. Chip fabrication services were provided through the Intel University Shuttle Program.

# TABLE OF CONTENTS

| Page |
|------|
|------|

| ABSTRACT                                                                                                                                                                                                                                                                                                                                         | .ii                                    |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------|
| DEDICATION                                                                                                                                                                                                                                                                                                                                       | iv                                     |
| ACKNOWLEDGEMENTS                                                                                                                                                                                                                                                                                                                                 | .v                                     |
| CONTRIBUTORS AND FUNDING SOURCES                                                                                                                                                                                                                                                                                                                 | vii                                    |
| TABLE OF CONTENTS                                                                                                                                                                                                                                                                                                                                | iii                                    |
| LIST OF FIGURES                                                                                                                                                                                                                                                                                                                                  | .x                                     |
| LIST OF TABLES                                                                                                                                                                                                                                                                                                                                   | iv                                     |
| 1. INTRODUCTION                                                                                                                                                                                                                                                                                                                                  | .1                                     |
| 2. BACKGROUND                                                                                                                                                                                                                                                                                                                                    | .5                                     |
| <ul> <li>2.1. High-speed wireline link</li></ul>                                                                                                                                                                                                                                                                                                 | .5<br>.7<br>10<br>13<br>14<br>17       |
| 3. A 1.5GS/S 8B PIPELINED-SAR ADC WITH OUTPUT LEVEL SHIFTING<br>SETTLING TECHNIQUE IN 14NM CMOS                                                                                                                                                                                                                                                  | 20                                     |
| 3.1. Introduction       2         3.2. Output Level Shifting Settling       2         3.3. ADC Architecture and Building Blocks       2         3.3.1. Dynamic Amplifier and Comparator       2         3.3.2. Second Stage Reference Switch with Embedded OLS       2         3.4. Experimental Results       3         3.5. Conclusion       3 | 20<br>23<br>28<br>29<br>31<br>32<br>34 |
| 4. A 38GS/S 7-BIT TIME-INTERLEAVED PIPELINED-SAR ADC WITH SPEED-<br>ENHANCED BOOTSTRAPPED SWITCH                                                                                                                                                                                                                                                 | 37                                     |

| 4.1. Introduction                                                        |             |
|--------------------------------------------------------------------------|-------------|
| 4.2. Time-Interleaved ADC Architecture                                   | 40          |
| 4.2.1. ADC Timing Diagram                                                | 42          |
| 4.2.2. Interleaver architecture and proposed speed-enhanced bootstrapped | d switch.44 |
| 4.2.3. Multi-Phase Clock Generator                                       | 50          |
| 4.3. Unit Pipeline-SAR ADC Architecture                                  |             |
| 4.4. Experimental Results                                                | 55          |
| 4.5. Conclusion                                                          | 61          |
|                                                                          |             |
| 5. A JITTER-ROBUST 40GB/S ADC-BASED MULTICARRIER RECEIVE                 | R           |
| FRONT END IN 22NM FINFET                                                 |             |
|                                                                          |             |
| 5.1. Receiver front-end architecture                                     |             |
| 5.2. Schematic of the receiver front-end                                 | 64          |
| 5.2.1. Schematic of CTLE                                                 | 66          |
| 5.2.2. Schematic of integrator and mixer                                 | 67          |
| 5.2.3. 4-way time interleaved ADC design                                 |             |
| 5.3. Multi-Phase Clock generator                                         | 69          |
| 5.4. Experiment Results                                                  | 70          |
| 5.5. Conclusion                                                          | 72          |
|                                                                          |             |
| 6. CONCLUSION AND FUTURE WORK                                            | 75          |
|                                                                          |             |
| 6.1. Conclusion                                                          |             |
| 6.2. Time-domain high-speed ADC design                                   |             |
| 6.3. Frequency domain multi-carrier transmitter                          |             |
| DEFEDENCES                                                               | 0.0         |
| KEFEKENUES                                                               | 88          |

# LIST OF FIGURES

| Figure. 2.2: Frequency response and pulse response of three channels                                                                                                                                                                                                                                                                                                                  |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Figure. 2.1: Electrical Backplane channel cross section                                                                                                                                                                                                                                                                                                                               |
| Figure. 2.3: Eye diagrams after channels without equization                                                                                                                                                                                                                                                                                                                           |
| Figure. 2.5: 128 Gb/s ADC-based receiver modeling. Channel, BER vs. Jitter9                                                                                                                                                                                                                                                                                                           |
| Figure. 2.4: ADC-based receiver architecture9                                                                                                                                                                                                                                                                                                                                         |
| Figure. 2.6: A 64-way time-interleaved ADC example10                                                                                                                                                                                                                                                                                                                                  |
| Figure. 2.7: Impact of offset (a) and gain (b) error on ADC output spectrum11                                                                                                                                                                                                                                                                                                         |
| Figure. 2.8: Impact of skew error on ADC output spectrum                                                                                                                                                                                                                                                                                                                              |
| Figure. 2.9: Characteristics of A/D conversion of multicarrier signals in the frequency domain. (a) Effect of time segmentation in the bandwidth of the multicarrier signal and in the samples frequency spacing $\Delta Fc$ (b) Number of coefficients <i>N</i> versus the symbol-period to segmentation-time ratio ( $M = T/Tc$ )                                                   |
| Figure. 2.10: 128 Gb/s ADC-based receiver modeling. Channel, BER vs. Jitter15                                                                                                                                                                                                                                                                                                         |
| <ul> <li>Figure. 2.11: (a) 128Gb/s system pulse responses from a time-interleaved 64GS/s PAM4 system and the proposed frequency-domain receiver in multi-tone mode with the responses from the baseband and I-phase channels shown.</li> <li>(b) BER vs RJ for a time-interleaved receiver and the frequency-domain receiver operating in baseband PAM4 and multi-tone mode</li></ul> |
| Figure. 2.12: ICI cancellation with digital FIR filters                                                                                                                                                                                                                                                                                                                               |
| Figure 3.1: Pipeline SAR ADC with a dynamic amplifier22                                                                                                                                                                                                                                                                                                                               |
| Figure 3.2: Output level shifting technique                                                                                                                                                                                                                                                                                                                                           |
| Figure 3.3: Simplified schematic of the OLS dynamic amplifier25                                                                                                                                                                                                                                                                                                                       |
| Figure 3.4: Clock jitter impact (a) transient waveforms (b) jitter induced noise power27                                                                                                                                                                                                                                                                                              |
| Figure 3.5: Pipeline SAR ADC with output level shifting residue amplifier                                                                                                                                                                                                                                                                                                             |
| Figure 3.6: Unit Pipeline-SAR ADC timing diagram                                                                                                                                                                                                                                                                                                                                      |

| Figure 3.7: Inverter-based dynamic amplifier schematic                                                                                            | 30 |
|---------------------------------------------------------------------------------------------------------------------------------------------------|----|
| Figure 3.8: Inverter-based dynamic amplifier transient simulations                                                                                | 31 |
| Figure 3.9: Second stage reference switch control with embedded OLS technique                                                                     | 32 |
| Figure 3.11: ADC characterization set-up                                                                                                          | 33 |
| Figure 3.10: Unit-ADC chip micrograph and layout details                                                                                          | 33 |
| Figure 3.12: ADC DFT results with (a) low frequency input, (b) Nyquist input                                                                      | 34 |
| Figure 3.14: SNDR and SFDR Vs. input frequency                                                                                                    | 35 |
| Figure 3.13: DNL and INL plots                                                                                                                    | 35 |
| Table 3.1 Unit-ADC Performance Summary.                                                                                                           | 36 |
| Figure 4.1: A common structure of the high-speed time-interleaved ADC                                                                             | 38 |
| Fig. 4.2. An overview of the ADC, the ADC core consist of muti-phase clock gen, 32-channel unit-ADCs and 8-channel T/Hs                           | 41 |
| Fig. 4.3. Interleaver architecture                                                                                                                | 42 |
| Fig. 4.4. Corresponding timing diagram of sampling clocks and unit-ADC clocks                                                                     | 43 |
| Fig. 4.6. Small signal model when the second-rank switch is 'ON'                                                                                  | 44 |
| Fig. 4.5. Circuit detail of an interleaving channel                                                                                               | 44 |
| Fig. 4.7. Settling error vs. hold time with different 2 <sup>nd</sup> rank buffer bandwidth                                                       | 47 |
| Fig. 4.8. Interleaver schematic, proposed speed-enhanced bootstrapped switch and T/H post-layout simulations                                      | 48 |
| Fig. 4.9. Hold phase signal feedthrough impact                                                                                                    | 49 |
| Fig. 4.10. Clock generation include high performance 25% duty cycle 8-phase T/H clock gen., skew calibration DAC, and 32-phase unit ADC clock gen | 50 |
| Fig. 4.11. Skew DAC simulation                                                                                                                    | 51 |
| Fig. 4.12. Unit-ADC and building blocks                                                                                                           | 53 |
| Fig. 4.13. TI-ADC Chip micrograph and layout details                                                                                              | 54 |

| Fig. 4.14. Time-Interleaved ADC characterization and foreground auto calibration setup                                       | 55  |
|------------------------------------------------------------------------------------------------------------------------------|-----|
| Fig. 4.15. Aggregated DNL and INL of the Time-Interleaved ADC                                                                | .56 |
| Fig. 4.16. ADC DFT results with (a) low frequency input, (b) Nyquist input                                                   | .57 |
| Fig. 4.17. (a) ADC SNDR and SFDR vs. input frequency, (b) ADC input frequency response                                       | 58  |
| Fig. 5.2. Multicarrier RXFE modeling results                                                                                 | .63 |
| Fig. 5.1. Multicarrier RXFE architecture                                                                                     | .63 |
| Fig. 5.3. RXFE schematic and timing diagram                                                                                  | .65 |
| Fig. 5.4. Inverter-based CTLE and programmable inverter schematic                                                            | .66 |
| Fig. 5.5. Double-balanced passive mixer and resettable PMOS integrator schematic                                             | .67 |
| Fig. 5.6. Integrator output with PAM-4 signal input                                                                          | .68 |
| Fig. 5.7. Architecture and timing diagram of the 4 GS/s time-interleaved ADC                                                 | .69 |
| Fig. 5.8. RXFE clock generation                                                                                              | .70 |
| Fig. 5.9. RXFE Chip micrograph                                                                                               | .71 |
| Fig. 5.10. CTLE response of each band, ADC spectrum with a low frequency input, and ENOB vs. input frequency                 | 72  |
| Fig. 5.11. Channel insertion loss, jitter measurement, and measured PAM4 and QAM16 constellations.                           | 73  |
| Fig. 6.2. (a) 4-layer 16x time interpolator (b) unit phase interpolator cell (c) Timing diagram of the 16x time interpolator | 77  |
| Fig. 6.1. (a) Block and (b) timing diagrams of the 8-bit 2.5GS/s time-domain ADC                                             | .77 |
| Fig. 6.3. A pipeline time-domain ADC with time amplifier, and timing diagram                                                 | .78 |
| Fig. 6.4. (a) circuit diagram and (b) timing diagram of the TA                                                               | .80 |
| Fig. 6.5. Proposed multi-carrier DAC-based transmitter                                                                       | .81 |

| Fig. 6.6. | Power spectral density of modulated signal in the proposed architecture and channel loss                                                                                | .82 |
|-----------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| Fig. 6.7. | (a) Digital polar output DAC. 16-point constellations: (b) QAM-16 and (c) APSK-4+12                                                                                     | .83 |
| Fig. 6.8. | System-level model for multicarrier TX and RX                                                                                                                           | .84 |
| Fig. 6.9. | Digital polar output DAC schematics: (a) phase mux and (b) segmented<br>output driver. Simulated APSK-4+12 constellations with (c) 12.8GHz and<br>(d) 25.6GHz carriers. | .86 |

# LIST OF TABLES

| Table 3.1 Unit-ADC Performance Summary.                                    | .36 |
|----------------------------------------------------------------------------|-----|
| Table 4.5 Time-Interleaved ADC Performance Summary                         | .59 |
| Table 5.1 RXFE Performance Summary                                         | .74 |
| Table 6.1 Summary of System-Level Design Parameter for 64Gb/s DAC-based Tx | .85 |

Page

#### 1. INTRODUCTION

The explosion in cloud computing application, IOT, and wireless data usage has created dramatic growth in data center traffic. This motivates the serial links that perform the communication between ICs in these systems must operate at higher speed. In order to alleviate the significant frequency dependent attenuation encountered on these legacy channels, PAM-4 signaling is a common choice of modulation format because of the PAM-4 Nyquist frequency is just half of PAM-2. Many designs [1-5] employ ADC based receiver architectures for PAM-4 links to take advantage of process scaling and process, voltage and temperature (PVT) robustness of the CMOS process in the DSP. The more powerful equalization techniques that lend themselves to easier digital implementations extend the amount of insertion loss that the receiver can handle in comparison to mixed signal implementations. Despite these advantages, ADC-based serial links due to the conventional SAR-based time-interleaved ADC and DSP that employs large amount taps of FFE and DFE.

The converters that utilized in the ADC-based front-end employ a large amount of time-interleaved unit-ADC channels to achieve the required high effective sampling rates. Sampling in these ADCs is typically performed in several ranks with multiple clock phases per rank and generating these clocks with the required accuracy and jitter performance is a major challenge. Another major issue is maintaining the required signal bandwidth as the interleaving factor is increased. This motivates the development of high-speed low-

power unit ADCs that can reduce the interleaving factor for a given effective sampling rate, resulting in smaller area and an overall simpler design.

In ADC-based links, most of the front-end time-interleaved ADCs were not intended for achieving analog 3dB bandwidth up to Nyquist [12]. The analog bandwidth is less relevant than SNDR at high frequencies in previous researches, the FIR filter is utilized to equalize amplitude degradation. However extra FIR taps for the ADC bandwidth equalization requires more DSP power, this motivates the innovations for lowpower and high-bandwidth ADC interleaver design.

Moreover, as the wireline communication data rates climb over 100Gb/s, the bit period shrinks to dozens of picoseconds and the high-speed time-interleaved ADC performance is limited by the jitter barrier. This necessitates that clock generation and distribution circuitry that achieve rms jitter of a few hundred femtoseconds. Despite there are some researches regarding over 100Gb/s baseband PAM-4 ADC-based receiver, their jitter tolerance are limited to around one hundred femtosecond and a power-hungry low jitter clock circuitry are necessary. This motivates the innovations on the ADC-based receiver architecture for jitter robustness.

In order to address the previous mention challenges in ADC-based receiver design, this dissertation is organized as follows:

Chapter 2 introduces the background material of the high-speed wireline link technique, including the ADC-based receiver architecture, high-speed time-interleaved ADC architecture, and the challenges for the 100+GS/s receiver design. It also presents multi-carrier technique and a frequency-domain ADC-based receiver architecture. And

the digital equalization techniques for the frequency-domain ADC-based receiver are briefly described.

Chapter 3 presents a single channel 1.5GS/s 8-bit pipelined-SAR ADC that utilizes a novel output level shifting (OLS) settling technique to enable low-voltage operation of the dynamic residue amplifier with low hardware overhead. A detailed discussion of the proposed OLS settling technique that allows for an inter-stage gain of ~4 with a settling time that is only 28% of a conventional CML amplifier is introduced. Implementation details asynchronous ADC architecture and key circuit blocks as well as experiment results are covered.

Chapter 4 presents a 7-bit, 38 GS/s, 32-way Time-Interleaved ADC, which utilizes a high-bandwidth 8-way interleaver architecture based on a proposed speed enhanced bootstrapped switch, which shows higher operation speed and better EBON with high frequency input. Design details of the 38GS/s ADC and the innovative speed-enhanced bootstrapped for high-bandwidth are covered and measurement results of both ADC SNDR and bandwidth are shown to verify the effectiveness of the proposed timeinterleaved ADC architecture

Chapter 5 presents a novel jitter-robust 40Gb/s wireline ADC-based receiver frontend (RXFE) architecture that supports multicarrier signaling to provide a ~3X relaxation in clock jitter requirements. The multicarrier signal is formed by three bands to support a total 40Gb/s data rate. This includes baseband (BB) PAM4 operating at 4GS/s and midband (MB) and high-band (HB) that both carry 4GS/s QAM16 on 4GHz and 8GHz orthogonal carriers, respectively. System level simulation, receiver architecture, analog front-end circuit, clock generation with jitter injection and time-interleaved ADC design, along with the experiment results are presented.

Chapter 5 summarize the three works and concludes the dissertation. In addition, this chapter recommends a time domain ADC architecture for further improve the high-speed ADC design. It also presents a frequency domain multi-carrier TX architecture that can work with the proposed muti-carried RX front-end.

## 2. BACKGROUND<sup>\*</sup>

This chapter briefly introduces the background of the high-speed wireline link technique, including the ADC-based receiver architecture, high-speed time-interleaved ADC architecture, and the challenges for the 100+GS/s receiver design. Then the multi-carrier technique and a frequency-domain ADC-based receiver architecture are present in detail, mainly focus on the jitter tolerance performance advantage compare with the traditional base-band ADC-based receiver. Finally, the digital equalization techniques for the frequency-domain ADC-based receiver are briefly described.

## 2.1. High-speed wireline link

The bandwidth of wireline link is limited by high-frequency loss of electrical traces, reflection from impedance discontinuity and crosstalk between adjacent channels. Fig. 2.1 shows a typical backplane example, which contains IC package, connector, vias and the backplane traces. All the components introduce dispersion and reflection which causes symbols transmitted in different time interval with each other. This is known as intersymbol interference (ISI) and leads to a big challenge of transmitting high data rates over a bandwidth limited channel.

<sup>&</sup>lt;sup>\*</sup> Part of this chapter is reprinted with permission from "S. Palermo, S. Hoyos, S. Cai, S. Kiran and Y. Zhu, "Analog-to-Digital Converter-Based Serial Links: An Overview," in IEEE Solid-State Circuits Magazine, vol. 10, no. 3, pp. 35-47, Summer 2018, doi: 10.1109/MSSC.2018.2844603."



Figure. 2.1: Electrical Backplane channel cross section.



Figure. 2.2: Frequency response and pulse response of three channels.

Fig. 2.2 shows how these frequency-dependent loss terms result in low-pass channels where the attenuation increases with distance. The high-frequency content of pulses sent across these channels is filtered, resulting in an attenuated received pulse with energy that has been dispersed over several bit periods, three example channels with different profiles are present. When transmitting data across the channel, energy from individual bits will now interfere with adjacent bits and make them more difficult to detect. The ISI increases with channel loss and can completely close the received data eye



# Figure. 2.3: Eye diagrams after channels without eqalization.

diagram, as shown in Fig. 2.3. While the eye is fairly open for the short desktop channel, and a slicer (comparator) with threshold level at zero can detect the received '0' and '1' signals reliably, the eye is completely closed for longer backplane (BP) channels, which causes errors in the detected signal.

## 2.1.1. Baseband ADC-based receiver architecture

As previously introduced, ISI is the main problem in the wireline communication system to reliable symbol detection at receiver. In order to extend the bandwidth of a given channel, signal equalization techniques are employed to cancel the ISI. The equalization can be implemented on both transmitter side and receiver side. Depending on system data rate requirements relative to channel bandwidth and the severity of potential noise sources, different combinations of transmitter and/or receiver equalization are employed.

Transmit equalization, implemented with a finite impulse response (FIR) filter, is the most common technique used in high-speed links. This TX "pre-emphasis" (or "deemphasis") filter attempts to invert the channel distortion that a data bit experiences by pre-distorting or shaping the pulse over several bit times. The main advantage of implementing the equalization at the transmitter is that it is generally easier to build highspeed digital-to-analog converters versus receive-side analog-to-digital converters. However, because the transmitter is limited in the amount of peak power that it can send across the channel due to driver voltage headroom constraints, the net result is that the low-frequency signal content has been attenuated down to the high-frequency level.

Receiver side equalization can be classified as mixed-signal architectures and ADC-based architectures. Both mixed-signal receivers and ADC-based receivers employ equalization that can be broadly divided into linear equalization and non-linear equalization. Examples of linear equalization are continuous time linear equalizers (CTLE) and discrete time finite impulse response (FIR) equalizers. Examples of non-linear equalizers are decision feedback equalizers (DFE) and maximum-likelihood sequence estimation (MLSE). While mixed-signal receivers mainly employ CTLE and DFE (both FIR and IIR feedback), ADC-based receivers generally employ some analog equalization before the ADC in the form of a CTLE and followed by a powerful linear feedforward equalizer (FFE) and DFE in the digital domain in the DSP.

As the ever-increasing demanding for the I/O data rate climes to 56 Gb/s and scaling beyond 100 Gb/s in the future. The large amount of frequency-dependent loss present in conventional electrical channels makes the use of common two-level pulse amplitude modulation (PAM-2) challenging without significant infrastructure upgrades. This motivates the use of the more spectrally efficient four-level PAM (PAM-4). While PAM-4 has a Nyquist frequency half of PAM-2, it is more sensitive to residual ISI. Thus, mix-signal receiver front ends often employ large tap-count feedforward equalizers (FFEs) that are difficult to robustly implement in the analog domain due to process, voltage, and temperature variations. Fig. 2.4 shows a serial link receiver front end that employs an



Figure. 2.4: ADC-based receiver architecture.



Figure. 2.5: 128 Gb/s ADC-based receiver modeling. Channel, BER vs. Jitter

analog-to-digital converter (ADC) followed by digital signal processing (DSP) to perform ISI cancellation and symbol detection in the digital domain. This ADC-based receiver is well suited for more spectrally efficient modulation schemes, such as PAM-4, and benefits from the improved area and power offered from CMOS scaling.

Unfortunately, sampling clock jitter places fundamental performance limitations on common time-interleaved ADC architectures, necessitating clock generation and distribution circuitry that achieve rms jitter of a few hundred femtoseconds. As shown in Fig. 2.5, for 128Gb/s communication over the channel, the clock needs less than 300fs total rms jitter to achieve acceptable BER assuming forward error correction in the system.



Figure. 2.6: A 64-way time-interleaved ADC example

## 2.1.2. High-speed ADC architecture and calibration

For the over 20GS/s ADC design, time-interleaved architecture is commonly employed, it combines N-way unit-ADCs that working in a time interleaving manner to achieve N x fs conversion speed. Fig 2.6 show a 64-way time-interleaved ADC example [12]. Most of the time-interleaved ADC designs are intended to have good SNDR at Nyquist instead of achieving 3dB input bandwidth up to Nyquist, since the FIR filter could equalize the high-frequency amplitude degradation. The ADC input connects to 16 parallel sampling switches that sample the input onto sampling capacitors. The sampled voltages are then buffered and forwarded to four sub-ADCs. SAR ADCs are chosen to convert the samples, as their mostly digital architecture is highly suitable for FinFET technologies at low supply voltages, and SAR ADCs have been proven to be power efficient and small in area while operating at more than 1 GS/s [16]. The digital output of the SAR ADCs is connected to a large shift-register-based memory that stores 16384 samples.



Figure. 2.7: Impact of offset (a) and gain (b) error on ADC output spectrum

While time interleaving enables extremely high sample rate converters, conversion errors occur due to mismatches between the parallel sub-ADCs. These errors appear as spurious peaks in the ADC output spectrum and can significantly degrade SNDR. The magnitude and position of these tones depend on the type of the mismatch, which can be classified as either offset, gain, and timing skew errors. Calibration techniques, both in the analog and digital domains, are employed to correct for these time interleaving errors and achieve acceptable performance.

Channel offset errors occur due to device mismatches in the time-interleaved T/Hs, reference generation circuitry, and comparators. Fig. 2.7(a) shows the impact of offset error on ADC output spectrum for an 8-way TI-ADC model with a  $1V_{ppd}$  FSR and offset standard variation of 40mV. Analog domain channel offset calibration could be costly, whereas the offset of each channel could be easily subtracted in digital domain which is more common nowadays. The mismatches can also result in gain errors between each channel. Fig. 2.8 shows the impact of gain errors on ADC output spectrum for an 8-way TI-ADC model with a  $1V_{ppd}$  FSR and gain standard variation of 5%. Same as the offset



**Figure. 2.8: Impact of skew error on ADC output spectrum** error, the gain error is signal independent static error, it could be easily calibrated in digital domain by multiplying a correction factor with the output of each channel.

Skew errors result from device mismatches and layout asymmetries in the multiphase clock generation and distribution to the input track-and-holds. Fig. 2.8 shows the impact of gain errors on ADC output spectrum for an 8-way TI-ADC model with a  $1V_{ppd}$ FSR and skew standard variation of 2ps. These are most often calibrated with per-phase digitally-adjustable delay cells in the clock distribution buffers [13, 14, 15, 19] or phase interpolators with independent phase offset codes. Skew errors will cause each sub-ADC to generate a pulse response with a slightly different ISI characteristic. Thus, an efficient approach to detect skew errors is to monitor the differences in the converged tap coefficients of a per-slice adaptive equalizer. The delay cell or phase interpolator control can be adjusted to minimize the coefficient differences and calibrate the skew to within the resolution of the correction circuitry. Independent equalizer tap control allows for further fine compensation of residual skew and bandwidth errors.

## 2.2. Muti-carrier techniques

Multicarrier communication systems [30] offer several key advantages over single carrier systems including reduced inter-symbol interference because of the narrowband nature of each carrier and much simpler frequency domain equalization consisting of just complex value scaling of the carrier's amplitude and phase estimates. One of the most successful realizations of multicarrier communications systems is orthogonal frequency division multiplexing (OFDM) where the transmitter uses an inverse DFT to transform the frequency block of information bits to the time domain for transmission, and the receiver uses a DFT to estimate the carrier's information. This digital implementation has taken advantage of technology scaling and the advances in low power DAC and ADC needed at the interfaces.

From legacy DSL operating at few Mb/s to WiFi operating at 56Mb/s and most recently serial links operating at 56Gb/s [8] have relied on this technology. While the wireless standards require up-conversion of a complex baseband signal to the allocated frequency band with in-phase and quadrature components simultaneously transmitted, the wireline counterparts do not require up-conversion and therefore realize the transmission of a baseband real-valued signal in which the constellation is mapped. This baseband realization is called discrete multi-tone (DMT). The sampling of multicarrier signals can be accomplished in time-domain at Nyquist rate for a full digital implementation or in a symbol rate fully analog implementation if a correlator bank is used. More interesting, it can be implemented in the frequency domain if filters banks are used. As illustrated in Fig. 2.9 and investigated in [30], sampling multicarrier signals of total duration T in the



Figure. 2.9: Characteristics of A/D conversion of multicarrier signals in the frequency domain. (a) Effect of time segmentation in the bandwidth of the multicarrier signal and in the samples frequency spacing  $\Delta Fc$  (b) Number of coefficients N versus the symbol-period to segmentation-time ratio (M = T/Tc).

frequency domain introduces a fundamental constraint in the total number of frequency samples (N) versus the number of processing windows (M) and number of carriers (S) given by N=S/M. Such constraint produces carrier expansion when the window processing time (Tc) shortens, requiring an adaptive digital baseband that adjusts for changing conditions in the design

# 2.2.1. Frequency-domain ADC-based receiver

In order to break the jitter barrier that existed in the traditional ADC-based receiver. Fig. 2.10 shows a frequency-domain ADC-based receiver [31]. The input CTLE drives the front-end channels that have a mixer for down-conversion, a Bessel low-pass filter, and an ADC for sampling and digitization. These digitized samples are then processed by the FIR filters in the DSP and their outputs are combined to either perform symbol estimation in PAM-4 baseband mode or to perform both inter-channel interference (ICI) and ISI cancellation in multi-tone mode. This architecture provides several benefits. First,

#### Reconfigurable Frequency Domain ADC-Based Receiver



Figure. 2.10: 128 Gb/s ADC-based receiver modeling. Channel, BER vs. Jitter

the mixers perform self-equalization [32] and provide some channel loss compensation, allowing for a reduction in digital equalization complexity. Second, high-frequency noise introduced by the mixers and CTLE is attenuated by the channel filters. Finally, the inclusion of digital receive-side ICI cancellation filters in the 128Gb/s system allows for



Figure. 2.11: (a) 128Gb/s system pulse responses from a time-interleaved 64GS/s PAM4 system and the proposed frequency-domain receiver in multi-tone mode with the responses from the baseband and I-phase channels shown. (b) BER vs RJ for a time-interleaved receiver and the frequency-domain receiver operating in baseband PAM4 and multi-tone mode.

a 50% improvement in relative channel spacing when compared against a previous 10Gb/s mixed-signal implementation [32].

The receiver channels are configured differently in order to support conventional baseband (PAM2-8) or multi-tone signaling. Assuming a 128Gb/s data rate achieved with baseband 64GS/s PAM4, the receiver is configured as having 3 effective channels with pairs of channels utilizing the same mixer LO frequency. System simulations indicate that a near optimum partitioning has the first effective channel (1/2) processing the low-frequency portion of the signal with a dummy mixer, the second channel (2/4) utilizing a single-phase 1/3-baudrate LO (21.33GHz), and the third channel (5/6) utilizing a single-phase 1/2-baudrate LO (32GHz). The pair of ADCs in each effective channel operates as a 21.33GS/s time-interleaved converter.

An intuitive visualization of the relative jitter robustness of the proposed architecture is shown in the Fig. 2.11(a) pulse responses from a conventional time-interleaved system and the proposed frequency-domain architecture operating in multi-tone mode over a channel with over 30dB loss at 32GHz. Even with an optimized CTLE, the narrow 64GS/s PAM4 pulse response has significant ISI and is more sensitive to timing jitter than the wider 12.8GS/s pulse responses that also experience mixer self-equalization. This is reflected in the BER versus random jitter plot of Fig. 2.11(b). Assuming an equal amount of RJ on both the ADC samplers and the mixer LO clocks, the multi-tone architecture is able to achieve a BER=10-12 with an RJ near 2psrms. This is over 5X the jitter required by the time-interleaved system. In a more conventional PAM4 mode, the frequency-domain receiver still provides a 60% improvement in jitter tolerance.

## **2.3.** Digital equalization techniques for the frequency-domain ADC-based receiver

Operating in a frequency-channelized manner offers the key advantage of processing a longer symbol time, which translates into reduced clock rates for the digital filters. For example, a time-interleaved receiver implementation requires 16 FFE taps and 2 DFE taps effectively operating at the full baud rate in order to support 128Gb/s operation over the channel in Fig. 2.10 with baseband PAM4. Reception of the same baseband PAM4 signal with the frequency-domain receiver offers the advantages of mixer self-equalization and a longer effective symbol time out of the ADC front-end. This allows removal of the DFE taps and the utilization of a digital equalizer with only 16FFE taps that effectively operate at one-third the baud rate.





Further digital complexity savings are achieved with operation in the 5-channel multi-tone mode, with only 3 FIR taps that effectively operate at one-fifth a comparable PAM4 baud rate required for ISI cancellation. However, in the absence of sharp roll-off filters on the transmitter side, signal energy from one channel will spill over into the adjacent channel and cause inter channel interference in the multi-tone implementation. While ICI can be reduced to an insignificant level by increasing the spacing between the data channels [32], this will cause the highest frequency QAM channel to experience significant attenuation when it passes through the communication channel. As this interference is well modelled as linear time-invariant, the FIR filters are utilized to remove ICI in a manner similar to the ISI cancellation filters.

Fig. 2.12 shows a 2-channel example with the desired channel pulse responses, PR11 and PR22, and the ICI pulse responses, PR12 and PR21. The received signal from the aggressor channel is passed through an ICI FIR filter to produce an interference estimate that is subtracted from the victim channel. In the proposed multi-tone implementation, a total of 8 4-tap ICI FIR filters are necessary to remove adjacent channel interference. Comparing against a time-interleaved PAM4 implementation at the same

effective data rate, and considering both the 3-tap ISI and 4-tap ICI FIR filters, the multitone digital complexity is only 59% of the TI system.

# 3. A 1.5GS/S 8B PIPELINED-SAR ADC WITH OUTPUT LEVEL SHIFTING SETTLING TECHNIQUE IN 14NM CMOS<sup>\*</sup>

This chapter presents a single channel 1.5GS/s 8-bit pipelined-SAR ADC utilizes a novel output level shifting (OLS) settling technique to reduce the power and enable lowvoltage operation of the dynamic residue amplifier. The ADC consists of a 4-bit first stage and a 5-bit second stage, with 1-bit redundancy to relax the offset, gain, and settling requirements of the first stage. Employing the OLS technique allows for an inter-stage gain of ~4 from the dynamic residue amplifier with a settling time that is only 28% of a conventional CML amplifier. The ADC's conversion speed is further improved with the use of parallel comparators in the two asynchronous stages. Fabricated in a 14nm FinFET technology, the ADC occupies 0.0013mm<sup>2</sup> core area and operates with a 0.8V supply. 6.6bit ENOB is achieved at Nyquist while consuming 2.4mW, resulting in an FOM of 16.7fJ/conv.-step.

# **3.1. Introduction**

As wireline communication data rates climb about 100Gb/s, there is an increased number of receiver front-ends utilizing high-speed analog-to-digital converters (ADCs) that allow for subsequent powerful digital equalization and symbol detection techniques [6]. These converters employ a large amount of time-interleaved unit-ADC channels to

<sup>\*</sup>Part of this chapter is reprinted with permission from "Y. Zhu et al., "A 1.5GS/s 8b Pipelined-SAR ADC with Output Level Shifting Settling Technique in 14nm CMOS," 2020 IEEE Custom Integrated Circuits Conference (CICC), 2020, pp. 1-4, doi: 10.1109/CICC48029.2020.9075942."

achieve the required high effective sampling rates. Sampling in these ADCs is typically performed in several ranks with multiple clock phases per rank and generating these clocks with the required accuracy and jitter performance is a major challenge. Another major issue is maintaining the required signal bandwidth as the interleaving factor is increased. This motivates the development of high-speed low-power unit ADCs that can reduce the interleaving factor for a given effective sampling rate, resulting in smaller area and an overall simpler design.

Successive-approximation-register (SAR) ADC architectures are popular due to their low comparator count and simple digital logic content, making them suitable for compact and power-efficient mid-resolution time-interleaved ADCs [33]. However, the conversion speed is limited in the most common implementation of the successive approximation algorithm that performs sequential single-bit conversion cycles. As shown in Fig. 3.1, introducing pipelining in the SAR ADC provides improved speed by decreasing the number of conversion cycles per input sampling event. A critical block in this architecture is the amplifier that transfers the residue signal between the two pipeline stages. In high-speed converters, conventional opamp-based amplifiers are not suitable due to the excessive static power required to meet settling time requirements. An alternative approach is to use a dynamic residue amplifier that is only activated once over the entire conversion process [18].



Figure 3.1: Pipeline SAR ADC with a dynamic amplifier.

While dynamic residue amplifiers have the potential to save power, these topologies require a small  $\tau$  to achieve fast settling times. Satisfying this and maintaining a given gain can result in large dynamic tail current values and increased input transistors that load the first pipeline stage capacitive digital-to-analog converter (CDAC). Given that the smallest CDAC that satisfies the kT/C noise requirement is desired to reduce input buffer power, this loading can cause significant reference attenuation that must be compensated with an increased range reference buffer that is difficult to implement with low supply voltages. Another issue is kickback noise due to the coupling through the large dynamic amplifier input transistors.

This Chapter presents a single channel 1.5GS/s 8-bit pipelined-SAR ADC that utilizes a novel output level shifting (OLS) settling technique to enable low-voltage
operation of the dynamic residue amplifier with low hardware overhead. A detailed discussion of the proposed OLS settling technique that allows for an inter-stage gain of ~4 with a settling time that is only 28% of a conventional CML amplifier is given in Section 3.2. Section 3.3 provides an overview of the asynchronous ADC architecture and key circuit details. Measurement results from a 14nm CMOS FinFET prototype are presented in Section 3.4. Finally, Section 3.5 provides the conclusion.

#### **3.2. Output Level Shifting Settling**

While the pipeline-SAR ADC architecture reduces the required settling accuracy of the residue amplifier, it is still challenging to achieve this at high speeds. Upon activation, the Fig. 3.1 conventional dynamic amplifier output will settle as

$$V_{Amp} = A_{CML} V_{in} \left( 1 - e^{-\frac{t}{\tau}} \right)$$

Where  $A_{CML}$  is the gain. This settles to 50% of the steady state value in a rapid 0.69 $\tau$ , but requires an additional 3.47 $\tau$  to settle to the 5-bit accuracy required in the second pipeline stage. The brute force method of reducing this settling time is to reduce the load resistor to decrease  $\tau$ , but this leads to the aforementioned issues of increased tail current values and large input transistor sizes.

Previously, a OLS technique was developed to reduce errors in feedback amplifiers that occur from finite opamp gain [22]. In that work, an initial estimate of the desired output voltage is sampled on a level shifting capacitor and then this capacitor is switched in series with the opamp output and the feedback amplifier output to improve settling accuracy. This work modifies this technique to dramatically improve the settling time of the open-loop dynamic residue amplifier by utilizing the second pipeline stage CDAC2 as



Figure 3.2: Output level shifting technique.

the level shifting capacitor. Fig. 3.2 gives an overview of the proposed OLS settling technique. When  $\Phi_{Amp}$  is high and the amplifier is activated, the differential output voltage is stored on both sides of CDAC2 by connecting the nominal amplifier output to the top plate and the opposite output to the bottom plate. This  $\Phi_{Amp}$  duration should nominally match the rapid 50% settling time. After this,  $\Phi_{OLS}$  is enabled to switch the CDAC2 bottom plate to the common mode. Charge conservation during this phase produces a rapid doubling of the amplifier output signal. Thus, the amplifier output voltage only needs to initially settle to 50% of the steady-state value and the long second half settling is avoided. The significant speed-up offered by the OLS technique is achieved with the low hardware overhead of only one extra bottom-plate CDAC2 switch.

A simplified schematic of the OLS dynamic amplifier, which offers several improvements relative to a conventional CML dynamic amplifier, is shown in the Fig. 3.3. Instead of utilizing a simple resistive-loaded differential pair, this inverter-based amplifier structure provides both PMOS and NMOS transconductance to provide a higher gain of

$$A = (g_{mp} + g_{mn})(r_{on}/r_{op}) \approx 2A_{CML}$$



Figure 3.3: Simplified schematic of the OLS dynamic amplifier.

at lower supply voltages. While the amplifier has high impedance outputs, a stable output common mode is achieved by resetting the CDAC2 top plate to the common mode prior to activation. One downside of this OLS amplifier is that the equivalent capacitive loading is 4X larger than the conventional CML amplifier due to both sides of CDAC2 being connected to each amplifier output and each capacitor experiencing Miller multiplication. Considering this, the time for the OLS amplifier to achieve 50% settling relative to the original CML amplifier is

$$Av_{in}\left(1-e^{-\frac{t}{4*\tau}}\right)=0.5A_{CML}v_{in}$$

 $t = 1.15\tau$ .

Due to the increased amplifier gain, this is only 28% of the  $4.17\tau$  required by the conventional CML amplifier at 5-bit resolution. This also results in lower average power

due to the dynamic amplifier's reduced activation time. Moreover, the required amplifier's linear output swing range is decreased by a factor of two.

One potential issue with the proposed amplifier is matching the duration of  $\Phi_{Amp}$  with the 50% settling point. However, high precision is not necessary, as any inaccuracy simply results in a modified gain value that is easily compensated with adjustment of the second stage reference voltages.

The jitter of the  $\Phi_{Amp}$  pulse is another issue need to be considered. As shown in the Fig. 3.4(a), the  $\Phi_{Amp}$  jitter causes the timing variance of the level shifting point, and this time domain error further generates the voltage noise at the output. Intuitively, a small settling time constant  $\tau$  gives very fast transition time which makes the output voltage more sensitive to the timing variance, and the rms value of the jitter determines how much timing error we have. Thus, the noise power is the function of the settling time constant  $\tau$  and the rms jitter. Quantitatively, the jitter induced voltage noise power could be calculated through the following equation

$$\Delta V \cong \frac{dV}{dt} * \Delta t$$

$$E\{\Delta V^2\} = E\left\{\left(\frac{dV}{dt}\right)^2 * \Delta t^2\right\} = E\left\{\left(\frac{dV}{dt}\right)^2\right\} * E\{\Delta t^2\}$$

$$E\{\Delta V^2\} \cong \left(\frac{0.75}{2*\tau}\right)^2 * \sigma_t^2$$



Figure 3.4: Clock jitter impact (a) transient waveforms (b) jitter induced noise power.

According to the previous equation, Fig. 3.4(b) plot the noise power Vs. jitter with



Figure 3.5: Pipeline SAR ADC with output level shifting residue amplifier.

different settling time constant. The jitter specifications on the  $\Phi_{Amp}$  signal are also not prohibitive, as a relatively large 3.8ps rms jitter can be tolerated to achieve 4-bit accuracy with 80ps settling time constant.

## **3.3. ADC Architecture and Building Blocks**

Fig. 3.5 shows the 8-bit pipelined-SAR ADC with the first pipeline stage converting 4-bits and the second stage converting 5-bits. This 1-bit redundancy between the two stages relaxes the gain, offset, and reference settling requirements of the first stage. The input signal is sampled with a boot-strapped switch that reduces the input sampling time constant and improves high-frequency linearity. kT/C noise requirements are satisfied with CDAC1 and CDAC2 set at 32fF and 16fF, respectively. Both stages employ parallel comparators that are asynchronously activated sequentially for each conversion step, eliminating the comparator reset delay and offering significant speed-up.

The ADC timing diagram is shown in Fig. 3.6. After input sampling, the first stage converts 4 bits and holds the residue voltage for partial amplification when  $\Phi_{Amp}$  goes high.



Figure 3.6: Unit Pipeline-SAR ADC timing diagram.

 $\Phi_{Amp}$  then transitions low and the level-shifted gain is achieved when the  $\Phi_{OLS}$  pulse is activated with minor modifications in the second stage reference switch logic. The second stage then converts the final 5 bits. Both stage CDACs are reset after their conversions are complete to avoid memory effect, with these reset signals internally generated by the comparator ready signal and input clocks. Independent flipped-voltage followers serve as buffers for the two sets of CDAC reference voltages that are locally decoupled with MOS capacitors. As previously mentioned in Section II, the second-stage reference voltage values are tuned to accommodate any static deviations in the dynamic amplifier gain due to the exact  $\Phi_{Amp}$  pulse width.

#### **3.3.1. Dynamic Amplifier and Comparator**



Figure 3.7: Inverter-based dynamic amplifier schematic

A clocked inverter-based buffer that achieves a gain of ~4 serves as the residue amplifier stage, shown in detail in Fig. 3.7. In addition to the main transconductance transistors M1/2 and M7/8, M3 acts as a current source that is switched on by M4 when  $\Phi_{Amp}$  is activated. The gray transistors improve the dynamic performance, with M10-13 compensating kickback noise and M9 boosting the amplifier startup. When the amplifier is disabled transistors M5 and M6 reset V<sub>Source</sub> to VDD and short the differential output, respectively. As shown in the Fig. 3.8 transient simulation results, resetting CDAC2 allows the amplifier output to start separating from the common mode and then experience an effective doubling after the level shifting.



Figure 3.8: Inverter-based dynamic amplifier transient simulations

Dynamic two-stage comparators are used to allow for low-voltage operation in the two pipeline stages. These comparators are foreground offset-calibrated with current-mode DACs.

#### 3.3.2. Second Stage Reference Switch with Embedded OLS

Fig. 3.9 illustrates how the OLS technique is included with minor logic changes in the reference switch control to allow both the CDAC2 bottom and top plate to connect to the amplifier output during the amplification phase. As shown in detail for the negative DAC MSB switches, there is an extra right-most switch that connects the bottom plate to the positive input signal when  $\Phi_{Amp}$  is high. An AND gate then produces the  $\Phi_{OLS}$  signal



Figure 3.9: Second stage reference switch control with embedded OLS technique

to switch the bottom plate to the common mode when  $\Phi_{Amp}$  goes low and the comparators' ready signals are enabled. Since the extra switch is added to the CDAC bottom plate, there are no speed penalty or reference attenuation issues.

## **3.4. Experimental Results**



Figure 3.10: Unit-ADC chip micrograph and layout details



Figure 3.11: ADC characterization set-up.

Fig. 3.10 shows the chip micrograph of the pipelined-SAR ADC, which was fabricated in a 14nm CMOS FinFET process and occupies an active area of 0.0013mm<sup>2</sup>. The ADC is powered from a 0.8V supply and has a 460mV<sub>pp,diff</sub> full-scale input range with a common mode of 500mV. Testing is performed with an initial foreground offset calibration for all comparators. Fig. 3.12 shows DFTs of the ADC output when operating at 1.5GS/s. The achieved SNDR is 43.5dB and 41.4dB for low frequency and close to Nyquist inputs, respectively, translating to 6.93 and 6.58 bits ENOB. Fig. 3.13 shows that



Figure 3.12: ADC DFT results with (a) low frequency input, (b) Nyquist input

the ADC maintains over 40dB SNDR and 45dB SFDR over frequency and Fig. 3.14 shows the maximum DNL and INL are +0.93/-0.84 and +0.75/-1.02 LSB, respectively. Table I summarizes the ADC performance and compares this work against previous medium resolution ADCs

#### **3.5.** Conclusion

This chapter presented a single channel 8-bit pipelined-SAR ADC that utilizes a novel low-overhead OLS settling technique in the dynamic residue amplifier. A low power



Figure 3.13: DNL and INL plots



Figure 3.14: SNDR and SFDR Vs. input frequency

design is realized by combining this technique with the use of parallel comparators in the

| References                 | VLSI'18<br>[34] | CICC'19<br>[35] | ISSCC'17<br>[36] | ISSCC'17<br>[18] | This Work |
|----------------------------|-----------------|-----------------|------------------|------------------|-----------|
| Technology<br>(nm)         | 40              | 40              | 28               | 14               | 14        |
| Supply<br>(V)              | 1.2             | 1.1             | 0.9              | 0.95             | 0.8       |
| Architecture               | Two step<br>SAR | 2-3b SAR        | 1-2b SAR         | Pipe-SAR         | Pipe-SAR  |
| Channels                   | 1               | 1               | 2                | 1                | 1         |
| Resolution<br>(bits)       | 8               | 7               | 7                | 10               | 8         |
| Sampling rate<br>(GS/s)    | 1.1             | 0.9             | 2.4              | 1.5              | 1.5       |
| ENOB<br>@Nyquist           | 7.18            | 6.3             | 6.36             | 8                | 6.58      |
| Area<br>(mm <sup>2</sup> ) | 0.00165         | 0.014           | 0.0043           | 0.00158          | 0.0013    |
| Power<br>(mW)              | 4               | 2.6             | 5                | 6.9              | 2.4       |
| FOM<br>(fj/conver-step)    | 25              | 36.6            | 25.3             | 17.7             | 16.7      |

 Table 3.1 Unit-ADC Performance Summary.

two asynchronous pipeline stages to allow for 1.5GS/s operation with a low 0.8V supply voltage.

# 4. A 38GS/S 7-BIT TIME-INTERLEAVED PIPELINED-SAR ADC WITH SPEED-ENHANCED BOOTSTRAPPED SWITCH\*

This chapter presents a 7-bit, 38 GS/s, 32-way Time-Interleaved ADC, which utilizes a high-bandwidth 8-way interleaver architecture based on a proposed speed enhanced bootstrapped switch, which shows higher operation speed and better EBON with high frequency input. For the high-speed unit-ADC design, the pipeline structure of the 4-bit 1st stage and 4-bit 2nd stage is utilized for low power and high-speed design, and utilized output level shifting settling technique (OLS) allows for an inter-stage gain of ~ 4 with low power and achieves only 33% of a conventional CML amplifier based exponential settling. The unit ADC's conversion speed is further improved with the use of a parallel comparator in the two asynchronous stages. The ADC was fabricated in the Intel 22nm FinFET technology, at 38GS/s, the 7b ADC achieves 41.9fj/conv.-step at low frequency, 64.1fj/conv.-step at Nyquist, and has a 20GHz 3dB bandwidth.

## 4.1. Introduction

High-speed time-interleaved ADCs are becoming more common in wireline receiver front-ends due to the enabling of subsequent digital processing for equalization and easier support of higher-order modulation schemes [6]-[9]. Fig. 4.1 shows a common implementation of the high-speed time-interleaved ADC[10][11], which utilize two rank

<sup>\*</sup> Part of this chapter is reprinted with permission from "Y. Zhu et al., " A 38GS/s 7b Time-Interleaved Pipelined-SAR ADC with Speed-Enhanced Bootstrapped Switch in 22nm FinFET" 2022 IEEE Custom Integrated Circuits Conference (CICC), 2022"



Figure 4.1: A common structure of the high-speed time-interleaved ADC

architecture. The first rank is high-speed tracking and hold circuit with less interleaving factor, each tacking and hold channel followed by multiple unit-ADCs that performs analog to digital conversion for the held volage from the first rank. Given that the high-speed sample and hold is only utilized at the first rank, this architecture is able to efficiently decreases the critical sampling clock phases for massive interleaving unit-ADC channels. and alleviate the bandwidth mismatch between T/Hs.

However, the sampling of wideband analog signals associated with higher data rates is a big challenge for conventional bootstrapped switch (BS) T/H circuits. one reason

for this is that the low-duty-cycle sampling clocks are utilized for avoiding sampling crosstalk between time-interleaved sub-ADCs, shorten the tracking time and requires improvements in T/H circuit startup time. The other reason is as technology nodes scale, ADCs based on the digital-intensive SAR architecture becomes faster, in order to keep the low interleaving factor for T/Hs, it requires the T/H circuits scales same in speed. This motivates the use of simple NMOS switches in high-speed ADCs [12]-[15]. However, even the input buffer is NMOS source follower, the common mode input of the sampling circuit is above the V<sub>DSAT</sub> that results in medium conductance of the NMOS sampling switch. Therefore, with the nominal supply voltage, the NMOS switch requires quite large size transistors to satisfy the settling requirements. Significant size of the NMOS switch result in additional non-linear capacitance on the first rank and increase the loading of the input broadband buffer. Also, the NMOS switch negatively impact the high-speed linearity and ADC front-end bandwidth.

The other challenge is from the unit-ADCs [16][17]. In order to achieve high date rate wireline communication, the converters have to employ a large amount of timeinterleaved unit-ADC. A high-speed low-power unit ADCs that can reduce the interleaving factor for a given effective sampling rate, resulting in smaller area and an overall simpler design.

To overcome these issues, this chapter presents a time-interleaved ADC that utilizes both a high-bandwidth interleaver architecture based on a proposed speedenhanced bootstrapped switch [19] and a pipelined-SAR unit ADC that utilizes a novel output level shifting (OLS) [20] settling technique to enable low-voltage operation and low power consumption of the dynamic residue amplifier with low hardware overhead.

This chapter is organized as follows, Section II describes the ADC architecture including the interleaver structure with the proposed speed-enhanced bootstrapped switch. A detailed discussion of the unit pipeline-SAR ADC with OLS settling technique and some key circuit details are given in Section III. Measurement results from a 22nm CMOS FinFET prototype are presented in Section IV. Finally, Section V provides the paper conclusion.

#### **4.2. Time-Interleaved ADC Architecture**

A top-level overview of the 38GS/s time-interleaved ADC is shown in Fig. 4.2. The ADC core consists of an 8-way first-rank interleaver that samples and buffers the input signal, 32-way unit-ADCs, which performs the A-2-D conversion, and multi-phase clock Gen.

The ADC input is fully differential and terminated by a 100-ohm resistor. Both sides of the input are protected by the reduced electrostatic discharge diodes (ESD) for less input bandwidth impact. A T-Coil termination architecture is utilized to reduce the effective capacitance seen at the input of the ADC by tuning out the ESD capacitance [21]. The ADC input firstly connect to two parallel buffers which separately drive even and odd channels from the 8 parallel sampling switches that sample the input on the sampling capacitors. The sampled voltages are then buffered and forwarded to 4 sub-ADCs. Pipe-SAR ADCs are chosen to convert the samples, as SAR ADC are mostly digital architecture



Fig. 4.2. An overview of the ADC, the ADC core consist of muti-phase clock gen, 32-channel unit-ADCs and 8-channel T/Hs

is highly suitable for advanced technologies at low supply voltages, and pipelining techniques have been proved to be both power efficient and high speed.

The 32 channel 7-bit digital output data and clock of each unit ADCs are captured by a synchronize block then connected to the decimator that down sample the output data rate to ~MHz for the measurement. A differential external clock is connected to an onchip CML buffer then pass through the CML divider. The four phases of the CML divider output are converted to CMOS level then feed into the ADC core. The 8 clock phases of



Fig. 4.3. Interleaver architecture

the Tracking and Hold (T/H) and the 32 clock phases of the unit ADC clock are derived from the CLK pulse generation block which is the part of the ADC core.

# 4.2.1. ADC Timing Diagram





Fig. 4.3 shows the interleaver architecture. There are two stages of the timeinterleaving, The first rank is an 8-way T/Hs where the input is sampled and held by 8 phase clocks and two input buffers. Parallel even and odd input buffers drive half of the first-rank 8-way T/Hs. Fig. 4.4 shows corresponding timing diagram of sampling clocks and unit-ADC clocks, the T/H is clocked by fs/8 25% duty-cycle pulse, which avoid the



Fig. 4.5. Circuit detail of an interleaving channel



Fig. 4.6. Small signal model when the second-rank switch is 'ON'

sampling crosstalk in each 4 even and odd T/H channels, and since the even and odd channel is isolated by the parallel input buffers, there is no sampling crosstalk between all the T/H channels. The 2nd interleaving stage, realized through ADC buffer, drives 4 unit-ADCs slices but only one unit-ADC sampling switch is on at each time. The 32-phase unit-ADC sampling clock runs at fs/32 and each unit-ADC clock sampling phase overlaps with 1st stage holding phase.

## 4.2.2. Interleaver architecture and proposed speed-enhanced bootstrapped switch

In order to derive the efficient design technique for the high speed interleaver. This part first analysis the step response of the sample and hold process. Fig. 4.5 shows the

schematic of a sampling channel, the buffered input signal first sample and hold on the 1st rank sampling capacitor CS and connect to the gate  $V_G$  of a source follower which servers as 2nd rank buffer to drive the following unit-ADCs which contains second rank sample and hold circuit. The second rank switch is enabled while the first rank switch is in hold phase.

The significant layout size difference between the less interleaved and relatively small size T/Hs and massive slices unit-ADCs caused long routing from the buffer to unit-ADCs. Since each unit-ADC needs to see the full input bandwidth, the second rank buffers are the most power-hungry block of the interleaver. It is very challenging to design such a buffer that can drive the unit-ADCs through long routing with low power.

As  $V_s$ , output of the source follower, has an initial value when the 2<sup>nd</sup> rank switch is 'on', also VS is changing during the settling process due to the finite bandwidth of the 2nd rank buffer. It is not a simple RC settling process and the math results for the transient step response can be tricky.

Fig. 4.6 shows the small signal model when the  $2^{nd}$  rank switch is on.  $V_{OUT}$  is the voltage on the unit-ADC DAC. C<sub>P</sub> consist of the self-capacitance loading of the buffer and the capacitance parasitic from routing, usually C<sub>P</sub> is comparable to C<sub>DAC</sub>, which is the capacitive DAC of the unit-ADC, due to the long routing from buffer to each unit-ADC. From the layout extraction, C<sub>P</sub>/C<sub>DAC</sub> is roughly 2 for this design. First, write the current equation for the V<sub>S</sub> node using Kirchhoff's current law, and C<sub>GS</sub> is neglected, because the gate voltage is already held

$$C_P \frac{dV_S}{dt} - g_m \cdot V_{gS} + \frac{V_S - V_{OUT}}{R_{ON}} = 0$$

$$\tag{4.1}$$

The voltage  $V_S$  is expressed by

$$V_S = R_{ON} \cdot C_{DAC} \cdot \frac{dV_{OUT}}{dt} + V_{OUT}$$
(4.2)

In the Eq. (4.1), assume the output impedance of the source follower ro1//ro2 is infinite, which lead to an ideal unit gain source follower. Since the gain of the source only impacts the steady-state and causes the static error, and it could be calibrated as channel gain error, therefore ignored in the dynamic settling error analysis.

Also given the worst case for the initial condition. The worst case happens when  $V_{OUT}$  settles from '0' to max amplitude with a Nyquist input which gives maximum difference between  $V_G$  and  $V_S$ 

$$V_{OUT}(0) = 0, \ V_S(0) = 1 - \max(|V_G(t) - V_S(t)|)$$
(4.3)

And  $\tau = R_{ON} \cdot C_{DAC}$ ,  $GBW \cdot 2\pi = \frac{g_m}{c_P}$ , the transient step response of the V<sub>OUT</sub> is derived by solving the Eq.(4.1) and Eq.(4.2) with the initial condition Eq.(4.3). And the final results is a function with 2<sup>nd</sup> rank switch time constant  $\tau$ , 2<sup>nd</sup> rank buffer GBW and hold time.



Fig. 4.7. Settling error vs. hold time with different 2<sup>nd</sup> rank buffer bandwidth

Since combine the eq. (4.1) and eq. (4.2) gives a non-homogeneous differential equation, it is solved by MATLAB. Based on the results of the transient response, Fig. 4.7 shows the modeling results of settling error vs. hold time with different 2<sup>nd</sup> rank buffer bandwidth. The required 2<sup>nd</sup> buffer bandwidth is just 9GHz with 25% duty-cycle compared to 15GHz with the 50% duty-cycle. The 25% low-duty-cycle sampling clock significantly relax the buffer bandwidth requirements by increase the holding time and therefore save the power.



Fig. 4.8. Interleaver schematic, proposed speed-enhanced bootstrapped switch and T/H post-layout simulations

While the 25% duty cycle first-rank T/H clock reduce the buffer bandwidth requirements and sampling crosstalk, it does necessitate that the T/H have a fast start-up time. It is difficult because the T/H is load by the second-rank buffer that has to be sized sufficiently to drive long routing parasitics to the second- rank switches. The proposed BS topology, shown in Fig. 4.8, modifies the  $M_{N1}$  gate connection to come directly from  $\Phi$ . As soon as the clock is enabled,  $M_{N1}$  turns on to transfer the boosted voltage to the  $M_{NSW}$  gate to reduce start-up time and offer better tracking of the high-speed input.  $M_{N5}$  is also added to rapidly pull up the  $M_{NSW}$  gate signal upon entering track mode to further improve the start-up time. Post-layout transient simulation waveforms show that the proposed



Fig. 4.9. Hold phase signal feedthrough impact

topology has a wider switch on pulse, faster start-up, and better tracking relative to a conventional bootstrapped switch. At the effective  $f_s/8$  T/H frequency of 4.75GHz for 38GS/s operation, this results in 0.75b and 1.1b improvement in ENOB with 20GHz and 30GHz input signals. Projecting this bootstrapped switch operation in ADCs with higher speed 16GHz clocks shows further improvement of 1.8b with both 20GHz and 30GHz input signals.

A dummy ground network added to the gate of the dummy switch transistor which servers a signal feedthrough neutralized path. The dummy grounded network creates a



Fig. 4.10. Clock generation include high performance 25% duty cycle 8-phase T/H clock gen., skew calibration DAC, and 32-phase unit ADC clock gen.

copy of the gate loading of the main sampling switch  $M_{NSW}$  in the holding phase, therefore generates an accurate negative signal feedthrough. Fig. 4.9 shows the T/H simulation results with Nyquist input, with dummy grounded network, the peak-to-peak signal swing in the hold phase is only  $36\mu$ V compared to 1.7mV that without dummy grounded network. Also, the FFT results shows 1-bit ENOB improvement at Nyquist.

PMOS source follower-based 2<sup>nd</sup>-rank buffer is utilized for generating a suitable around VDD/2 input common mode of the unit-ADC comparators for the high-speed operation, and the input 1<sup>st</sup> rank buffer is NMOS source follower which gives lower output common mode for the linear operation of the 2<sup>nd</sup> rank buffer.

#### 4.2.3. Multi-Phase Clock Generator

Fig. 4.10 shows the schematic of the clock generator for 8 phases T/Hs 4.75GHz clocks and 32 phase 1.1875GHz unit-ADC clocks. The fs/2 external clock first pass



Fig. 4.11. Skew DAC simulation

through a CML divider /2 circuit, which generates 4 phase fs/4 clock with 90° phase space. The 4 phase CML level clock signal are fed to a CML-to-CMOS circuit and then connect to a CMOS divider /2 circuit that outputs 8 phase CMOS level fs/8 clock with 50% duty cycle and 45° phase space.

Meanwhile, the CMOS level fs/4 clocks connect to pass gate which is enabled by the fs/8 clock. By properly align fs/8 and fs/4 clock, the 25% fs/8 sampling pulses is selected from fs/4 clock. And all the 8 phase 25% sampling is derived by choosing the different phase of fs/8 and fs/8 clocks.

Then 8 phases 25% sampling pulse connected to skew calibration capacitors and buffers to the sampling switches. The skew calibration has 7-bit resolution with ~95fs step, simulation results with different corner are shown in Fig. 4.11. Clock jitter is optimized by separately placing the programable capacitive loadings in several buffer stages to achieve steep rising and falling edges.

The unit-ADC clocks are derived from the T/H clocks. Each T/H clocks connects to a divide by 4 circuit and follows by 4 shift registers that generates 4 phase fs/32 clocks. Totally 32 phases unit-ADC clocks are divided from 8 phase T/H clocks, NAND and NOR logic gate shape the unit-ADC clock pulse width to set the ADC sampling pulse width equals to the hold width of the T/H clock pulse.

Also, a phase rotator is inserted between the T/H clocks and unit-ADC clock generator, the 360° phase rotation with 3-bit control is manually adjusted to align the sampling phase of the unit-ADC clock with the hold phase of the T/H clocks.

## 4.3. Unit Pipeline-SAR ADC Architecture

Fig. 4.12 shows the 7b unit pipelined-SAR ADC. Both pipeline stages convert 4bits, with 1-bit redundancy between the stages to relax the first stage gain, offset, and reference settling requirements. The second-rank switch is the same proposed bootstrapped topology to reduce the input sampling time constant and improve linearity. kT/C noise requirements are satisfied with CDAC1 and CDAC2 set at 32fF and 16fF, respectively. Both stages employ parallel comparators that are asynchronously activated sequentially for each conversion step, eliminating the comparator reset delay and offering significant speed-up.



Fig. 4.12. Unit-ADC and building blocks

A clocked inverter-based buffer that achieves a gain of ~4 serves as the residue amplifier stage. An OLS technique [20] allows the residue amplifier output to only settle to 50% of the steady state value, which results in a  $1.15\tau$  settling time that is roughly 3X faster than a conventional CML amplifier's settling for 4-bit resolution. This allows for lower average power due to the dynamic amplifier's reduced activation time. Both the first and second pipeline stages have independent reference DACs and buffers, which avoids crosstalk and allows for inter-channel gain mismatch and inter-stage gain error calibration.



Fig. 4.13. TI-ADC Chip micrograph and layout details

Dynamic two-stage comparators [24] are used to allow for low-voltage operation in the two pipeline stages. These comparators are foreground offset-calibrated with



Fig. 4.14. Time-Interleaved ADC characterization and foreground auto calibration setup current-mode DACs

#### 4.4. Experimental Results

A chip microphotograph and layout floor plan of the prototype 7-bit 38GS/s ADC, which was fabricated in Intel 22nm CMOS process, is shown in Fig. 4.13. The core timeinterleaved ADC, consisting of two input buffer, 8-way T/H, multi-phase clock generator and 32-way unit pipeline-SAR ADCs, occupies 0.107 mm<sup>2</sup>. All the even and odd channels are split on the left and right side for symmetric routing and placing the front-end T/H close to the unit-ADCs minimized the high-speed signal routing from the 2<sup>nd</sup> rank buffer to each unit-ADCs. This maximum distance is about 160 um length, which adds a ~ 81fF capacitive loading due to routing. The critical 8 phase sampling clock routing is also reduced by placing the clock generator close together to the T/H circuits, and the number of the clock buffer chain is optimized for the best jitter performance. Also, one unit pipelined-SAR ADC core only occupies 54um x 12um, and for the unit-ADC layout, the bootstrapped switch, 1<sup>st</sup> SAR ADC, residue amplifier, 2<sup>nd</sup> SAR ADC are placed in



Fig. 4.15. Aggregated DNL and INL of the Time-Interleaved ADC

sequence, each unit-ADC has independent reference buffer for avoiding the reference coupling noise.

The measurement is performed with the wire-bonded chip-on-board test setup. Fig. 4.14 shows time-interleaved ADC characterization and auto calibration setup. The Comparator offset, channel mismatch/offset, and phase skew calibration are done in the foreground. The automatic measurement script running on PC captures the ADC output data from the logic analyzer, calculate the error and update the on-chip programable scall cells code, after several times iteration the calibration setting converges to optimum



Fig. 4.16. ADC DFT results with (a) low frequency input, (b) Nyquist input

For the comparator calibration, the ADC input is set to the dc common mode, and each comparator is selected by MUXs and monitored from real-time scope. A comparator's output is averaged and the calibration DAC code is adjusted automatically until this average close to 0.5, which implies that the comparator is metastable. The



Fig. 4.17. (a) ADC SNDR and SFDR vs. input frequency, (b) ADC input frequency response

channel gain and skew calibration is performed thru sine fitting algorithm [], which calculate the amplitude, offset, and phase of each unit-ADC output when the input is a
| References                                     | ISSCC'18<br>[12] | JSSC'18<br>[13] | CICC'17<br>[27] | ISSCC'19<br>[28] | JSSC'16<br>[29] | This Work           |
|------------------------------------------------|------------------|-----------------|-----------------|------------------|-----------------|---------------------|
| Technology<br>(nm)                             | 14               | 28              | 28              | 7                | 16              | 22                  |
| ADCs/Interleaver<br>Supply (V)                 | 0.8/0.9          | 0.95/±0.9       | 0.95/0.95       | 0.9/0.9          | 0.9/0.9         | 0.85/0.9            |
| Sampling Rate<br>(GS/s)                        | 72               | 56              | 28              | 30               | 28              | 38                  |
| Channels                                       | 64               | 64              | 64              | 32               | 64              | 32                  |
| Architecture                                   | 8b -TI-<br>SAR   | 8b -TI-<br>SAR  | 8b -TI-<br>SAR  | 7b-TI-SAR        | 8b-TI-SAR       | 7b -TI-Pipe-<br>SAR |
| 3 dB Bandwidth<br>(GHz)                        | 21***            | 31.5            | NA              | NA               | NA              | 20                  |
| $SNDR@f_{in,low}$ $(dB)$                       | 39.3             | 40.5            | 37              | 33*              | 40.9*           | 39.26               |
| ${ m SNDR}@f_{ m in,Nyquist}\ ({ m dB})$       | 32.7<br>@27G     | 33              | 34              | 32*<br>@14G      | 31.5*           | 35.6                |
| Total/Interleaver<br>Power (mW)                | 235/77           | 702/291         | 165/NA          | 79.65/NA         | 280/NA          | 119.7/37.05         |
| FoM@ f <sub>in,low</sub><br>(fj/conv-step)     | 43               | 145             | 102             | 81.6**           | 110.4           | 41.9                |
| FoM@ f <sub>in,Nyquist</sub><br>(fj/conv-step) | 121              | 344             | 140             | 83.6**           | 325.7           | 64.05               |
| Area<br>(mm <sup>2</sup> )                     | 0.15             | 0.878           | 0.24            | 0.078            | NA              | 0.107               |

Table 4.5 Time-Interleaved ADC Performance Summary.

 $^{\star}$  Measurements with CTLE  $\overline{\rm front}\text{-end}$ 

\*\* Added the clock generation power

\*\*\* Probe Testing

sinewave. Then the skew calibration DAC and on-chip reference DAC are adjusted automatically until each unit-ADC output has roughly same amplitude and equally spaced phase. And the offset of each unit-ADC channels is subtracted off-chip.

The unit-ADCs and the multi-phase clock generator are powered by 0.85V supply, consumes 82.65 mW. The interleaver which includes input 1<sup>st</sup> rank buffer, 1<sup>st</sup> rank T/H and 2<sup>nd</sup> rank buffer are powered by 0.9V supply, cosumes 37.05 mW.

A sinewave histogram technique [25] is utilized for ADC static characterization. Fig. 4.15 shows that the aggregated maximum DNL and INL after calibration are +0.2/-0.39 LSB and +0.71/-0.55 LSB, respectively. Fig. 4.16 shows the 32768-points DFT of the decimated (1089X) ADC output after calibration when sampling low and Nyquist frequency sinusoidal input at 38GS/s. At low frequency, the time-interleaved ADC achieved SNDR and SFDR of 38.52 dB and 49.8dB, the performance main limited by the thermal noise, remaining gain mismatch spurs and the unit-ADC non-linearity due to the incomplete settling of the pipeline gain stage. The spur at  $f_{s}/2$ - $f_{in}$  and  $f_{s}/2$ - $3f_{in}$  caused by the aliasing and a relatively higher third harmonic distortion in one of the channels [26]. At Nyquist, the SNDR and SFDR are 35.6dB and 43.8dB, spurs coming from the remaining skew, bandwidth mismatch and non-linearity, as well as jitter limits the performance. The harmonics at the high frequency input are mainly caused by the interleaver sampling switch since the input signal tracking of the bootstrapped switch is less accurate at high frequency.

Fig. 4.17(a) shows the measured SNDR and SFDR with various input frequencies at 38GS/s, the SNDR and SFDR drops 3.7dB and 6.3dB at Nyquist, respectively. The measure 3dB bandwidth of the time-interleaved ADC is 20 GHz, as shown in Fig. 23(b), which includes insertion loss from wire-bonding, and de-embedding the wirebond parasitics show that 28GHz bandwidth is possible.

Table 4.1 summarizes the time-interleaved ADC and compares this work against previous 7-8b ADCs operating at  $\geq$ 28GS/s. Total ADC power consumption is 119.7mW, with 82.65mW dissipated in the pipelined-SAR unit ADCs and clock generation circuitry operating on a 0.85V supply and 37.05mW from the 0.9V proposed interleaver, also achieving 64.05 fj/conv-step FoM at Nyquist input. The interleaver consums the least power and achieves 20GHz 3-dB bandwidth with wirebonded chip-on-board testing. And the totally area is only 0.107 mm<sup>2</sup> with 22nm CMOS process.

# 4.5. Conclusion

This chapter has presented a 32-way 38 GS/s 7-bit time-interleaved ADC. A low power, high speed unit-ADC design is realized by combine several techniques, including a novel low-overhead OLS settling technique in the dynamic residue amplifier, pipelined-SAR architecture, parallel comparators, and asynchronous SAR. The proposed speedenhanced bootstrapped switch with the lows-duty-cycle sampling clock significantly saves the interleaver power and achieves high 3-dB bandwidth without limiting the input common mode. Overall, the high input bandwidth and SNDR enabled by the proposed interleaver and pipelined-SAR unit ADC with OLS settling allows for significant improvement in the Nyquist rate FoM.

# 5. A JITTER-ROBUST 40GB/S ADC-BASED MULTICARRIER RECEIVER FRONT END IN 22NM FINFET<sup>\*</sup>

Demand for increased data-rates in serial link transceivers calls for innovative architectures capable of overcoming communications impairments such as limited channel bandwidth and stringent jitter specifications. While mixed-signal and ADC-based receiver architectures that utilize simple pulse amplitude modulation (PAM) can take advantage of technology scaling, it is becoming increasingly difficult to deal with the extremely short baseband pulse widths. This chapter presents a wireline receiver front-end (RXFE) architecture that supports multicarrier signaling to provide a ~3X relaxation in clock jitter requirements.

# 5.1. Receiver front-end architecture

Fig. 5.1 shows the implemented RXFE architecture and the multicarrier signal power spectral density, where orthogonality is leveraged to allow for band overlapping and improved spectral efficiency. Assuming the channel bandwidth is occupied by N channels,  $\Delta_f = \frac{W}{N}$ , and the symbol rate  $\frac{1}{T}$  in each subchannel is chosen as a multiple of the channel separation  $\Delta_f$ , then, the subcarriers will be orthogonal over a symbol interval T, independent of the relative phase relationship between subcarriers ( $\phi$ ).

<sup>\*</sup> Part of this chapter is reprinted with permission from "Y. Zhu et al., " A Jitter-Robust 40Gb/s ADC-Based Multicarrier Receiver Front End in 22nm FinFET" 2022 IEEE Custom Integrated Circuits Conference (CICC), 2022"



Fig. 5.1. Multicarrier RXFE architecture



Fig. 5.2. Multicarrier RXFE modeling results

$$\int_{0}^{T} \cos\left(2\pi f_k t + \phi_k\right) \cos\left(2\pi f_j t + \phi_j\right) dt = 0$$

The multicarrier signal is formed by three bands to support a total 40Gb/s data rate. This includes baseband (BB) PAM4 operating at 4GS/s and mid-band (MB) and highband (HB) that both carry 4GS/s QAM16 on 4GHz and 8GHz orthogonal carriers, respectively. This 4GS/s rate has a symbol duration that is 5X longer than a conventional 20GS/s baseband PAM4 signal. Thus, the proposed architecture offers jitter robustness that has been evaluated through system-level simulations with different clock jitter amounts on both the receiver LO down-conversion and sampling clocks when operating over a channel with 20dB loss at 10GHz. The modeling results in Fig. 5.2 show that the proposed multicarrier system which can tolerate up to 1.6psrms jitter, whereas a BB PAM4 system cannot tolerate more than 600fsrms at a BER=10-4.

### 5.2. Schematic of the receiver front-end

As shown in Fig. 5.3, after the input T-coil termination the multiband signal passes through three parallel continuous-time linear equalizers (CTLEs) designed to optimally equalize each of the bands and relax RXFE linearity requirements. The BB CTLE drives a single segment consisting of a dummy mixer, integrating sinc filters, and 4-way time-interleaved ADC, while the MB and HB CTLE both drive two I/Q segments with the mixers switched with the 4GHz and 8GHz LO signals, respectively. Inverter-based CTLE structures with two signal paths are utilized [37], with the top branch low-pass response subtracted from the bottom path all-pass response to form a high-pass characteristic. The advantage of using this architecture lies on the ability to set equal transconductances for the top and bottom branches to significantly attenuate the low-frequency component of the multiband signal. This results in a more optimal AC response, improved linearity, and lower output-referred noise for the MB and HB segments. The RXFE sinc filters are implemented with resettable integrators built with dynamic amplifiers with common-restoration and programmable source degeneration resistances for gain control.





The Fig. 5.3 timing diagram shows the integrator operation and interface with the 4-way time-interleaved ADC. The integrator in each segment operates in a 2-way time-interleaved manner to allow for an integrator reset cycle. In this case,  $\phi_{ini1}$  going high marks the start of the integration period. The clock  $\phi_{A1}$  will go high right after this to start



Fig. 5.4. Inverter-based CTLE and programmable inverter schematic

tracking the output and then perform sampling after  $\phi_{ini1}$  goes low. The 25% percent duty cycle  $\phi_{rst1}$  creates an integrator hold phase that relaxes the timing requirement for this ADC sample clock. The integrator capacitor is then reset when  $\phi_{rst1}$  goes high. Each 7-bit 1GS/s unit ADC is formed by a pipelined-SAR structure with an output level shifting (OLS) settling technique [20] for low power and high-speed operation. 4-bits are converted in both pipeline stages, with 1-bit redundancy between the two stages to relax the gain, offset, and reference settling requirements. Both stages employ parallel comparators that are asynchronously activated sequentially for each conversion step, eliminating the comparator reset delay and offering significant speed-up.

## 5.2.1. Schematic of CTLE

As it can be seen in Fig. 5.4, an inverter-based structure is chosen for the CTLEs [37], where the NMOS and PMOS transistors operate as transconductors driving active inductors for bandwidth extension. In this structure there are two signal paths, the top-path that consist of a Gm-C low-pass filter followed by a programmable transconductance that



**Fig. 5.5.** Double-balanced passive mixer and resettable PMOS integrator schematic would subtract from the all-pass bottom-path. The resulting high-pass filter transfer function of the CTLE is defined by the following equation:

$$-\frac{V_{out}}{V_{in}} = \frac{g_{m1}}{g_{ml}} - \frac{gm_2}{g_{ml}}\frac{g_{mp2}}{g_{mp1}}\frac{1}{1 + \frac{SC}{g_{mp1}}}$$

The three CTLEs were designed by using a 1 V power supply and consume a total of 64 mW. For this front-end design, the LF, MF and HF CTLEs are peaking at 2, 6, and 10 GHz respectively. The LF CTLE uses one of these stages followed by a buffer, the MF and HF CTLE uses two of them cascaded, and a buffer.

### 5.2.2. Schematic of integrator and mixer

Given its low noise and high-linearity, passive double-balanced mixers are used. Thereafter, resettable PMOS integrators with degeneration resistor for gain programmability and enhanced linearity are used. The integration of the symbols on each channel is performed in a 2-way interleaved fashion, while one integrator is resetting, the



Fig. 5.6. Integrator output with PAM-4 signal input

other is integrating. Since the sub-channel ADC is also 2-way interleaved, each integrator is connected to a sampler. The schematics of the mixer and integrator are shown below. As it can be seen in the following figure, the designed analog front-end linearity is evaluated. The left-hand side of the figure shows the 1dB compression point of the CTLE, which occurs when the output signal is about 800 mVppd. On the right-hand side, the PAM-4 eye diagram at the output of the second rank sampler of one of the channel's ADC is shown in Fig. 5.6. This eye-diagram has a ratio-level-mismatch (RLM) higher than 90%.

### 5.2.3. 4-way time interleaved ADC design

For a 40 Gb/s 5-channel system, the ADC sampling rate is 4 GS/s. The 7-bit 4GS/s ADC is based on 4-way time-interleaving of a Pipe-SAR ADC with output-level-shifting (OLS) technique [20] for enhanced residue amplifier settling. Fig. 5.7 shows the block diagram of the interleaving architecture. There are two stages of the time-interleaving. The first stage is a two-way time interleaved T/H, where the input signal is tracked and held by two phases of 2 GHz clock. The second stage is a 4-way time interleaved 1GHz unit-



**Fig. 5.7.** Architecture and timing diagram of the 4 GS/s time-interleaved ADC ADC, where the held signal from the first stage is further sampled on the CDAC and converted into digital code. A unit gain buffer is placed at the output of each T/H to sequentially drive parallel unit-ADCs.

# **5.3. Multi-Phase Clock generator**

Fig. 5.8 shows the RXFE clock generation circuitry that generates the HB and MB LOs and the 10 input clocks for the integrators and ADC local clock generation blocks in each segment. A 16GHz differential clock input passes through a CML divider and CML-to-CMOS converters to generate 4-phase 8GHz clock signals for the HB LO signals. A subsequent CMOS divider generates 8-phase 4GHz clock signals, with four of these phases used for the MB LO signals. Additionally, these eight 4GHz clocks signals are provided to 10 6b phase interpolators to generate the clocks for the integrators and ADCs with independent sampling phase control. Phase mismatches between I and Q LO signals are compensated with skew calibration blocks consisting of distribution buffers that have digitally controlled capacitive loading. To verify the proposed RXFE jitter robustness, the distribution buffers of the LO and integrator/ADC clocks operate on a power domain with



## Fig. 5.8. RXFE clock generation

programmable noise injection. Large shunting NMOS transistors are driven by either an internal PRBS generator or an external signal to provide jitter with varying frequency content.

### **5.4. Experiment Results**

The 22nm FinFET die micrograph is shown in Fig. 5.9, the entire RXFE occupies 0.84mm2 when the input T-coil and clock buffer inductors are included. The CTLE, mixers, and integrators occupy 102um X 225um of this area, while the ADCs consume 150um X 441um. Fig. 5.10 shows the DFT from one of the decimated (621x) 4-way time



## Fig. 5.9. RXFE Chip micrograph

interleaved ADC outputs when sampling a low-frequency input at 4GS/s, with an achieved SNDR and SFDR of 33.7dB and 42 dB, respectively. The ADC Nyquist rate ENOB through the entire RXFE is 4.2b and 4.53b when the RXFE is de-embedded. The frequency response of the CTLEs are measured from the ADC output using single-tone inputs, with the BB, MB, and HB CTLEs achieving a highest normalized peaking of 8dB, 7dB, and 3dB, respectively. Fig. 5.11 show RXFE measurement results with a 40Gb/s multiband input signal with independent PRBS15 patterns in the bands generated with a Keysight M8195A AWG with an 800mVppd total swing. This is passed through a channel with 20dB insertion loss at 10GHz (conventional baseband PAM4 Nyquist) and data is



Fig. 5.10. CTLE response of each band, ADC spectrum with a low frequency input, and ENOB vs. input frequency

collected with and without jitter added. For the maximum 1.6psrms jitter injection, the BB channel has an average voltage margin of 24 ADC codes at a BER= $10^{-5}$  and the MB and HB constellations have a respective EVM of 20.4dB and 18.8dB that translates to BERs of  $10^{-6}$  and  $3.5 \times 10^{-5}$ . These constellations suffered a respective EVM degradation of 0.5dB and 0.8dB with respect to the no jitter added case.

# 5.5. Conclusion

This chapter has presented a jitter robust multi-carrier 40Gb/s receiver front-end. Table 5.1 compares this RXFE with other ADC-based wireline RXFE that utilize



Fig. 5.11. Channel insertion loss, jitter measurement, and measured PAM4 and QAM16 constellations.

conventional PAM4 and multi-tone signaling. The proposed multicarrier RXFE can

operate with the highest 1.6psrms jitter and achieves 3.05pJ/bit power efficiency.

| Specificatio<br>n                               | LaCroix<br>ISSCC<br>2019 | Ali<br>ISSCC<br>2019 | Kim<br>ISSCC<br>2019 | Upadhaya<br>ISSCC<br>2018  | Wang<br>ISSCC<br>2018 | This Work            |
|-------------------------------------------------|--------------------------|----------------------|----------------------|----------------------------|-----------------------|----------------------|
| Technology                                      | 7 nm<br>FinFET           | 7 nm<br>FinFET       | 14 nm<br>FinFET      | 16 nm<br>FinFET            | 16 nm<br>FinFET       | 22 nm<br>FinFET      |
| Modulation                                      | PAM-4                    | PAM-4                | 64-QAM<br>DMT        | PAM-4                      | PAM-4                 | Multicarrie<br>r     |
| Power<br>Supply (V)                             | 0.75, 0.8,<br>1.2        | 0.8, 1               | 0.7, 0.75,<br>0.8    | 0.85, 0.9,<br>1.2, and 1.8 | 0.9, 1.2              | 0.85, 0.95,<br>1.2   |
| Data Rate                                       | 60 Gb/s                  | 56 Gb/s              | 56 Gb/s              | 56 Gb/s                    | 64.375 Gb/s           | 40 Gb/s              |
| Fs (GS/s)                                       | 30                       | 28                   | 22.4                 | 28                         | 32                    | 5 x 4                |
| ADC<br>Structure                                | TI-SAR                   | TI-SAR               | TI-Pipe-<br>SAR      | TI-SAR                     | TI-Folding<br>Flash   | TI- Pipe-<br>SAR     |
| Pre-<br>Equalizatio<br>n                        | CTLE                     | CTLE                 | No                   | CTLE                       | CTLE                  | CTLE                 |
| ENOB @<br>Nyquist                               | N/A                      | 4.74 bits            | N/A                  | 4.43 bits                  | 4.33 bits             | 4.5 bits             |
| Area                                            | 0.84 mm <sup>2</sup>     | 0.32 mm <sup>2</sup> | 0.26 mm <sup>2</sup> | N/A                        | 0.16 mm <sup>2</sup>  | 0.84 mm <sup>2</sup> |
| Max.<br>Compensated<br>Channel Loss             | 32 dB @<br>14 GHz        | 40 dB @ 14<br>GHz    | 28 dB @ 14<br>GHz    | 32 dB @ 14<br>GHz          | 30 dB @ 16<br>GHz     | 20 dB @ 10<br>GHz    |
| RMS Jitter                                      | 160 fs                   | 225 fs               | N/A                  | 180 fs                     | 162 fs                | 1.6 ps               |
| BER                                             | < 1e-6                   | < 1e-5               | < 2e-4               | < 1e-12                    | < 1e-4                | < 1e-4               |
| AFE +<br>ADC Power                              | 303 mW                   | 180 mW               | 93.2 mW              | N/A                        | 283.9 mW              | 122 mW               |
| Power<br>Efficiency<br>(pJ/bit)<br>AFE +<br>ADC | 5.05                     | 3.2                  | 1.6                  | 5.8                        | 4.44                  | 3.05                 |

Table 5.1 RXFE Performance Summary.

#### 6. CONCLUSION AND FUTURE WORK

#### **6.1.** Conclusion

ADC-based receiver are becoming more common in wireline receivers due to the enabling of powerful and flexible digital processing for equalization and easier support of higher-order modulation schemes. However, the ADC-based receivers are usually more power hungry compare with mixed-signal receivers due to the conventional SAR-based high speed time-interleaved ADC front-end. Moreover, as the ever-increasing wireline communication data rates climbs 100+Gb/s, sampling clock jitter places fundamental performance limitations on common baseband PAM4 ADC-based receivers, necessitating clock generation and distribution circuitry that achieve rms jitter of a few hundred femtoseconds. And this jitter issue will become more severe for the future 200+Gb/s design. This dissertation presents three different works for addressing these challenges.

The first research focused on the low power and high-speed unit-ADC design that can reduce the interleaving factor for a given effective sampling rate, resulting in smaller area and an overall simpler time-interleaved ADC design. Presents a single channel 8-bit pipelined-SAR ADC that utilizes a novel low-overhead OLS settling technique in the dynamic residue amplifier. A low power design is realized by combining this technique with the use of parallel comparators in the two asynchronous pipeline stages to allow for 1.5GS/s operation with a low 0.8V supply voltage and achieves an FOM of 16.7fJ/conv.-step.

The second research focused on the implementation of a low power and high bandwidth time-interleaved ADC which avoid the extra amplitude equalization for the ADC input bandwidth therefore save the following DSP power. The 38GS/s 7-bit timeinterleaved ADC prototype utilizes previous mentioned unit-ADC and a high-bandwidth 8-way interleaver architecture based on a proposed speed enhanced bootstrapped switch. which shows higher operation speed and better EBON with high frequency input. The time-interleaved ADC achieves 41.9fj/conv.-step at low frequency, 64.1fj/conv.-step at Nyquist, and has a 20GHz 3dB bandwidth.

The last research focused on the jitter-robust ADC-base receiver design. Presents a novel frequency-domain wireline ADC-based receiver front-end (RXFE) that supports multicarrier signaling to provide a ~3X relaxation in clock jitter requirements. Also the previous mentioned pipeline-SAR unit-ADC and speed enhanced bootstrapped switch are utilized. A 40Gb/s RXFE prototype can operate with the highest 1.6psrms jitter and achieves 3.05pJ/bit power efficiency.

## 6.2. Time-domain high-speed ADC design

Recently time-domain ADCs have shown promising speeds for mid-resolution application, because the quantization is performed by the inside flash TDC. Fig. 6.1 shows an 8b time-domain ADC example [42], which consists of a S/H, a VTC, and a two-stage TDC. The S/H adopts bootstrapped switches with cross-coupled compensation for high linearity sampling on an input capacitance of only 45fF (single-ended). The dynamic VTC converts the sampled voltage into a time difference ( $S_{P1}<0>$ ,  $S_{N1}<0>$ ) through a pair of current sources and crossing detectors. The pseudo-differential and discharging features guarantee the high linearity of the VTC. The 8b TDC comprises a 4b pseudo-differential flash TDC (with 1b sign) as the coarse stage and a 5b single ended interpolation TDC



Fig. 6.1. (a) Block and (b) timing diagrams of the 8-bit 2.5GS/s time-domain ADC



Fig. 6.2. (a) 4-layer 16x time interpolator (b) unit phase interpolator cell (c) Timing diagram of the 16x time interpolator

(with 1b redundancy) as the fine stage. This design successfully implements a 2.5GS/s 8bit ADC on 65nm CMOS process, and calibration for the comparators is not needed, because the time comparator only needs to compare two signal levels which are VDD and VSS. However, there is no gain stage between course TDC and fine TDC, the time step for the fine TDC is only 1.375ps. This design utilized a 16x time interpolator for generating the stringent fine time step. As shown in Fig. 6.2, 4 layers and 34 inverterbased phase interpolators (PI) are utilized for one cascade stage and there are two cascade stages in total. Moreover, dummy delay cells and interpolators are added both before and after the quantization cells to shield the terminal effect. Those are results in around 136 PI cells in total, which is power-hungry and could be susceptible to PVT variation on advanced technology. Also, the fine time step could cause comparator meta stabilities.





Fig. 6.3 shows a proposed 8-bit pipeline time-domain ADC with time amplifier (TA) that amplifies the time residue from the 3-bit first stage flash TDC. The residue TA increases the second stage LSB step by a factor of residue gain, and therefore enables a

course 5-bit 2<sup>nd</sup> stage flash TDC which could be implemented by simple digital buffers. According to the timing diagram of Fig. 6.3, T/H is clocked by  $\Phi_S$ , once the input voltage has been sampled by bootstrapped switch,  $\Phi_D$  enables a pair of current sources which discharge the voltage on the sampling capacitors. Then the following crossing detector converts the voltage difference to time difference, which is the gap between fast and slow pulse. Also, a sign bit comparator is placed right after the T/Hs and triggered by  $\Phi_D$ . The sign bit saves 1 bit resolution of the first stage TDC and controls the time rectifier that forces the fast pulse always connect to the delay line. The residue time amplifier is enabled by the  $\Phi_{AMP}$  clock that is located after the slow pulse, and there is 1-bit redundance between two stages relax the gain and delay mismatch.

A low power TA with enough gain and linearity is essential for implementing the system. Fig. 6.4 shows a possible TA design [43] that achieves ~8x gain and 6-bit resolution. Two discharging phases of two capacitors  $C_X$  and  $C_Y$  (in differential configuration) take place in turn during the amplification. The first one is an early discharging phase between  $t_1$  and  $t_2$  with a large slew rate after the first rising edge of the inputs arrives. For example, if  $V_{INA's}$  rising edge arrives first, it will discharge  $C_X$  through  $M_{1A}$  and  $M_{2A}$  while keeping  $V_Y$  at  $V_{DD}$ . Then, when the rising edge of  $V_{INB}$  arrives, both capacitors will discharge together at an equal but small slew rate through  $M_{3A}$ ,  $M_{4A}$ ,  $M_{3B}$  and  $M_{4B}$ . When the two voltages cross the threshold of two TCDs at  $t_3$  and  $t_4$ , respectively, an amplified time difference is generated. The input and output time are

$$t_{in} = t_2 - t_1 = \frac{V_{DD} - V_1}{SR_1} = \frac{V_{DD} - V_1}{\frac{I_{D1}}{C_X}}$$



Fig. 6.4. (a) circuit diagram and (b) timing diagram of the TA

$$t_{out} = t_3 - t_4 = \frac{\frac{1}{2}V_{DD1}}{SR_2} - \frac{V_1 - \frac{1}{2}V_{DD1}}{SR_2} = \frac{V_{DD} - V_1}{SR_2} = \frac{V_{DD} - V_1}{\frac{I_{D2}}{C_X}}$$

where  $I_{\mathrm{D1}}$  and  $I_{\mathrm{D2}}$  are the discharging currents in two phases. So, the gain of the TA is

$$A_t = \frac{t_{out}}{t_{in}} = \frac{SR_1}{SR_2} = \frac{I_{D1}}{I_{D2}}$$

In summary, the time domain ADC converts the voltage difference to time difference that quantized by the flash TDC. Compare against the conventional SAR ADC, it is faster and does not require large area capacitive DAC, reference buffer and comparator calibration, has potential to realize a high-performance ADC with better energy efficiency.



Fig. 6.5. Proposed multi-carrier DAC-based transmitter

#### **6.3.** Frequency domain multi-carrier transmitter

As the common PAM-4 transmitter does not support the proposed multi-carrier receiver. A DAC-based transmitter architecture is proposed for transmitting the multi-carrier signal also aims to significantly improve jitter robustness and reduce system equalization complexity. The proposed DAC-based transmitter supports advanced baseband and multi-carrier modulation schemes. The target performance is a low-power multi-carrier DAC-based serial I/O transmitter capable of operating at data rates >100Gb/s. With the inclusion of a multi-carrier receiver and the optimum modulation for a given channel, operation with channels that have more than 30dB loss at an equivalent Nyquist frequency is possible.

Fig. 6.5 shows the proposed multi-carrier DAC-based transmitter with baseband (BB) PAM-4, mid-frequency band (MB) QAM-16, and high-frequency band (HB) QAM-16 modulations. For 64Gb/s data transmission, 160 parallel bits at 400MHz drive the DSP and pass-through FIR filters that are programmable from 1-8 taps. In order to achieve sufficient output stage linearity, the outputs of these filters are summed to generate a parallel multi-bit predistortion code. The outputs of the 400MHz DSP are 3 parallel



Fig. 6.6. Power spectral density of modulated signal in the proposed architecture and channel loss

streams of 16 symbols that are represented with a multi-bit code, which represents the level of the BB PAM-4 signal and the amplitude of the MB and HB QAM-16 signals, a multi-bit phase code for the MB and HB QAM-16 signals, and the predistortion code. 16:1 serialization is performed on these bits to generate 3 parallel 6.4GS/s codes that drive the polar transmitter output DACs whose outputs are combined in the analog domain for transmission. This architecture provides several benefits. First, each channel is operating at an effective 6.4GS/s sampling/serialization rate, which is significantly lower than the 64GS/s sampling rate required for conventional PAM4 modulation, allowing for close to a 6X improvement in simulated rms jitter tolerance. Second, the up-conversion performed in the MB and HB channels perform mixer-based self-equalization and provide some channel loss compensation, allowing for a reduction in digital equalization complexity. Finally, the polar transmitter output stages provide the ability to support multiple 16-point constellations, such as QAM-16 and APSK-4+12, that improve the back-off energy efficiency.



Fig. 6.7. (a) Digital polar output DAC. 16-point constellations: (b) QAM-16 and (c) APSK-4+12.

Fig. 6.6 shows the power spectral density of the modulated signal from a potential 64Gb/s multi-carrier implementation implemented with the BB PAM-4 channel, the MB QAM-16 channel centered at 6.4GHz, and the HB QAM-16 channel centered at 12.8GHz. Orthogonality of the multi-carrier channels is achieved by operating at harmonics of a 6.4GHz carrier.

A block diagram of the proposed digital polar transmitter output stage used in both the MB and HB channels is shown in Fig. 6.7. A given constellation point is realized by first using the digital phase code to generate the desired phase from the high-speed phase modulator. This output phase passes to the output driver where it is scaled by the multibit amplitude control to realize the desired constellation point magnitude. This decoupling of the output phase generation and amplitude control simplifies the control of the output magnitude relative to a conventional I/Q QAM modulator where the output phase generation and amplitude is realized by summing modulated I and Q carriers. The proposed polar architecture provides the flexibility to generate multiple 16-point constellations other than the conventional square QAM-16 constellation shown in Fig. 6.7,



Fig. 6.8. System-level model for multicarrier TX and RX

which has three different output magnitudes. For example, the APSK-4+12 constellation allows all the outer 12 points to have the same maximum magnitude. This lower peak-to-average power ratio allows for relaxed output driver non-linearity and operation at a smaller back-off with reduced spectral regrowing.

The high-speed phase modulator utilizes an injection-locked oscillator (ILO) locked to the desired carrier frequency to generate N multiple clock phases that are selected by a phase mux with the digital phase code. This produces the differential carrier signal at the output driver with the appropriate phase for the desired constellation point.

For transmitter equalization, a segmented lookup table (LUP) approach is planned for the FIR filter implementation in the DSP [44] as shown in Fig. 6.5. An 8-b LUP is used for each tap, with the BB PAM-4 channel having 4 entries per LUP and the MB and HB QAM-16 channels having 16. These tap outputs are summed and quantized for the output stage at each channel. This LUP-based approach will allow the efficient implementation of an FIR filter with a programmable range from 1 to 8 taps.

| Data rate                                              | 64 Gb/s                  |  |  |
|--------------------------------------------------------|--------------------------|--|--|
| Channel loss                                           | > 30 dB at 16 GHz        |  |  |
| Jitter requirement                                     | < 1500 fs <sub>rms</sub> |  |  |
| The number of taps (M)                                 | 8 taps                   |  |  |
| Bit resolution of output DAC (NA)                      | 7 bits                   |  |  |
| Bit resolution of phase modulator<br>(N <sub>P</sub> ) | 7 bits (>96 phases)      |  |  |

 Table 6.1 Summary of System-Level Design Parameter for 64Gb/s DAC-based Tx.

The proposed transmitter (Tx) is verified by using system-level models to verify its operation and to optimize its design parameters. The system-level models implemented by using MATLAB consist of a polar Tx model, a channel model, and a receiver (Rx) model based on a I/Q demodulator using integrators and ADCs (Fig. 6.8). As the channel model, two kinds of channels (30- and 40-dB loss at 16 GHz) are considered. For Rx equalization, a CTLE and a 5-tap adaptive MIMO equalizer in the DSP are used. In the Tx model, the number (M) and the coefficients of FFE taps, the bit resolution (N<sub>A</sub>) of the output DAC, and the bit resolution (N<sub>P</sub>) for the phase modulator are adjusted to maximize the voltage margin at the Rx output.

From the verification using system-level models, the key design parameters to achieve the target performance in silicon implementation are summarized in Table 6.1. To operate properly with a channel having near 40dB loss at an equivalent Nyquist frequency, at least 8 taps should be considered for the Tx equalization. The required bit resolutions  $(N_A \text{ and } N_P \text{ in Fig 6.8})$  for the output DACs and the phase modulator to achieve almost the same performance with the infinite resolution cases are 7 bits, respectively.



Fig. 6.9. Digital polar output DAC schematics: (a) phase mux and (b) segmented output driver. Simulated APSK-4+12 constellations with (c) 12.8GHz and (d) 25.6GHz carriers.

Fig. 6.9(a) shows an example where N=12 to generate either the QAM-16 or APSK-4+12 constellations. An initial pseudo-differential CML 6:1 phase mux, which

utilizes shunt peaking for bandwidth extension allows generation of  $0^{\circ}$  to 150°. This is followed by a 2:1 mux that can invert the differential output to span the required 360°.

Fig. 6.9(b) shows the DAC output driver that is segmented with 6b (4b binary and 2b thermometer) amplitude control and parallel 2b predistortion cells. The selected phase signal differentially drives the bottom transistor in each output driver segment, while common amplitude control of the top transistors realizes the desired constellation point magnitude with also the FIR filter response computed by the DSP. The 22nm simulation results of Fig. 6.9(c) and (d) show that the APSK-4+12 constellation is generated with low EVM for both the 12.8 and 25.6GHz carriers.

In conclusion, a DAC-based transmitter architecture is proposed for 100+Gb/s serial links. To improve jitter robustness and reduce system equalization complexity, the proposed DAC-based transmitter supports advanced baseband and multi-carrier modulation schemes. Before implementing the transmitter in silicon, it is verified by using MATLAB system-level models to verify its operation and to optimize its design parameters. From the results of system-level verification, the performance requirements of Tx block circuits are specified. A phase modulator circuit is one of key blocks in the proposed Tx because its phase change must be large, accurate, and fast. It also presents an initial design for the phase modulator and output driver with amplitude modulation.

#### REFERENCES

- [1] J. Hudner et al., "A 112GB/S PAM4 Wireline Receiver Using a 64-Way Time-Interleaved SAR ADC in 16NM FinFET," 2018 IEEE Symposium on VLSI Circuits, 2018, pp. 47-48, doi: 10.1109/VLSIC.2018.8502436.
- [2] P. W. de Abreu Farias Neto et al., "A 112–134-Gb/s PAM4 Receiver Using a 36-Way Dual-Comparator TI-SAR ADC in 7-nm FinFET," in IEEE Solid-State Circuits Letters, vol. 3, pp. 138-141, 2020, doi: 10.1109/LSSC.2020.3007580.
- [3] P. Mishra et al., "8.7 A 112Gb/s ADC-DSP-Based PAM-4 Transceiver for Long-Reach Applications with >40dB Channel Loss in 7nm FinFET," 2021 IEEE International Solid- State Circuits Conference (ISSCC), 2021, pp. 138-140, doi: 10.1109/ISSCC42613.2021.9365929.
- [4] Y. Krupnik et al., "112-Gb/s PAM4 ADC-Based SERDES Receiver With Resonant AFE for Long-Reach Channels," in IEEE Journal of Solid-State Circuits, vol. 55, no. 4, pp. 1077-1085, April 2020, doi: 10.1109/JSSC.2019.2959511.
- [5] T. Ali, M. Abdullatif, H. Park, E. Chen, R. Awad and M. Gandara, "56/112Gbps Wireline Transceivers for Next Generation Data Centers on 7nm FINFET CMOS Technology," 2021 IEEE Custom Integrated Circuits Conference (CICC), 2021, pp. 1-6, doi: 10.1109/CICC51472.2021.9431430.
- [6] S. Palermo, S. Hoyos, S. Cai, S. Kiran and Y. Zhu, "Analog-to-Digital Converter-Based Serial Links: An Overview," in IEEE Solid-State Circuits Magazine, vol. 10, no. 3, pp. 35-47, Summer 2018, doi: 10.1109/MSSC.2018.2844603.

- [7] S. Kiran, S. Cai, Y. Zhu, S. Hoyos and S. Palermo, "Digital Equalization With ADC-Based Receivers: Two Important Roles Played by Digital Signal Processingin Designing Analog-to-Digital-Converter-Based Wireline Communication Receivers," in IEEE Microwave Magazine, vol. 20, no. 5, pp. 62-79, May 2019, doi: 10.1109/MMM.2019.2898025.
- [8] G. Kim et al., "A 161-mW 56-Gb/s ADC-Based Discrete Multitone Wireline Receiver Data-Path in 14-nm FinFET," in IEEE Journal of Solid-State Circuits, vol. 55, no. 1, pp. 38-48, Jan. 2020, doi: 10.1109/JSSC.2019.2938414.
- [9] Y. Zhu, J. Diaz, S. Kaile, I. Yi, T. Liu, S. Hoyos and S. Palermo, " A Jitter-Robust 40Gb/s ADC-Based Multicarrier Receiver Front End in 22nm FinFET," 2022 IEEE Custom Integrated Circuits Conference (CICC), 2022.
- Y. Duan and E. Alon, "A 6b 46GS/s ADC with >23GHz BW and sparkle-code error correction," 2015 Symposium on VLSI Circuits (VLSI Circuits), 2015, pp. C162-C163, doi: 10.1109/VLSIC.2015.7231250.
- [11] Y. M. Greshishchev et al., "A 40GS/s 6b ADC in 65nm CMOS," 2010 IEEE International Solid-State Circuits Conference - (ISSCC), 2010, pp. 390-391, doi: 10.1109/ISSCC.2010.5433972.
- [12] L. Kull et al., "A 24–72-GS/s 8-b Time-Interleaved SAR ADC With 2.0–3.3pJ/Conversion and >30 dB SNDR at Nyquist in 14-nm CMOS FinFET," in IEEE Journal of Solid-State Circuits, vol. 53, no. 12, pp. 3508-3516, Dec. 2018, doi: 10.1109/JSSC.2018.2859757.

- [13] L. Kull et al., "A 10-Bit 20–40 GS/S ADC with 37 dB SNDR at 40 GHz Input Using First Order Sampling Bandwidth Calibration," 2018 IEEE Symposium on VLSI Circuits, 2018, pp. 275-276, doi: 10.1109/VLSIC.2018.8502268.
- [14] L. Kull et al., "Implementation of Low-Power 6–8 b 30–90 GS/s Time-Interleaved ADCs With Optimized Input Bandwidth in 32 nm CMOS," in IEEE Journal of Solid-State Circuits, vol. 51, no. 3, pp. 636-648, March 2016, doi: 10.1109/JSSC.2016.2519397.
- [15] K. Sun, G. Wang, Q. Zhang, S. Elahmadi and P. Gui, "A 56-GS/s 8-bit Time-Interleaved ADC With ENOB and BW Enhancement Techniques in 28-nm CMOS," in IEEE Journal of Solid-State Circuits, vol. 54, no. 3, pp. 821-833, March 2019, doi: 10.1109/JSSC.2018.2884352.
- [16] L. Kull et al., "A 3.1 mW 8b 1.2 GS/s Single-Channel Asynchronous SAR ADC
   With Alternate Comparators for Enhanced Speed in 32 nm Digital SOI CMOS," in
   IEEE Journal of Solid-State Circuits, vol. 48, no. 12, pp. 3049-3058, Dec. 2013, doi: 10.1109/JSSC.2013.2279571.
- [17] Cai, S., Zhu, Y., Kiran, S., Hoyos, S. and Palermo, S., 2017. Reference switching pre-emphasis-based successive approximation register ADC with enhanced DAC settling. Electronics Letters, 53(20), pp.1352-1354.
- [18] L. Kull et al., "28.5 A 10b 1.5GS/s pipelined-SAR ADC with background second-stage common-mode regulation and offset calibration in 14nm CMOS FinFET," 2017
   IEEE International Solid-State Circuits Conference (ISSCC), 2017, pp. 474-475, doi: 10.1109/ISSCC.2017.7870467.

- Y. Zhu, T. Liu, S. Kaile, S. Kiran, I. Yi, R. Liu, J. Diaz, S. Hoyos and S. Palermo,
   " A 38GS/s 7b Time-Interleaved Pipelined-SAR ADC with Speed-Enhanced Bootstrapped Switch in 22nm FinFET," 2022 IEEE Custom Integrated Circuits Conference (CICC), 2022.
- [20] Y. Zhu et al., "A 1.5GS/s 8b Pipelined-SAR ADC with Output Level Shifting Settling Technique in 14nm CMOS," 2020 IEEE Custom Integrated Circuits Conference (CICC), 2020, pp. 1-4, doi: 10.1109/CICC48029.2020.9075942.
- [21] B. Razavi, "The Bridged T-Coil [A Circuit for All Seasons]," in IEEE Solid-State Circuits Magazine, vol. 7, no. 4, pp. 9-13, Fall 2015, doi: 10.1109/MSSC.2015.2474258.
- [22] B. R. Gregoire and U. Moon, "An Over-60 dB True Rail-to-Rail Performance Using Correlated Level Shifting and an Opamp With Only 30 dB Loop Gain," in IEEE Journal of Solid-State Circuits, vol. 43, no. 12, pp. 2620-2630, Dec. 2008, doi: 10.1109/JSSC.2008.2006312.
- [23] Tao Jiang, Wing Liu, F. Y. Zhong, Charlie Zhong and P. Y. Chiang, "Single-channel, 1.25-GS/s, 6-bit, loop-unrolled asynchronous SAR-ADC in 40nm-CMOS," IEEE Custom Integrated Circuits Conference 2010, 2010, pp. 1-4, doi: 10.1109/CICC.2010.5617411.
- [24] D. Schinkel, E. Mensink, E. Klumperink, E. van Tuijl and B. Nauta, "A Double-Tail Latch-Type Voltage Sense Amplifier with 18ps Setup+Hold Time," 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers, 2007, pp. 314-605, doi: 10.1109/ISSCC.2007.373420.

- [25] Kester, W. and Engineeri, A.D.I., 2005. *Data conversion handbook*. Newnes.
- [26] B. Razavi, "Design Considerations for Interleaved ADCs," in IEEE Journal of Solid-State Circuits, vol. 48, no. 8, pp. 1806-1817, Aug. 2013, doi: 10.1109/JSSC.2013.2258814.
- [27] M. Q. Le et al., "A background calibrated 28GS/s 8b interleaved SAR ADC in 28nm CMOS," 2017 IEEE Custom Integrated Circuits Conference (CICC), 2017, pp. 1-4, doi: 10.1109/CICC.2017.7993699.
- [28] M. Pisati et al., "6.3 A Sub-250mW 1-to-56Gb/s Continuous-Range PAM-4
   42.5dB IL ADC/DAC-Based Transceiver in 7nm FinFET," 2019 IEEE International Solid- State Circuits Conference - (ISSCC), 2019, pp. 116-118, doi: 10.1109/ISSCC.2019.8662428.
- [29] Y. Frans et al., "A 56-Gb/s PAM4 Wireline Transceiver Using a 32-Way Time-Interleaved SAR ADC in 16-nm FinFET," in IEEE Journal of Solid-State Circuits, vol. 52, no. 4, pp. 1101-1110, April 2017, doi: 10.1109/JSSC.2016.2632300.
- [30] S. Hoyos et al., "Clock-Jitter-Tolerant Wideband Receivers: An Optimized Multichannel Filter-Bank Approach," in IEEE Transactions on Circuits and Systems
  I: Regular Papers, vol. 58, no. 2, pp. 253-263, Feb. 2011, doi: 10.1109/TCSI.2010.2072090.
- [31] J. C. Gomez Diaz, S. Kiran, S. Palermo and S. Hoyos, "Jitter-Robust Multicarrier ADC-Based Serial Link Receiver Architecture : (Invited Special Session Paper),"
  2019 IEEE 62nd International Midwest Symposium on Circuits and Systems (MWSCAS), 2019, pp. 1151-1154, doi: 10.1109/MWSCAS.2019.8884927.

- [32] W. -H. Cho et al., "10.2 A 38mW 40Gb/s 4-lane tri-band PAM-4 / 16-QAM transceiver in 28nm CMOS for high-speed Memory interface," 2016 IEEE International Solid-State Circuits Conference (ISSCC), 2016, pp. 184-185, doi: 10.1109/ISSCC.2016.7417968.
- [33] S. Kiran, S. Cai, Y. Luo, S. Hoyos and S. Palermo, "A 32 Gb/s ADC-based PAM-4 receiver with 2-bit/stage SAR ADC and partially-unrolled DFE," 2018 IEEE Custom Integrated Circuits Conference (CICC), 2018, pp. 1-4, doi: 10.1109/CICC.2018.8357008.
- [34] H. Chen, X. Zhot, Q. Yu, F. Zhang and Q. Li, "A >3GHz ERBW 1.1GS/S 8B Two-Sten SAR ADC with Recursive-Weight DAC," 2018 IEEE Symposium on VLSI Circuits, 2018, pp. 97-98, doi: 10.1109/VLSIC.2018.8502370.
- [35] D. Li, J. Liu, H. Zhuang, Z. Zhu, Y. Yang and N. Sun, "A 7b 2.6mW 900MS/s Nonbinary 2-then-3b/cycle SAR ADC with Background Offset Calibration," 2019
   IEEE Custom Integrated Circuits Conference (CICC), 2019, pp. 1-4, doi: 10.1109/CICC.2019.8780191.
- [36] C. -H. Chan, Y. Zhu, I. -M. Ho, W. -H. Zhang, S. -P. U and R. P. Martins, "16.4 A 5mW 7b 2.4GS/s 1-then-2b/cycle SAR ADC with background offset calibration," 2017 IEEE International Solid-State Circuits Conference (ISSCC), 2017, pp. 282-283, doi: 10.1109/ISSCC.2017.7870371.
- [37] K. Zheng et al., "An Inverter-Based Analog Front-End for a 56-Gb/s PAM-4
   Wireline Transceiver in 16-nm CMOS," in IEEE Solid-State Circuits Letters, vol. 1, no. 12, pp. 249-252, Dec. 2018, doi: 10.1109/LSSC.2019.2894933.

- [38] M. -A. LaCroix et al., "6.2 A 60Gb/s PAM-4 ADC-DSP Transceiver in 7nm CMOS with SNR-Based Adaptive Power Scaling Achieving 6.9pJ/b at 32dB Loss,"
  2019 IEEE International Solid- State Circuits Conference - (ISSCC), 2019, pp. 114-116, doi: 10.1109/ISSCC.2019.8662322.
- [39] T. Ali et al., "6.4 A 180mW 56Gb/s DSP-Based Transceiver for High Density IOs in Data Center Switches in 7nm FinFET Technology," 2019 IEEE International Solid-State Circuits Conference (ISSCC), 2019, pp. 118-120, doi: 10.1109/ISSCC.2019.8662523.
- [40] P. Upadhyaya et al., "A fully adaptive 19-to-56Gb/s PAM-4 wireline transceiver with a configurable ADC in 16nm FinFET," 2018 IEEE International Solid State Circuits Conference (ISSCC), 2018, pp. 108-110, doi: 10.1109/ISSCC.2018.8310207.
- [41] L. Wang, Y. Fu, M. LaCroix, E. Chong and A. C. Carusone, "A 64Gb/s PAM-4 transceiver utilizing an adaptive threshold ADC in 16nm FinFET," 2018 IEEE International Solid - State Circuits Conference - (ISSCC), 2018, pp. 110-112, doi: 10.1109/ISSCC.2018.8310208.
- [42] M. Zhang, Y. Zhu, C. -H. Chan and R. P. Martins, "An 8-Bit 10-GS/s 16× Interpolation-Based Time-Domain ADC With <1.5-ps Uncalibrated Quantization Steps," in IEEE Journal of Solid-State Circuits, vol. 55, no. 12, pp. 3225-3235, Dec. 2020, doi: 10.1109/JSSC.2020.3012776.
- [43] S. Zhu, B. Xu, B. Wu, K. Soppimath and Y. Chiu, "A Skew-Free 10 GS/s 6 bit CMOS ADC With Compact Time-Domain Signal Folding and Inherent DEM," in
IEEE Journal of Solid-State Circuits, vol. 51, no. 8, pp. 1785-1796, Aug. 2016, doi: 10.1109/JSSC.2016.2558487.

[44] A. Roshan-Zamir et al., "A Reconfigurable 16/32 Gb/s Dual-Mode NRZ/PAM4
SerDes in 65-nm CMOS," IEEE Journal of Solid-State Circuits, vol. 52, no. 9, pp. 2430-2447, Sept. 2017.