## DESIGN TECHNIQUES FOR HIGH PIN EFFICIENCY WIRELINE TRANSCEIVERS

A Dissertation

by

### YANG-HANG FAN

## Submitted to the Office of Graduate and Professional Studies of Texas A&M University in partial fulfillment of the requirements for the degree of

## DOCTOR OF PHILOSOPHY

| Chair of Committee, | Samuel Palermo      |
|---------------------|---------------------|
| Committee Members,  | Kamran Entesari     |
|                     | Jun Zou             |
|                     | Duncan M. Walker    |
| Head of Department, | Miroslav M. Begovic |

May 2020

Major Subject: Electrical Engineering

Copyright 2020 Yang-Hang Fan

#### ABSTRACT

While the majority of wireline research investigates bandwidth improvement and how to overcome the high channel loss, pin efficiency is also critical in high-performance wireline applications. This research proposes two different implementations for high pin efficiency wireline transceivers. The first prototype achieves twice pin efficiency than unidirectional signaling, which is 32-Gb/s simultaneous bidirectional transceiver supporting transmission and reception on the same channel at the same time. It includes an efficient low-swing voltage-mode driver with an R-gm hybrid for signal separation, combining the continuous-time-linear-equalizer (CTLE) and echo cancellation (EC) in a single stage, and employing a low-complexity 5/4X CDA system. Support of a wide range of channels is possible with foreground adaptation of the EC finite impulse response (FIR) filter taps with a sign-sign least-mean-square (SSLMS) algorithm. Fabricated in TSMC 28nm CMOS, the 32-Gb/s SBD transceiver occupies 0.09-mm<sup>2</sup> area and achieves 16-Gb/s uni-directional and 32-Gb/s simultaneous bi-directional signals. 32-Gb/s SBD operation consumes 1.83-mW/Gb/s with 10.8-dB channel loss at Nyquist rate. The second prototype presents an optical transmitter with a quantum-dot (QD) microring laser. This can support wavelength-division multiplexing allowing for high pin efficiency application by packing multiple high-bandwidth signals onto one optical channel. The development QD microring laser model accurately captures the intrinsic photonic high-speed dynamics and allows for the future co-design of the circuits and photonic device. To achieve higher bandwidth than intrinsic one, utilizing both techniques of optical injection locking (OIL) and 2-tap asymmetric Feed-forward equalizer (FFE) can perform 22-Gb/s operation with 3.2-mW/Gb/s. The first hybrid-integration directly-modulated OIL QD microring laser system is demonstrated.

# DEDICATION

To my wife, my son, and my daughter.

#### ACKNOWLEDGMENTS

First of all, I wish to express my sincere appreciation to my advisor, Prof. Samuel Palermo, who convincingly guided and encouraged me to be professional in my doctoral research. Without his persistent help and motivation, this dissertation would not have been realized.

I would like to thank Qualcomm and Hewlett Packard Enterprise (HPE) Labs for their sponsor. Without their financial support, this dissertation could not have reached its goal. Also, I wish to express my gratitude to my colleague, Bo Sun, in Qualcomm, for the discussion in my research. I would like to thank my colleagues in HPE Labs, Di Liang, Sudharsanan Srinivasan, and Marco Fiorentino, for supporting my research and a HPE internship opportunity in 2018 Fall.

I wish to show my gratitude to Prof. Entesari, Prof. Walker, and Prof. Zou, for serving in my committee and their constructive feedback.

I have truly enjoyed and greatly benefited from the interaction with all research colleagues in Prof. Palermo's group. I would like to pay my special regards to Ashkan Roshan-Zamir, Takayuki Iwai, and Ankur Kumar, with whom I collaborated on an electrical and optical high-speed link works. I would like to thank the following colleagues: Po-Hsuan Chang, Yuanming Zhu, Shengchang Cai, Kunzhi Yu, Shiva Kiran, Ruida Liu, Chaerin Hong, Tong Liu, Peng Yan, and Dedeepya Annabattuni for technical discussion and friendship. I learned a lot from every technical interaction with you.

Last but not least, I would like to express my deepest gratitude to my lovely wife, Wan-Ting. Without her endless love and enduring support, I would not be able to complete my doctoral work, my dissertation, and this Ph.D. journey. I am deeply grateful for her patience and always believing in me. My son, Ryan, and daughter, Elena, are my source of joy and happiness. They were the reason that I could get up during every difficult time. For that, I dedicate this dissertation to them.

### CONTRIBUTORS AND FUNDING SOURCES

### Contributors

This work was supported by a dissertation committee consisting of Prof. Palermo [advisor], Prof. Entesari, Prof. Zou of the Department of Electrical and Computer Engineering and Prof. Walker of the Department of Computer Science and Engineering.

All other work conducted for the dissertation was completed by the student independently.

### **Funding Sources**

Graduate study was supported in part by Qualcomm, San Diego, CA, and Hewlett Packard Enterprise, Palo Alto, CA.

## NOMENCLATURE

| BBPD | Bang-Bang Phase Detector    |
|------|-----------------------------|
| BER  | Bit-Error Rate              |
| CDA  | Clock and Data Alignment    |
| EC   | Echo Cancellation           |
| FE   | Far-End                     |
| FFE  | Feed Forward Equalization   |
| I/O  | Input/Output                |
| ILO  | Injection-Locked Oscillator |
| ISI  | Inter-Symbol Interference   |
| JTOL | Jitter Tolerance            |
| NE   | Near-End                    |
| OIL  | Optical Injection Locking   |
| PI   | Phase Interpolator          |
| PLL  | Phase-Locked Loop           |
| QLL  | Quadrature-Locked Loop      |
| RX   | Receiver                    |
| SBD  | Simultaneous Bi-Directional |
| TX   | Transmitter                 |
| UD   | Uni-Directional             |

# TABLE OF CONTENTS

| Page |
|------|
|------|

| AB | BSTRACT                                        | . ii                                                 |
|----|------------------------------------------------|------------------------------------------------------|
| DE | EDICATION                                      | . iii                                                |
| AC | CKNOWLEDGMENTS                                 | . iv                                                 |
| CC | ONTRIBUTORS AND FUNDING SOURCES                | . vi                                                 |
| NC | OMENCLATURE                                    | . vii                                                |
| TA | ABLE OF CONTENTS                               | . viii                                               |
| LI | IST OF FIGURES                                 | . X                                                  |
| LI | IST OF TABLES                                  | . XV                                                 |
| 1. | INTRODUCTION                                   | . 1                                                  |
| 2. | BACKGROUND ON HIGH PIN EFFICIENCY TRANSCEIVERS | . 5                                                  |
|    | <ul> <li>2.1 Introduction</li></ul>            | . 5<br>. 5<br>. 8<br>. 13<br>. 22                    |
| 3. | SIMULTANEOUS BIDIRECTIONAL TRANSCEIVERS        | . 28                                                 |
|    | <ul> <li>3.1 Introduction</li></ul>            | . 28<br>. 30<br>. 35<br>. 41<br>. 48<br>. 52<br>. 62 |
| 4. | OPTICAL TRANSMITTER                            | . 63                                                 |

|    | 4.1        | A Dire  | ctly Modulated Quantum Dot Microring Laser Transmitter with In- |    |
|----|------------|---------|-----------------------------------------------------------------|----|
|    |            | tegrate | d CMOS Driver                                                   | 63 |
|    |            | 4.1.1   | Introduction                                                    | 63 |
|    |            | 4.1.2   | Quantum Dot Microring Laser                                     | 64 |
|    |            | 4.1.3   | Driver Architecture                                             | 68 |
|    |            | 4.1.4   | Simulation and Measurement Results                              | 69 |
|    |            | 4.1.5   | Conclusion                                                      | 70 |
|    | 4.2        | A 22-0  | Gb/s Directly Modulated Optical Injection-Locked QD Microring   |    |
|    |            | Laser 7 | Fransmitter with Integrated CMOS Driver                         | 71 |
|    |            | 4.2.1   | Introduction                                                    | 71 |
|    |            | 4.2.2   | QD Microring Laser Charaterization                              | 73 |
|    |            | 4.2.3   | Driver Architecture                                             | 74 |
|    |            | 4.2.4   | Measurement Results                                             | 79 |
|    |            | 4.2.5   | Conclusion                                                      | 82 |
|    |            |         |                                                                 |    |
| 5. | CON        | ICLUSI  | ON                                                              | 84 |
|    | <b>F</b> 1 |         |                                                                 | 05 |
|    | 5.1        | SBD I   | ransceiver Future Work                                          | 85 |
|    | 5.2        | QD Mi   | croring Laser Transmitter Future Work                           | 86 |
| DE | FEDE       | INCES   |                                                                 | 87 |
| КĽ | 1 L'UL     | LINCES  | •••••••••••••••••••••••••••••••••••••••                         | 07 |

# LIST OF FIGURES

| FIGURE |                                                                      | Page |
|--------|----------------------------------------------------------------------|------|
| 1.1    | Simultaneous bidirectional transceiver overview.                     | 2    |
| 1.2    | Optical transceiver overview                                         | 3    |
| 2.1    | Simplex                                                              | 5    |
| 2.2    | Half-duplex                                                          | 6    |
| 2.3    | Full-duplex                                                          | 7    |
| 2.4    | Simultaneous bi-directional full-duplex                              | 7    |
| 2.5    | Signal separation by adjusting the receiver reference                | 8    |
| 2.6    | Replica TX for signal separation.                                    | 9    |
| 2.7    | R-gm hybrid for signal separation                                    | 10   |
| 2.8    | $V_{ib}$ and $V_{ob}$ in SBD signaling                               | 11   |
| 2.9    | SBD receiver conceptual block diagram                                | 11   |
| 2.10   | Equalization in SBD transceiver                                      | 12   |
| 2.11   | $V_{ib}$ pulse responses of 10-dB loss with and without CTLE         | 13   |
| 2.12   | $V_{ib}$ bathtub curves of 10-dB loss with and without CTLE          | 13   |
| 2.13   | UD and SBD current-mode drivers with single-ended terminations       | 14   |
| 2.14   | UD and SBD current-mode drivers with differential-ended terminations | 15   |
| 2.15   | UD and SBD voltage-mode drivers with single-ended terminations       | 17   |
| 2.16   | Four conditions of SBD VM driver with single-ended termination       | 18   |
| 2.17   | UD and SBD voltage-mode drivers with differential-ended terminations | 19   |
| 2.18   | Four conditions of SBD VM driver with differential-ended termination | 20   |

| 2.19 | Optical link system.                                                                                                                                                                                                                                   | 22 |
|------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 2.20 | VCSEL driver.                                                                                                                                                                                                                                          | 24 |
| 2.21 | MZM driver                                                                                                                                                                                                                                             | 25 |
| 2.22 | EAM driver.                                                                                                                                                                                                                                            | 26 |
| 2.23 | Ring-resonator driver                                                                                                                                                                                                                                  | 26 |
| 3.1  | Signal separation in a SBD transceiver that includes echo cancellation                                                                                                                                                                                 | 29 |
| 3.2  | 6" FR4 channel. (a) $S_{21}$ and $S_{11}$ responses with and without a receiver CTLE. (b) CTLE frequency response.                                                                                                                                     | 31 |
| 3.3  | 6" FR4 channel. Simulated 16-Gb/s (a) UD pulse responses and (b) echo responses.                                                                                                                                                                       | 31 |
| 3.4  | 6" FR4 channel. Simulated 16-Gb/s (a) timing and (b) voltage margin for 16-Gb/s UD and 32-Gb/s SBD operation modes, including SBD without and with echo cancellation (EC)                                                                              | 32 |
| 3.5  | 6" FR4 channel. Simulated 16-Gb/s (a) timing and (b) voltage margin with various equalizer configurations for 32-Gb/s SBD operation modes with the echo cancellation enabled.                                                                          | 33 |
| 3.6  | Multi-channel source-synchronous SBD transceiver with adaptive echo cancellation (simplified single-ended schematic)                                                                                                                                   | 34 |
| 3.7  | SBD transmitter with echo cancellation data generation.                                                                                                                                                                                                | 35 |
| 3.8  | UD and SBD voltage-mode driver termination comparison                                                                                                                                                                                                  | 37 |
| 3.9  | (a) VM driver with sensing resistor $R_S$ for the R-gm hybrid. All transistors<br>employ minimum 30-nm length. (b) Digital on-chip resistor calibration<br>loop. Analog control loops for the (c) pull-up and (d) pull-down total<br>output impedance. | 38 |
| 3.10 | Equivalent SBD schematic                                                                                                                                                                                                                               | 40 |
| 3.11 | SBD receiver data path circuitry. All transistors have minimum 30-nm length except where mentioned                                                                                                                                                     | 41 |

| 3.12 | Simulated 32-Gb/s SBD operation over the 6" FR4 channel: (a) SBD eye diagram at CTLE output with echo cancellation activated. (b) Echo signal when received signal is deactivated.                              | 42 |
|------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 3.13 | Simulated 32-Gb/s SBD operation over the 6" FR4 channel: (a) Echo rms value versus data rate without and with echo cancellation activated. (b) Eye width and (c) height for different relative TX and RX timing | 43 |
| 3.14 | Normalized eye height versus R-gm hybrid 1-dB compression input am-<br>plitude                                                                                                                                  | 44 |
| 3.15 | 5/4X phase tracking CDA system.                                                                                                                                                                                 | 45 |
| 3.16 | Receiver of the clock channel.                                                                                                                                                                                  | 46 |
| 3.17 | Timing diagram of data and edge phases                                                                                                                                                                          | 47 |
| 3.18 | Edge rotating 5/4X CDA                                                                                                                                                                                          | 47 |
| 3.19 | UD mode                                                                                                                                                                                                         | 48 |
| 3.20 | SBD mode                                                                                                                                                                                                        | 49 |
| 3.21 | Foreground echo cancellation tap adaptation state sequence                                                                                                                                                      | 49 |
| 3.22 | State 1 and 3                                                                                                                                                                                                   | 50 |
| 3.23 | State 2                                                                                                                                                                                                         | 51 |
| 3.24 | Echo cancellation tap adaptation hardware.                                                                                                                                                                      | 51 |
| 3.25 | SSLMS hardware                                                                                                                                                                                                  | 52 |
| 3.26 | Chip micrograph of the 32-Gb/s SBD transceiver                                                                                                                                                                  | 53 |
| 3.27 | Voltage-mode driver test setup                                                                                                                                                                                  | 54 |
| 3.28 | 16-Gb/s eye diagram with no channel                                                                                                                                                                             | 54 |
| 3.29 | 16-Gb/s eye diagram with 2" channel                                                                                                                                                                             | 55 |
| 3.30 | 16-Gb/s eye diagram with 6" channel                                                                                                                                                                             | 55 |
| 3.31 | SBD transceiver test setup and channel response                                                                                                                                                                 | 56 |

| 3.32 | Measured 16-Gb/s UD and 32-Gb/s SBD timing and voltage bathtub curves operating over (a) 2" and (b) 6" channels.                                                    | 57 |
|------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 3.33 | Measured 16-Gb/s UD and 32-Gb/s SBD timing and voltage bathtub curves operating over the (a) 2" and (b) 6" channels.                                                | 58 |
| 3.34 | SBD transceiver jitter tolerance testing. (a) Setup. (b) Measured results                                                                                           | 59 |
| 3.35 | 32-Gb/s SBD transceiver power breakdown.                                                                                                                            | 61 |
| 4.1  | Quantum dot microring laser device: Schematic 3-D and cross section views.                                                                                          | 64 |
| 4.2  | Quantum dot microring laser device: CW LIV characteristic                                                                                                           | 65 |
| 4.3  | Quantum dot microring laser device: laser spectrum at room temperature                                                                                              | 66 |
| 4.4  | The QD microring laser model and extracted parameters                                                                                                               | 67 |
| 4.5  | $S_{21}$ Curve fitting results at 15-mA and 22-mA.                                                                                                                  | 67 |
| 4.6  | QD microring laser transmitter block diagram.                                                                                                                       | 68 |
| 4.7  | Hybrid-integrated QD microring laser transmitter prototype                                                                                                          | 69 |
| 4.8  | Simulated/measured optical eye diagrams at 12-Gb/s before applying asymmetric FFE (top) and after applying asymmetric FFE (bottom)                                  | 70 |
| 4.9  | Optical injection-locked quantum-dot microring laser with CMOS driver                                                                                               | 71 |
| 4.10 | Direct modulation response without and with OIL at different DC bias current levels. The horizontal dotted line marks the response level 3dB below the value at DC. | 74 |
| 4.11 | Microring laser non-linear optical dynamic behavior.                                                                                                                | 75 |
| 4.12 | 5-channel driver prototype block diagram.                                                                                                                           | 76 |
| 4.13 | QD microring laser transmitter block diagram.                                                                                                                       | 76 |
| 4.14 | CML output driver with asymmetric FFE                                                                                                                               | 77 |
| 4.15 | Current profiles of (a) high pass and (b) low pass behaviors for the falling edge and (c) high pass and (d) low pass behavior for the rising edge                   | 78 |
| 4.16 | Hybrid-integrated QD microring laser transmitter prototype                                                                                                          | 79 |

| 4.17 | Optical measurement setup.                                                                                                                                                                                                                                                               | 80 |
|------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 4.18 | Microring laser directly modulated with CMOS driver. Measured optical eye diagrams at 4-Gb/s without OIL (a) before and (b) after applying FFE. 10-Gb/s eye diagrams with OIL (c) before and (d) after applying FFE. 22-Gb/s eye diagrams with OIL (e) before and (f) after applying FFE | 81 |
| 5.1  | Round-trip delay detection                                                                                                                                                                                                                                                               | 85 |

# LIST OF TABLES

TABLE

| 2.1 | Driver current consumption summary               | 22 |
|-----|--------------------------------------------------|----|
| 3.1 | SBD transceiver performance comparisons          | 60 |
| 4.1 | Microring laser transmitter performance summary. | 83 |

Page

### 1. INTRODUCTION

In recent years, high-speed serial I/O bandwidth and density have shown a significant expansion of computing capabilities. International Technology Roadmap for Semiconductors (ITRS) expects that the bandwidth of 61.9-Gb/s between Application Processor (AP) and Main memory will be required, and the number of AP cores will increase to 25 by 2029 [1]. While high-performance I/O circuitry can leverage technology improvements, unfortunately, the bandwidth of the electrical channel used for inter-chip communication and the I/O pin number have not scaled in the same manner. Expanding computing capabilities demands not only an increase in input/output (I/O) communication bandwidth but also the improvement of the pin efficiency. Multiple data streams transmission or reception on the same wireline is attractive to investigate.

This allows us to take consideration of simultaneous bi-directional (SBD) signaling technique which, relative to uni-directional signaling, offers higher spectral efficiency, lower loss at the Nyquist frequency, relaxed clock speeds, and reduced pin density. These advantages have led to the implementation of SBD transceiver which doubles the throughput per pin compared to uni-directional signaling because of transmitting and receiving data simultaneously on the same channel.

Signal separation is needed in SBD transceivers to obtain the inbound signal from the receiver-end signal. The SBD transceivers, with the aggregate data rate from 600-Mb/s to 6.4-Gb/s, achieved inbound signal separation by changing the comparator's reference voltage according its output data [2, 3, 4, 5, 6, 7]. To achieve aggregate data rate, 5-Gb/s, and 8-Gb/s, the SBD transceivers adapted switched-capacitor hybrid (SCH) to subtract the outbound signal generated by the replica driver [8, 9]. The resistor-transconductor (R-gm) hybrid, applied to the aggregate data rate 20-Gb/s, was proposed to obtain the separated

inbound signal [10]. Relative to the uni-directional signaling system, SBD transceivers require more stringent circuits to separate the receiver-end signal.



Figure 1.1: Simultaneous bidirectional transceiver overview.

While SBD has better spectral efficiency relative to uni-directional signaling, this doesnt make it the superior signaling option for all systems. The major interference or uncertainty in the separated signal was the replica driver mismatch, and the long channel and the echoes introduced by the channel discontinuities restrict SBD aggregate rate [9]. The R-gm hybrid approach eliminates the need for a replica driver and therefore avoids the issue of replica driver mismatch [10]. As the chip-to-memory I/O transceivers are often designed to support short channels, low power, and high pin density, this motivates us to develop the SBD signaling transceiver with the higher aggregate data rate. Fig. 1.1 presents the SBD transceiver overview that two signals flow on the same channel at the same time. The SBD receiver conceptually has a subtractor to cancel out the outbound

signal( $V_1$ ) for inbound signal ( $V_2$ ) separation.

The other compelling solutions for increasing the pin efficiency are photonic integrated circuits (PICs) due to the low loss optical channel and inherent wavelength-division multiplexing (WDM). Fig. 1.2 exhibits an example as the optical transceiver overview. The transmitter IC drivers the ring lasers which can generate different wavelength lights with proper bias control. The lights of the different wavelengths can be sent to the receiver through a single-mode fiber (SMF). The drop filters and photo detectors of the optical receiver can convert each light power of different wavelengths to the electrical current signal, and the CMOS receiver IC is responsible for amplifying the data signal and deserializing it to low-speed data.



Figure 1.2: Optical transceiver overview.

While the optical transceiver has the inherent WDM and the potential to meet the bandwidth-density requirements of the high computation ability in the futures data center, the integration of high-speed circuits and light source/filter/detector is still a challenge. The optical interconnect system efficiency is relying on the ability to optimize the transceiver circuitry for low-power and high-bandwidth operation. These motives a hybrid-integration of III-V-on-silicon lasers and the CMOS circuits and a co-simulation

environment of the CMOS circuits and the optical device.

This dissertation presents design techniques for high pin efficiency in electrical and optical interconnects. The section 2 explains the background of SBD transceiver and optical transmitters. Section 3 investigates the SBD transceiver requirement in the system level and show the details of the prototype implementation. Section 4 looks into a design of the directly modulated quantum dot microring laser transmitter with an integrated CMOS driver. Finally, section 5 concludes this dissertation.

#### 2. BACKGROUND ON HIGH PIN EFFICIENCY TRANSCEIVERS

### 2.1 Introduction

This section briefly explains the background and challenges in high pin efficiency wireline transceiver design. Section 2.2 discusses the different communication systems. Section 2.3 introduces a simultaneous bidirectional transceiver (SBD) and compares different implementations in different data-rates. Section 2.4 presents UD and SBD drivers and current conseption comparisons. Section 2.5 introduces the optical transmitter background and challenges.

### 2.2 Communication System



Figure 2.1: Simplex.

The communication system has three types of transmission, simplex, half-duplex, and full-duplex. The transmission type defines the direction of signal flow. The main target of wireline communication is to transmit and receive the data between two systems with bit-error free. Fig. 2.1 shows the simplex which is simplest type of the communication. The left transmitter delivers the data to the right receiver with a specific direction of data. The right-side system can't reply to the left-side system. Only one transmitter on the left and one receiver on the right can realize this communication system. This one direction

of communication is also called uni-directional signaling. We can take the relationship between a keyboard and a computer as an example. The keyboard only sends the data to the computer, and the computer only receives data without feeding back any commands to control the keyboard. However, this type of communication is the worst performing mode of transmission, and some high-performance systems needs both directions of communication.



Figure 2.2: Half-duplex.

The half-duplex type of transmission is shown in Fig. 2.2. Both sides of the channel have a transmitter and a receiver. The communications between transmitter 1/2 and receiver 1/2 occur in both directions, but only one of data 1 and data 2 is allowed on the channel at any given time. This type of wireline communication still costs one pin instead of two. The walkie-talkie is a good example, the speakers at both ends can speak, but one can speak when the other one stops to speak. Both ends can't speak simultaneously. Hence, the direction of communication is two-directional, but it's one-directional at a time. The half-duplex is a better performing mode of transmission than simplex.

For the best performing type of transmission, full-duplex communication is the choice because of the simultaneously transmitting data in both directions shown in Fig. 2.3. The data 1 and data 2 are transmitted all the time. If an electrical channel only supports one



Figure 2.3: Full-duplex.

direction of data, full-duplex wireline communication requires two channels to realize. For example, both ends are free to speak and listen all the time in the telephone conversation. The communication is bi-directional, but pin/channel cost is double of the half-duplex or simplex.



Figure 2.4: Simultaneous bi-directional full-duplex.

The simultaneous bi-directional (SBD) transmission of signals between two sides has been proposed to achieve optimal performance while maintaining the same pin count [11]. As shown in Fig. 2.4, two directions of data 1 and data 2 exist on the same electrical channel all the time. This technique essentially creates three signal levels on a communication path. Hence, the signal reconstruction needs to be investigated, and the next section will discuss more the challenges in different SBD architectures.

### 2.3 Previous SBD Transceiver

SBD architecture not only gets the best performing of both directions of communication but also maintain the same pin count as half-duplex and simplex. To achieve this, the previous works proposed different solutions to the signal separation. This section discusses the previous SBD architectures. The aggregate data-rate (or throughput) mentioned in the previous works combines both directional data, so it is two times of the I/O data-rate.



Figure 2.5: Signal separation by adjusting the receiver reference

The first type of SBD architecture shown in Fig. 2.5 was proposed for the relative low aggregate date-rate from 600-Mb/s to 6.4-Gb/s [2, 3, 4, 5, 6, 7]. When two PAM2 or PAM4 signals delivering on the same path, the data level becomes to be more than two levels. The receiver selects the corresponding references according to the transmitter's data to compensate the outbound signal. Hence, the first type of SBD architecture is to adjust the receiver's reference for the signal separation.



Figure 2.6: Replica TX for signal separation.

The second type of SBD architecture shown in Fig. 2.6 takes a transmitter replica to generate the subtracted voltage for the receiver. The transmitter replica utilizes the same outbound bit sequence. The receiver with switched-capacitor hybrid (SCH) extracts the inbound signal by subtracting the replica outbound signal. To minimize power consumption, the transmitter replica is downsized from the transmitter. Hence, the challenges for the second type is to match the delay, swing, and time constant of the main driver [9].

Fig. 2.7 presents the third type of SBD architecture. The SBD transceiver uses a resistor-transconductor (R-gm) hybrid to subtract the outbound signal [10]. The pure inbound signal passes to the receiver. This R-gm hybrid eliminates the need of a replica driver to generate the subtracted outbound signal. Therefore, this type avoids the issues caused by the mismatches of the primary and replica driver. This one achieved the ag-



Figure 2.7: R-gm hybrid for signal separation.

gregate date-rate 20-Gb/s. The main driver was implemented by the current-mode (CM) which consumes 2X power than the voltage-mode (VM) driver. More details of the SBD driver type selection are discussed in section 2.4.

Fig. 2.8 can explain the inbound signal  $(V_{ib})$  and outbound signal  $(V_{ob})$ . The transceivers on both sides are symmetric in this figure, and we take the left one as an example. The transmitter sends out the outbound signal depending on the bit sequence  $(D_{in})$ . The sharper and larger waveform is the outbound signal. The inbound signal  $(V_{ib})$  is attenuated by the channel loss, so it has a smaller swing depending on how much loss the channel is. The receiver receives the superposition of both  $V_{ib}$  and  $V_{ob}$ .

The most different part of SBD transceiver from the uni-directional transceiver is the subtractor and outbound signal generator ( $V_{gen}$ ) shown in Fig. 2.9. Every previous SBD work has similar blocks in it. The SBD receiver needs to separate the inbound signal from the outbound signal. Hence, the subtraction is implemented either by adjusting the



Figure 2.8:  $V_{ib}$  and  $V_{ob}$  in SBD signaling



Figure 2.9: SBD receiver conceptual block diagram

receiver's reference, switch-capacitor hybrid (SCH), or R-gm hybrid. The  $V_{ib} + V_{ob}$  comes into the subtractor, and only  $V_{ib}$  can pass to the decision block.



Figure 2.10: Equalization in SBD transceiver

Fig. 2.10 shows the equalization in SBD transceiver. SBD is commonly applied to short channel applications because of the signal separation. The SBD transceiver is more sensitive to the reflections from the near-end and far-end of the channel, so the termination impedance matching to the channel is more critical than the UD transceiver. This transmitter driver is used to be the termination, so feed-forward equalizer in the transmitter is not common to compensate for the channel loss. SBD transceiver prefers to implementing equalizer in the receiver to compensate for a decent loss. The Fig. 2.11 and Fig. 2.12 shows a proper equalizer continuous-time-linear-equalizer (CTLE) can compensate the 10-dB loss at 8-GHz and improve the 16-Gb/s eye open for a given  $BER = 10^{-12}$ .



Figure 2.11: Vib pulse responses of 10-dB loss with and without CTLE



Figure 2.12:  $V_{ib}$  bathtub curves of 10-dB loss with and without CTLE

### 2.4 Uni-Directional and SBD Signaling Drivers

This section discusses possible SBD signaling driver structures and the current consumptions of the conventional uni-directional and SBD signaling drivers. The output driver is the last stage of a transmitter, and it should drive the channel load and provide a good termination load. The driver can be current-mode or voltage-mode, and the termination can be the single-ended mode or differential-ended mode. The driver is a major part of power consumption in the low-power transmitter. More details of the current consumption comparison are presented.



Figure 2.13: UD and SBD current-mode drivers with single-ended terminations.

In Fig. 2.13, the top one is an UD current-mode driver with single-ended termination on the receiver side. The current consumption equations are

$$V_{d,1} = I(\frac{Z_0}{2})$$
(2.1)

$$V_{d,0} = -I(\frac{Z_0}{2}) \tag{2.2}$$

$$V_{d,pp} = IZ_0 \tag{2.3}$$

$$I = \frac{V_{d,pp}}{Z_0} \tag{2.4}$$

Where I is the CM driver's tail current,  $V_{d,1}$  is the differential voltage  $(V_d)$  at the right side when the left driver transmits a one,  $V_{d,0}$  is the same when transmitting a zero,  $V_{d,pp}$  is the peak-to-peak differential voltage  $(V_{d,1}-V_{d,0})$ . Hence, the current consumption is  $\frac{V_{d,pp}}{Z_0}$  for a given  $V_{d,pp}$ .

The SBD single-ended current-mode driver can be extended from the UD one, as shown in bottom of Fig. 2.13. In this case, the driver on the right side adds a tail current and switches, and its termination can reuse the right-side receiver's termination. The terminations on both sides keep the same values as the UD CM driver. Hence, current consumption derivation is same as UD single-ended current-mode driver, and current consumption is still  $\frac{V_{d,pp}}{Z_0}$  for a given  $V_{d,pp}$ .



Figure 2.14: UD and SBD current-mode drivers with differential-ended terminations.

In Fig. 2.14, the top one is an UD current-mode driver with differential-ended termi-

nation on the receiver side. The current consumption equations can be derived as below. Hence, the current consumption is  $\frac{V_{d,pp}}{Z_0}$  for a given  $V_{d,pp}$ .

$$V_{d,1} = \frac{I}{4}(2Z_0) \tag{2.5}$$

$$V_{d,0} = -\frac{I}{4}(2Z_0) \tag{2.6}$$

$$V_{d,pp} = IZ_0 \tag{2.7}$$

$$I = \frac{V_{d,pp}}{Z_0} \tag{2.8}$$

The SBD CM driver with differential-ended termination on the receiver side is expanded by the UD one, as shown in the bottom of Fig. 2.14. The right side adds a green CM driver with its termination, and the left side adds a green different-ended termination for its receiver. The drivers on both sides are symmetric. In order to maintain these terminations matching to the channel, the single-ended and differential-ended resistances need to increase to two times. However, current consumption is still  $\frac{V_{d,pp}}{Z_0}$  for a given  $V_{d,pp}$ .

Fig. 2.15 presents an UD voltage-mode driver with the single-ended termination on the receiver side. The current consumption derivation is as follows.

$$V_{d,1} = \frac{V_s}{2} \tag{2.9}$$

$$V_{d,0} = -\frac{V_s}{2}$$
(2.10)

$$V_{d,pp} = V_s \tag{2.11}$$

$$I = \frac{V_s}{2Z_0} \tag{2.12}$$

$$I = \frac{V_{d,pp}}{2Z_0} \tag{2.13}$$



Figure 2.15: UD and SBD voltage-mode drivers with single-ended terminations.

Where  $V_{d,1}$  is the differential voltage  $(V_d)$  at the right side when the left driver transmits a one,  $V_{d,0}$  is the same when transmitting a zero,  $V_{d,pp}$  is the peak-to-peak differential voltage  $(V_{d,1}-V_{d,0})$ , and I is the current flowing out from  $V_s$ . Hence, the current consumption is  $\frac{V_{d,pp}}{2Z_0}$  for a given  $V_{d,pp}$  which is  $V_s$ .

Fig. 2.15 also shows the SBD VM/SE driver on the bottom. The right side removes the existed termination and adds a VM driver with its terminations. The driver's termination is also the receiver's termination, so the termination load maintains the same value of  $Z_0$ . To calculate the peak and average current consumption, we need to consider four possible conditions shown in Fig. 2.16.

$$I_{(a)} = \frac{V_s}{2Z_0}$$
(2.14)

$$V_{d,(a)} = 0 (2.15)$$





Figure 2.16: Four conditions of SBD VM driver with single-ended termination.

$$I_{(b)} = 0$$
 (2.16)

$$V_{d,(b)} = V_s \tag{2.17}$$

$$I_{(c)} = 0 (2.18)$$

$$V_{d,(c)} = -V_s$$
 (2.19)

$$I_{(d))} = \frac{V_s}{2Z_0}$$
(2.20)

$$V_{d,(d)} = 0 (2.21)$$

$$V_{d,pp} = V_s \tag{2.22}$$

$$I_{peak,sbd,se} = \frac{V_{d,pp}}{2Z_0} \tag{2.23}$$

$$I_{avg,sbd,se} = \frac{V_{d,pp}}{4Z_0} \tag{2.24}$$

Where  $V_{d,(a)}$ ,  $V_{d,(b)}$ ,  $V_{d,(c)}$ , and  $V_{d,(d)}$  are differential voltages of the Fig. 2.16 (a) to (d), and  $I_{(a)}$ ,  $I_{(b)}$ ,  $I_{(c)}$ , and  $I_{(d)}$  are the currents flowing out from  $V_s$ , and  $V_{d,pp}$  is the ideal inbound signal swing when the outbound signal is perfectly removed.  $I_{peak}$  and  $I_{avg}$  are the peak current and the average current for those four conditions.





Figure 2.17: UD and SBD voltage-mode drivers with differential-ended terminations.

Fig. 2.17 shows an UD voltage-mode driver with the differential-ended termination on the receiver side. The current consumption derivation is as follow. Hence, the current consumption is  $\frac{V_{d,pp}}{4Z_0}$  for a given  $V_{d,pp}$  which is  $V_s$ .

$$V_{d,1} = \frac{V_s}{2}$$
(2.25)

$$V_{d,0} = -\frac{V_s}{2}$$
(2.26)

$$V_{d,pp} = V_s \tag{2.27}$$

$$I = \frac{V_s}{4Z_0} \tag{2.28}$$

$$I = \frac{V_{d,pp}}{4Z_0} \tag{2.29}$$

The UD VM/Diff driver can be expanded to SBD VM/Diff driver by adding a green VM driver at the right side and a differential-ended termination at the left side (See at the bottom of Fig. 2.17). The overall termination should match to the channel  $Z_0$ , so the original driver and receiver's termination values should increase to double. Fig. 2.18 shows the four possible conditions, the equations for the peak and average currents are writing as below.





Figure 2.18: Four conditions of SBD VM driver with differential-ended termination.

$$I_{(a)} = \frac{V_s}{4Z_0}$$
(2.30)

$$V_{d,(a)} = 0 (2.31)$$

$$I_{(b)} = \frac{V_s}{8Z_0}$$
(2.32)

$$V_{d,(b)} = \frac{V_s}{2}$$
(2.33)

$$I_{(c)} = \frac{V_s}{8Z_0}$$
(2.34)

$$V_{d,(c)} = -\frac{V_s}{2}$$
(2.35)

$$I_{(d))} = \frac{V_s}{4Z_0}$$
(2.36)

$$V_{d,(d)} = 0 (2.37)$$

$$V_{d,pp} = \frac{V_s}{2} \tag{2.38}$$

$$I_{peak,sbd,diff} = \frac{V_{d,pp}}{2Z_0}$$
(2.39)

$$I_{avg,sbd,se} = \frac{3V_{d,pp}}{8Z_0} \tag{2.40}$$

Where  $V_{d,(a)}$ ,  $V_{d,(b)}$ ,  $V_{d,(c)}$ , and  $V_{d,(d)}$  are differential voltages of the Fig. 2.18(a) to (d), and  $I_{(a)}$ ,  $I_{(b)}$ ,  $I_{(c)}$ , and  $I_{(d)}$  are the current flowing out from  $V_s$ , and  $V_{d,pp}$  is the ideal inbound signal swing when the outbound signal is perfectly removed.  $I_{peak,sbd,diff}$  and  $I_{avg,sbd,se}$  are the peak currents and the average currents for those four conditions.

Table 2.1 summarizes all different driver/termination structures. In UD mode, for the same peak-to-peak differential swing, the single-ended and differential-ended VM drivers consume a half and a quarter current compared to single-ended and differential-ended CM drivers, respectively. In SBD mode, CM drivers maintain the same current consumption
| Signaling/Driver/Term |         | Peak Current    | Average Current  | $V_{d,pp}$ |
|-----------------------|---------|-----------------|------------------|------------|
| UD                    | CM/SE   | $V_d$           | $IZ_0$           |            |
|                       | CM/Diff | $V_d$           | $IZ_0$           |            |
|                       | VM/SE   | $V_{d,p}$       | $V_s$            |            |
|                       | VM/Diff | $V_{d,p}$       | $V_s$            |            |
| SBD                   | CM/SE   | $V_d$           | $IZ_0$           |            |
|                       | CM/Diff | $V_d$           | $IZ_0$           |            |
|                       | VM/SE   | $V_{d,pp}/2Z_0$ | $V_{d,pp}/4Z_0$  | $V_s$      |
|                       | VM/Diff | $V_{d,pp}/2Z_0$ | $3V_{d,pp}/8Z_0$ | $V_s/2$    |

Table 2.1: Driver current consumption summary.

as UD mode. SBD VM drivers have different peak currents from average currents. The interesting for the SBD signaling is that the most efficient structure is single-ended VM driver instead of differential-ended VM. While it has the same peak current as differential termination, it has double the effective signal swing. More detail of SBD driver is discussed in section 3.3.

# 2.5 Optical Transmitter



Figure 2.19: Optical link system.

This section introduces the background of the optical transmitter in an optical link system, as shown in Fig. 2.19. The left side is an optical transmitter which has the same high-speed electrical link architecture, including a serializer, a clock generator, and a driver, except the electrical/optical (E/O) converter. The right side is an optical receiver which has a transimpedance amplifier (TIA), a limiting amplifier (LA), a de-serializer, a clock/data recovery (CDR), and the optical/electrical (O/E) converter. The main difference between electrical and optical links is the replacing an electrical channel by an optical channel. To transmitting and receiving the light signal, the E/O and O/E photonic devices are required to consider in the optical link system.

To overcome the electrical channel bandwidth limitation, the traditional electrical interconnects increased circuit complexity as the four-level pulse amplitude modulation (PAM4) scheme with a feed-forward equalizer (FFE), decision feedback equalizer (DFE), continuous-time-linear-euqalizer (CTLE), and an infinite impulse response (IIR) [12, 13]. Although those equalization techniques can theoretically compensate the channel loss, the power efficiency of the equalization circuits is required to consider. The power efficiency of high speed I/O takes the benefit of the process shrinking, but this trend reverses and begins to decline with data-rate as increasingly-complex equalization becomes necessary [14]. The pin number for a chip is another trend that can't scale with the process. SBD transceiver has been proposed for pin efficiency improvement [11]. While the 28nm transceiver circuits can operate at near double the achieved 16-Gb/s unidirectional datarate with some degradation in power efficiency, ultimately, what limits significant data-rate scaling in SBD mode is the channel characteristics [15, 16]. More complex architecture is necessary in order to support channels with higher insertion loss and more complex echo characteristics due to impedance discontinuities along the channel.

Let's take a look at why the optical link system is so promising. The optical channel is much lower loss and cross-talk than an electrical one, and it has the potential for multiple data on a single fiber via wavelength-division-multiplexing (WDM). WDM is a way to use multiple wavelengths to transmit independent information. Hence, the optical link is a possible solution for high date-rate and pin efficiency.

The E/O in Fig. 2.19 is the optical source for the optical transmitter. There are two modulation techniques as directly modulating laser output power and external modulation of continuous-wave (CW) DC laser with the absorptive or refractive modulator. The known E/O for chip-to-chip links are vertical-cavity surface-emitting laser (VC-SEL), Mach-Zehnder modulator (MZM), electro-absorption modulator (EAM), and ring-resonator modulator. The VCSEL is the directly modulated laser, and the rest of them are externally modulated lasers. According to the E/O characteristics, the driver circuit structures ,as follows, are designed to provide corresponding current/voltage profiles.



Figure 2.20: VCSEL driver.

Fig. 2.20 shows the VCSEL transmitter with nonlinear equalization [17]. Currentmode drivers are often used to drive the laser due to its linear DC L-I relationship. However, the VCSEL has asymmetric rising and falling edge responses, so the linear equalizer performance is limited by VCSEL non-linearity. To individually optimize the rising and falling with high/low pass behaviors, driver uses two separated taps,  $I_r$  and  $I_f$ . In addition to the high-speed modulation current  $I_{data}$ , the driver provides the laser with a minimum bias current  $I_{bias}$  to ensure no turn-on delay.



Figure 2.21: MZM driver.

Fig. 2.21 shows a MZM driver. The CW input is split into two lights, and both experience phase shifts through the two paths and then are recombined to one light. The phase shift is controlled by the modulation voltage [18]. The phase shift between two lights is  $0 \deg$  or  $180 \deg$ , then the output intensity is high or low, respectively. Hence the driver has a differential output producing a modulation voltage to MZM.

Fig. 2.22 shows a EAM driver. The CW input is absorbed in EAM device, and the reverse-bias of the PN junction can control the absorption. The optical output is high at low reverse-bias, and the absorption increases when a strong reverse-bias is applied and the optical output is low [19]. To avoid the chirp in the EAM output, the driver is required to supply a minimum reverse-bias. The high-voltage output stage is utilized to have enough dynamic swing to achieve a proper extinction ratio.



Figure 2.22: EAM driver.



Figure 2.23: Ring-resonator driver.

Fig. 2.23 presents a example of ring-resonator modulator . Ring-resonator is a refractive device in which part of CW input coupled into the ring interferes with the waveguide light. It displays a notch filter response at the through port. At a given wavelength, the ring-resonator outputs a low power intensity as activating. The resonance wavelength of the ring device can be shifted by changing the effective refractive index of the waveguide through the free-carrier plasma dispersion effect [20]. To achieve a proper extinction ratio, the modulation voltage swing needs to be large enough, so the high-voltage output stage and the differential output driver are proposed [21, 22]. We summarize that the optical transmitter has the design challenge of the co-design and co-optimization of the electronics and photonics. This lets us seriously consider what design factors can be solved in either the electrical or optical domain. The dimension of the optical device limits the bandwidth, but the optical device with the small dimension could need higher electrical output swing to drive for producing enough optical extinction ratio. The electrical circuits are hard to provide high voltage swing, especially in the low voltage supply process. Also, the high output swing costs relatively high power consumption. Still, they are easier to implement the equalizer to boost the bandwidth of the transmitter. For achieving the best power efficiency, the co-simulation environment is required to develop.

### 3. SIMULTANEOUS BIDIRECTIONAL TRANSCEIVERS<sup>1</sup>

### 3.1 Introduction

As shown in Fig. 3.1, one challenge in SBD transceivers involves separating the transmitted outbound signal  $V_{ob}$  from the received inbound signal that includes both the desired received signal Vib and echoes  $V_E$ . Efficient techniques are necessary to generate a replica Vob for subtraction from the total signal present at the transceiver interface to allow for the extraction of only the inbound signal. One approach is to switch the references of the comparators used in the receiver according to the outbound transmit data [2, 3, 4, 5, 6, 7, 23]. Other previous SBD transceivers have utilized scaled replica drivers and switched-capacitor sample and subtract circuitry to accomplish this [8, 9]. However, both of these approaches require precise delay matching with the main output driver that can limit the maximum data rate. This issue was resolved with a resistor-transconductance (R-gm) hybrid circuit consisting of a main current-mode driver with a sensing resistor in series with the channel and subsequent transconductance cells that perform a weighted subtraction of the signals on either side of the sensing resistor [10]. While this currentmode R-gm hybrid approach is effective, voltage-mode output stages generally allow for further improvements in power efficiency.

Margin degradation due to near-end (NE) and far-end (FE) echoes is another major challenge. NE echoes occur instantly at the NE chip interface and can be relatively large in amplitude. Conversely, FE echoes occur when  $V_{ob}$  travels to the FE interface and generates reflections that return to the NE side after a round-trip propagation delay. These FE echoes experience the channel filtering and are generally smaller in amplitude and more dispersed

<sup>&</sup>lt;sup>1</sup>©2020 IEEE. Part of chapter 3 is reprinted, with permission, from Y.-H. Fan, A. Kumar, T. Iwai, A. Roshan-Zamir, S. Cai, B. Sun, and S. Palermo, "A 32-Gb/s Simultaneous Bidirectional Source-Synchronous Transceiver With Adaptive Echo Cancellation Techniques," *IEEE Journal of Solid-State Circuits*, vol. 55, no. 2, pp. 439-451, Feb. 2020.



Figure 3.1: Signal separation in a SBD transceiver that includes echo cancellation. Reprinted with permission from [16].

in time. Unfortunately, the typical high-pass responses realized by the linear equalizers used to compensate for channel loss, such as a transmitter feedforward equalizer (FFE) and a receiver continuous-time linear equalizer (CTLE), cannot compensate for this and can even boost the echo signals. Given that the impact of both NE and FE echoes can vary depending on packing approaches and channel length, adaptive echo cancellation (EC) techniques that are compatible with common equalizers are necessary to support robust operation at high data rates over a wide range of channels.

Data sampling by the receiver circuitry should also be implemented in an efficient manner in an SBD system. This is possible with a source-synchronous architecture that

allows for high-frequency jitter tolerance with low-complexity deskew circuitry. An important consideration in an SBD system is having the flexibility to forward the clock from either side in different operation modes. The phase deskew circuitry should also track low-frequency phase variations due to the temperature and power supply variations and be compatible with the EC adaptation algorithm.

This article presents a 32-Gb/s source-synchronous SBD transceiver that includes a low-power voltage-mode driver with R-gm hybrid circuitry and performs EC with a finite impulse response (FIR) filter whose taps are adapted with a signsign least-mean squares (SSLMS) algorithm to compensate for both NE and FE echoes[15]. The SBD transceiver architecture that supports transmission over short distance channels with bidirectional forwarded clock operation modes is given in detail in Section 3.2. Section 3.3 discusses the transmitter that outputs the main data signal with a voltage-mode driver that includes a sensing resistor and also generates the delayed data signals used for EC. The receiver, which includes a CTLE to efficiently compensate for channel loss and a 5/4X phase interpolator (PI)-based clock and data alignment (CDA) system for robust deskew[24], is outlined in Section 3.4. Section 3.5 describes the adaptive EC scheme that performs fore-ground tap calibration with an SSLMS algorithm that utilizes error information obtained from the receiver samplers. Experimental results from a 28-nm CMOS prototype are presented in Section 3.6. Finally, Section 3.7 concludes this article.

### **3.2 SBD Transceiver Architecture**

Fig. 3.2 (a) shows the frequency response of a 6" FR4 channel that has 10.2-dB loss at the 8-GHz Nyquist frequency for 32-Gb/s simultaneous bidirectional signaling. The frequency-dependent channel loss causes dispersion in the unidirectional 16-Gb/s pulse response (Fig. 3.3 (a)), with significant inter-symbol interference (ISI) in the first pre-



Figure 3.2: 6" FR4 channel. (a)  $S_{21}$  and  $S_{11}$  responses with and without a receiver CTLE. (b) CTLE frequency response. Reprinted with permission from [16].



Figure 3.3: 6" FR4 channel. Simulated 16-Gb/s (a) UD pulse responses and (b) echo responses. Reprinted with permission from [16].

cursor and several post-cursor terms. Utilizing a custom-coded MATLAB-based link modeling tool that builds upon previous statistical simulation methodologies[25], it is shown that efficient ISI compensation is possible with a receive-side single-stage CTLE that has 5dB high-frequency peaking near 8-GHz (Fig. 3.2 (b)). This CTLE provides a larger main cursor and attenuated post-cursor ISI terms to allow for 16-Gb/s UD operation at a  $BER < 10^{-12}$  with 200- $mV_{ppd}$  swing (Fig. 3.4 (a) and (b)). While further performance improvement is possible by moving the single-stage CTLE peaking to a higher frequency[26] to completely cancel the first post-cursor ISI or by utilizing a multi-stage CTLE that can better match the channel loss profile, this single-stage design is chosen for low-complexity and power consumption.



Figure 3.4: 6" FR4 channel. Simulated 16-Gb/s (a) timing and (b) voltage margin for 16-Gb/s UD and 32-Gb/s SBD operation modes, including SBD without and with echo cancellation (EC). Reprinted with permission from [16].

However, SBD systems should also consider the echo signal that can be obtained from the channels  $S_{11}$  response. Fig. 3.3(b) shows that the NE echoes are quite large and occur in the first 2 unit intervals (UIs), while the FE echoes experience the round-trip channel delay and are more attenuated and dispersed over several UIs. Since the echoes will also pass through the CTLE before sampling, this should also be considered (Fig. 3.2 (a)). Unfortunately, while the CTLE improves the main signal, it does not significantly impact the echoes. These echoes in 32-Gb/s SBD mode only allow for operation with a



Figure 3.5: 6" FR4 channel. Simulated 16-Gb/s (a) timing and (b) voltage margin with various equalizer configurations for 32-Gb/s SBD operation modes with the echo cancellation enabled. Reprinted with permission from [16].

 $BER = 10^{-8}$ . Given that the echoes are due to the known transmitted data, an effective way to cancel them is by passing this same signal through an FIR filter. Utilizing an EC FIR filter with 2 NE taps and 4 FE taps allows for operation in 32-Gb/s SBD mode at a  $BER < 10^{-12}$ . Two taps are sufficient for the NE echo signal, while the FE tap span is chosen to adequately cover the dispersion of the echoes that occurs over a maximum 6" channel with reasonable system complexity. System simulations show an improvement of less than 1-mV eye height when the FE tap span is increased up to 6 taps. In order to robustly support channels with varying lengths, the EC filter tap locations should be programmable and their values adaptively set. As shown in Fig. 3.5(a) and (b), adding further 2-tap TX FFE and 1-tap RX decision feedback equalization (DFE) only provides marginal performance improvement when operating over the 10.2dB loss channel. Thus, the transceiver architecture is chosen with only a receive-side CTLE, which has -2-dB lowfrequency gain, 5-dB total peaking, and 2.5- $mV_{rms}$  output-referred noise, that is followed with 2 NE and 4 FE EC taps to enable SBD operation with low-complexity and power consumption.



Figure 3.6: Multi-channel source-synchronous SBD transceiver with adaptive echo cancellation (simplified single-ended schematic). Reprinted with permission from [16].

Fig. 3.6 shows the proposed 32-Gb/s SBD transceiver architecture in a conceptual multi-channel system configuration where 16-Gb/s flows in both directions simultaneously over each differential channel. Either side of the sourcesynchronous transceiver can serve as the master role to forward a unidirectional quarter-rate clock signal to the slave chip over the additional clock channel. At the master side, a differential quarter-rate input clock is fed to an injection-locked oscillator (ILO) to generate the four quadrature clocks used for 16:1 transmit serialization and as the input clock phases for a 5/4X CDA system. This process is similar at the slave side with the forwarded-clock signal buffered and applied to the ILO to generate the four quadrature clock phases.

After data serialization, a voltage-mode output driver includes a sensing resistor to provide the necessary signals to an R-gm hybrid that performs inbound and outbound signal separation. The main and post-cursor data from the main serializer drive two NE EC FIR filter taps, while a parallel serializer provides the data for the four FE taps with a tunable delay from 2 to 48-UIs to handle varying channel lengths. These EC signals are current-mode combined in the load of the CTLE that follows the R-gm hybrid stage. The resultant signal is then sampled by the quarter-rate data samplers. These samplers are also utilized to foreground adapt the six EC taps with an SSLMS algorithm.

## 3.3 Transmitter



Figure 3.7: SBD transmitter with echo cancellation data generation. Reprinted with permission from [16].

A detailed block diagram of the SBD transmitter is shown in Fig. 3.7. The main data path serializes 16 bits of parallel input data in three stages, a two-stage 16:4 multiplexer,

and a final 4:1 serializer driven by 25% duty cycle quarter-rate pulseclock signals [27]. These pulse-clock signals are generated by passing adjacent quadrature-spaced clocks through AND and OR gates. Inverter buffers that precede this logic have programmable p-n strength and capacitive DACs to compensate for duty cycle and quadrature-phase spacing errors, respectively. While performed manually in this prototype, automated duty cycle and quadrature-phase correction are possible with a foreground calibration scheme [27]. A low-swing voltage-mode driver that is powered by a regulator then drives the full-rate signal onto the channel and also produces the necessary signals for inbound/outbound signal separation. Two additional parallel 4:1 serializers also generate the full-rate data for the two  $N_{0-1}$  NE EC taps, which occur at the main cursor and first post-cursor positions. In order to support operation over different channel lengths, the FE EC data are generated by passing the 16 parallel input bits through a programmable delay generation block. Static 48:1 output muxes select the appropriate 16-bit group for serialization with the same topology used in the main path. Four final 4:1 serializers generate the full-rate data for the four  $F_{0-3}$  FE EC taps, which can be programmed to occur at a delay of 2 to 48-UIs relative to the main cursor.

An important voltage-mode driver consideration is the termination scheme that provides minimum power consumption. As shown in Fig. 3.8, a conventional UD voltagemode driver has minimum current consumption with differential termination. In this case, the peak current is ideally equal to the average current at  $V_{ppd}/(4Z_0)$  and the peak-to-peak differential swing  $V_{ppd}$  is equal to the regulated supply  $V_s$ . However, implementing differential termination in an SBD transceiver with controlled impedance voltage-mode drivers on both sides of the channel is more difficult. This requires increasing the driver output impedance to  $2Z_0$  and adding  $4Z_0$  differential termination, resulting in double the peak current as the UD voltage-mode driver and half the effective signal swing when the inbound/outbound signal separation is considered. Interestingly, single-ended termination



Figure 3.8: UD and SBD voltage-mode driver termination comparison. Reprinted with permission from [16].

is more efficient in an SBD system. While it has the same peak current as differential termination, it has double the effective signal swing. SBD single-ended termination also displays the same average current as a conventional UD voltage-mode driver although it is not constant. Hence, the proposed SBD transceiver utilizes single-ended termination with the output impedance of the voltage-mode drivers acting as the channel termination.

As shown in Fig. 3.9(a), a low-swing voltage-mode output driver topology is modified to include a digitally calibrated series sensing resistor ( $R_S$ ), allowing for reduced current consumption relative to the previous CM implementations [10]. This sensing resistor enables extraction of the local receivers desired inbound signal ( $V_{ib}$ ), which represents the



Figure 3.9: (a) VM driver with sensing resistor  $R_S$  for the R-gm hybrid. All transistors employ minimum 30-nm length. (b) Digital on-chip resistor calibration loop. Analog control loops for the (c) pull-up and (d) pull-down total output impedance. Reprinted with permission from [16].

signal transmitted from the other side of the channel, from the total signal present at the transmitter output  $(V_i)$ . By observing the voltage across this resistor, the inbound voltage

can be determined as

$$V_{ib} = \frac{(1 + Z/R_S)V_i - (Z/R_S)V_{is}}{2}$$
(3.1)

where  $R_S$  is set to be half the nominal 50 $\Omega$  channel impedance.

Both the sensing resistor and the total driver output impedance need to be accurately controlled to avoid any errors in signal separation by the R-gm hybrid and to have an impedance-matched channel throughout the link for minimal echoes. To accomplish this, a global passive resistor digital calibration loop provides the proper codes at start-up to set  $R_S$  to 25 $\Omega$  and the passive resistors used in two analog control loops to either (N)25 $\Omega$  or (N)50 $\Omega$ . This foreground calibration loop uses an offset-corrected dynamic comparator that has the same design as the high-speed receiver comparators (see Fig. 3.11). While the foreground passive resistor calibration loop sets  $R_S$  to  $25\Omega$ , the remainder of the total  $50\Omega$  pull-up and pull-down impedances from the driver transistors is set by the two analog control loops of Fig. 3.9(c) and (d). These analog loops utilize scaled replica drivers to produce the gate voltages,  $V_{zcp}$  and  $V_{zcn}$ , for the top M1 and bottom M4 transistors to, respectively, set the total pull-up and pull-down impedances to 50 $\Omega$  for matching to the channel. While not employed in this design in order to save area, further improvement in high-frequency matching is possible with passive networks, such as T-coils [28]. An on-chip regulator (see Fig. 3.7) produces the low-swing voltage-mode driver output stage supply  $(V_{REG})$  to allow for a programmable effective output swing between 100 and  $200-mV_{ppd}$  for low-power operation. It also provides filtering from uncorrelated power supply noise between the data and clock channels. The regulator employs a PMOS-input error amplifier and is designed to achieve 10-MHz gain bandwidth and >20-dB PSRR and support the peak 2-mA current during transient operation.

While a voltage-mode driver has an advantage over a current-mode implementation in terms of power efficiency, the proposed voltage-mode output impedance generally varies more with the output signal level due to the signal path transistors setting nominally 50% of the total 50- $\Omega$  value. The impact of this is modeled with the equivalent circuit model of two voltage SBD drivers, one on the left side and the other on the right side of a channel, as shown in Fig. 3.10. The left/right-side drivers are modeled with a regulated supply  $2V_{L/R}$ , a pull-up impedance consisting of the series combination of transistor component  $Z_{UL/R}$  and  $R_S$ , and a pull-down impedance consisting of the series combination of transistor component  $Z_{DL/R}$  and  $R_S$ . Relative to Eq.( 3.1), a more detailed expression of the left-side inbound signal is Eq. ( 3.2).

$$V_{ib} = V_L - \frac{3(V_L - V_R)(2R_S + Z_{UL} + Z_{DL}) - 2(V_L - V_R)(Z_{UL} + Z_{DL})}{4R_S + Z_{UL} + Z_{DL} + Z_{UR} + Z_{DR}}$$
(3.2)



Figure 3.10: Equivalent SBD schematic. Reprinted with permission from [16].

For the ideal case when  $Z_{UL/R} = Z_{DL/R} = R_S$ ,  $V_{ib}$  is equal to the nominal rightside transmitted voltage of  $V_R$ . However, an effective error voltage results when these impedances vary. Simulation results show that  $Z_U$  and  $Z_D$  vary near  $\pm 10\%$  over the drivers total output voltage range in the SBD mode. However, these impedance variations are complementary, with  $Z_{UL}$  increasing when a single-ended high voltage level and  $Z_{DL}$ decreasing when a complementary single-ended low voltage level is observed on their respective output pads. This results in a relatively constant summation of  $Z_{UL}$  and  $Z_{DL}$  over the SBD signal range. From Eq.( 3.2), this relatively constant summation of pull-up and pull-down impedances allows for a simulated residual inbound signal error less than  $\pm 1mV$  when the drivers are configured with a maximum  $2V_{L/R} = 200mV$ .

### 3.4 Receiver



Figure 3.11: SBD receiver data path circuitry. All transistors have minimum 30-nm length except where mentioned. Reprinted with permission from [16].

Fig. 3.11 shows the SBD receiver data path circuitry. The first main block includes the transconductance stages of the R-gm hybrid that are dc-coupled to the voltage-mode transmitter and combine the signals at the two sides of  $R_S$ . Independent tuning is implemented

in the tail currents of both the  $V_i$  and  $V_{is}$  transconductance cells in order to compensate for any variation in the sensing resistor setting. A CTLE stage with adjustable peaking follows to compensate for channel loss near 10-dB. This CTLEs zero position and peaking value is set by manually tuning the degeneration impedance with 5-bit binary-capacitor and 3-bit binary-resistor DACs.



Figure 3.12: Simulated 32-Gb/s SBD operation over the 6" FR4 channel: (a) SBD eye diagram at CTLE output with echo cancellation activated. (b) Echo signal when received signal is deactivated. Reprinted with permission from [16].

The six EC FIR taps, whose tail current weights are set by the SSLMS adaptation block with  $10-\mu A$  resolution, are summed at the CTLE output to produce an open 32-Gb/s SBD eye diagram [see Fig. 3.12(a)] in the presence of the uncorrected echoes observed in Fig. 3.12(b). This is quantified over data rate in Fig. 3.13(a), where the uncorrected echo rms value scales from 16.8 to  $21.9-mV_{rms}$  as the SBD data rate increases from 10 to 32-Gb/s. Applying the error correction taps reduces this to between 1.2 and  $2.0-mV_{rms}$ . While it is possible to perform EC at the CTLE input and avoid the CTLE amplification, this design performs cancellation at the CTLE output for better compatibility with the high-speed EC FIR filter resolution. An important consideration is the timing between



Figure 3.13: Simulated 32-Gb/s SBD operation over the 6" FR4 channel: (a) Echo rms value versus data rate without and with echo cancellation activated. (b) Eye width and (c) height for different relative TX and RX timing. Reprinted with permission from [16].

the received data sampling and the transmitted data that sets the EC FIR filter timing. As discussed in more detail in Section 2.4, the EC taps are adapted based on the error signal observed at the receiver sampling position in order to provide the best values for a given timing relationship. This is illustrated in the circuit-level simulated timing and voltage margin plots of Fig. 3.13(b) and (c), where an open eye is achieved over a  $\pm 0.5$ -UI range when the EC taps are optimized at every relative sampling position. The system is also somewhat robust to timing shifts in operation without re-optimization of the taps, with the relative timing and voltage margins respectfully degraded by 0.1-UI and 10-mV over a 0.3-UI range.

Another important consideration is the R-gm hybrid input amplifier linearity, given that

there can be a superimposed signal with a 200-mV total amplitude at the SBD transceiver input. Fig. 3.14 quantifies this with system simulations utilizing a third-order compressive amplifier model. In order to achieve less than 10% degradation in eye height, a 190-mV 1dB compression input amplitude is required for the R-gm input stages. Finally, mismatches in the R-gm input amplifier and the driver output termination result in an effective signal offset. Monte Carlo simulations show a 3.6- $mV_{rms}$  residual error at the CTLE output from this. This effect is partially corrected by the offset correction DACs in the subsequent samplers.



Figure 3.14: Normalized eye height versus R-gm hybrid 1-dB compression input amplitude. Reprinted with permission from [16].

The CTLE output serves as the input to five quarter-rate samplers followed by SR latches that produce four parallel received data bits and one-phase sample for the 5/4X CDA phase detector. All the dynamic samplers [12] have offset correction that is implemented with a differential pair controlled by a 7-bit voltage DAC that is in parallel with the input stage. This offset correction voltage DAC is realized by steering current onto two matched resistor strings to generate a differential voltage with the same nominal com-

mon mode as the main sampler input pair. The offset correction in this article is manually performed at start-up by adjusting the correction code to generate an equal probability of 1s and 0s at a given samplers output when the RX input is set to the common-mode value. While not implemented in this prototype, if the data have uniform statistics, then the offset correction could also be done in an automated background manner by adding an accumulator to monitor the samplers output to drive the correction code. The samplers differential kickback is limited to a simulated  $0.5 \cdot mV_{pp}$  by utilizing small input transistors and symmetric layout techniques.



Figure 3.15: 5/4X phase tracking CDA system. Reprinted with permission from [16].



Figure 3.16: Receiver of the clock channel. Reprinted with permission from [16].

The 5/4X phase tracking CDA system shown in Fig. 3.15 is utilized to set the samplers timing position in a manner that is robust to voltage and temperature variations. A quarter-rate clock, which is either the differential input clock at the master side or the clock signal forwarded to the slave side utilizing the same voltage-mode driver as the data signals, is injected into an ILO [27] via ac-coupled buffers to create IQ phases that are calibrated by a low-overhead quadrature-locked loop (QLL) [29]. The receiver of the clock channel is shown in Fig. 3.16. These IQ phases feed into four-parallel PIs controlled by the 5/4X first-order phase tracking CDA system that saves power by reducing the number of samplers from eight to five [24]. An oversampling clock generator, consisting of matched-delay buffers and inverter pairs statically interpolating between the quadrature phases, produces the equally spaced data and edge clocks for the samplers. Phase errors are compensated with tunable buffers at the output of this block. As shown in the timing diagram (Fig. 3.17), the CDA logic rotationally selects two consecutive data samples via a 4:2 MUX and the corresponding edge clock to produce the edge sample via a 4:1 MUX. In Fig. 3.18, these data and edge samples are passed through a bang-bang phase detector whose output is further deserialized and passed to a digital accumulator with programmable depth, realizing a first-order phase tracking loop [30]. While the maximum

target bandwidth is 1 MHz with the CDA system continuously running, the loop can be periodically disabled to save power and still track low-frequency temperature drifts.



Figure 3.17: Timing diagram of data and edge phases. Reprinted with permission from [16].



Figure 3.18: Edge rotating 5/4X CDA. Reprinted with permission from [16].

The digital FSM controls the PIs with four independent 7-bit registers to provide appropriately deskewed clocks to the samplers. The 2-bit MSB outputs select the input clocks to the CMOS PIs, while the remaining 5-bit outputs that control the phase position are converted to thermometer code for improved linearity over the 90° phase range. Hence, the complete 7-bit digital output covers a full 360° phase rotation.

### 3.5 Adaptive Echo Cancellation

As discussed in Section II, echoes from the transmitted signals cause significant BER margin reduction in the SBD mode operation. Given that the NE and FE echoes can vary with different packaging approaches and channel lengths, adaptive tuning of the EC FIR filter taps is necessary to support the operation over a range of channels. In order to achieve this, Fig. 3.19, Fig. 3.20, and Fig. 3.21 shows the foreground tap adaptation scheme with the transmitters on the adjacent channel sides operating in the UD mode and alternating between transmitting normal data and being configured as a static termination network.



Figure 3.19: UD mode. Reprinted with permission from [16].



Figure 3.20: SBD mode. Reprinted with permission from [16].

| State         | Master |     |               | Slave |     |               |
|---------------|--------|-----|---------------|-------|-----|---------------|
|               | ТΧ     | CDA | EC Adaptation | ΤХ    | CDA | EC Adaptation |
| State 1 (UD)  | on     | off | off           | term  | on  | off           |
| State 2 (UD)  | term   | on  | off           | on    | off | on            |
| State 3 (UD)  | on     | off | on            | term  | off | off           |
| State 4 (SBD) | on     | on  | off           | on    | on  | off           |

Figure 3.21: Foreground echo cancellation tap adaptation state sequence. Reprinted with permission from [16].

Since the EC tap values are a function of the receiver sampling position, the CDA systems on both sides must first obtain the lock. State 1 (shown in Fig. 3.22) in the foreground scheme sets the nominal sampling position for the right-side slave receiver by having the left-side master driver deliver a random data signal onto the channel and the right-side driver set in a termination configuration. Since the right-side driver is quiet, the

received data are free of echoes and the CDA system can lock at the nominal sampling point. State 1 ends when the mean values of the CDA PI codes stabilize, which occurs in under 1- $\mu$ s. State 2 (shown in Fig. 3.23) then repeats this operation with the opposite transmitter configuration to set the nominal sampling position for the left-side receiver. In addition, since the left-side transmitter is only set in a termination configuration and is not transmitting data, the right-side receiver is only receiving its own echo signals. The right-side CDA system is frozen and these echoes are sampled as error signals to drive the SSLMS adaptation of the EC tap values for the right side with the NE tap positions fixed at the main and first post-cursor positions and the FE tap positions manually set for a given channels roundtrip delay. These FE tap positions are estimated utilizing the transmission line trace length and board dielectric properties to sufficient accuracy relative to the minimum 62.5-ps bit period.



Figure 3.22: State 1 and 3. Reprinted with permission from [16].

Utilizing separate digital supplies for the EC adaptation and CDA digital logic allows the forwarded-clock system to maintain the nominally correct sampling position during this period when the right-side CDA system is frozen. State 2 ends when the mean values of the EC taps stabilize, which occurs in under 20- $\mu$ s. State 3 (shown in Fig. 3.22) applies



Figure 3.23: State 2. Reprinted with permission from [16].

the opposite transmitter configuration to repeat this EC tap adaptation for the left side. This state also ends when the mean values of the tap adaptation stabilize, which occurs in under 20- $\mu$ s. Finally, in State 4 (SBD mode shown in Fig. 3.20), the EC taps are frozen and both sides are configured in the SBD operation mode with simultaneous data transmission from both sides.



Figure 3.24: Echo cancellation tap adaptation hardware. Reprinted with permission from [16].

More details of the SSLMS adaptation hardware are shown in Fig. 3.24 and Fig. 3.25. In the UD mode, the echoes that are sensed by the R-gm network and propagate through



Figure 3.25: SSLMS hardware. Reprinted with permission from [16].

the CTLE are sampled as error signals by the normal quarter-rate data samplers. These error samples are further deserialized for input to the synthesized digital SSLMS block operating at 1-GHz. This adaptation block employs a programmable 7-bit accumulator depth to update the two NE and four FE EC tap current values that subtract the echo signals through current summation at the CTLE load. The update rate is chosen to trade off adaptation time with convergence stability. While the overall step is always 1 LSB of the EC tap DACs, this does not happen until the programmable accumulator threshold is exceeded. A nominal 6-bit accumulator depth is utilized to allow convergence over both a 2- and 6-in SBD channel, with more details shown in Section 3.6.

#### **3.6 Experimental Results**

Fig. 3.26 shows the micrograph of the SBD transceiver prototype chip, which was



Figure 3.26: Chip micrograph of the 32-Gb/s SBD transceiver. Reprinted with permission from [16].

fabricated in a 28-nm CMOS process. The chip contains one clock and two data lanes that are placed close to the pads. Each data lane occupies  $0.09 \ mm^2$ , while the common circuitry shared between the two data lanes has an area of  $0.092 \ mm^2$ . The master-side differential quarter-rate clock comes in from the left of the chip through a clock buffer and is routed to the IQ generator placed in the middle of the data lanes. The clock phases from the IQ generator are then routed to the lanes where they are used in the serializers and the CDAs that reside on top of the data lanes. System timing margin testing is achieved by manually overriding the CDA PI codes and multiplexing recovered data out of the chip at a 1/16 rate for BER testing. An on-chip digital-to-analog converter (DAC) is also included that serves as an analog monitor for the convergence behavior of the EC tap coefficients.

The performance of the voltage-mode driver with the R-gm sensing resistor is first verified in the UD mode with the setup shown in Fig. 3.27. A chip-on-board scheme is utilized with the die directly wirebonded to the FR4 board and the transmitter driving 1.35-in differential traces. In addition to this inherent test board channel, testing is per-



Figure 3.27: Voltage-mode driver test setup. Reprinted with permission from [16].

formed with an additional channel board that has 2- and 6-in traces (see in Fig. 3.28, Fig. 3.29, and Fig. 3.30). The voltage-mode driver has healthy 16-Gb/s eye margins when it is configured for a  $200 \text{-}mV_{ppd}$  output swing and drives the inherent test board channel. Operating with the additional 6in channel completely closes this eye and requires further equalization with the receiver CTLE for adequate BER performance.



Figure 3.28: 16-Gb/s eye diagram with no channel. Reprinted with permission from [16].



Figure 3.29: 16-Gb/s eye diagram with 2" channel. Reprinted with permission from [16].



Figure 3.30: 16-Gb/s eye diagram with 6" channel. Reprinted with permission from [16].



Figure 3.31: SBD transceiver test setup and channel response. Reprinted with permission from [16].

SBD transceiver BER measurements at 32-Gb/s were performed using two chips simultaneously communicating over either 2- or 6-in FR4 PCB traces with 4.4- and 10.2-dB loss at 8-GHz, respectively, as shown in Fig. 3.31. As discussed in Section 3.5, the EC taps are adapted in a foreground manner by activation of the master and slave chips consecutively in the UD transmission mode. Utilizing the on-chip monitor DAC, Fig. 3.32 shows the convergence of the EC tap weights from the middle of the signed binary 6-bit range for the twochannel cases. The  $N_{0-1}$  weights cancel NE echoes at the main and first post-cursor positions, while the  $F_{0-3}$  weights cancel FE echoes that experience a roundtrip delay of 9 and 28-UIs for the 2- and 6-in channels, respectively. After convergence of the EC taps, which occurs in under 20- $\mu$ s for both channels, the codes are frozen for SBD communication mode. The converged EC tap values indicate that the echoes are stronger for the shorter 2-in channel where the reflections are less attenuated due to the lower loss.



Figure 3.32: Measured 16-Gb/s UD and 32-Gb/s SBD timing and voltage bathtub curves operating over (a) 2" and (b) 6" channels. Reprinted with permission from [16].

As shown in the measured BER bathtub curves in Fig. 3.33, adaptive EC is a necessity for SBD signaling. While healthy  $BER < 10^{-12}$  margins are present for both channels in a 16-Gb/s UD mode, a BER floor near  $BER < 10^{-7}$  and  $BER < 10^{-8}$  is present in a 32-Gb/s SBD mode without EC for the 2- and 6-in channels, respectively. The BER is actually worse for the 2-in channel without EC due to the less attenuated reflections. Enabling the adaptive EC allows for a  $BER < 10^{-12}$  for both channels. By operating the CDA system in open loop and sweeping the PI codes, a timing margin of 0.375 and 0.0625-UI is achieved for the 2- and 6-in channels, respectively. A voltage margin of 8 and 2-mV is, respectively, achieved for the 2- and 6-in channels by operating the CDA system in a locked position and sweeping the samplers offset control. As the echoes are mostly canceled now, the 6-in channel has reduced margins due to more residual ISI.


Figure 3.33: Measured 16-Gb/s UD and 32-Gb/s SBD timing and voltage bathtub curves operating over the (a) 2" and (b) 6" channels. Reprinted with permission from [16].

Jitter tolerance (JTOL) measurements were performed using the 6-in channel with the CDA system enabled. Fig. 3.34 shows the JTOL test setup where a quarter-rate stressed clock is input to the left master chip that forward the 4-GHz clock to the right slave chip. The source-synchronous system allows for very high-frequency jitter tolerance, with 0.2 and  $0.1-UI_{pp}$  achieved with 150-MHz sinusoidal jitter in the 16-Gb/s UD and 32-Gb/s SBD mode, respectively. Note that 150-MHz was the jitter frequency upper bound of the test equipment and it is expected that the JTOL performance will degrade further at higher sinusoidal jitter frequencies. Given that there is a round-trip channel skew between the master-side receiver and the forwarded clock used for transmission at the slave side, it is expected that the master-side performance will have degraded high-frequency jitter tolerance.

erance. At 32-Gb/s SBD operation, this clocking architecture should provide master-side jitter tracking benefits for jitter frequencies up to 280 and 90-MHz for the 2- and 6-in channel, respectively [31]. Further improvements are possible by modifying the architecture to forward clocks from both chips. This would also allow the support of ppm frequency differences between the two chips due to reference crystal offsets and spread spectrum clocking.



Figure 3.34: SBD transceiver jitter tolerance testing. (a) Setup. (b) Measured results. Reprinted with permission from [16].

| References              | [9]                    | [23]                             | [8]                          | [6]                          | [10]                           | [32]                 | [33]                     | This Work              |
|-------------------------|------------------------|----------------------------------|------------------------------|------------------------------|--------------------------------|----------------------|--------------------------|------------------------|
|                         | 0.18-um                | 0.35-um                          | 0.18-um                      | 0.18-um                      | 0.11-um                        | 0.18-um              | 0.13-um                  | 28-nm                  |
| recuriorogy             | CMOS                   | CMOS                             | CMOS                         | CMOS                         | CMOS                           | CMOS                 | BiCMOS                   | CMOS                   |
| Throughput              | 6.4-Gb/s/pin           | 8-Gb/s/pin                       | 5-Gb/s/pair                  | 8-Gb/s/pair                  | 20-Gb/s/pair                   | 4-Gb/s/pair          | 24.3-Gb/s/pin            | 32-Gb/s/pair           |
| Data-Rate               | 3.2-Gb/s               | 4-Gb/s                           | 2.5-Gb/s                     | 4-Gb/s                       | 10-Gb/s                        | 2-Gb/s               | 24-Gb/s<br>0.3125-Gb/s   | 16-Gb/s                |
| A whitechture           | TX: VM<br>Sub:         | TX: VM<br>Sub:<br>Bidirectional  | TX: VM<br>Sub: SC<br>based & | TX: CM<br>Sub: SC<br>based & | TX: CM<br>Sub: P.am            | TX: CM<br>Sub : DIB  | TX: VM/CM<br>Sub.:       | TX: VM<br>Sub: P.am    |
|                         | Reference<br>selection | reference &<br>Replica<br>driver | Replica<br>driver            | Replica<br>driver            | hybrid                         | uruuc<br>hybrid      | Filter/Replica<br>driver | hybrid                 |
| Equalization            | No                     | No                               | No                           | 2-tap FFE<br>@ TX            | 2-tap FFE @<br>TX<br>CTLE @ RX | No                   | CTLE@ RX                 | CTLE@ RX               |
| Echo<br>Cancellation    | No                     | No                               | No                           | No                           | No                             | No                   | No                       | 2 NE taps<br>4 FE taps |
| Total Loss<br>@ Nyquist | 1                      | I                                | I                            | 0.6dB                        | 5dB                            | I                    | 18.3dB                   | 4.4dB,<br>10.2dB       |
| Area $(mm^2)$           | ı                      | 0.11                             | 0.9                          | 0.13                         | 1.02                           | 0.0013               | 0.23                     | 0.182                  |
| Supply                  | 1.75V                  | 3.3V                             | 1.8V                         | 1                            | 1.2V                           | 1.8V                 | 2.5V                     | V0.0                   |
| Power                   |                        | TX: 96-mW                        | Transceiver:                 | Transceiver:                 | TX: 126-mW                     | TX driver            | Forward TX/RX            | Transceiver:           |
| (Single Slide)          | 1                      | RX: 16-mW                        | 120-mW                       | 158-mW                       | RX: 140-mW                     | and nyonu.<br>1.9-mW | 456-mW                   | 29.2-mW                |
| Power                   |                        | Q                                | 40                           | 200                          |                                | 90.0                 |                          | 00 1                   |
| Efficiency<br>(mW/Gb/s) | I                      | 87                               | 48                           | C.95                         | 70.0                           | c <u>6</u> .0        | 18./0                    | 1.83                   |

Table 3.1: SBD transceiver performance comparisons. Reprinted with permission from [16].

Fig. 3.35 shows the 32-Gb/s SBD transceiver power breakdown. The transceiver consumes 29.2-mW, with the EC circuitry accounting for 21% of the total power. Table 3.1 summarizes the SBD transceiver performance and compares it with other SBD transceivers operating above 4-Gb/s. Utilizing the adaptive EC in combination with the CTLE allows for support of the highest 10.2-dB channel loss with symmetric data rates, while employing the efficient low-swing voltage-mode driver allows for excellent power efficiency at the highest 32-Gb/s data rate.



Figure 3.35: 32-Gb/s SBD transceiver power breakdown. Reprinted with permission from [16].

While the presented 28-nm transceiver circuits can operate at near double the achieved 16-Gb/s unidirectional data rate with some degradation in power efficiency, ultimately what limits significant data rate scaling in the SBD mode is the channel characteristics. Adding improved passive matching structures, such as T-coils, will allow for operation at higher data rates due to the improved return loss. However, a more complex architecture is necessary in order to support channels with higher insertion loss and more complex echo

characteristics due to impedance discontinuities along the channel.

Utilizing an ADC-based receiver that digitizes the total signal at the chip interface and performs signal separation, EC with multiple banks of FIR filters placed in roaming locations to account for echoes at various locations along the channel, and equalization with FFE and DFE is one option. This ADC-based architecture could potentially scale to above 100-Gb/s per channel in the SBD mode, which each side operating at a moderate 50-Gb/s data rate.

### 3.7 Conclusion

This article has presented a multi-lane 32-Gb/s source-synchronous SBD transceiver implemented in 28-nm CMOS. A low-power design is realized by utilizing an efficient voltage-mode driver with an R-gm hybrid for signal separation, combining the CTLE and EC in a single stage, and employing a low-complexity 5/4X CDA system. Support of a wide range of channels is possible with foreground adaptation of the EC FIR filter taps with an SSLMS algorithm. This transceiver allows for an effective doubling of the perpin bandwidth in SBD mode and also has the flexibility to support unidirectional mode transmission for varying bandwidth demands.

# 4. OPTICAL TRANSMITTER<sup>2 3</sup>

# 4.1 A Directly Modulated Quantum Dot Microring Laser Transmitter with Integrated CMOS Driver

#### 4.1.1 Introduction

Data centers and high-performance computing are major driving forces behind increasing bandwidth and volume demand for high-speed optical interconnects. With volume CMOS foundries entering the silicon photonics market, the advantages in integration density and cost offered by a silicon integration platform will soon stand out. Among all the key photonic components on silicon, the laser source remains a critical factor that has a large impact on system power consumption, signal integrity, reliability, reach, and cost. Heterogeneously integrated III-V-on-silicon lasers are arguably the best overall solution for on-chip sources so far due to their seamless integration with silicon photonic circuits and the potential to pass all qualification processes in volume production [34]. While traditional quantum-well (QW) structures have served as the primary active region design, recently we successfully demonstrated several quantum dot (QD)-based heterogeneous lasers [35, 36]. The QD material properties offers superior optical gain stability at high temperature, large optical gain bandwidth, low threshold current density, and immunity to material defects and optical feedback. Directly modulated microring resonator lasers are our choice to build a low- $\lambda$  count wavelength-division multiplexing (WDM) optical inter-

<sup>&</sup>lt;sup>2</sup>©2019 IEEE. Part of this chapter 4 is reprinted, with permission, from Y.-H. Fan, D. Liang, A. Roshan-Zamir, C. Zhang, B. Wang, M. Fiorentino, R. Beausoleil, and S. Palermo, "A Directly Modulated Quantum Dot Microring Laser Transmitter with Integrated CMOS Driver," *2019 Optical Fiber Communication Conference and Exhibition (OFC)*, San Diego, CA, USA, 2019, pp. 1-3.

<sup>&</sup>lt;sup>3</sup>©2020 IEEE. Part of this chapter 4 is reprinted, with permission, from Y.-H. Fan, S. Srinivasan, Y. Hu, D. Liang, R. Liu, A. Kumar, E. Li, Z. Huang, R. Beausolei, S. Palermo, "A 22 Gb/s Directly Modulated Optical Injection-Locked Quantum-Dot Microring Laser Transmitter with Integrated CMOS Driver," 2020 IEEE International Symposium on Circuits and Systems (ISCAS), Seville, Spain, May 2020.

connect due to the devices inherent WDM functionality providing a compact transmitter solution [37, 38]. Employment of QD material further reduces power consumption and enhances operation robustness.

This section presents for the first time, to the best of our knowledge, a heterogeneous QD photonic transmitter based on a microring laser that is directly modulated with a high-speed CMOS driver. The CMOS driver implements asymmetric 2-tap feedforward equalization (FFE) in the output driver to compensate for the non-linear optical dynamics present with direct laser modulation. This allows for significant high-speed performance improvement and an open 12-Gb/s eye diagram, which is presently the highest NRZ direct modulation speed of O-band QD lasers on silicon. A microring laser co-simulation model is also developed that matches well with experimental data and provides the ability to further optimize the driver circuitry for future device design iterations.

#### 4.1.2 Quantum Dot Microring Laser



Figure 4.1: Quantum dot microring laser device: Schematic 3-D and cross section views. Reprinted with permission from [39]

As shown in the schematic views of Fig. 4.1, the microring laser consists of a GaAsbased epitaxial layer structure on a silicon-on-insulator (SOI) substrate that is fabricated with a heterogeneous integration process. The 50- $\mu$ m diameter device has a curved Si bus waveguide that couples a fraction of the optical power out. Eight layers of InAs QDs in the active region provides enough optical gain to enable a 2-mA threshold at room temperature (Fig. 4.2). This is 5X lower than previously demonstrated QW counterparts [38]. Up to 70 °C continuous-wave (CW) lasing in the O band is also observed, with still potential for large improvement. While the laser cavity length-determined 3-nm free spectral range is much smaller than the QD optical gain bandwidth due to inhomogeneous broadening, a good side mode suppression ratio (SMSR) of over 30-dB is achieved (Fig. 4.3). Future smaller dimensions and further optimized designs can enhance the FSR and SMSR to favor a WDM transmitter design.



Figure 4.2: Quantum dot microring laser device: CW LIV characteristic. Reprinted with permission from [39]

Accurate CMOS driver design is possible with a developed QD microring laser model



Figure 4.3: Quantum dot microring laser device: laser spectrum at room temperature. Reprinted with permission from [39]

that includes both electrical equivalent circuit and optical dynamics components (Fig. 4.4). The models electrical part describes the laser junction, represented by  $C_j$  and  $R_j$ , an equivalent series  $R_s$ , and the parasitic capacitance  $C_p$  between the P and N pads. Optical dynamics are captured with a junction current dependent RLC equivalent circuit ( $R_{VL}$ ,  $L_{VL}$ , and  $C_{VL}$ ) that is driven by a current-control-voltage-source  $\eta(I_{R_j} - I_{th})$ . The  $L_{VL}$  and  $R_{VL}$  are  $\frac{1}{4\pi^2 C_{VL} D^2(I_{R_j} - I_{th})}$  and  $(Kf_r^2 + \gamma_0)L_{VL}$ , respectively [17]. Measured  $S_{11}$  curves are utilized to determine the electrical parameters by curve fitting the low-pass RC electrical model. The optical parameters are then extracted by curve fitting measured  $S_{21}$  curves (Fig. 4.5). This complete QD microring laser model is employed to design asymmetric FIR filters to compensate the QD bandwidth limitations when operated at a certain bias current.



Figure 4.4: The QD microring laser model and extracted parameters. Reprinted with permission from [39]



Figure 4.5:  $S_{21}$  Curve fitting results at 15-mA and 22-mA. Reprinted with permission from [39]



Figure 4.6: QD microring laser transmitter block diagram. Reprinted with permission from [39]

#### 4.1.3 Driver Architecture

The CMOS half-rate QD microring laser transmitter architecture is shown in Fig. 4.6 [36]. This transmitter directly modulates the QD microring laser with a high-speed output driver that consists of three parallel differential stages that steer current between the laser and a dummy diode-connected stacked transistor load. In addition, a DC bias current source sets the minimum amount of current that flows through the laser. All of the high-speed tail current sources and the DC bias current are programmable with 4-bit resolution. Based on the developed QD model, the lasers bias current has a maximum 20-mA setting and is optimized for a given bandwidth. This bandwidth is extended with the 2-tap asymmetric FFE that is realized with the three parallel differential current steering stages. The main cursor data bit drives the left-most differential pair that has a 40-mA maximum current setting. In order to compensate for the lasers asymmetric rising and falling-edge responses, the polarity of the first post-cursor bit is used to determine whether the rising or falling-edge differential pair steers current through the laser. While these currents both have a 10-mA maximum setting, they have independent weight controls and can be independently configured to yield either a high-pass or low-pass transient response. The transmitted data is generated on-die with a PRBS generator that provides 8 parallel bits

that are serialized to full rate with three stages of 2:1 muxes. At the 4-bit parallel stage, the first post-cursor bit is separated from the main path to allow for adequate timing margin for the asymmetric FFE logic.

# 

#### 4.1.4 Simulation and Measurement Results

Figure 4.7: Hybrid-integrated QD microring laser transmitter prototype. Reprinted with permission from [39]

The driver is fabricated in a 65-nm CMOS process and utilizes a 1.2-V supply. As shown in Fig. 4.7, a hybrid chip-on-board integration approach is employed to place the QD microring laser and CMOS driver dies in close proximity to minimize the wire-bond lengths between the chips. Utilizing the developed QD microring laser co-simulation model, Fig. 4.8 shows the simulated 12-Gb/s optical eye diagram before and after applying the FFE equalizer. Activating the asymmetric 2-tap FFE allows for opening of a previously closed eye and significant performance improvement. Similar performance are observed in the measured 12-Gb/s eye diagrams shown in Fig. 4.8. Overall, the co-simulation model allows for very good agreement between simulation and experimental

results.



Figure 4.8: Simulated/measured optical eye diagrams at 12-Gb/s before applying asymmetric FFE (top) and after applying asymmetric FFE (bottom). Reprinted with permission from [39]

# 4.1.5 Conclusion

The first hybrid-integrated directly-modulated QD microring laser system is demonstrated at 12-Gb/s operation. Utilizing a QD microring laser model that accurately captures the photonic high-speed dynamics allows for the co-design of an advanced CMOS transmitter with 2-tap asymmetric FFE that improves modulation speed. This represents a milestone in demonstrating integration between advanced active silicon photonics and CMOS electronics for a variety of datacom applications.

- 4.2 A 22-Gb/s Directly Modulated Optical Injection-Locked QD Microring Laser Transmitter with Integrated CMOS Driver
- 4.2.1 Introduction



Figure 4.9: Optical injection-locked quantum-dot microring laser with CMOS driver. Reprinted with permission from Fan et al, IEEE Copyright 2020.

Data centers and high-performance computing are major driving forces behind increasing bandwidth and volume demand for high-speed optical interconnects. With volume CMOS foundries entering the silicon photonics market, the advantages in integration density and cost offered by a silicon integration platform will soon stand out. Among all the key photonic components on silicon, the laser source remains a critical factor that has a large impact on system power consumption, signal integrity, reliability, reach, and cost. Heterogeneously integrated III-V-on-silicon lasers are arguably the best overall solution for on-chip sources so far due to their seamless integration with silicon photonic circuits and the potential to pass all qualification processes in volume production [34]. While traditional quantum-well (QW) structures have served as the primary active region design, recently there has been successful demonstrations of several quantum-dot (QD)-based heterogeneous lasers [37, 39]. The QD material properties offers superior optical gain stability at high temperature, large optical gain bandwidth, low threshold current density, and immunity to material defects and optical feedback [40]. Directly modulated microring resonator lasers are our choice to build a low- $\lambda$  count wavelength-division multiplexing (WDM) optical interconnect due to the devices inherent WDM functionality providing a compact transmitter solution [37, 39]. Employment of QD material further reduces power consumption and enhances operation robustness. Optical injection locking (OIL) can significantly enhance the modulation bandwidth of QD microring lasers, as demonstrated by previous modulation bandwidth extension from 3-GHz to 19-GHz and data transmission with a modulation rate up to 18-Gb/s with BER below  $10^{-10}$  [38]. Additionally, if the master laser wavelength is tunable, it can be used to lock the microring laser to any of its longitudinal modes within the QD gain bandwidth, without any complex laser control. This locked mode shows increased optical power and a side-mode suppression ratio (SMSR) greater than 40-dB. As shown in Fig. 4.9, the bus-microring structure allows for easy OIL in a transmission configuration. This eliminates the need for a circulator or isolator, present in a conventional OIL setup, and therefore has little to no impact on the size and cost of the transmitter. This section presents for the first time a QD photonic transmitter based on an optical injection-locked microring laser heterogeneously integrated on silicon that is directly modulated with a high-speed CMOS driver. The CMOS driver implements

asymmetric 2-tap feed-forward equalization (FFE) in the output driver to compensate for the non-linear optical dynamics present with direct laser modulation. Both the OIL and asymmetric 2-tap FFE techniques allow for significant high-speed performance improvement, resulting in 22-Gb/s operation that is presently the highest non-return-to-zero (NRZ) direct modulation speed of an O-band QD laser on silicon.

#### 4.2.2 QD Microring Laser Charaterization

The QD microring lasers were fabricated in a heterogeneous integration platform [41]. A 3D schematic diagram in Fig. 4.9 shows the finished device structure. A commercial O-band tunable laser with 13-dBm maximal power served as the master laser, however, this platform allows for such a laser to be integrated onto the photonic die. The master laser output went through a polarization controller before launching into one of the grating couplers, at the ends of the bus waveguide, via a cleaved standard single mode fiber (SMF-28). OIL to different longitudinal modes of the slave laser was achieved by tuning the wavelength of the master laser. The grating coupler loss was measured to be 11-dB, which was due to the fabrication imperfection. About 5% power cross-coupling coefficient between Si bus waveguide and heterogeneous microring laser is obtained from simulation. Thus, we estimate about -10.5-dBm (90- $\mu$ W) of master laser power couples into the slave laser cavity. The slave lasers output along with the transmitted injected light is collected by another cleaved SMF-28 fiber. A 20-GHz lightwave component analyzer (LCA) is used to measure the small-signal response of the laser.

Fig. 4.10 shows the measured modulation frequency response with and without OIL for different bias current. Without OIL, the modulation bandwidth was between 2.6-GHz and 4-GHz, when the bias current increased from 5-mA to 25-mA. Further increase of the bias current to 30-mA led to a reduced bandwidth primarily due to the device self-heating



Figure 4.10: Direct modulation response without and with OIL at different DC bias current levels. The horizontal dotted line marks the response level 3dB below the value at DC. Reprinted with permission from Fan et al, IEEE Copyright 2020.

effect. The photon-photon interaction from OIL results in a resonance peak at around 5 to 7-GHz that extends the 3-dB modulation bandwidth by more than three fold. A 18-GHz bandwidth was observed when biasing at 25-mA.

## 4.2.3 Driver Architecture

The microring laser converts the electrical high-speed modulation signal to the optical power shown in Fig. 4.11. We can observe the different response on the rise and fall edge in the optical dynamic behavior. The optical response is current dependent, which results in a fast and underdamped rising edge with ringing and a slow and overdamped falling edge [37]. While the conventional symmetric FFE can compensate on the falling edge with a proper high pass filter, the rising edge becomes sharper and more ringing caused



Figure 4.11: Microring laser non-linear optical dynamic behavior. Reprinted with permission from Fan et al, IEEE Copyright 2020.

by the same high pass filter. Hence, the FFE needs to independently control the rising and falling current with high pass or low pass behavior to achieve a better eye-diagram.

Fig. 4.12 shows the proposed half-rate directly modulated microring laser transmitter architecture implemented in a 28-nm CMOS technology. This is a multi-channel transmitter capable of driving up to five lasers tuned to a desired WDM grid. All the high-speed blocks, except the output current-mode (CM) driver, use a 0.9-V supply. The CM driver operates under a higher voltage supply of 2-V to allow direct drive of the microring laser. The on-chip pseudorandom binary sequence (PRBS) generator provides 8-bit parallel data which is followed by the serializer, the delay generator, and asymmetric feed-forward equalizer (FFE) logic. The full-rate data/rise/fall signals drive three separate current-mode logic (CML) drivers to realize the asymmetric FFE. A half-rate clock is input to the chip, amplified by a global CML buffer, and distributed to multiple channels. The local clock



Figure 4.12: 5-channel driver prototype block diagram. Reprinted with permission from Fan et al, IEEE Copyright 2020.

buffer in each channel receives the differential clocks and converts them to rail-to-rail swing for the serializer circuits.



Figure 4.13: QD microring laser transmitter block diagram. Reprinted with permission from Fan et al, IEEE Copyright 2020.

Fig. 4.13 presents the detailed block diagram of QD microring laser CMOS driver.

The differential clocks are amplified by a local CML buffer and then converted to CMOS level swing by a CML-to-CMOS converter. CMOS buffers with adjustable pull-up/down allows to correct the clock duty-cycle. The CMOS buffers drive the final 2:1 MUX and the clock dividers which provide lower rate clocks for the 4:2 MUX, 8:4 MUX, and PRBS generator. At the 4-bit parallel stage, the first post-cursor bit is separated from the main path to allow for adequate timing margin for the asymmetric FFE logic. In order to compensate for the lasers asymmetric rising and falling-edge responses, the polarity of the first post-cursor bit is used to determine whether the rising or falling-edge differential pair steers current through the laser. The polarity of the rise/fall signs are utilized for selecting high/low pass behaviors applied on the transitions.



Figure 4.14: CML output driver with asymmetric FFE. Reprinted with permission from Fan et al, IEEE Copyright 2020.

This transmitter directly modulates the QD microring laser with a high-speed output driver that consists of three parallel differential stages that steer current between the laser

and a dummy diode-connected stacked transistor load shown in Fig. 4.14. In addition, a DC bias current source sets the minimum amount of current that flows through the laser. All of the high-speed tail current sources and the DC bias current are programmable with 4-bit resolution. The lasers bias current has a maximum 20-mA setting and is optimized for a given bandwidth. This bandwidth is extended with the 2-tap asymmetric FFE that is realized with the three parallel differential current steering stages. The main cursor data bit drives the left-most differential pair that has a 40-mA maximum current setting. The rise and fall data drive the middle and right-most differential pairs that each of them provides a 10-mA maximum setting. According to rise/fall sign settings, they can be independently configured to yield either a high pass or a low pass transient response.



Figure 4.15: Current profiles of (a) high pass and (b) low pass behaviors for the falling edge and (c) high pass and (d) low pass behavior for the rising edge. Reprinted with permission from Fan et al, IEEE Copyright 2020.

Fig. 4.15 shows the 4 different current profiles of this asymmetric FFE driver. For the falling edge, the high/low pass behavior of the transient response is performed by outputting the Ibias/Ibias+Ifall on the transition/non-transition bits shown in the Fig. 4.15(a) and (b). For the rising edge, the high/low pass behavior is also done by the current combination of the Ibias, Idata, and Irise shown in Fig. 4.15(c) and (d).

#### 4.2.4 Measurement Results



Figure 4.16: Hybrid-integrated QD microring laser transmitter prototype. Reprinted with permission from Fan et al, IEEE Copyright 2020.

The prototype driver chip is fabricated in a 28-nm CMOS process. As shown in Fig. 4.16, a hybrid chip-on-board integration approach is employed to place the QD microring laser and CMOS driver dies in close proximity to minimize the wirebond lengths between



Figure 4.17: Optical measurement setup. Reprinted with permission from Fan et al, IEEE Copyright 2020.

the chips. Fig. 4.17 shows a close-up photograph of the optical measurement setup which has one fiber injecting the master laser for OIL and the other one receiving the light output from the DUT. The collected light was split in to two branches by a 10:90 splitter shown in Fig. 4.9. The 10% port was connected to an Optical Spectrum Analyzer (OSA) for monitoring and the 90% port went through a Praseodymium-Doped Fiber Amplifier (PDFA) before passing through a tunable filter. The optical signal was observed for eye diagram characterization in a digital communication analyzer (DCA). By turning ON/OFF the master laser, we can select between the two operation modes, with or without OIL.

The CMOS driver provides 30-mA high-speed modulating current swing and 5-mA current bias which equates to an average current of 20-mA through the microring laser. In



Figure 4.18: Microring laser directly modulated with CMOS driver. Measured optical eye diagrams at 4-Gb/s without OIL (a) before and (b) after applying FFE. 10-Gb/s eye diagrams with OIL (c) before and (d) after applying FFE. 22-Gb/s eye diagrams with OIL (e) before and (f) after applying FFE. Reprinted with permission from Fan et al, IEEE Copyright 2020.

order to maintain this average current while applying FFE, the modulating current swing is slightly reduced. As shown previously in Fig. 4.10, the intrinsic bandwidth is below 4-GHz and OIL can boost the bandwidth to above 10-GHz. Fig. 4.18(a) and (b) shows the 4-Gb/s eye diagrams before and after applying the asymmetric FFE, with the master laser turned OFF. Before applying asymmetric FFE (Fig. 4.18 (a)), we observe that the falling edge is slower than the rising edge. The asymmetric FFE can selectively add a high pass filter behavior on the falling edge to optimize the eye to be better symmetric as shown in Fig. 4.18(b). Further increase in data rate without OIL leads to eye-closure. Fig. 4.18(c) and (d) show eye diagrams for a higher data-rate of 10-Gb/s, now with OIL, and the asymmetric FFE allows for optimizing the eye opening even in this case. Fig. 4.18(e) and (f) show the eye diagrams from the highest data-rate of 22-Gb/s. Without the FFE, the eye is closed and activating the asymmetric 2-tap FFE allows for opening of this closed eye. Table 4.1 summarizes the features of our directly modulated OIL microring laser transmitter and compares it with previous works. Utilizing OIL and the asymmetric FFE allows to achieve the better power efficiency of 3.2-pJ/bit.

#### 4.2.5 Conclusion

The first hybrid-integrated directly-modulated OIL QD microring laser system is demonstrated at 22-Gb/s operation. Utilizing an OIL technique allows to extend the QD microring laser bandwidth up to 4 to 5 times efficiently. An advanced CMOS transmitter with 2-tap asymmetric FFE can improve modulation speed in different modes, with and without OIL. This represents a milestone in the integration of advanced silicon lasers and CMOS integrated circuits for a variety of datacom applications.

Table 4.1: Microring laser transmitter performance summary. Reprinted with permission from Fan et al, IEEE Copyright 2020.

| References                | [37]            | [39]            | This Work       |
|---------------------------|-----------------|-----------------|-----------------|
| Photonic Device           | Quantum Well    | Quantum Dot     | Quantum Dot     |
|                           | Microring Laser | Microring Laser | Microring Laser |
| CMOS                      | 65.000          | 65.000          | 29.nm           |
| Technology                | 0.51111         | UJIIII          | 201111          |
| Data-Rate                 |                 |                 |                 |
| Per Channel               | 14              | 12              | 22              |
| (Gb/s)                    |                 |                 |                 |
| Number of Channels        | 5               | 4               | 5               |
| Energy Efficiency         | 10.3            | 10.8            | 3.2             |
| (pJ/bit)                  |                 |                 |                 |
| Integrated Driver         | Yes             | Yes             | Yes             |
| Optical Injection Locking | No              | No              | Yes             |

#### 5. CONCLUSION

CMOS circuits can increase data-rate and have better power efficiency with the process scaling. Expanding computing capabilities demands not only an increase in data bandwidth but also improvement of the pin efficiency. However, the channel bandwidth limitation and pin number are not scaled with the advanced process. This requires serious consideration of different solutions in electrical and optical links. This dissertation demonstrates design techniques for high-speed low-power SBD transceiver and QD microring laser transmitter. SBD transceiver can support two data one the channel simultaneously, which means double pin efficiency. The QD microring laser can support WDM, which allows for multiple high-bandwidth signals to be packed onto one optical channel.

The 32-Gb/s SBD transceiver prototype includes an efficient VM driver with an R-gm hybrid for signal separation, combining the CTLE and EC in a single stage, and employing a low-complexity 5/4X CDA system. Support of a wide range of channels is possible with foreground adaptation of the EC FIR filter taps with an SSLMS algorithm. This transceiver allows for an effective doubling of the per-pin bandwidth in SBD mode and also has the flexibility to support unidirectional mode transmission for varying bandwidth demands.

The first hybrid-integrated directly-modulated OIL QD microring laser system is demonstrated at 22-Gb/s operation. Utilizing an OIL technique allows extending the QD microring laser bandwidth up to 4 to 5 times efficiently. An advanced CMOS transmitter with 2-tap asymmetric FFE can improve modulation speed in different modes, with and without OIL. This represents a milestone in the integration of QD silicon lasers and CMOS integrated circuits for a variety of datacom applications.

Both proposed architectures and design techniques are able to achieve high pin efficiency wireline transceiver.

#### 5.1 SBD Transceiver Future Work

Section 3.5 has shown the details of adaptive echo cancellation, but the FE tap timing position is estimated by utilizing the transmission line length and board dielectric properties. Section 3.6 presents the experimental results that the round trip delay of 9 and 28-UIs are manually assigned for the 2" and 6" channels, respectively. While performed manually in this prototype, automated channel round-trip time detection is possible with a foreground calibration scheme.

The foreground calibration scheme can reuse the same analog front-end and decision circuits, but it needs more digital filters to detect the echoes from a range of receiving data. In state 1 of Fig. 3.22, the left-side transmitter is transmitting the PRBS data stream for training the right-side CDA. We can utilize the same state but changing the PRBS to a 1-bit one following long zeros. The receiving date should have the balance of one and zero if echo-free, so the left side can detect when the echoes come back by the time of receiving unbalance data. A pattern of the 1-bit one following a long zero should be long enough to let the FE echo comes back before sending out the next short pulse.



Figure 5.1: Round-trip delay detection

In this prototype, the round-trip time of 6" channel is 28-UIs of 16-Gb/s, which is 1.75ns. The pattern of a 1-bit one following 31-bit zeros is safe for 2" to 6" channels. When the round-trip time is detected, it can be automatically assigned to the programmable delayed data in the transmitter. The detection diagram for 6" channel case is shown in Fig. 5.1. The TX pattern is 32-UI long with 1 at the first bit and 0 at the rest of the pattern. Since no signal from the other side, the RX data should be 1 and 0 balance. Using the digital filters on the RX data from the  $9_{th}$  to  $32_{th}$  can detect the possible echoes for the 2" to 6" channel. In this case, we can get consecutive un-balanced data from 28 to 31-UIs, and the 28-UIs is the round-trip delay time.

#### 5.2 QD Microring Laser Transmitter Future Work

In section 4.2, the hybrid-integrated directly-modulated OIL QD microring laser system has been demonstrated at 22-Gb/s operation. The QD microring laser intrinsic bandwidth is about 5-GHz, but utilizing an OIL technique allows to extend the bandwidth to 4-5 times efficiently. Fig. 4.18 (c) and (e) can be observed the QD microring laser non-linearity. To optimize the driver circuitry, the OIL QD microring laser model needs to be developed. Fig. 4.10 presents direct modulation response with OIL. With different current bias, the peaking at 5GHz increases from 5-mA to 15-mA and then becomes unpredictable at 20-mA, 25-mA, and 30-mA because of device self-heating effect. The QD microring laser model is required to capture this characteristic. The peaking amount is adjustable, but this OIL test chooses a fixed average current of 20-mA. Optimizing both the proper peaking and the asymmetric FFE equalizer circuitry should extend the bandwidth to higher than 22-Gb/s.

#### REFERENCES

- [1] "International technology roadmap for semiconductors 2015,"
- [2] T. Takahashi, M. Uchida, T. Takahashi, R. Yoshino, M. Yamamoto, and N. Kitamura,
   "A cmos gate array with 600 mb/s simultaneous bidirectional i/o circuits," *IEEE Journal of Solid-State Circuits*, vol. 30, pp. 1544–1546, Dec 1995.
- [3] R. Mooney, C. Dike, and S. Borkar, "A 900 mb/s bidirectional signaling scheme," *IEEE Journal of Solid-State Circuits*, vol. 30, pp. 1538–1543, Dec 1995.
- [4] Jae-Yoon Sim, Young-Soo Sohn, Seung-Chan Heo, Hong-June Park, and Soo-In Cho, "A 1-gb/s bidirectional i/o buffer using the current-mode scheme," *IEEE Journal of Solid-State Circuits*, vol. 34, pp. 529–535, April 1999.
- [5] T. Takahashi, T. Muto, Y. Shirai, F. Shirotori, Y. Takada, A. Yamagiwa, A. Nishida, A. Hotta, and T. Kiyuna, "110-gb/s simultaneous bidirectional transceiver logic synchronized with a system clock," *IEEE Journal of Solid-State Circuits*, vol. 34, pp. 1526–1533, Nov 1999.
- [6] H. Wilson and M. Haycock, "A six-port 30-gb/s nonblocking router component using point-to-point simultaneous bidirectional signaling for high-bandwidth interconnects," *IEEE Journal of Solid-State Circuits*, vol. 36, pp. 1954–1963, Dec 2001.
- [7] Jin-Hyun Kim, Sua Kim, Woo-Seop Kim, Jung-Hwan Choi, Hong-Sun Hwang, Changhyun Kim, and Suki Kim, "A 4-gb/s/pin low-power memory i/o interface using 4-level simultaneous bi-directional signaling," *IEEE Journal of Solid-State Circuits*, vol. 40, pp. 89–101, Jan 2005.
- [8] H. Tamura, M. Kibune, Y. Takahashi, Y. Doi, T. Chiba, H. Higashi, H. Takauchi,H. Ishida, and K. Gotoh, "5 gb/s bidirectional balanced-line link compliant with ple-

siochronous clocking," in 2001 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. ISSCC (Cat. No.01CH37177), pp. 64–65, Feb 2001.

- [9] B. Casper, A. Martin, J. E. Jaussi, J. Kennedy, and R. Mooney, "An 8-gb/s simultaneous bidirectional link with on-die waveform capture," *IEEE Journal of Solid-State Circuits*, vol. 38, pp. 2111–2120, Dec 2003.
- [10] Y. Tomita, H. Tamura, M. Kibune, J. Ogawa, K. Gotoh, and T. Kuroda, "A 20-gb/s simultaneous bidirectional transceiver using a resistor-transconductor hybrid in 0.11μm cmos," *IEEE Journal of Solid-State Circuits*, vol. 42, pp. 627–636, March 2007.
- [11] K. Lam, L. R. Dennison, and W. J. Dally, "Simultaneous bidirectional signalling for ic systems," in *Proceedings.*, 1990 IEEE International Conference on Computer Design: VLSI in Computers and Processors, pp. 430–433, Sep. 1990.
- [12] A. Roshan-Zamir, O. Elhadidy, H. Yang, and S. Palermo, "A reconfigurable 16/32 gb/s dual-mode nrz/pam4 serdes in 65-nm cmos," *IEEE Journal of Solid-State Circuits*, vol. 52, pp. 2430–2447, Sep. 2017.
- [13] A. Roshan-Zamir, T. Iwai, Y. Fan, A. Kumar, H. Yang, L. Sledjeski, J. Hamilton, S. Chandramouli, A. Aude, and S. Palermo, "A 56-gb/s pam4 receiver with lowoverhead techniques for threshold and edge-based dfe fir- and iir-tap adaptation in 65-nm cmos," *IEEE Journal of Solid-State Circuits*, vol. 54, pp. 672–684, March 2019.
- [14] I. A. Young, E. Mohammed, J. T. S. Liao, A. M. Kern, S. Palermo, B. A. Block, M. R. Reshotko, and P. L. D. Chang, "Optical i/o technology for tera-scale computing," *IEEE Journal of Solid-State Circuits*, vol. 45, pp. 235–248, Jan 2010.
- [15] Y. Fan, A. Kumar, T. Iwai, A. Roshan-Zamir, S. Cai, B. Sun, and S. Palermo, "A 32 gb/s simultaneous bidirectional source-synchronous transceiver with adaptive echo

cancellation in 28nm cmos," in 2019 IEEE Custom Integrated Circuits Conference (CICC), pp. 1–4, April 2019.

- [16] Y. Fan, A. Kumar, T. Iwai, A. Roshan-Zamir, S. Cai, B. Sun, and S. Palermo, "A 32gb/s simultaneous bidirectional source-synchronous transceiver with adaptive echo cancellation techniques," *IEEE Journal of Solid-State Circuits*, vol. 55, pp. 439–451, Feb 2020.
- [17] M. Raj, M. Monge, and A. Emami, "A modelling and nonlinear equalization technique for a 20 gb/s 0.77 pj/b vcsel transmitter in 32 nm soi cmos," *IEEE Journal of Solid-State Circuits*, vol. 51, pp. 1734–1743, Aug 2016.
- [18] B. Analui, D. Guckenberger, D. Kucharski, and A. Narasimha, "A fully integrated 20gb/s optoelectronic transceiver implemented in a standard 0.13- μm cmos soi technology," *IEEE Journal of Solid-State Circuits*, vol. 41, pp. 2945–2955, Dec 2006.
- [19] N. Dupuis, B. G. Lee, J. E. Proesel, A. Rylyakov, R. Rimolo-Donadio, C. W. Baks, A. Ardey, C. L. Schow, A. Ramaswamy, J. E. Roth, R. S. Guzzon, B. Koch, D. K. Sparacin, and G. A. Fish, "30-gb/s optical link combining heterogeneously integrated iiiv/si photonics with 32-nm cmos circuits," *Journal of Lightwave Technology*, vol. 33, pp. 657–662, Feb 2015.
- [20] G. T. Reed, G. Mashanovich, F. Y. Gardesl, and D. J. Thomson, "Silicon optical modulators," *Nature Photon*, vol. 4, pp. 518–526, Jul 2010.
- [21] C. Li, R. Bai, A. Shafik, E. Z. Tabasy, B. Wang, G. Tang, C. Ma, C. Chen, Z. Peng, M. Fiorentino, R. G. Beausoleil, P. Chiang, and S. Palermo, "Silicon photonic transceiver circuits with microring resonator bias-based wavelength stabilization in 65 nm cmos," *IEEE Journal of Solid-State Circuits*, vol. 49, pp. 1419–1436, June 2014.

- [22] H. Li, Z. Xuan, A. Titriku, C. Li, K. Yu, B. Wang, A. Shafik, N. Qi, Y. Liu, R. Ding, T. Baehr-Jones, M. Fiorentino, M. Hochberg, S. Palermo, and P. Y. Chiang, "22.6 a 25gb/s 4.4v-swing ac-coupled si-photonic microring transmitter with 2-tap asymmetric ffe and dynamic thermal tuning in 65nm cmos," in 2015 IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, pp. 1–3, Feb 2015.
- [23] R. J. Drost and B. A. Wooley, "An 8-gb/s/pin simultaneously bidirectional transceiver in 0.35-/spl mu/m cmos," *IEEE Journal of Solid-State Circuits*, vol. 39, pp. 1894– 1908, Nov 2004.
- [24] Hao Li, Shuai Chen, Liqiong Yang, Rui Bai, Weiwu Hu, F. Y. Zhong, S. Palermo, and P. Y. Chiang, "A 0.8v, 560fj/bit, 14gb/s injection-locked receiver with input dutycycle distortion tolerable edge-rotating 5/4x sub-rate cdr in 65nm cmos," in 2014 Symposium on VLSI Circuits Digest of Technical Papers, pp. 1–2, June 2014.
- [25] A.Sandersi, M. Resso, and J. DíAmbrosia, "Channel compliance testing utilizing novel statistical eye methodology," in *presented at the DesignCon, Santa Clara, CA*, 2004.
- [26] M. S. Jalali, M. H. Taghavi, A. Melaren, J. Pham, K. Farzan, D. Diclemente, M. van Ierssel, W. Song, S. Asgaran, C. Holdenried, and S. Sadr, "A 4-lane 1.25to-28.05gb/s multi-standard 6pj/b 40db transceiver in 14nm finfet with independent tx/rx rate support," in 2018 IEEE International Solid - State Circuits Conference -(ISSCC), pp. 106–108, Feb 2018.
- [27] Y. Song, H. Yang, H. Li, P. Y. Chiang, and S. Palermo, "An 816 gb/s, 0.651.05 pj/b, voltage-mode transmitter with analog impedance modulation equalization and sub-3 ns power-state transitioning," *IEEE Journal of Solid-State Circuits*, vol. 49, pp. 2631–2643, Nov 2014.

- [28] M. Kossel, C. Menolfi, J. Weiss, P. Buchmann, G. von Bueren, L. Rodoni, T. Morf, T. Toifl, and M. Schmatz, "A t-coil-enhanced 8.5 gb/s high-swing sst transmitter in 65 nm bulk cmos with -16 db return loss over 10 ghz bandwidth," *IEEE Journal of Solid-State Circuits*, vol. 43, pp. 2905–2920, Dec 2008.
- [29] M. Raj, S. Saeedi, and A. Emami, "A wideband injection locked quadrature clock generation and distribution technique for an energy-proportional 1632 gb/s optical receiver in 28 nm fdsoi cmos," *IEEE Journal of Solid-State Circuits*, vol. 51, pp. 2446– 2462, Oct 2016.
- [30] J. L. Sonntag and J. Stonick, "A digital clock and data recovery architecture for multigigabit/s binary links," *IEEE Journal of Solid-State Circuits*, vol. 41, pp. 1867–1875, Aug 2006.
- [31] A. Ragab, Y. Liu, K. Hu, P. Chiang, and S. Palermo, "Receiver jitter tracking characteristics in high-speed source synchronous links," *Journal of Electrical and Computer Engineering*, vol. 2011, Article ID 982314, 2011.
- [32] N. Wary and P. Mandal, "Current-mode full-duplex transceiver for lossy on-chip global interconnects," *IEEE Journal of Solid-State Circuits*, vol. 52, pp. 2026–2037, Aug 2017.
- [33] A. Manian, A. Rane, and Y. Koh, "A simultaneous bidirectional single-ended coaxial link with 24-gb/s forward and 312.5-mb/s back channels," in ESSCIRC 2018 - IEEE 44th European Solid State Circuits Conference (ESSCIRC), pp. 178–181, Sep. 2018.
- [34] "https://www.intel.com/content/www/us/en/architecture-and-technology/siliconphotonics/optical-transceiver-100g-cwdm4-qsfp28-brief.html.,"
- [35] G. Kurczveil, D. Liang, M. Fiorentino, and R. G. Beausoleil, "Robust hybrid quantum dot laser for integrated silicon photonics," *Opt. Express*, vol. 24, pp. 16167–

16174, Jul 2016.

- [36] C. Zhang, D. Liang, G. Kurczveil, A. Descos, and R. G. Beausoleil, "Hybrid quantum-dot microring laser on silicon," *Optica*, vol. 6, pp. 1145–1151, Sep 2019.
- [37] A. Roshan-Zamir, K. Yu, D. Liang, C. Zhang, C. Li, G. Fan, B. Wang, M. Fiorentino,
  R. Beausoleil, and S. Palermo, "A 14 gb/s directly modulated hybrid microring laser transmitter," in 2018 Optical Fiber Communications Conference and Exposition (OFC), pp. 1–3, March 2018.
- [38] D. Liang, C. Zhang, A. Roshan-Zamir, K. Yu, C. Li, G. Kurczveil, Y. Hu, W. Shen, M. Fiorentino, S. Kumar, S. Palermo, and R. Beausoleil, "A fully-integrated multihybrid dml transmitter," in 2018 Optical Fiber Communications Conference and Exposition (OFC), pp. 1–3, March 2018.
- [39] Y. Fan, D. Liang, A. Roshan-Zamir, C. Zhang, B. Wang, M. Fiorentino, R. Beausoleil, and S. Palermo, "A directly modulated quantum dot microring laser transmitter with integrated cmos driver," in 2019 Optical Fiber Communications Conference and Exhibition (OFC), pp. 1–3, March 2019.
- [40] J. C. Norman, D. Jung, Y. Wan, and J. E. Bowers, "Perspective: The future of quantum dot photonic integrated circuits," *APL Photonics*, vol. 3, no. 3, p. 030901, 2018.
- [41] D. Liang, Z. Liu, A. Descos, S. Srinivasan, Z. Huang, G. Kurczveil, and R. Beausoleil, "Optical injection-locked high-speed heterogeneous quantum-dot microring lasers," in *European Conference on Optical Communication (ECOC) Dublin, UK*, 2019.