## DESIGN OF ENERGY-EFFICIENT OPTICAL TRANSCEIVERS

A Dissertation

by

## PENG YAN

## Submitted to the Graduate and Professional School of Texas A&M University in partial fulfillment of the requirements for the degree of

## DOCTOR OF PHILOSOPHY

| Samuel Palermo         |
|------------------------|
| Paul Gratz             |
| Jose Silva-Martinez    |
| Duncan Henry M. Walker |
| Aniruddha Datta        |
|                        |

August 2023

Major Subject: Electrical Engineering

Copyright 2023 Peng Yan

#### ABSTRACT

Ultra wide-band optical channel's insensitivity to frequency and communication distance makes it suitable to support ever-increasing data-rate, while electrical copper channel is no longer a solution. Optical transceiver based on micro-ring resonator is an effective approach bridging optical channel's THz bandwidth and electrical circuit's GHz running speed, but has increased design complexity.

This dissertation presents three designs focusing on power-efficient short-reach optical communication up to hundreds of meters, including one wire-boned optical receiver and two 3D-integrated optical transceivers. TIA with a multi-stage amplifier is proposed to reduce optical receiver's noise and improve its sensitivity, without extra power and silicon area. Clocking, transmitter and microring resonant wavelength stabilization are also discussed to make a complete power-efficient optical transceiver.

Combining all noise reduction techniques, the 12.5 Gb/s optical receiver fabricated in 28 nm CMOS technology achieves 0.11 pJ/bit power efficiency and -10.7 dBm OMA sensitivity at  $10^{-12}$  BER with a 0.6 A/W wire-bonded PD. Power efficiency improves by 3.6X while normalized OMA sensitivity improves by 3.2 dB, compared to conventional TIA using a single-stage amplifier. Its minimal silicon area without on-chip inductors makes it suitable for high bandwidth-density applications.

Further improvement is achieved in the 32-channel optical transceiver fabricated in 12 nm CMOS technology, with co-designed optical devices and 3D integration. Optical transmitter has 157 fJ/bit power efficiency at 18 Gb/s. The measured optical receiver power efficiency of 84.8 fJ/bit and -17.0 dBm OMA sensitivity at 25 Gb/s is the state-of-the-art result to our best knowledge. Normalized OMA sensitivity is second only to power-hungry design using DFE, with 18.75X better power efficiency.

The 20-channel design has been taped-out in 22 nm CMOS technology, with simulated 179 fJ/bit overall power efficiency at 500 Gb/s aggregate data rate. 3D-integrated optical transceiver

incorporates MOS-capacitor modulator transmitter with DVFS and multi-phase clock generated by DLL for less power. Electrostatic micro-ring resonant wavelength stabilization is included to eliminate high-power heater-based tuning.

# DEDICATION

To my family.

#### ACKNOWLEDGMENTS

First of all, I would like to express my deepest gratitude to my advisor, Prof. Samuel Palermo. He spent lots of time and effort guiding me through my Ph.D. research in the past five years. This dissertation would not happen without his persistent advice and motivation. I would also like to thank Prof. Paul Gratz, Prof. Jose Silva-Martinez, and Prof. Duncan Henry M. Walker for serving as my committee members and their valuable feedback.

I wish to thank Intel for their financial support and tape-out opportunity. I'm grateful to James E Jaussi, Ganesh Balamurugan, Cooper Levy, Saeid Daneshgar and Jahnavi Sharma. It's always my pleasure to work with such a great research group. I learned quite a lot from them.

My sincere thanks also go to my colleague graduate students in Texas A&M University, Po-Hsuan Chang, Chia-Chi Liu, Chun Yi Cheng, Ankur Kumar, Hyungryul Kang, Ruida Liu, Yuanming Zhu, and Srujan Kumar Kaile. Our collaboration is a critical part of my Ph.D. study. I also thanks Anirban Samanta, Mingye Fu, and Prof. S.J. Ben Yoo from UC Davis for their time and effort spent on optical part of our projects.

Last, but not least, I would like to thank my family for their unconditional support in all my life. For that, I dedicate this dissertation to them.

#### CONTRIBUTORS AND FUNDING SOURCES

### Contributors

This work was supported by a dissertation committee consisting of Prof. Palermo [advisor], Prof. Gratz, Prof. Silva-Martinez and Prof. Walker of the Department of Computer Science and Engineering.

The optical transmitter for Chapter VI was provided by Po-Hsuan Chang. The optical transmitter for Chapter VII was provided by Chia-Chi Liu. Optical device used in Chapter VI and VII was provided by UC Davis. All other work conducted for the dissertation was completed by the student independently.

## **Funding Sources**

Graduate study was supported in part by DARPA, DOE and Intel.

## NOMENCLATURE

| ADC  | Analog-to-Digital Converter             |
|------|-----------------------------------------|
| AI   | Artificial Intelligence                 |
| BER  | Bit Error Rate                          |
| CMOS | Complementary Metal-Oxide Semiconductor |
| CTLE | Continuous Time Linear Equalizer        |
| DAC  | Digital-to-Analog Converter             |
| DCC  | Duty Cycle Correction                   |
| DFE  | Decision Feedback Equalizer             |
| DLL  | Delay-Locked Loop                       |
| DSP  | Digital Signal Processing               |
| DVFS | Dynamic Voltage and Frequency Scaling   |
| EIC  | Electrical Integrated Circuit           |
| ILO  | Injection-Locked Oscillator             |
| ІоТ  | Internet of Things                      |
| ISF  | Impulse Sensitivity Function            |
| ISI  | InterSymbol Interference                |
| JTF  | Jitter Transfer Function                |
| LTV  | Linear Time-Varying                     |
| MZM  | Mach-Zehnder Modulator                  |
| NRZ  | Non-Return-to-Zero                      |
| OMA  | Optical Modulation Amplitude            |
| PAM  | Pulse Amplitude Modulation              |
| PD   | Photodiode                              |

| PIC    | Photonic Integrated Circuit          |
|--------|--------------------------------------|
| PRBS   | Pseudorandom Binary Sequence         |
| PVT    | Process, Voltage and Temperature     |
| QEC    | Quadrature Error Correction          |
| SAR    | Successive-Approximation-Register    |
| SERDES | Serializer/Deserializer              |
| SNDR   | Signal-to-Noise-and-Distortion Ratio |
| RX     | Receiver                             |
| TIA    | Transimpedance Amplifier             |
| TX     | Transmitter                          |
| UI     | Unit Interval                        |
| WDM    | Wavelength-Division Multiplexing     |

# TABLE OF CONTENTS

| ABSTRACT |                                                                                          |                                                                                                                                                                   |                                  |
|----------|------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------|
| DF       | EDIC                                                                                     | ATION                                                                                                                                                             | iv                               |
| AC       | CKNC                                                                                     | OWLEDGMENTS                                                                                                                                                       | v                                |
| CC       | ONTR                                                                                     | IBUTORS AND FUNDING SOURCES                                                                                                                                       | vi                               |
| NC       | OMEN                                                                                     | NCLATURE                                                                                                                                                          | vii                              |
| TA       | BLE                                                                                      | OF CONTENTS                                                                                                                                                       | ix                               |
| LI       | ST OI                                                                                    | F FIGURES                                                                                                                                                         | xii                              |
| LI       | ST OI                                                                                    | F TABLESx                                                                                                                                                         | cvii                             |
| 1.       | INT                                                                                      | RODUCTION                                                                                                                                                         | 1                                |
| 2.       | SIG                                                                                      | NAL INTEGRITY IN OPTICAL COMMUNICATION                                                                                                                            | 4                                |
|          | 2.1<br>2.2<br>2.3<br>2.4<br>2.5<br>2.6                                                   | Modulation Scheme and Signal Power Spectrum Density<br>Linear Distortion (ISI)<br>Non-linear Distortion<br>Noise and BER<br>Jitter's Impact on BER<br>Conclusion. | 4<br>7<br>9<br>10<br>12<br>13    |
| 3.       | LOV                                                                                      | V-NOISE OPTICAL RECEIVER DESIGN                                                                                                                                   | 14                               |
|          | <ul> <li>3.1</li> <li>3.2</li> <li>3.3</li> <li>3.4</li> <li>3.5</li> <li>3.6</li> </ul> | Optical Receiver's Optimal Bandwidth                                                                                                                              | 15<br>16<br>18<br>22<br>25<br>27 |
| 4.       | CLC                                                                                      | CKING, OPTICAL TRANSMITTER AND OPTICAL DEVICE TUNING                                                                                                              | 28                               |
|          | 4.1                                                                                      | Clocking Circuitry4.1.1Jitter and Phase Noise                                                                                                                     | 28<br>28                         |

|    |      | 4.1.2 Forwarded Clock Architecture                                           | 32  |
|----|------|------------------------------------------------------------------------------|-----|
|    |      | 4.1.3 Jitter Amplification Due to Limited Signal Bandwidth                   | 34  |
|    |      | 4.1.4 Multi-phase Clock Generation                                           | 36  |
|    | 4.2  | Optical Transmitter and Optical Device Tuning                                | 39  |
|    |      | 4.2.1 Optical Transmitter                                                    | 39  |
|    |      | 4.2.2 Micro-ring Resonator Wavelength Stabilization                          | 42  |
|    |      |                                                                              |     |
| 5. | A 12 | 2.5 GB/S WIRE-BONDED OPTICAL RECEIVER                                        | 49  |
|    | 5.1  | Low-noise Front-end                                                          | 49  |
|    |      | 5.1.1 Low-bandwidth TIA with Multi-stage Amplifier                           | 50  |
|    |      | 5.1.2 Inverter-based CTLE                                                    | 50  |
|    | 5.2  | DC Cancellation                                                              | 54  |
|    | 5.3  | Low-voltage Ouarter-rate Slicers                                             | 55  |
|    | 5.4  | Experimental Results                                                         | 56  |
|    | 5.5  | Conclusion                                                                   | 59  |
|    |      |                                                                              |     |
| 6. | A 32 | C-CHANNEL 3D-INTEGRATED OPTICAL TRANSCEIVER                                  | 61  |
|    | 6.1  | Optical Transceiver Architecture                                             | 61  |
|    | 6.2  | Power-efficient Optical Transmitter                                          | 62  |
|    | 6.3  | Ultra sensitive Optical Receiver                                             | 63  |
|    | 6.4  | Experimental Results                                                         | 69  |
|    | 6.5  | Conclusion                                                                   | 71  |
| 7  | Δ 20 | CHANNEL 3D-INTEGRATED OPTICAL TRANSCEIVER                                    | 73  |
| 7. | 1120 |                                                                              | 15  |
|    | 7.1  | 3D-integration Scheme                                                        | 73  |
|    | 7.2  | Power-efficient Low-swing Optical Transmitter                                | 73  |
|    | 7.3  | Ultra Sensitive Optical Receiver                                             | 76  |
|    |      | 7.3.1 TIA with Multi-stage Amplifier                                         | 77  |
|    |      | 7.3.2 Inverter-based Current-mode Additive CTLE with Active Inductor         | 78  |
|    |      | 7.3.3 Noise Reduction in DC Cancellation Loop                                | 81  |
|    |      | 7.3.4 Low-voltage Slicer with Hybrid Offset Cancellation                     | 84  |
|    |      | 7.3.5 Wire-bonded Optical Receiver Front-end Test Structure and Experimental |     |
|    |      | Results                                                                      | 85  |
|    | 7.4  | Power-efficient Clock Generation and Distribution                            | 88  |
|    |      | 7.4.1 Multi-phase Clock Generation                                           | 88  |
|    |      | 7.4.2 Clock Distribution                                                     | 93  |
|    |      | 7.4.3 Phase Correction and Duty Cycle Correction                             | 93  |
|    | 7.5  | Performance Summary                                                          | 96  |
| 8. | CON  | ICLUSION AND FUTURE WORK                                                     | 101 |
|    | 8.1  | Conclusion                                                                   | 101 |
|    | 8.2  | Future Work                                                                  | 102 |

| 8.2.1      | Further Improvement with Co-designed Optical Device and Integration |       |
|------------|---------------------------------------------------------------------|-------|
|            | Scheme                                                              | . 102 |
| 8.2.2      | Automatic Tuning Logic                                              | . 103 |
| REFERENCES |                                                                     | . 105 |

# LIST OF FIGURES

| FIGURE |                                                                                                                               |    |
|--------|-------------------------------------------------------------------------------------------------------------------------------|----|
| 1.1    | T. Musah, "Wireline Link Standard," [Online]. Available: https://ece.osu.edu/mixed-signal-integrated-circuits-and-systems-lab | 1  |
| 1.2    | Electrical transceiver based on a terminated transmission line                                                                | 2  |
| 1.3    | (a) Typical electrical backplane link and (b) its frequency response w/ and w/o equalization [1].                             | 2  |
| 1.4    | WDM optical transceiver block diagram [2].                                                                                    | 3  |
| 2.1    | NRZ and PAM4 modulation scheme.                                                                                               | 4  |
| 2.2    | Normalized NRZ's auto-correlation.                                                                                            | 5  |
| 2.3    | Normalized NRZ's power spectrum density.                                                                                      | 6  |
| 2.4    | Input and output of 2nd-order Butterworth LPF.                                                                                | 7  |
| 2.5    | Eye-diagram of 2nd-order Butterworth LPF.                                                                                     | 8  |
| 2.6    | Eye-diagram of 1st-order HPF and 2nd-order Butterworth LPF with 16 GHz band-<br>width, 32 Gbps NRZ.                           | 9  |
| 2.7    | Inverter-based voltage amplifier schematic.                                                                                   | 10 |
| 2.8    | Errors generated by additive noise                                                                                            | 11 |
| 2.9    | Simulated BER curve with different jitter levels                                                                              | 12 |
| 2.10   | BER simulation method                                                                                                         | 13 |
| 3.1    | Optical transceiver's simplified noise model                                                                                  | 14 |
| 3.2    | Simulated eye-diagrams of 2nd-order Butterworth response of various bandwidths                                                | 15 |
| 3.3    | Simulated eye-heights of 2nd-order Butterworth response of various bandwidths [3].                                            | 16 |
| 3.4    | Optical receiver's integrated noise with various bandwidths [3]                                                               | 17 |
| 3.5    | Optical receiver's OMA sensitivity with various bandwidths [3].                                                               | 17 |

| 3.6  | Optical receiver front-end noise model                                                                                   | 18 |
|------|--------------------------------------------------------------------------------------------------------------------------|----|
| 3.7  | Open-loop frequency response of 2nd-order shunt-feedback TIA.                                                            | 20 |
| 3.8  | Simulated TIA frequency response with different amplifier poles                                                          | 21 |
| 3.9  | Simulated TIA frequency response with different amplifier pole                                                           | 22 |
| 3.10 | (a) Conventional TIA with post-amplifier and (b) proposed TIA with multi-stage feedback amplifier.                       | 23 |
| 3.11 | Transimpedance and input-referred noise power spectral density of conventional TIA with post-amplifier and proposed TIA. | 23 |
| 3.12 | Small signal models of (a) conventional TIA with post-amplifier and (b) proposed TIA.                                    | 24 |
| 4.1  | Current source charging a capacitor [4].                                                                                 | 28 |
| 4.2  | Noise injected at different time [5].                                                                                    | 29 |
| 4.3  | Phase noise/jitter generation in oscillator [5][6].                                                                      | 30 |
| 4.4  | (a) Embedded and (b) forwarded clock architecture [7]                                                                    | 32 |
| 4.5  | Optical transceiver's forwarded clock diagram.                                                                           | 33 |
| 4.6  | Simulated jitter transfer function with different $t_d$ , $\omega_{ILO,TX} = \omega_{ILO,RX} = 0.1\omega_c$              | 34 |
| 4.7  | Calculated and simulated jitter transfer function of first-order LPF with various bandwidths.                            | 35 |
| 4.8  | 8-phase clock generation based on ring oscillator DLL.                                                                   | 37 |
| 4.9  | 8-phase clock generation based on ring oscillator ILO.                                                                   | 38 |
| 4.10 | ILO's linearized phase domain model                                                                                      | 38 |
| 4.11 | MZM modulator structure.                                                                                                 | 40 |
| 4.12 | Micro-ring resonator structure.                                                                                          | 41 |
| 4.13 | Micro-ring resonator's spectra transfer function.                                                                        | 41 |
| 4.14 | Micro-ring resonator acts as external optical modulator                                                                  | 42 |
| 4.15 | Micro-ring resonator modulation with different incoming laser wavelength                                                 | 43 |
| 4.16 | Low-power electrostatic micro-ring resonator wavelength stabilization                                                    | 43 |

| 4.17 | Micro-ring's two possible lock points when used as optical modulator [8]                                                                                        | 44 |
|------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 4.18 | Current-mode 6-bit SAR ADC schematic                                                                                                                            | 45 |
| 4.19 | Tuning logic diagram                                                                                                                                            | 45 |
| 4.20 | (a) Sigma-Delta modulator and (b) its equivalent z-domain model                                                                                                 | 46 |
| 4.21 | Charge-pump amplifier schematic.                                                                                                                                | 47 |
| 4.22 | (a) Simulated normalized average drop port power and (b) 6-bit tuning code of 2 cascaded micro-rings and tuning loops                                           | 47 |
| 5.1  | Inverter-based optical receiver.                                                                                                                                | 49 |
| 5.2  | Inverter-based CTLE schematic, small signal model, and noise reduction via TIA input stage bandwidth reduction.                                                 | 51 |
| 5.3  | Simulated frequency response, input-referred noise power spectral density, and 12.5 Gb/s differential eye diagram at the slicers' inputs with $C_{in} = 150$ fF | 53 |
| 5.4  | DC cancellation schematic and simulated front-end frequency response over an extended low-frequency range. $Z_T$ is the front-end's HF frequency response       | 54 |
| 5.5  | Schematic of sampling slicer                                                                                                                                    | 55 |
| 5.6  | (a) Optical receiver layout and chip micrograph and (b) optical test setup                                                                                      | 56 |
| 5.7  | (a) Measured BER timing margin curves with OMA = -10.7 dBm and (b) sensitiv-<br>ity curves.                                                                     | 57 |
| 6.1  | PIC-EIC 3D-integration approach [9].                                                                                                                            | 61 |
| 6.2  | Optical transmitter block diagram [10]                                                                                                                          | 62 |
| 6.3  | Optical receiver block diagram [10].                                                                                                                            | 63 |
| 6.4  | DOE optical link budget.                                                                                                                                        | 64 |
| 6.5  | Variable bandwidth TIA with multi-stage feedback amplifier and broadband buffer                                                                                 | 65 |
| 6.6  | Simulated front-end frequency response with $C_{in} = 14$ fF                                                                                                    | 66 |
| 6.7  | Simulated 25 Gb/s eye-diagram with gain setting = 2 and $C_{in}$ = 14 fF                                                                                        | 66 |
| 6.8  | Simulated front-end frequency response with varying $C_{in}$                                                                                                    | 67 |
| 6.9  | Simulated input-referred noise PSD with varying C <sub>in</sub>                                                                                                 | 67 |

| 6.10 | Simulated frequency response of conventional TIA and our front-end                                                                                                           | 68 |
|------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 6.11 | Simulated input-referred noise PSD of conventional TIA and our front-end                                                                                                     | 68 |
| 6.12 | Layout of (a) transmitter channel and (b) receiver channel                                                                                                                   | 69 |
| 6.13 | (a) Measured BER timing margin curves with OMA = -17.0 dBm and (b) sensitiv-<br>ity curves.                                                                                  | 70 |
| 7.1  | 3D-integration bonding scheme                                                                                                                                                | 73 |
| 7.2  | Power-efficient low-swing optical transmitter                                                                                                                                | 74 |
| 7.3  | (a) 4 to 1 serializer and (b) its timing diagram.                                                                                                                            | 75 |
| 7.4  | Simulated eye-diagrams at 4 to 1 serializer's output node, 650mV DVFS supply                                                                                                 | 76 |
| 7.5  | Pre-driver (a) charges/(b) discharges $C_C$ , and its simulated single-ended (c) input and (d) output eye-diagram.                                                           | 77 |
| 7.6  | Simulated eye-diagrams at mos-cap modulator's input node with (a) 600 mV, (b) 650 mV, (c) 700 mV DVFS supply and (d) normalized optical eye-diagram with 650 mV DVFS supply. | 78 |
| 7.7  | Ultra sensitive optical receiver.                                                                                                                                            | 79 |
| 7.8  | Simulated OMA sensitivity with different total RX input capacitance                                                                                                          | 80 |
| 7.9  | Simulated front-end frequency response and eye-diagram.                                                                                                                      | 81 |
| 7.10 | (a) Voltage-mode additive CTLE and (b) current-mode additive CTLE with active inductor.                                                                                      | 82 |
| 7.11 | Inverter-based active inductor and its small signal model                                                                                                                    | 82 |
| 7.12 | Active inductor frequency response.                                                                                                                                          | 83 |
| 7.13 | DC cancellation loop diagram.                                                                                                                                                | 83 |
| 7.14 | Source degeneration resistor with shorting switch and its noise model                                                                                                        | 84 |
| 7.15 | Slicer with hybrid offset cancellation                                                                                                                                       | 85 |
| 7.16 | Wire-bonded optical receiver front-end test structure.                                                                                                                       | 86 |
| 7.17 | Optical receiver front-end test setup.                                                                                                                                       | 86 |
| 7.18 | Measured eye-diagram at (a) 8 Gb/s and (b) 10 Gb/s                                                                                                                           | 87 |

| 7.19 | Measured BER timing margin curves with OMA = -13.9 dBm                                                                                    | 87  |
|------|-------------------------------------------------------------------------------------------------------------------------------------------|-----|
| 7.20 | Clocking circuitry diagram of the 20-channel optical transceiver                                                                          | 88  |
| 7.21 | (a) Transmitter and (b) receiver multi-phase clock generation.                                                                            | 89  |
| 7.22 | Simulated phase error of transmitter multi-phase clock generation                                                                         | 89  |
| 7.23 | Simulated phase error of receiver multi-phase clock generation                                                                            | 90  |
| 7.24 | Measured phase error of transmitter and receiver multi-phase clock generation                                                             | 91  |
| 7.25 | Updated (a) transmitter and (b) receiver multi-phase clock generation                                                                     | 92  |
| 7.26 | Inverter-based clock buffer.                                                                                                              | 93  |
| 7.27 | TX clocking distribution network's simulated jitter transfer function, output-referred jitter, and power consumption with various supply. | 94  |
| 7.28 | RX clocking distribution network's simulated jitter transfer function, output-referred jitter, and power consumption with various supply. | 94  |
| 7.29 | Simulated TX clocking total duty cycle variation.                                                                                         | 95  |
| 7.30 | Simulated TX clocking total phase error.                                                                                                  | 95  |
| 7.31 | Simulated RX clocking total duty cycle variation.                                                                                         | 96  |
| 7.32 | Simulated RX clocking total phase error.                                                                                                  | 97  |
| 7.33 | (a) TX clocking duty cycle correction and (b) simulation result                                                                           | 98  |
| 7.34 | (a) RX clocking duty cycle correction and (b) simulation result                                                                           | 98  |
| 7.35 | (a) TX clocking phase error correction and (b) simulation result                                                                          | 99  |
| 7.36 | (a) RX clocking phase error correction and (b) simulation result                                                                          | 99  |
| 7.37 | Power consumption breakdown of optical transceiver with 26.32Gb/s data-rate, including amortized clock channel power.                     | 100 |
| 8.1  | Resistive TIA with DC cancellation loop.                                                                                                  | 102 |
| 8.2  | (a) RX slicer with controlled offset and (b) CTLE frequency response                                                                      | 103 |

# LIST OF TABLES

| E Pag                                             | TABLE |                                        | Page |
|---------------------------------------------------|-------|----------------------------------------|------|
| Numerical value of Q-function or BER 1            | 2.1   | alue of Q-function or BER              | 11   |
| Noise reduction techniques in recent publications | 3.1   | tion techniques in recent publications |      |
| Decision algorithm table 4                        | 4.1   | gorithm table.                         | 45   |
| Performance summary 5                             | 5.1   | e summary                              | 58   |
| Optical receiver power consumption summary        | 6.1   | iver power consumption summary.        | 71   |
| Optical receiver performance summary 7            | 6.2   | iver performance summary.              | 72   |
| Noise reduction techniques summary 7              | 6.3   | tion techniques summary                | 72   |
| Jitter budget running at 26.32 Gb/s               | 7.1   | running at 26.32 Gb/s                  |      |
| Link budget                                       | 7.2   |                                        |      |

#### 1. INTRODUCTION

Rapid data growth due to 5G, AI, IoT, and Cloud Computing requires innovative solutions to sustain increasing data rate and data density, especially for communication up to 100 meters in Data Centers. Electrical transceiver plotted in Fig. 1.2 is susceptible to high-frequency loss from skin effect and dielectric loss, which are proportional to communication distance, making it harder to keep up with emerging protocols even with higher-order modulation, power-hungry equalization, and ADC-DSP solution.



Figure 1.1: T. Musah, "Wireline Link Standard," [Online]. Available: https://ece.osu.edu/mixed-signal-integrated-circuits-and-systems-lab

Despite increased complexity, WDM optical transceiver shown in Fig. 1.4 is a promising solution along a single optical fiber for Tera-bps data rate. Power efficiency can also be improved by removing complex equalization, thanks to channel loss almost independent of frequency and communication distance.



Figure 1.2: Electrical transceiver based on a terminated transmission line.



Figure 1.3: (a) Typical electrical backplane link and (b) its frequency response w/ and w/o equalization [1].

This dissertation investigates all the aspects of optical transceivers to release the full potential of ultra wide-band optical channels. The rest of this dissertation is organized as follows.

Chapter 2 reviews modulation scheme, signal integrity, and their impact on BER in optical communication. Chapter 3 lays down the theoretical foundation of a high-speed low-noise op-



Figure 1.4: WDM optical transceiver block diagram [2].

tical receiver which is critical to improve the full optical transceiver's power efficiency. Noisebandwidth trade-off and several published noise reduction techniques are investigated before we propose an inverter-based TIA with a multi-stage amplifier for noise reduction without any extra power or silicon area. Better CTLE and DC cancellation loop are also provided for further noise reduction. Chapter 4 covers clocking circuit, optical transmitter, and micro-ring resonant wavelength stabilization to make a complete optical transceiver.

Chapter 5 presents an optical receiver fabricated in 28 nm CMOS technology to interface discrete photodiode, with experimental results. Chapter 6 describes a complete optical transceiver fabricated in 12 nm CMOS technology with co-designed 3D-integrated PIC, providing superior performance and power efficiency compared to recently published transceivers running at similar data rate. Chapter 7 discusses further improvements in a 20-channel optical transceiver fabricated in 22 nm CMOS technology with simulation results.

Finally, Chapter 8 concludes this work with potential future work.

#### 2. SIGNAL INTEGRITY IN OPTICAL COMMUNICATION

Signal integrity is the key concept in data communication. Transmitted signal altered by deterministic and random fluctuations in the channel, degrades SNDR defined as

$$SNDR = \frac{Signal Power}{Noise Power + Distortion Power}.$$
(2.1)

Except for degraded SNDR, errors can also be introduced by clock jitter. Directly measurable BER is a better single metric in SERDES design but has a complex relationship with modulation scheme, linear/non-linear distortion, noise, and jitter. In this chapter, we investigate them and reveal their impacts on BER before providing a BER simulation method.

#### 2.1 Modulation Scheme and Signal Power Spectrum Density



Figure 2.1: NRZ and PAM4 modulation scheme.

NRZ has been the most popular modulation scheme in medium data-rate optical communica-

tion, using two levels of optical power to represent information. Its simplicity provides significantly better BER thanks to insensitivity to non-linearity and noise, at the cost of worse bandwidth efficiency. This problem is usually not serious up to 32 Gb/s per channel but becomes more severe with a higher data rate. In this dissertation, NRZ is adopted for better power efficiency, in terms of energy to transmit 1-bit signal, of an optical transceiver running at a medium data rate.

NRZ modulation generates a random binary sequence X(t) with power spectrum density derived from its auto-correlation, which is a correlation of the original and delayed version of X(t) as shown in Eq. (2.3).

$$R(\tau) = E[X(t)X(t-\tau)]$$
(2.2)

Assuming any one bit is independent of all the other bits, auto-correlation of a normalized NRZ bit stream has a triangle waveform as plotted in Fig. 2.2, corresponding to a sinc function power spectrum density as derived in Eq. (2.3) using Fourier transform.



Figure 2.2: Normalized NRZ's auto-correlation.

$$PSD(f) = F[R(\tau)] = \int_{-\infty}^{+\infty} R(\tau)e^{-j2\pi f\tau}d\tau$$
  
=  $\int_{-UI}^{0} [\tau/UI + 1]e^{-j2\pi f\tau}d\tau + \int_{0}^{UI} [-\tau/UI + 1]e^{-j2\pi f\tau}d\tau$  (2.3)  
=  $UI[\frac{sin(\pi UIf)}{\pi UIf}]^{2}$ 



Figure 2.3: Normalized NRZ's power spectrum density.

As shown in Fig. 2.3, the majority of the signal power is carried by the fundamental component. Excessive bandwidth requires more power and introduces more high-frequency noise. Optimal bandwidth is a trade-off between integrated signal power and noise power. Detailed analysis will be given in Chapter 3.

#### 2.2 Linear Distortion (ISI)

Optical transceiver usually has band-pass characteristic with upper bandwidth and low-frequency cut-off, introducing linear distortion (ISI). Fig. 2.4 shows the low-pass filtering effect of a finite bandwidth linear system on 1-UI pulse, with the main cursor carrying transmitted information and pre-cursor/post-cursor degrading other bits. A random NRZ bit stream passing through a linear system becomes the linear combination of the 1-UI pulse response for each bit.



Figure 2.4: Input and output of 2nd-order Butterworth LPF.

Eye-diagram is a widely used graphical tool to judge signal quality, both in simulation and measurement. Simulated eye-diagram plotted in Fig. 2.5 is generated by cutting 1-UI pulse response into 2 UI time domain pieces and overlapping all possible cases statistically. Another way to generate an eye-diagram is a long enough transient simulation with PRBS15 or better input bit-stream going through almost all possible combinations, which also captures the non-linearity effect in the time domain.



Figure 2.5: Eye-diagram of 2nd-order Butterworth LPF.

An open eye-diagram indicates better signal quality, with conservative signal swing captured by eye-height and conservative jitter estimation captured by eye width. Even though an eye-diagram can also capture the noise effect and works pretty well with a sampling oscilloscope to obtain a measurement, simulation should only generate a noiseless eye-diagram because low probability event due to noise can not be simulated efficiently in the time domain.

Eye-diagram also captures DC wandering due to low cut-off frequency, which is usually a less severe linear distortion. Low-frequency component from consecutive ones or zeros is high-pass filtered out leading to signal drift. Lower cut-off frequency is preferred for better signal integrity but requires a larger silicon area to construct a slower pole and needs more setup time. As shown in Fig. 2.6, a cut-off frequency around 1MHz is usually acceptable, while some standards free of long consecutive bits can further relax this requirement.



Figure 2.6: Eye-diagram of 1st-order HPF and 2nd-order Butterworth LPF with 16 GHz bandwidth, 32 Gbps NRZ.

#### 2.3 Non-linear Distortion

Non-linearity is another source of distortion, but not severe in NRZ signaling as opposed to its PAM4 counterpart, especially in properly sized inverter-based design with good linearity [11][12].

Considering the inverter-based voltage amplifier in Fig. 2.7, the input-stage inverter acts as the transconductance stage, while transimpedance loading converts current back to voltage swing. Its input and output nodes are biased near half of the supply for maximal gain. If PMOS and NMOS have the same threshold and are properly sized for  $K_n = K_p = K$ ,



Figure 2.7: Inverter-based voltage amplifier schematic.

$$i_{out} = i_n - i_p = K_n (V_{DD}/2 + v_{in} - v_{thn})^2 - K_p [V_{DD} - (V_{DD}/2 + v_{in}) - |v_{thp}|]^2$$
  
=  $K (V_{DD}/2 + v_{in} - v_{th})^2 - K (V_{DD}/2 - v_{in} - v_{th})^2$   
=  $2K (V_{DD} - 2v_{th}) v_{in}$  (2.4)

 $i_{out}$  becomes pure linear if CMOS's square-law characteristic holds. As transimpedance loading determined by  $R_F$  is also very linear, the output voltage swing only has non-linearity from CMOS's imperfect characteristic, signal clipping when output swing approaches supply or too large DC input current destroying biasing condition. In fact, linearity is sacrificed for better noise performance and power efficiency, just enough to for a reasonable dynamic range.

#### 2.4 Noise and BER

Noise performance is of great importance to avoid excessive BER in data communication, especially at stages with a small signal swing. Fig. 2.8 shows how error occurs at the decision circuit with incoming NRZ bit-stream and additive amplitude noise. Generated by a random process including a huge number of independent events, random noise's probability density is a normal distribution function. As an unbounded function, noise pushes the signal in the wrong direction crossing the decision threshold with a probability



Figure 2.8: Errors generated by additive noise.

$$Q(x) = \int_{x}^{+\infty} \frac{1}{\sqrt{2\pi}} e^{-\mu^{2}/2} du,$$
  

$$\mu = \frac{V_{1} - V_{0}}{2\delta_{n}}$$
(2.5)

which is the integrated areas of normal distribution's long tail and equals jitter-free BER. A few numerical values of Q-function are provided in Table 2.1, calculated by MATLAB's function qfun( $\mu$ ). BER =  $10^{-12}$  requires  $V_1 - V_0$  roughly equals to  $14\delta_n$ , and will change by 3 order of magnitude if  $V_1 - V_0$  changes by  $2\delta_n$ .

| $\mu$    | 1          | 2          | 3          | 4          |
|----------|------------|------------|------------|------------|
| Q or BER | 0.1587     | 0.0228     | 0.0013     | 3.1671e-05 |
| $\mu$    | 5          | 6          | 7          | 8          |
| Q or BER | 2.8665e-07 | 9.8659e-10 | 1.2798e-12 | 6.2210e-16 |

Table 2.1: Numerical value of Q-function or BER.

#### 2.5 Jitter's Impact on BER



Figure 2.9: Simulated BER curve with different jitter levels.

Clock's timing uncertainty, often referred as jitter, is another way to generate error. Even with noise-free ideal square waveform, there's error when unbounded jitter is so large that adjacent bit is sampled incorrectly like a random guess. Real-world eye-height is higher in the middle of eye-diagram generating better BER, while reduced eye-height introduces more errors with early or late sampling phase.

It's clear that BER with jitter is the weighted average of jitter-free BER at each nominal sampling phase, with weighting coefficient determined by total jitter profile. Jitter can either introduce power penalty of a few dB shown in Fig. 2.9 (a) or infinity shown in Fig. 2.9 (b) when it's impossible to reach the required  $10^{-12}$  BER due to higher jitter level. The following BER simulation method is summarized in Fig. 2.10 based on our previous discussion in this Chapter.

Step (a), generate eye-diagram.

Step (b), generate worst-case or statistic signal swing vs sampling phase over one UI range.Step (c), calculate jitter-free BER based on signal swing and simulated noise rms value.



Figure 2.10: BER simulation method.

Step (d), calculate BER with jitter using jitter's profile as weighting coefficient.

### 2.6 Conclusion

In this chapter, we discussed basic concepts related to signal integrity and BER in optical communication. Based on that, a BER simulation method is provided to gain insights of their impacts.

#### 3. LOW-NOISE OPTICAL RECEIVER DESIGN \*

Optical transceiver with better noise performance requires less laser source power to achieve a given BER, while its own power consumption must also be minimized to improve the full link power efficiency.

Unlike linear/non-linear distortion that comes equally from all the stages along the communication channel, noise matters most during weak signal detection at receiver's front-end. Channel gain's effect becomes clear if we input-refer all noise in Fig. 3.1. Even with more noise  $V_{n,TX}$ added on the transmitter side, channel gain A < 1 amplifies noise contribution from  $V_{n,RX}$ , making receiver the key circuit for noise optimization.



Figure 3.1: Optical transceiver's simplified noise model.

The same argument holds inside the receiver. Optical receiver's input-stage TIA converts input current from photodiode to output voltage swing, usually determining noise performance if high enough signal gain suppresses noise from subsequent stages. In this chapter, we will start with optimal bandwidth and then discuss previously published and our proposed techniques to reduce TIA's input-referred noise density. Noise from DC cancellation loop is also discussed. All these techniques are compatible to make an ultra-sensitive power-efficient optical receiver.

<sup>\*</sup>Part of this chapter is reprinted with permission from "P. Yan et al., "A 12.5 Gb/s 1.38 mW Inverter-Based Optical Receiver in 28 nm CMOS," 2022 IEEE 65th International Midwest Symposium on Circuits and Systems (MWSCAS), Fukuoka, Japan, 2022, pp. 1-4, doi: 10.1109/MWSCAS54063.2022.9859536."

#### 3.1 Optical Receiver's Optimal Bandwidth

As bandwidth impacts both ISI and noise, there's optimal bandwidth to avoid either excessive ISI or excessive noise [3]. Eye-height is a good metric for ISI, and can be generated using statistic simulation or long enough transient simulation to cover all possible scenarios. Fig. 3.2 plots simulated eye-diagrams with 2nd-order Butterworth response of various bandwidths that are normalized to data-rate. As shown in 3.3, it's clear that eye-height remains roughly the same for bandwidth higher than 0.5 data-rate, but the timing range to achieve this eye-height reduces to a single point at 0.5 data-rate bandwidth.



Figure 3.2: Simulated eye-diagrams of 2nd-order Butterworth response of various bandwidths.



Figure 3.3: Simulated eye-heights of 2nd-order Butterworth response of various bandwidths [3].

Assuming a white noise PSD, we come to the conventional optimum 0.6 or 0.7 data-rate bandwidth with jitter margin, widely used in electrical SERDES SNR optimization. But in optical receiver, integrated noise power is proportional to  $\omega_{TIA}^3$  as we discussed previously. Combining Fig. 3.3 and 3.4, bandwidth's impact on optical receiver OMA sensitivity is plotted in Fig. 3.5, suggesting 1dB OMA sensitivity improvement if bandwidth is reduced to 0.5 data-rate also with jitter margin.

#### 3.2 Resistive TIA

A simple resistor is able to carry out current-to-voltage conversion but has several drawbacks. Conversion gain is equal to resistor value, which also determines the input-referred noise density 4kT/R. Due to the capacitive source impedance from photodiode and parasitic, -3dB bandwidth is set by the first-order RC low-pass response. It should be noted that loading capacitance from the subsequent stage is also added to the input node and make things even worse. We have a direct trade-off between noise and bandwidth in this simple structure.



Figure 3.4: Optical receiver's integrated noise with various bandwidths [3].



Figure 3.5: Optical receiver's OMA sensitivity with various bandwidths [3].

#### 3.3 Noise in Shunt-feedback TIA and Transimpedance Limit

Shunt-feedback TIA is one of the most popular ways to break the noise-bandwidth trade-off. Loop gain allows for a higher resistor value for a given bandwidth and so less input-referred noise density from it. We do have extra noise from the amplifier but should not state that it degrades the overall noise performance. Both designs have the same input-referred noise from later stages, which can only be ignored in the shunt-feedback structure due to its higher transimpedance gain. The shunt-feedback structure usually performs significantly better due to extra design freedom and isolation between source and load.



Figure 3.6: Optical receiver front-end noise model.

As shown in Fig. 3.6, feedback resistor  $R_F$  and amplifier are the two major noise sources [13], resulting in the following input-referred noise power spectral density  $\overline{I_{n,in}^2(\omega)}$ .

$$\overline{I_{n,in}^{2}(\omega)} \approx \overline{I_{n,R_{F}}^{2}(\omega)} + \overline{I_{n,amp}^{2}(\omega)}$$

$$= \frac{4kT}{R_{F}} + \frac{\overline{v_{n,amp}^{2}(\omega)}}{R_{F}^{2}} + \omega^{2}(C_{in} + C_{in,amp})^{2}\overline{v_{n,amp}^{2}(\omega)}$$

$$\approx \frac{4kT}{R_{F}} + \omega^{2}(C_{in} + C_{in,amp})^{2}\overline{v_{n,amp}^{2}(\omega)}$$
(3.1)

where  $C_{in}$  is the combined PD and packaging capacitance at the input node.  $C_{in,amp}$  is the amplifier input capacitance.  $\overline{v_{n,amp}^2(\omega)} = 4kT\gamma/g_m$  is the amplifier's input-referred noise power spectral density.  $\gamma$  is the channel-noise factor. The  $\overline{v_{n,amp}^2(\omega)}/R_F^2$  term is omitted as  $4kT/R_F \gg 4kT\gamma/(g_m R_F^2)$  always holds. For a given amplifier structure, both  $C_{in,amp}$  and  $\overline{v_{n,amp}^2(\omega)}$  are determined by the size and power, especially by its input stage.  $C_{in,amp}$  is proportional to size/power, while  $\overline{v_{n,amp}^2(\omega)}$  is inversely proportionally to them. The following optimum value is achieved when  $C_{in} = C_{in,amp}$  [14][15].

$$min. \ \overline{I_{n,in}^2(\omega)} = \frac{4kT}{R_F} + \omega^2 \frac{16kT\gamma C_{in}}{A\omega_A}$$
(3.2)

where  $A\omega_A$  is the gain-bandwidth product which is roughly equal to CMOS technology transition frequency  $\omega_T$  for a single-stage amplifier, but may achieve a higher value in multi-stage amplifier [16][17]. While optimum sensitivity is achieved under equal input and amplifier capacitance, a smaller input stage could be used to trade-off lower power with degraded sensitivity.

Another critical design specification here is the frequency response with proper stability. In high-speed shunt-feedback TIA, it's usually assumed there's at least one amplifier pole  $\omega_A$  which makes a second or higher-order system with RC pole at the input node. To minimize the inputreferred noise density  $4kT/R_F$ , larger  $R_F$  is preferred to place the dominant pole at the input node. TIA's open-loop frequency response is plotted in Fig. 3.7.

 $C_T$  is the total input node capacitance from PD, parasitic, and amplifier.  $\omega_A$  must be placed beyond loop unity frequency  $A/R_FC_T$  due to stability concern. When  $\omega_A = 3A/R_FC_T$ , it's Bessel response with maximally flat group delay and 71.6° phase margin. Most applications prefer  $\omega_A = 2A/R_FC_T$  for Butterworth response which has no peaking in the frequency domain and 63.4° phase margin. Closed-loop frequency response is expressed as follows [14] [16]:

$$Z_T(s) = \frac{-R_F}{1 + \frac{(1+s/\omega_A)(1+sR_FC_T)}{A}},$$
(3.3)

Close-loop TIA bandwidth  $\omega_{TIA} = \sqrt{2}A/R_FC_T = \omega_A/\sqrt{2}$  only when  $\omega_A = 2A/R_FC_T$  to


Figure 3.7: Open-loop frequency response of 2nd-order shunt-feedback TIA.

achieve Butterworth response. The well-known transimpedance limit [18] is derived as follows.

$$R_F = \frac{A\omega_A}{C_T \omega_{TIA}^2} = \frac{\sqrt{2A}}{C_T \omega_{TIA}}$$
(3.4)

which reveals the maximal possible  $R_F$  with gain-bandwidth product  $A\omega_A$ , total input capacitance  $C_T$  and required TIA bandwidth  $\omega_{TIA}$ . It should be noted that this only holds with proper  $\omega_A$  which solely determines  $\omega_{TIA}$  for a given phase margin. As shown in Fig. 3.8, either faster or slower  $\omega_A$  degrades TIA performance. Slower  $\omega_A$  degrades phase margin and introduces ac overshoot, while faster  $\omega_A$  wastes the potential to achieve the transimpedance limit with unnecessary higher phase margin.

Combining (3.4) and (3.2), the optimal input-referred noise power spectral density with  $C_{in} = C_{in,amp}$  is

min. 
$$\overline{I_{n,in}^2(\omega)} = \frac{8kTC_{in}}{A\omega_A}(\omega_{TIA}^2 + 2\gamma\omega^2).$$
 (3.5)

Integrating it with the second-order Butterworth frequency response, the input-referred noise power is proportional to  $C_{in}\omega_{TIA}^3/(A\omega_A)$  [14][3].

This relationship can be understood intuitively for an optical receiver properly designed for



Figure 3.8: Simulated TIA frequency response with different amplifier poles.

a given  $C_{in}$ . If  $C_{in}$  increases by *n* times, the whole front-end can be sized up proportionally to maintain the same phase margin and bandwidth, which means increasing the amplifier's size and reducing  $R_F$  by *n* times. Input-referred noise power increases by *n* times as a result due to the 1/n times transimpedance gain and 1/n times output-referred noise power.

The closed-loop  $\omega_{TIA}$  has much more impact on the receiver's noise performance. Achieving  $n\omega_{TIA}$  requires A/n amplifier gain for a constant  $A\omega_A$ , resulting in  $(1/n^2)R_F$  that generates  $n^2$  times higher noise density, which means  $n^3$  times noise power integrating over  $n\omega_{TIA}$ . The same input-referred noise density from the amplifier also generates  $n^3$  noise power over  $n\omega_{TIA}$ . Thus, the whole input-referred noise power is  $\propto n^3$ , which motivates several noise reduction techniques based on low-bandwidth front-ends.

Increased A with the same  $\omega_A$  implies that  $g_m$  is proportionally higher for the same  $C_{in,amp}$ , resulting in 1/n times input-referred noise power from the amplifier. The  $R_F$  value can also be increased proportionally to maintain the same bandwidth and phase margin, generating 1/n times input-referred noise power.

#### 3.4 Inverter-based TIA with Multi-stage Amplifier

Inverter-based shunt-feedback TIA [19] is popular in high-speed optical receiver design due to current reuse and low supply operation, but its maximum  $R_F$  and noise performance are significantly degraded by the shrinking intrinsic device gain in advanced CMOS technology. It's usually impossible to implement cascode or other gain boost techniques due to low supply voltage. Fig. 3.9 plots the frequency response and input-referred noise with reduced intrinsic device gain.  $R_F$ must be reduced at least proportionally to keep the same closed-loop bandwidth  $\omega_{TIA}$ , while higher  $\omega_A$  requires slightly further reduced  $R_F$  due to higher phase margin. Input-referred noise density from  $R_F$  is increased as a result, with identical amplifier noise density and noise bandwidth.



Figure 3.9: Simulated TIA frequency response with different amplifier pole.

Fig. 3.10 shows the conventional TIA cascaded with a post-amplifier [19] and our proposed TIA with multi-stage amplifier [20] to deal with the low intrinsic gain. Single-stage inverter acts as an amplifier in the conventional TIA, limiting maximum  $R_F$  value. Though the overall transimpedance gain can be easily boosted by the cascaded broadband post-amplifier, reduced  $R_F$  generates unnecessary high input-referred noise and degrades sensitivity as shown in Fig. 3.11.



Figure 3.10: (a) Conventional TIA with post-amplifier and (b) proposed TIA with multi-stage feedback amplifier.



Figure 3.11: Transimpedance and input-referred noise power spectral density of conventional TIA with post-amplifier and proposed TIA.

Our proposed design improves noise performance by placing the transconductance-transimpedance broadband voltage amplifier inside the TIA feedback loop and increasing  $R_F$  proportionally. It's easy to see that these two designs have similar transimpedance, bandwidth and power consumption. Noise density from amplifier are also identical since both are determined by the size and power of input stage  $S_1$ . On the contrary, noise power density from feedback resistance is suppressed by the gain of cascaded amplifier  $A_2 = g_{m2}R_{F2}$ .

Fig. 3.12 shows their small signal models. The basic idea behind the proposed technique is utilizing fast-but-low-gain technology to break the transimpedance limit with higher gain-bandwidth-



Figure 3.12: Small signal models of (a) conventional TIA with post-amplifier and (b) proposed TIA.

product  $A\omega_A$ . The proposed TIA is also equivalent to three-stage inverter-based amplifier for higher DC loop gain and negative feedback, with a local feedback resistor  $R_{F2}$  to adjust DC loop gain and phase margin. Without  $R_{F2}$ , we do have the maximum possible DC loop gain in a threestage structure but also a limited TIA bandwidth  $\omega_{TIA}$  for a given phase margin due to the three roughly equal non-dominant poles. Adding  $R_{F2}$  parallel to the third stage  $S_3$  reduces DC loop gain and pushes two poles at  $S_2/S_3$ 's output node to higher frequency, generating the following frequency response.

$$A_{2}(s) = \frac{g_{m2}R_{F2}}{1+1/L(s)}$$

$$L(s) = \frac{(g_{m3}R_{F2}-1)\frac{r_{ds2}}{R_{F2}+r_{ds2}}\frac{r_{ds3}}{R_{F2}+r_{ds3}}}{[1+s(r_{ds2}||R_{F2})C_{L2}][1+s(r_{ds3}||R_{F2})C_{L3}]}$$
(3.6)

The overall frequency response of the conventional and proposed front-end designs are expressed as

$$Z_{T,conv.}(s) = \frac{-g_{m2}R_{F2}R_F}{1 + \frac{1}{A_{conv.}(s)} + \frac{sR_FC_T}{A_{conv.}(s)}}$$

$$A_{conv.}(s) = \frac{(g_{m1}R_F - 1)r_{ds1}}{[1 + s(r_{ds1} || R_F)C_{L1}](R_F + r_{ds1})}$$

$$Z_{T,proposed}(s) = \frac{-g_{m2}R_{F2}R_F}{1 + \frac{1}{g_{m2}R_{F2}A_{proposed}(s)} + \frac{sR_FC_T}{A_{proposed}(s)}}$$

$$A_{proposed}(s) = \frac{g_{m1}r_{ds1}}{1 + sr_{ds1}C_{L1}} \approx A_{conv.}(s), \text{ when } R_F \gg r_{ds1},$$
(3.7)

both with dominant pole  $1/(R_F C_T)$  and 1st non-dominant pole  $\omega_A$  at the input and output node of  $S_1$ , respectively. With proper choice of  $R_{F2}$ , phase shift due to 1/[1+1/L(s)] and  $\omega_A$  can be 26.6° to achieve the same 63.4° phase margin as in a Butterworth response. Better noise performance is achieved without any extra power or silicon area. The proposed design releases the full potential of advanced CMOS technology and provides a better trade-off between bandwidth and noise.

## 3.5 Noise Reduction Techniques Comparison

Several noise reduction techniques were proposed to cope with the inherent transimpedance limit in recent publications. Table 3.1 summarizes their noise reduction performance and cost. Low bandwidth TIA is one of the most popular noise reduction techniques, as input-referred integrated noise power is proportional to  $\omega_{TIA}^3$ . However, as we discussed in the previous chapter, ISI cancellation is required to maintain reasonable eye height and eye-width.

CTLE [13] is the most straight-forward bandwidth recovery technique in analog domain. Though

|                     | CTLE [13]               | DFE [3]<br>or Duobinary[21] | 3D-Integration for less $C_{in}$ [22] | TIA with<br>Multi-stage Amp.<br>using Inductor [23] |
|---------------------|-------------------------|-----------------------------|---------------------------------------|-----------------------------------------------------|
| $R_F$ Noise Density | Yes                     | Yes                         | Yes                                   | Yes                                                 |
| Amp. Noise Density  | No                      | No                          | Yes                                   | No                                                  |
| Noise Bandwidth     | Compensate<br>Variation | Reduction                   | No                                    | No                                                  |
| Cost                | Extra Power             | Extra Power                 | Integration Effort                    | Extra Area                                          |

Table 3.1: Noise reduction techniques in recent publications.

with higher power compared to conventional broadband TIA design [19], CTLE allows for higher  $R_F$  value and so lower white noise density. It also helps to achieve the optimal noise bandwidth regardless of  $C_{in}$ ,  $R_F$  or amplifier gain PVT variation. On the other hand, DFE [3] is the most effective noise bandwidth reduction technique as decision feedback adds back minimal high-frequency noise during bandwidth recovery in digital domain, but requires significantly more power to meet the critical timing. Duobinary sampling [21] is another way to leverage well-controlled ISI from TIA with 25% bandwidth to data-rate ratio instead of 50% or higher ratio, but requires two slicers and logic gate to resolve each symbol. Its noise performance and power consumption sit between CTLE and DFE. Another drawback is the reduced eye-width and so more power spent on clock for less jitter.

As we see in previous chapter, co-designed photonic IC and 3D-Integration [22] is another promising solution to improve optical receiver's OMA sensitivity at 5 dB/dec with reduced  $C_{in}$ . Front-end power consumption is also proportionally reduced. The main challenge is the 3Dintegration effort to connect EIC and PIC.

Since the integrated noise power is inversely proportional to amplifier's gain-bandwidth product  $A\omega_A$ , which is roughly equal to transit frequency  $\omega_T$  in single stage amplifier, multi-stage amplifier [16] was proposed for higher  $A\omega_A$  and so less noise. However, conventional broadband multi-stage amplifier has roughly equal pole at each stage's output node determined by fan-out factor, requiring unreasonable fast technology even for a medium data-rate [17]. On-chip peaking inductors are added for bandwidth extension, at the cost of silicon area and data-rate density [23].

#### 3.6 Conclusion

In this chapter, we started from optical receiver's optimal bandwidth to balance ISI and noise. Shunt-feedback TIA's frequency response and input-referred noise model are discussed to reach the well-known transimpedance limit, which reveals the maximum achievable noise performance at a given bandwidth and the conditions to realize it. Then we proposed an inverter-based TIA with multi-stage amplifier to break the transimpedance limit with higher gain-bandwidth product, without extra power or silicon area. This technique releases full potential of a fast-but-low-gain CMOS technology and converts its speed advantage to better noise performance, especially suitable for medium data-rate application. Inverter-based current-mode additive CTLE is provided to further reduce TIA's input-referred noise, with less power consumption and better linearity. Noise from DC cancellation loop is also suppressed by a source degeneration resistor. Finally, several noise reduction approaches in recent publications were investigated to compare their effects and costs. All techniques discussed in this chapter are compatible with each other to make an ultra-sensitive power-efficient optical receiver.

## 4. CLOCKING, OPTICAL TRANSMITTER AND OPTICAL DEVICE TUNING

Clocking, optical transmitter and optical device tuning are essential parts of a complete optical transceiver. We will discuss them in this chapter.

### 4.1 Clocking Circuitry

As we see in Chapter 2, low-jitter multi-phase clock is critical for high-speed SERDES to achieve good BER. In this section, we will start from jitter basics and then visit several aspects necessary to generate low-jitter multi-phase clock.

# 4.1.1 Jitter and Phase Noise

Clock's timing uncertainty can be defined as jitter in the time domain, or as equivalent phase noise in the frequency domain. A general clock's jitter is solely determined by the phase noise  $\phi_n$ of its fundamental component  $Acos(\omega_c t + \phi_n)$  with  $\phi_{n,rms} = \omega_c J_{rms}$ , while all the higher order harmonics must have phase shift proportional to  $n\omega_c$  generating the same jitter [4].



Figure 4.1: Current source charging a capacitor [4].

Fig. 4.1 shows how noise alters the timing of a clock's rising edge in the time domain. Except for the deterministic charging signal current, noise current from various sources are also integrated

over capacitor C, randomly shifting the time to reach crossing  $v_{th}$ . Integrated noise voltage at time  $t_d$  is

$$V_{noise} = \frac{1}{C} \int_0^{t_d} I_{noise}(\tau) d\tau, \qquad (4.1)$$

and impacts on the exact time to reach  $V_{th}$ . As a first-order approximation, we can conclude that jitter equals integrated noise voltage  $V_{noise}$  divided by the noise-free signal slope  $I_{signal}/C$ .

$$Jitter \approx V_{noise} / (I_{signal} / C) = \frac{1}{I_{signal}} \int_0^{t_d} I_{noise}(\tau) d\tau$$
(4.2)



Figure 4.2: Noise injected at different time [5].

The effect of capacitor C is canceled, implying jitter can only be reduced by a better SNR.

Even though a steeper signal slope is preferred near crossing for less jitter, we should NOT conclude that more jitter is generated near the peak due to a lower derivative. On the opposite, Fig. 4.2 shows that the same noise introduces maximal phase shift near crossing and zero phase shift

with changed amplitude near peak, which is usually not a problem due to amplitude stabilization everywhere inside clocking circuitry.

When this noisy circuit is placed inside a free-running oscillator, the above jitter gets accumulated in the loop forever and can reach arbitrarily large level if given long enough time.



Figure 4.3: Phase noise/jitter generation in oscillator [5][6].

More accurate Linear Time Variant (LTV) jitter model was proposed in the frequency domain [5][6], as summarized in Fig. 4.3. The time-varying relationship between phase shift and injected noise can be modeled as a unit-less periodic impulse sensitivity function (ISF) with the same frequency  $\omega_c$  [5], which is determined by clock waveform and can be expanded as a Fourier series

$$\Gamma(\omega_c \tau) = c_o + \sum_{n=1}^{\infty} c_n \cos(n\omega_c \tau + \theta_n).$$
(4.3)

Assuming small phase noise/jitter, phase noise  $\phi_n(t)$  is quite linear to complete the Linear Time Variant model.

$$\phi_n(t) = \frac{1}{q_{max}} \int_{-\infty}^t \Gamma(\omega_c \tau) I_{noise}(\tau) d\tau$$

$$= \frac{1}{q_{max}} \left[ \int_{-\infty}^t c_0 I_{noise}(\tau) d\tau + \sum_{n=1}^\infty \int_{-\infty}^t \cos(n\omega_c \tau + \theta_n) c_n I_{noise}(\tau) d\tau \right]$$
(4.4)

Noise current with flicker and white noise components is scaled and moved to low frequency by the multiplication. It should be noted that  $\omega_{corner}$  is always at a lower frequency compared to  $\omega_{1/f}$ depending on coefficient  $c_n$ . If  $c_0 = 0$  in a perfect symmetric design, flicker noise's contribution is completely removed with  $\omega_{corner} = 0$ . As various parameters affect ISF, symmetry for  $c_0 = 0$ should be obtained by the simulation method. The integration provides another -20 dB/dec slope to generate phase noise  $\phi_n(t)$  plotted in Fig. 4.3 (c), with extra white noise from the output buffer.

Since phase is not a directly measurable parameter, power spectrum is widely used to quantify phase noise. The associated phase modulation is a non-linear process but can be linearized for reasonably small  $\phi_n$  as follows,

$$Acos(\omega_c t + \phi_n) = Acos(\omega_c t)cos(\phi_n) - Asin(\omega_c t)sin(\phi_n)$$
  

$$\approx Acos(\omega_c t) - Asin(\omega_c t)\phi_n,$$
(4.5)

which is equivalent to amplitude modulating orthogonal carrier  $Asin(\omega_c t)$  added to ideal carrier  $Acos(\omega_c t)$ , generating noise sidebands near carrier frequency  $\omega_c$  as shown in Fig. 4.3 (d).

## 4.1.2 Forwarded Clock Architecture

Embedded and forwarded clock architecture shown in Fig. 4.4 in are frequently used in highspeed data link [7]. Extra channel resources and power spent on the clock channel are amortized among multiple data channels, making forwarded clocking more suitable for power-efficient multichannel optical transceiver with high aggregate data rate. It also simplifies clock recovery for power saving and provides phase noise cancellation. With matched delay, jitters from the same source going through different paths cancel each other.



Figure 4.4: (a) Embedded and (b) forwarded clock architecture [7].



Figure 4.5: Optical transceiver's forwarded clock diagram.

As the optical transceiver's forwarded clocking diagram shown in Fig. 4.5, TX ILO generates a multi-phase clock to drive multiple TX channels, based on a differential reference clock. Each TX channel shares the same design,  $\phi_{ILO,TX}$  and filtered  $\phi_{REF}$  to maximize their correlation. One of the TX channels sends out fixed pattern data as forwarded clock with channel dependant  $\phi_{CLOCK}$ , while all the other TX channels transmit real random data with their own  $\phi_{DATA}$ . RX ILO converts the received differential clock to a multi-phase clock again, before sending them to slicers of all RX channels through the clock distribution network.  $\phi_{OUT}$  represents the total phase noise difference between the received clock and data, with the following expression.

$$\phi_{OUT} = \left[\frac{\phi_{REF}}{1 + j\omega/\omega_{ILO,TX}} + \phi_{ILO,TX}\right] \left(1 - \frac{e^{-j\omega t_d}}{1 + j\omega/\omega_{ILO,RX}}\right) - \frac{e^{-j\omega t_d}}{1 + j\omega/\omega_{ILO,RX}} \phi_{CLOCK} - \phi_{ILO,RX} - \phi_{DISTRIBUTION} + \phi_{DATA}$$
(4.6)

Forwarded clocking scheme provides extra filtering for  $\phi_{ILO,TX}$  and especially for  $\phi_{REF}$ , relaxing their phase noise requirement. Fig. 4.6 shows simulated jitter transfer functions with different  $t_d$ , which mainly comes from extra delay in RX ILO and clock distribution network but is also affected by any other discrepancy between clock and data channel. Jitter from  $\phi_{REF}$  is minimized when  $t_d$  = T compensating phase shift from RX ILO, slightly different from the broadband con-



Figure 4.6: Simulated jitter transfer function with different  $t_d$ ,  $\omega_{ILO,TX} = \omega_{ILO,RX} = 0.1\omega_c$ .

clusion that  $t_d$  should be zero for the perfect matched delay. Multi-phase clock with T = nUI can NOT achieve this optimal  $t_d$  simultaneously for all its phases but can have a delay between 0.5T and 1.5T to optimize overall jitter performance. Even if  $t_d$  is further off this optimal range by a few UI due to mismatch, band pass jitter filtering in the forwarded clocking scheme still works reasonably well, allowing for an inexpensive reference clock source.

## 4.1.3 Jitter Amplification Due to Limited Signal Bandwidth

Clock distribution is another challenge in low-jitter design. It's well known that there's jitter amplification due to limited bandwidth [7], and can be derived based on signal transfer function as follows. Input clock with single-tone jitter  $\phi_{n,in}(t) = \phi_{n,amp} sin(\Delta \omega t)$  has time domain expression in Eq. (4.7) and equivalent frequency components in Eq. (4.8).



Figure 4.7: Calculated and simulated jitter transfer function of first-order LPF with various bandwidths.

$$clock_{in}(t) = Ae^{j[\omega_{c}t + \phi_{n}(t)]} = Ae^{j[\omega_{c}t + \phi_{n,amp}sin(\Delta\omega t)]}$$

$$= Ae^{j\omega_{c}t}[cos(\phi_{n,amp}sin(\Delta\omega t)) + jsin(\phi_{n,amp}sin(\Delta\omega t))]$$

$$\approx Ae^{j\omega_{c}t}[1 + j\phi_{n,amp}sin(\Delta\omega t)], for small \phi_{n,amp}$$

$$= Ae^{j\omega_{c}t}[1 + \phi_{n,amp}\frac{e^{j\Delta\omega t} - e^{-j\Delta\omega t}}{2}]$$

$$= Ae^{j\omega_{c}t} + \frac{A\phi_{n,amp}}{2}e^{j(\omega_{c} + \Delta\omega)t} - \frac{A\phi_{n,amp}}{2}e^{j(\omega_{c} - \Delta\omega)t},$$

$$= Ae^{j\omega_{c}t} + \frac{A\phi_{n,amp}}{2}e^{j\omega_{H}t} - \frac{A\phi_{n,amp}}{2}e^{j\omega_{L}t},$$
(4.7)

$$clock_{in}(\omega) = 2\pi A\delta(\omega - \omega_c) + \pi A\phi_{n,amp}\delta(\omega - \omega_H) - \pi A\phi_{n,amp}\delta(\omega - \omega_L)$$
(4.8)

When it's applied to a linear system, the output clock becomes

$$clock_{out}(\omega) = 2\pi A\delta(\omega - \omega_c)H(j\omega_c) + \pi A\phi_{n,amp}\delta(\omega - \omega_H)H(j\omega_H) - \pi A\phi_{n,amp}\delta(\omega - \omega_L)H(j\omega_L)$$
  
$$= 2\pi A\delta(\omega - \omega_c)H(j\omega_c) + \pi A\phi_{n,amp}\frac{H(j\omega_H) + H(j\omega_L)}{2}[\delta(\omega - \omega_H) - \delta(\omega - \omega_L)]$$
  
$$+ \pi A\phi_{n,amp}\frac{H(j\omega_H) - H(j\omega_L)}{2}[\delta(\omega - \omega_H) + \delta(\omega - \omega_L)],$$
  
(4.9)

which has both amplitude modulation and phase modulation components. Ignoring the amplitude modulation component due to ubiquitous amplitude stabilization, the phase modulation component generates the following jitter transfer function,

$$JTF(\Delta\omega) = \frac{H(j\omega_H) + H(j\omega_L)}{2H(j\omega_c)} = \frac{H(j(\omega_c + \Delta\omega)) + H(j(\omega_c - \Delta\omega))}{2H(j\omega_c)}.$$
 (4.10)

To validate the narrow-band linear approximation used in Eq. (4.7), we directly simulate JTF of a single pole LPF  $H(j\omega) = 1/(1 + j\omega/\omega_p)$  and compares it against the calculation based on Eq. (4.10). Results with various bandwidth are plotted in Fig. 4.7, indicating little discrepancy between simulation and calculation. Clock distribution network should have reasonable signal bandwidth to avoid excessive jitter amplification.

## 4.1.4 Multi-phase Clock Generation

DLL and ILO are two popular ways to convert differential input clock to multi-phase clock. DLL locks phase of its output clock against the input clock by means of negative feedback in phase domain, with advantage of inherent stability [24][25]. DLL is also preferable due to significantly less phase noise. Phase noise from each stage deviates passing clock edge only once, as opposed to the accumulated phase noise in oscillator.



Figure 4.8: 8-phase clock generation based on ring oscillator DLL.

Fig. 4.8 shows an implementation of 8-phase clock generation based on ring oscillator DLL. Cascaded pseudo-differential delay stages introduce total delay equal to one clock period, controlled by supply from feedback loop. Cross-coupled inverters are added only to guarantee oscillation with even stage number, and should not be excessive large to avoid extra phase noise and power during transition edge [26]. Assuming equal stage delay, which only holds with large enough transistor size for small variation, we have equally spaced clock phases.

ILO is another multi-phase clock generation method, with one implementation shown in Fig. 4.9. The same cascaded pseudo-differential delay stages are connected to form a ring oscillator, with input clock injected at two phases. Variable size inverter is used to control injection strength, which compensates the difference between input clock frequency and ring oscillator's free running frequency in locked condition.

Various approximations were proposed to model ILO's characteristic [27][28][29][30][31]. Here we present a linearized phase domain model in Fig. 4.10 for intuitive understanding. ILO's injected stage acts as phase interpolator controlled by injection strength  $K_i$ , while all stages' delay



Figure 4.9: 8-phase clock generation based on ring oscillator ILO.



Figure 4.10: ILO's linearized phase domain model.

are combined into a single delay cell in the feedback loop. Since phase shift due to frequency discrepancy is cancelled in locked condition, only excess phase noise is included here.

When  $K_i = 0$  in free running mode, phase noise  $\phi_n$  persists in the free running loop forever, generating accumulated phase noise  $\phi_{osc} = \phi_n/(z-1)$ . When  $0 < K_i < 1$  in locked mode, total transfer function becomes

$$\begin{split} \phi_{out} &= \frac{K_i}{z - (1 - K_i)} \phi_{inj} + \frac{1}{z - (1 - K_i)} \phi_n \\ &= \frac{K_i}{z - (1 - K_i)} \phi_{inj} + \frac{z - 1}{z - (1 - K_i)} \phi_{osc} \\ &\approx \frac{1}{1 + sT/(2K_i)} \phi_{inj} + \frac{sT/(2K_i)}{1 + sT/(2K_i)} \phi_{osc}, \text{ with } z = e^{sT/2} \approx 1 + sT/2, \end{split}$$
(4.11)

indicating input phase noise is low pass filtered and oscillator's phase noise is high pass filtered with the same pole  $\omega_p \approx 2K_i/T$ , the same result as derived in [32][33][28]. In low-power application,  $\phi_{osc}$  is usually larger than  $\phi_{inj}$ , requiring larger injection strength  $K_i$  for higher jitter tracking bandwidth and lower  $\phi_{out}$ . But extra jitter is still added during the multi-phase clock generation. On the other hand, lower injection strength  $K_i$  is preferred with  $\phi_{osc} < \phi_{inj}$  and  $\phi_{out} < \phi_{inj}$  used in high-power low-jitter application.

One drawback using ILO is the systematic phase mismatch between injected stage and uninjected stages, in a general condition when input clock frequency and ILO's free running frequency are different. DLL was proposed to eliminate the phase mismatch by aligning the free running frequency with the input clock [34].

#### 4.2 Optical Transmitter and Optical Device Tuning

Optical modulation can be performed either by Direct Modulation or External Modulation [14]. External modulation is preferred in high data-rate application for better optical pulse, in spite of increased complexity. In this section, we will start with optical transmitter with MZM or micro-ring resonator, the most popular external modulation methods, and then focus on micro-ring resonant wavelength stabilization loop implemented inside CMOS chip.

### 4.2.1 Optical Transmitter

MZM shown in Fig. 4.11 is a widely used off-chip external optical modulator, which splits incoming laser into two parts, shifts their phase and then recombines them. Assuming equal optical power splitting, we have the following model equation:



Figure 4.11: MZM modulator structure.

$$E_{out} = \frac{E_{in}}{2} \left( e^{j \frac{\pi V_{mod}}{4V_{\pi}}} + e^{-j \frac{\pi V_{mod}}{4V_{\pi}}} \right) = E_{in} \cos\left(\frac{\pi V_{mod}}{4V_{\pi}}\right).$$
(4.12)

As electro-optic phase shift is proportional to optical arm length, we need long physical length or cascading MZM [35] to reduce  $V_{\pi}$ , resulting large loading capacitance. Required driving power is determined by this loading capacitance and electrical swing  $V_{mod}$ . Transmitter's driving stage and all preceding stages must also be sized proportionally for speed concern.

Compared to MZM, small footprint micro-ring resonator shown in Fig. 4.12 is more suitable for on-chip multi-channel optical transceiver implementation. It is made up of circular optical waveguide coupling to one or more closely placed linear waveguide. Incoming laser from In port can be coupled to the ring, and then have constructive or destructive interference in the circle depending on optical round trip length.

A special case called critical coupling happens when circle's optical length is an integer multiple of laser's wavelength, defined as the micro-ring's resonant wavelength, with complete constructive interference in the circle. At this resonance condition, minimal laser power reaches Thru port, while most laser power is coupled into the circle and gets lost or coupled to Drop port afterward, generating spectra transfer function shown in Fig. 4.13. Optical modulator can be built between the In port and Thru port, using electrical voltage swing applied to the micro-ring to change circle's optical length and so resonant wavelength as shown in Fig. 4.14. Optical transmitter's power is reduced because of lower electrical swing from more efficient electro-optic modulation and lower loading capacitance from compact micro-ring layout.

Except electro-optic modulation, cascading micro-ring resonators also provide multiplexing



Figure 4.12: Micro-ring resonator structure.



Figure 4.13: Micro-ring resonator's spectra transfer function.

and de-multiplexing operation, offering a complete efficient WDM solution bridging optical channel's THz bandwidth and electrical circuit's GHz running speed. Each micro-ring is properly sized to center at slightly different nominal resonant wavelengths, equivalent to band-stop comb filter



Figure 4.14: Micro-ring resonator acts as external optical modulator.

at Tru port of the transmitter micro-ring and band-pass comb filter at Drop port of the receiver micro-ring.

However, this resonant wavelength is sensitive to micro-ring's fabricated size, material and operating temperature. Fig. 4.15 shows significantly reduced optical swing with unaligned laser and resonant wavelength [8]. Similar argument holds on the receiver side, when incoming laser is filtered out by unaligned band-pass micro-ring or picked by a wrong channel micro-ring. Resonant wavelength stabilization is required both on the transmitter side and receiver side.

## 4.2.2 Micro-ring Resonator Wavelength Stabilization

Fig. 4.16 shows a resonant wavelength stabilization loop on the receiver side, searching for maximal Drop port average power. Even though average power sensing works perfectly for optical receiver, as optical swing is proportional to average power for a fixed extinction ratio, there are two drawbacks to use the same circuit on the the transmitter side.

The first problem comes from micro-ring's two possible lock points using average power sensing, when optical modulator transmits data, as shown in Fig. 4.17 [8]. Optical swing is reduced



Figure 4.15: Micro-ring resonator modulation with different incoming laser wavelength.



Figure 4.16: Low-power electrostatic micro-ring resonator wavelength stabilization.

when locked to the lower Q point, with opposite signal polarity. For algorithm searching for local maximal, initial uncompensated resonant wavelength must on the higher Q side, which is shorter wavelength side in Fig. 4.17, to avoid this problem.



Figure 4.17: Micro-ring's two possible lock points when used as optical modulator [8].

Another less severe problem is the discrepancy between maximal optical swing and average power with varying extinction ratio on transmitter side. Theoretical maximal optical power swing is achieved when the derivatives of red and blue curve in Fig. 4.17 equal, slightly to the left of maximal average power point. However, the difference is negligible with Q value around 5000 or higher.

Fig. 4.18 shows schematic of 6-bit current SAR ADC, sensing low-speed average input current. Going through a NMOS current mirror, it competes with pull-up current from a 6-bit DAC controlled binary search logic, charging or discharging the parasitic capacitor to make decision. 6-bit control registers are set as 100000 at the beginning of the 6-comparison cycle, and will be updated according to each comparison result during binary search procedure. After 6 comparisons, registers' value is sent out as ADC output code before another 6-bit comparison cycle starts.

ADC's 6-bit output code is accumulated over 64 samples for noise filtering, before their downsampled 12-bit sum is sent to tuning logic shown in Fig. 4.19. Comparison between current and previous results tell whether the input current is increasing or decreasing. Algorithm described in



Figure 4.18: Current-mode 6-bit SAR ADC schematic.



Figure 4.19: Tuning logic diagram.

Table 4.1: Decision algorithm table.

| Current Input     | Action               |  |
|-------------------|----------------------|--|
| input < threshold | Flip Previous Output |  |
| otherwise         | Keep Previous Output |  |

Table 4.1 outputs +/- 1 or 0, adjusting the 12-bit output code to reach a local maximal point, with a digital parameter threshold determining the logic's sensitivity to a single error. Bandwidth limiting

accumulator provides unity gain at  $\omega_{unity\ gain} = 1/T$ , for stability concern with optical domain pole  $\omega_{nd}$  usually around a few hundred kHz. Total DC loop gain  $A_{DC}$  pushes the whole loop's unity gain frequency to  $A_{DC}/T$ , which must equal to  $\omega_{nd}/2$  for 2nd-order Butterworth response, assuming negligible delay and phase shift from other pole/zero. The 2nd-order Butterworth closedloop tracking bandwidth becomes  $\omega_{nd}/\sqrt{2}$  with  $T = 2A_{DC}/\omega_{nd}$ , quite flexible in the lab with proper logic running speed. Better phase margin but lower tracking bandwidth is possible with slower logic speed.



Figure 4.20: (a) Sigma-Delta modulator and (b) its equivalent z-domain model.

Sigma-Delta modulator shown in Fig. 4.20 is added to convert 12-bit output code to 6-bit output code for better linearity and reduced DAC's complexity. Even though lower 6 bits are thrown away, equivalent to adding extra quantization error  $Q_n$ , negative feedback guarantees un-



Figure 4.21: Charge-pump amplifier schematic.



Figure 4.22: (a) Simulated normalized average drop port power and (b) 6-bit tuning code of 2 cascaded micro-rings and tuning loops.

distorted low frequency component with the following transfer function,

$$D_{out} = z^{-1}D_{in} + (1 - z^{-1})Q_n, (4.13)$$

indicating  $D_{in}$  is delayed by one period with almost cancelled low-frequency component of  $Q_n$ , while high-frequency component of  $Q_n$  will be filtered out by optical pole  $\omega_{nd}$ .

Fig. 4.21 shows the low-power charge-pump amplifier driving capacitive loading with 1.8V

swing. Capacitive feedback path sets DC gain = 5, with zero DC power consumption. DC leakages parallel to these capacitors and loading capacitor are compensated once in a while with refreshing charging current from charge-pump.

Fig. 4.22 shows simulation results of 2 cascaded micro-rings and tuning loops which are plotted in Fig. 4.16. Starting at a random initial resonant wavelength, tuning logic of channel 1 runs first, searching for its maximal average drop port power, while channel 2's logic stays idle. After channel 1's target is reached, 1-bit flag signal is sent to channel 2 starting its tuning procedure. Tuning logics can always run in the background to resist any thermal aggressor, or can be turned on once in a while for power saving. After We turned off tuning logic after 50 ms in this simulation, only sigma delta logic is still running. More micro-rings and tuning loops can be connected in series for higher aggregate data rate, but require well controlled optical device to pair incoming laser with correct transceiver channels.

## 5. A 12.5 GB/S WIRE-BONDED OPTICAL RECEIVER \*

In this Chapter, We will present a low-noise low-power optical receiver fabricated in 28 nm CMOS technology, wire-bonded to a discrete InGaAs PD. Measurement shows improved noise performance with less power consumption, thanks to inverter-based TIA with multi-stage amplifier and cascading CTLE.

## 5.1 Low-noise Front-end



Figure 5.1: Inverter-based optical receiver.

Fig. 5.1 shows the single-ended all-inverter-based optical receiver architecture. A low-bandwidth TIA converts the single-ended PD input current to a single-end output voltage that is equalized by the CTLE. An RC low pass filter generates a pseudo-differential signal for the four slicers driven by 4-phase quarter-rate clocks. A DC cancellation loop eliminates the low-frequency component

<sup>\*</sup>Part of this chapter is reprinted with permission from "P. Yan et al., "A 12.5 Gb/s 1.38 mW Inverter-Based Optical Receiver in 28 nm CMOS," 2022 IEEE 65th International Midwest Symposium on Circuits and Systems (MWSCAS), Fukuoka, Japan, 2022, pp. 1-4, doi: 10.1109/MWSCAS54063.2022.9859536."

of the input current and stabilizes output common-mode voltage.

### 5.1.1 Low-bandwidth TIA with Multi-stage Amplifier

Low input capacitance is critical in optical receiver for optimal noise performance. Inverterbased gain stage is widely used due to transconductance provided by both PMOS and NMOS with less input capacitance and higher gain-bandwidth product. The best achievable noise performance requires input-stage inverter to be properly sized for the same input capacitance as the total combined input capacitance from the PD and bonding pads [15]. Even though larger size inverter offers less input-referred noise voltage, its more input capacitance generates higher TIA input-referred noise current. It should never be used due to higher power consumption with more noise. On the other hand, lower than optimal inverter size may be preferred in low power application, for less power consumption with more noise.

As discussed in Chapter 3, multi-stage inverter-based amplifier offers higher-gain bandwidth product and so reduce input referred noise from  $R_F$ , with the same front-end transimpedance gain and bandwidth. Three-stage design is used for negative feedback avoiding too many bandwidth limiting poles at each stages' output node. Second and third stage inverters should be sized according to speed requirement, as noise from them is suppressed by signal gain. Local feedback resistor  $R_{F2}$  is added to reduce loop gain and push two dominant poles to higher frequency, extending bandwidth for a given phase margin. Through proper choice of the  $R_{F2}$  value, the same  $63.4^{\circ}$  phase margin can be achieved as in a second-order Butterworth response.

## 5.1.2 Inverter-based CTLE

As we know, low-bandwidth TIA improves optical receiver's OMA sensitivity by 15dB/dec with decreasing  $\omega_{TIA}$ . On the other hand, NRZ sampling requires 50% or higher data-rate overall bandwidth to keep eye-height, necessitating equalization to avoid excessive ISI [21]. Though not as good as high-order DFE [3], which adds minimal noise during bandwidth recovery and so suppresses noise from both  $R_F$  and amplifier, CTLE only suppresses noise from  $R_F$  but doesn't need power-hungry fast decision feedback loop and is suitable for low power application.



Figure 5.2: Inverter-based CTLE schematic, small signal model, and noise reduction via TIA input stage bandwidth reduction.

Subsequent inverter-based CTLE [36][12] is utilized to further increase TIA feedback resistance by a factor of n=2.5 in this design as shown in Fig 5.2. Low-power additive CTLE is more suitable here, thanks to the relaxed linearity requirement in NRZ signaling. At LF coupling capacitor  $C_C$  blocks the bottom path, leaving top-path  $g_{m1}$  alone setting input transconductance, while at high-frequency (HF) both  $g_{m1}$  and  $g_{m2}$  drive the combined loading in both paths. The CTLE utilizes 2-bit control through the EN transistors to achieve the desired frequency response, with a relatively stable peaking gain  $n = (g_{m1} + g_{m2})/2g_{m1}$  that is determined by the input stage transistors' size ratio. A  $g_{mL}$  loading is also added in the bottom path to limit its local voltage swing for reasonable linearity. First-order high-pass response produced with  $g_{m2} > g_{m1}$  can not perfectly cancel the second-order slope from the preceding low-bandwidth TIA. Adding series resistors to the gate path of the output load stage solves this problem by realizing an active inductor [37], generating a flatter frequency response. The resulting transfer function is given by

$$H(s) = -\frac{g_{m1}}{g_{mL}} \frac{1 + s/\omega_z}{1 + s/\omega_p} P(s),$$
  

$$\omega_z = \frac{g_{m1}g_{mL}}{(g_{m1} + g_{m2})C_c}, \omega_p = \frac{g_{mL}}{2C_c} = \frac{g_{m1} + g_{m2}}{2g_{m1}} \omega_z$$
  

$$P(s) = \frac{1 + s/\omega_{z,ind}}{1 + 2\zeta(s/\omega_n) + (s/\omega_n)^2}$$
  

$$\omega_{z,ind} = \frac{1}{RC}, \omega_n = \sqrt{\frac{2g_{mL}}{RCC_{Load}}}, \zeta = \frac{\sqrt{2}}{4} (\sqrt{\frac{RCg_{mL}}{C_{Load}}} + \sqrt{\frac{C_{Load}}{RCg_{mL}}}).$$
(5.1)

where C consists of combined NMOS/PMOS gate capacitance and parasitic capacitance. It works with the added R to attenuate LF component, creating an active inductor  $L = RC/g_{mL}$  and a series resistor  $1/g_{mL}$ . The active inductor adds a pole-zero pair at the output node, represented by P(s). P(s) becomes a second-order Butterworth LPF with a  $\sqrt{2}g_{mL}/C_{Load}$  bandwidth and a zero at  $g_{mL}/C_{Load}$ , when  $R = C_{Load}/(Cg_{mL})$ . Lower R provides less peaking gain at higher frequency with roughly the same bandwidth. Proper choice of  $R = 0.7C_{Load}/(Cg_{mL})$  generating peaking gain in the mid-band compensates the discrepancy between the second-order TIA and first-order CTLE. Compared to conventional design that's equivalent to R = 0, active inductor extends the bandwidth by 1.5 times. Proportional reduction in  $g_{m1}$ ,  $g_{m2}$  and  $g_{mL}$  maintaining the same LF/HF gain and bandwidth, reduces the whole CTLE power consumption by 33% as a result. Power spent on TIA's output stage can also be reduced thanks to CTLE's reduced input capacitance.

An adjustable CTLE power supply is utilized to set the absolute transconductance values to achieve the desired peaking frequency. As TIA and CTLE are both inverter-based and biased near half of the supply for optimal gain, no extra buffer is needed between them. Inverter-based CTLE also makes it possible for less parasitic capacitance and a compact layout. The combined TIA and CTLE layout occupies 65  $um^2$  silicon area.

Fig. 5.3 shows the simulated frequency response, input-referred noise power spectral density with  $C_{in} = 150$  fF, and a 12.5 Gb/s eye-diagram with input OMA = -10.7 dBm. The proposed TIA is intentionally designed with a reduced 2.8 GHz bandwidth that is extended by the subsequent



Figure 5.3: Simulated frequency response, input-referred noise power spectral density, and 12.5 Gb/s differential eye diagram at the slicers' inputs with  $C_{in} = 150$  fF.

CTLE to 7.0 GHz to support the 12.5 Gb/s data-rate. This allows for an extremely high 82  $dB\Omega$  transimpedance gain without excessive ISI. The higher feedback resistance in the proposed TIA with CTLE yields a 2.0  $pA/\sqrt{Hz}$  reduction relative to a conventional broadband TIA with the same power consumption.

## 5.2 DC Cancellation



Figure 5.4: DC cancellation schematic and simulated front-end frequency response over an extended low-frequency range.  $Z_T$  is the front-end's HF frequency response.

Optical receiver's front-end needs a feedback loop to suppress DC input current from PD and provide proper common-mode voltage for the subsequent slicers. As shown in Fig. 5.4, a high-gain OTA, with a pole created by the 0.68 pF Miller compensation capacitor, is utilized to match the DC current flowing through transistor M0 with  $I_{DC}$  and generate the 2 MHz cut-off frequency. A preceding 100 MHz RC LPF is added to relax OTA's input dynamic range since 12.5 Gb/s large-swing signal from CTLE is attenuated by 36 dB. This additional pole doesn't impact stability due

to it being far beyond the cut-off frequency. Reference voltage is locally generated by a diodeconnected PMOS/NMOS pair, which helps to compensate PVT variation.

As discussed in Chapter 3, M0 should use the minimum transistor length and width, just enough to stay in saturation region with the max  $I_{DC}$ . It also helps to minimize any parasitic capacitance added to the TIA's input node, which degrades front-end's noise performance.

# 5.3 Low-voltage Quarter-rate Slicers



Figure 5.5: Schematic of sampling slicer.

The front-end pseudo-differential output signal is sampled by four quarter-rate slicers that are activated by four 90°-spaced clock phases. This quarter-rate operation provides increased slicer regeneration time and allows for powering them with reduced supply voltages. Fig. 5.5 shows the
two-stage slicer circuit [38] that employs minimal device stacking for low-voltage operation. This is followed by a SR latch that holds the sampled data during the slicer reset phase. Optical receiver sensitivity is improved with slicer offset cancellation that is performed with two 5b current DACs that provide programmable discharge currents at the first-stage output nodes during the sampling phase.

## 5.4 Experimental Results



Figure 5.6: (a) Optical receiver layout and chip micrograph and (b) optical test setup.

Fig. 5.6 (a) shows the chip micrograph of the optical receiver, which was fabricated in a 28 nm CMOS technology. The optical receiver is placed directly underneath the pad to reduce parasitic capacitance and occupies 720  $um^2$  total area.



Figure 5.7: (a) Measured BER timing margin curves with OMA = -10.7 dBm and (b) sensitivity curves.

The optical test setup is shown in Fig. 5.6 (b). A 40 Gb/s 0.6 A/W InGaAs PD is wire bonded to the optical receiver input. This results in 150 fF total combined input capacitance from the PD and bonding pads. A 1550 nm laser is connected to a Mach-Zehnder modulator (MZM) that is modulated with 12.5 Gb/s PRBS15 data to produce the optical input signal. A half-rate electrical

clock is supplied to the chip and passes through an injection-locked oscillator to generate the four quarter-rate clock phases for the slicers. The quarter-rate data signals are then multiplexed and driven out of the chip with a CML buffer for BER testing.

Fig. 5.7 shows measured timing bathtub and sensitivity curves at 10 Gb/s, 12.5 Gb/s and 14 Gb/s. The 12.5 Gb/s OMA sensitivity at BER =  $10^{-12}$  is -10.7 dBm with a 0.04 UI timing margin. The optical receiver front-end consumes 1.08 mW from a 1 V power supply and the slicers consume 0.30 mW from a 0.7 V power supply, resulting in a 0.11 pJ/bit power efficiency at the 12.5 Gb/s data rate.

| References         | [23]                 | [21]      | [3]       | [19]             | [39]            | This work            |
|--------------------|----------------------|-----------|-----------|------------------|-----------------|----------------------|
| CMOS technology    | 180nm                | 65nm      | 65nm      | 40nm             | 28nm            | 28nm                 |
| Data rate (Gbps)   | 10                   | 12        | 12        | 10               | 20              | 12.5                 |
| Architactura       | MSA-TIA <sup>1</sup> | TIA+      | Diff. TIA | ТІЛ              | ID <sup>2</sup> | MSA-TIA <sup>1</sup> |
| Architecture       |                      | Duobinary | +DFE      | IIA              |                 | +CTLE                |
| Sampling rate      | No Sampling          | 1/4th     | Half      | Half             | 1/4th           | 1/4th                |
| PD + parasitic cap | > 200                | 160       | 100       | 40.60            | 120             | 150                  |
| (fF)               | >200                 | 100       | 100       | 40-00            | 150             | 150                  |
| PD responsivity    | 1.0                  | 0.8       | 0.75      | 0.7              | 0.5             | 0.6                  |
| (A/W)              | 1.0                  | 0.8       | 0.75      | 0.7              | 0.5             | 0.0                  |
| Power supply (V)   | 1.8                  | —         | -         | 1.0              | 0.95            | 1.0/0.7              |
| Transimpedance     | 70.5                 | 70        | 86        | 72               |                 | 82                   |
| (dBΩ)              | 70.5                 | 19        | 00        | 12               |                 | 02                   |
| Sens. OMA (dBm).   | -18 7 <sup>3</sup>   | -14 1     | -16.8     | -12 <sup>4</sup> | -8.6            | -10.7                |
| $BER = 10^{-12}$   | -10.7                | -17.1     | -10.0     | -12              | -0.0            | -10.7                |
| Normalized         | 10.05                |           |           |                  |                 |                      |
| Sens. OMA (dBm),   | -19.03               | -16.1     | -18.0     | -10.9°           | -15.5           | -14.1                |
| $BER = 10^{-12}$   |                      |           |           |                  |                 |                      |
| Area $(um^2)$      | 780,000              | 88,000    | 120,000   | 7,000            | 5,000           | 720                  |
| Power (mW)         | 81                   | 9.5       | 23        | 3.95             | 10.6            | 1.38                 |
| Power efficiency   | 8.1                  | 0.79      | 10        | 0.40             | 0.53            | 0.11                 |
| (pJ/bit)           | 0.1                  | 0.79      | 1.7       | 0.40             | 0.55            | 0.11                 |

Table 5.1: Performance summary.

1 Multi-stage amplifier TIA

2 Integrate-and-dump

3 Calculated from input-referred noise current =  $0.97 \ \mu A_{rms}$ 

4 Calculated from avg. sensitivity

5 Assume input cap = 200 fF

6 Assume input cap = 50 fF

Table 5.1 summarizes the receiver performance and compares it with other recent CMOS designs that operate between 10-20 Gb/s. Since input optical signal power is proportional to  $\sqrt{C_{in}\omega_{TIA}^3}/R_{PD}$  for a given SNR, the OMA sensitivity is normalized for a fair comparison between the different design techniques.

Normalized OMA Sens. = OMA Sens. - 
$$5log_{10}(\frac{C_{in}}{100 \ fF})$$
  
-  $15log_{10}(\frac{Data \ rate}{12 \ Gb/s}) + 10log_{10}(\frac{R_{PD}}{1 \ A/W})$  (5.2)

The proposed design improves upon the normalized OMA sensitivity relative to the conventional inverter-based TIA with a single-stage amplifier [19]. While the integrate-and-dump [39], duobinary-signaling design [21] and pseudo-differential TIA with 4-tap DFE [3] achieve better normalized OMA sensitivity, these designs consume significantly more power on clocked integration stages, extra slicers and logic gates, and fast decision-feedback circuitry, respectively. The best sensitivity is achieved with the multi-stage amplifier TIA [23] due to bandwidth extension with large area on-chip peaking inductors, high power consumption to minimize amplifier noise, and the lack of on-die slicers, which can lead to an optimistic estimate of the receiver sensitivity that is set by the BER tester. Overall, the proposed all-inverter-based optical receiver with multi-stage feedback TIA and continuous-time linear equalizer achieves adequate sensitivity and provides both more than 3.6X improvement in power efficiency and 6.9X improvement in area.

## 5.5 Conclusion

In this chapter, we presented a 12.5 Gb/s all-inverter-based optical receiver with a multi-stage TIA feedback amplifier that is suitable for high-speed low-gain nanometer CMOS technologies. This multi-stage amplifier technique suppresses feedback resistor noise without extra power consumption and is compatible with other noise reduction techniques. Significant power efficiency improvement is achieved with a subsequent inverter-based active inductor CTLE that provides frequency peaking to compensate for ISI from the low-bandwidth TIA. Overall, the all-inverter-based

optical receiver achieves ultra-low power and area consumption, making it suitable for the high bandwidth-density optical interconnects required in future systems.

## 6. A 32-CHANNEL 3D-INTEGRATED OPTICAL TRANSCEIVER \*

In this Chapter, We will present a 32-channel 3D-integrated optical transceiver consisting of a silicon photonic IC (PIC) and our flip-chip bonded EIC chip in 12 nm CMOS technology. Measurement from fabricated prototype shows significant improvement both in sensitivity and power efficiency, compared to recently published optical transceivers running at similar data-rate.

# 6.1 Optical Transceiver Architecture

3D-integrated silicon photonic is a promising solution to ever-increasing data-rate with better power efficiency and data-rate density when copper can no longer support it. As we discussed earlier, less parasitic capacitance reduces the optical transmitter's loading and improves the optical receiver's noise performance with lower power consumption, favorable for energy-efficient optical transceivers.



Figure 6.1: PIC-EIC 3D-integration approach [9].

<sup>\*</sup>Part of this chapter is reprinted with permission from "A. Samanta et al., "A Direct Bond Interconnect 3D Co-Integrated Silicon-Photonic Transceiver in 12nm FinFET with -20.3dBm OMA Sensitivity and 691fJ/bit," 2023 Optical Fiber Communications Conference and Exhibition (OFC), San Diego, CA, USA, 2023, pp. 1-3, doi: 10.1364/OFC.2023.M3I.4."

In this work, we co-design CMOS optical transceiver to interface silicon photonic using a hybrid bonding approach shown in Fig. 6.1. The 5.5 mm x 7.5 mm PIC provides micro-ring modulator with a tuning heater, add-drop filter, SiGe photodetector and also acts as an active interposer between the 1.5 mm x 1.5 mm EIC and the outside PCB. Independent performance optimization and design flexibility are possible for separate PIC and EIC dies.

#### 6.2 Power-efficient Optical Transmitter



Figure 6.2: Optical transmitter block diagram [10].

The proposed 32-channel transmitter architecture is given in Fig. 6.2, including 31 data channels and 1 forwarded clock channel. All 32 transmitter channels share the same design and layout to maximize their correlation for jitter cancellation, while one of them is configured to output 1100 fixed pattern data as a quarter-rate forwarded clock.

Per-channel ILO generates a 4-phase clock, based on a differential reference clock passing through the clock distribution network. Clock quadrature error correction (QEC) and duty cycle correction (DCC) are performed before they reach each transmitter channel. 8 parallel bit-streams from either off-chip FPGA or on-chip PRBS15 generator are first serialized into 4 quarter-rate

bit-streams by the 8:4 serializer and then are combined into a single full-rate bit-stream by the 4:1 Mux. Properly sized inverter-based output stages are ac-coupled to drive 5.5 fF capacitance from the bonding pad and 35fF capacitance from the micro-ring modulator with a 1.2  $V_{pp}$  differential electrical swing. Two 1.5 pF coupling capacitors are used in each channel to minimize related capacitive dividing, while 21.3 k $\Omega$  resistors bias micro-ring in the depletion region and set cut-off frequency around 5 MHz to avoid excessive DC wandering. Integrated heating resistor inside micro ring stabilizes micro-ring resonant wavelength, with tuning circuitry in EIC, on transmitter and receiver side.

### 6.3 Ultra sensitive Optical Receiver



Figure 6.3: Optical receiver block diagram [10].

Fig. 6.3 shows 32-channel receiver architecture. Multi-wavelength laser travel along optical fiber together before each laser is picked by one of the cascaded receiver micro rings and then

converted to asymmetric current by the SiGe photodetector. DC component of the input current is removed by the cancellation loop, generating a symmetric current swing that is converted to a proper voltage swing by the low-noise single-ended front-end. Lower front-end bandwidth is preferred in the clock channel for less jitter, compared to around 0.5 data-rate front-end bandwidth in the data channel for less ISI. Quarter-rate data bit-streams generated by 4 parallel slicers are further de-serialized into 1/8th-rate data streams to make it easier to communicate with the outside world, along the long on-chip transmission line in the interposing PIC.



Figure 6.4: DOE optical link budget.

Even though cascading CTLE always reduces white noise density from TIA's feedback resistor  $R_F$ , it's NOT always worth the extra power, especially when white noise density has already been reduced by our multi-stage amplifier TIA and extreme small input capacitance due to 3D- integration silicon photonics. CTLE's gain and loss is also affected by optical channel loss, as indicated in Fig. 6.4. Assuming 30% wal-plug efficiency, we have 2.21 mW laser source power here. Even though CTLE can provide another 0.2 dB improvement compared to the original -19.0 dBm OMA sensitivity, it only translates to 0.1 mW laser source power saving but requires 0.30 mW power spent on CTLE. Optical transceiver's overall power efficiency becomes worse. Generally speaking, CTLE is only suitable for huge laser source power either due to bad OMA sensitivity or high optical channel loss, making it less attractive with improved optical device and circuit design.



Figure 6.5: Variable bandwidth TIA with multi-stage feedback amplifier and broadband buffer.

Fig. 6.5 shows our modified optical receiver front-end to accommodate a CTLE-less design. Cascading CTLE is replaced by a broadband buffer for reduced power, driving the same loading capacitance. Diode-connected inverter-based loading is added at the input stage's output node to reduce it's DC gain and push it's pole to higher frequency. As shown in Fig. 6.6, higher loop gain and slower first non-dominant pole generate higher TIA closed-loop bandwidth with worse phase margin, and vice versa. 3-bit controlled inverter-based loading helps to achieve the same optimal front-end bandwidth compensating varying PD capacitance, on-chip  $R_F$  value and TIA's loop gain due to PVT variation. In typical case, 16.7 GHz bandwidth is achieved with negligible ac overshoot when gain setting = 2, generating eye-diagram shown in Fig. 6.7.

Fig. 6.8 shows how gain setting compensates varying input capacitance  $C_{in}$  to achieve similar



Figure 6.6: Simulated front-end frequency response with  $C_{in}$  = 14 fF.



Figure 6.7: Simulated 25 Gb/s eye-diagram with gain setting = 2 and  $C_{in}$  = 14 fF.



Figure 6.8: Simulated front-end frequency response with varying  $C_{in}$ .



Figure 6.9: Simulated input-referred noise PSD with varying  $C_{in}$ .



Figure 6.10: Simulated frequency response of conventional TIA and our front-end.



Figure 6.11: Simulated input-referred noise PSD of conventional TIA and our front-end.

ac response and eye-diagram. In case of higher than expected  $C_{in}$ , we need a higher gain setting for bandwidth extension, while lower  $C_{in}$  offered by further improved PD or 3D-integration can also be accommodated by lower gain setting. Simulated input-referred noise density is plotted in Fig. 6.9, indicating same white noise from  $R_F$  and varying noise from  $C_T$ . Varying  $R_F$  value and TIA's loop gain can also be compensated in the same way.

Fig. 6.10 and Fig. 6.11 shows frequency response and input-referred noise PSD of our frontend against a conventional TIA using a single inverter as its amplifier. As we discussed in previous chapter, multi-stage amplifier's higher gain-bandwidth product allows for 2.50 dB OMA improvement at the same bandwidth without excessive ac overshoot.

#### 6.4 Experimental Results

This EIC chip has been fabricated in 12 nm CMOS technology. Each optical transceiver channel is 20 um high as shown in Fig. 6.12, compatible with optical device spacing in PIC.



Figure 6.12: Layout of (a) transmitter channel and (b) receiver channel.



Figure 6.13: (a) Measured BER timing margin curves with OMA = -17.0 dBm and (b) sensitivity curves.

Fig. 6.13 shows measured timing bathtub and sensitivity curves at 25 Gb/s and 26 Gb/s. The 25 Gb/s OMA sensitivity at BER =  $10^{-12}$  is -17.0 dBm with a 0.12 UI timing margin. Table 6.1 compares measured power consumption against simulation results. The optical receiver frontend consumes 1.18 mW from a 0.9 V power supply. The slicers and de-serializer consume 0.94 mW, while per-channel ILO consumes 2.66 mW from a 0.83 V power supply, resulting in 84.8 fJ/bit power efficiency excluding ILO and 191.2 fJ/bit power efficiency including ILO. It's clear that front-end consumes less power at a higher supply voltage, mainly due to the IR drop along power grid in PIC. Higher supply voltage is applied to per-channel ILO for less jitter from it, as we have more jitter amplification along on-chip transmission line in PIC, slightly degrading power efficiency.

Table 6.1: Optical receiver power consumption summary.

|                                  | Target/Simulation | Measurement      |  |
|----------------------------------|-------------------|------------------|--|
| Front-end                        | 1.24 mW @ 0.85 V  | 1.18 mW @ 0.9 V  |  |
| 4 slicers + 4-to-8 de-serializer | 0.83 mW @ 0.7 V   | 0.94 mW @ 0.83 V |  |
| per-channel ILO                  | 2.37 mW @ 0.7 V   | 2.66 mW @ 0.83 V |  |
| Power efficiency excluding ILO   | 82.8 fJ/bit       | 84.8 fJ/bit      |  |
| Power efficiency including ILO   | 177.6 fJ/bit      | 191.2 fJ/bit     |  |

Table 6.2 compares our optical receiver with other recently published design running at similar data-rate. Even with non-ideal PIC, our design achieves the best OMA sensitivity while consuming the minimal power, due to lower  $C_{in}$  and better TIA design. Normalized OMA sensitivity is presented to show the TIA's effect alone, assuming all design work with the the same  $R_{PD}$ ,  $C_{in}$  and data-rate. Our design is second only to one with DFE, with 18.75X less power consumption.

#### 6.5 Conclusion

This chapter presented a 32-channel optical transceiver fabricated in 12 nm CMOS tehenology. 3D-integrated and co-designed photonic-electronic allows for significantly reduced parasitic capacitance, improving transmitter's power efficiency due to less required electrical swing and lower

| References                                              | [40]       | [41]           | [2]       | [22]                     | [42]           | This work            |
|---------------------------------------------------------|------------|----------------|-----------|--------------------------|----------------|----------------------|
| Technology                                              | 14nm       | 55nm<br>BiCMOS | 65nm      | 28nm                     | 28nm           | 12nm                 |
| Data rate (Gbps)                                        | 25         | 26             | 25        | 25                       | 25             | 25                   |
| Architecture                                            | TIA+DFE    | FD-SF TIA      | TIA+EQ    | Integrating              | APD+<br>TIA+EQ | MSA-TIA <sup>1</sup> |
| PD + parasitic cap<br>Cin (fF)                          | 69         | 20             | 90        | < 33                     | 55             | 14                   |
| PD responsivity<br>$R_{PD}$ (A/W)                       | 0.52       | 1.0            | 0.45      | 0.8                      | 4.0            | 0.8                  |
| Sens. OMA (dBm),<br>BER = $10^{-12}$                    | -13.8      | -15.2          | -8.0      | -14.9                    | -16.0          | -17.0                |
| Normalized<br>Sens. OMA $(dBm)^2$ ,<br>BER = $10^{-12}$ | -15.8      | -12.0          | -11.2     | -13.5                    | -8.7           | -13.7                |
| Power (mW)                                              | 39.8       | 45             | 17        | 4.25                     | 34.2           | 2.12                 |
| Power efficiency                                        | 1.59pJ/bit | 1.73pJ/bit     | 680fJ/bit | 170fJ/bit<br>w/o slicers | 1.37pJ/bit     | 84.8fJ/bit           |

Table 6.2: Optical receiver performance summary.

1 Multi-stage amplifier TIA

2 Normalized to R<sub>PD</sub>=1A/W, Cin=100fF and Data-rate=25Gbps

loading capacitance. Receiver's sensitivity and power consumption are both improved by dramatic  $C_{in}$  reduction. Besides that, we propose a variable bandwidth TIA with multi-stage amplifier to convert faster technology's speed advantage to better noise performance. Table 6.3 summarize noise reduction techniques we used to improve optical transceiver's overall power efficiency.

Table 6.3: Noise reduction techniques summary.

|                         | Our Solution                   |
|-------------------------|--------------------------------|
| $R_F$ Noise Density     | TIA with Multi-stage Amplifier |
|                         | Co-designed PD with less Cin   |
| Amplifier Noise Density | Co-designed PD with less Cin   |
| Noise Bandwidth         | Bandwidth Control in TIA       |
| Power and Area          | Improvement in Both            |

#### 7. A 20-CHANNEL 3D-INTEGRATED OPTICAL TRANSCEIVER

In this Chapter, We will present a 20-channel 3D-integrated WDM optical transceiver, including 19 data channels and 1 forwarded clock channel. Several improvements are proposed to achieve sub-200 fJ/bit total optical transceiver power efficiency.

## 7.1 3D-integration Scheme

Even though 3D integration provides significant improvement in the previous project, on-chip transmission lines and power grid through PIC active interposer degrade performance. In this project, we oppositely bond chips as shown in Fig. 7.1. CMOS communicates with PCB directly, with a better power grid, and acts as an interposer for a shrinking PIC chip.



Figure 7.1: 3D-integration bonding scheme.

## 7.2 Power-efficient Low-swing Optical Transmitter

The updated transmitter architecture is plotted in Fig. 7.2. Mos-cap modulator with better electro-optic modulation requires 0.5  $V_{pp}$  electrical swing instead of 1.2  $V_{pp}$  swing in the previous



Figure 7.2: Power-efficient low-swing optical transmitter.

project. Low-swing driving stage powered by a 0.25 V supply is used here, with further power saving from 4-to-1 serializer with TIA loading and level-shifting pre-driver. Mos-cap modulator's differential input capacitance is also reduced from 20 fF to 15 fF, permitting proportional reduced transmitter size and power.

Fig. 7.3 shows 4 to 1 serializer's schematic and its timing diagram. Time-interleaved 1UI pulse P<1:4>/N<1:4> generated by AND/OR gate with adjacent quarter-rate clock phase, drive four tri-state logic connected to the same TIA loading in parallel. At any time, only one of the quarter-rate data D<1:4> is sampled by its 1UI pulse to pull up/down the TIA loading, while the other three logic remain in tri-state off-state. Clock pulse CK<1:4> must have a proper phase and duty cycle to generate the correct full-rate OUT/OUTB. Simulated serializer's output waveform, with and without the bandwidth extending TIA loading, are plotted in Fig. 7.4. Eye-diagram is



Figure 7.3: (a) 4 to 1 serializer and (b) its timing diagram.

significantly improved without unnecessary high power spent on serializer, at the cost of a slightly reduced serializer's output swing, but not a problem for the pre-driver powered by 0.65V supply.

Even though 4 to 1 serializer's can directly drive the all-NMOS low-swing driving stage with 4 fF single-ended input capacitance, level-shifting pre-driver with 2 fF single-ended input capacitance is inserted between them for better power efficiency. Fig. 7.5 shows how pre-driver improves eye-diagram, driving 2 times higher loading capacitance  $C_L$ . When IN = VDD,  $C_C$  and  $C_L$  are both charged to  $V_L = 100$  mV first to properly turn off NMOS in the subsequent stage with 180 mV threshold voltage. When IN falls to VSS, MP/MN are turned off while  $C_C$ 's bottom plate rises to VDD. Charge sharing between  $C_C$  and  $C_L$  pushes pre-driver's output voltage



Figure 7.4: Simulated eye-diagrams at 4 to 1 serializer's output node, 650mV DVFS supply.

to  $V_H = V_L + \frac{C_C}{C_C + C_L} VDD$ . Even though larger  $C_C$  is preferred for higher  $V_H$ , leakage through MP makes it less efficient when  $V_H > VDD$ . When IN rises to VDD again, another charge sharing between  $C_C$  and  $C_L$  will push the output node back to  $V_L$ . As shown in Fig. 7.5 (c)/(d), slightly reduced slew rate at pre-driver's output node generates better eye-diagram thanks to reduced swing, which is not a problem driving NMOS in subsequent stage. 4 to 1 serializer's size and power are reduced by a factor of 2 as a result.

DVFS supply is used to power preceding stages to balance ISI and power consumption with PVT variations, with simulated eye-diagrams in the typical case shown in Fig. 7.6. Even though there's not much difference between 650 mV and higher supply while optical transmitter's standalone eye-diagram seems acceptable, the full link's bandwidth is limited by the low-swing transmitter as a cascading stage. This problem is fixed by a CTLE on the receiver side, decoupling the required bandwidth at the mos-cap modulator's input node and its 20 fF loading capacitance.

#### 7.3 Ultra Sensitive Optical Receiver

Updated receiver architecture is shown in Fig. 7.7 for better sensitivity and less power. Several techniques used in it will be discussed in this section.



Figure 7.5: Pre-driver (a) charges/(b) discharges  $C_C$ , and its simulated single-ended (c) input and (d) output eye-diagram.

# 7.3.1 TIA with Multi-stage Amplifier

Due to higher intrinsic gain in 22 nm CMOS technology, diode-connected loading is added in TIA's multi-stage amplifier to trade reduced gain for a faster pole. The amplifier's gain becomes less sensitive to PVT variation now, determined by the transistors' size ratio. Inverter-based current-mode additive CTLE discussed in Chapter 3, is added here for bandwidth extension and bandwidth control.

One straightforward improvement comes from PD with further reduced input capacitance. As shown in Fig. 7.8, simulated OMA sensitivity is improved by 4.5 dB, when  $C_{in}$  is 5 fF instead of 14



Figure 7.6: Simulated eye-diagrams at mos-cap modulator's input node with (a) 600 mV, (b) 650 mV, (c) 700 mV DVFS supply and (d) normalized optical eye-diagram with 650 mV DVFS supply.

fF ( $C_T = 2C_{in}$ ). Fig. 7.9 shows simulated front-end bandwidth of 13.4 GHz, close to optimal 0.5 data-rate bandwidth running at 26.32 Gb/s, and eye-diagram. With the low noise density offered by TIA with a multi-stage amplifier, the simulated OMA sensitivity is -21.5 dBm.

# 7.3.2 Inverter-based Current-mode Additive CTLE with Active Inductor

Equalization is widely used for ISI cancellation and bandwidth recovery [21], but usually doesn't affect noise performance much in a general receiver. However, it proves to be an effective way to suppress noise from  $R_F$  in the preceding TIA using either CTLE [13] or DFE [3]. Both allow for intentionally designed low-bandwidth TIA with less noise density from  $R_F$ , and nei-



Figure 7.7: Ultra sensitive optical receiver.

ther affects amplifier noise density. The main difference is that DFE doesn't add high-frequency noise back during bandwidth recovery as CTLE does, providing significantly reduced noise bandwidth and better noise performance but also requiring huge power consumption for fast decision feedback. We will focus on the CTLE method in this dissertation due to power efficiency concern.

Voltage-mode inverter-based CTLE [36] shown in Fig. 7.10 (a) was proposed for low-supply operation and current reuse. High-frequency peaking can be generated in either additive or sub-tractive way [12], with less power spent in additive CTLE but worse linearity and more parasitic pertaining to the coupling capacitor  $C_C$ . Though worse linearity due to higher swing is not a serious problem in NRZ signaling,  $C_C$  and parasitic directly at the output node reduce high-frequency peaking gain, requiring more power to achieve the overall front-end bandwidth with  $C_L$ .

We propose CTLE carrying out signal addition in the current domain, decoupling  $C_C$  from  $C_L$ and improving linearity as well. Inverter-based active inductor, implemented by a resistor R and input capacitance C of inverter as shown in Fig. 7.11, is added for bandwidth extension and power



Figure 7.8: Simulated OMA sensitivity with different total RX input capacitance.

saving. Even though active inductor can also be used in the voltage-mode additive CTLE, complex frequency response related to  $C_C$ ,  $C_L$  and parasitic makes it less effective. The CTLE frequency response becomes

$$H(s) = [g_{m1} \frac{1 + sC_C/g_{m3}}{1 + sC_C(1/g_{m3} + 1/g_{ms})} + g_{m2} \frac{sC_C/g_{m3}}{1 + sC_C(1/g_{m3} + 1/g_{ms})}]Z(s)$$

$$\approx [g_{m1} + g_{m2} \frac{sC_C/g_{m3}}{1 + sC_C/g_{m3}}]Z(s), when \ g_{ms} >> g_{m3}$$

$$Z(s) = \frac{1}{g_{mL}} \frac{1 + s/\omega_z}{1 + 2\zeta(s/\omega_n) + (s/\omega_n)^2}$$

$$\omega_z = 1/RC, \omega_n = \sqrt{\frac{g_{mL}}{RCC_L}}, \zeta = \frac{1}{2}\sqrt{\frac{C_L}{Cg_{mL}R}}.$$
(7.1)

Fig. 7.12. shows that proper R can extend bandwidth by 1.76 times without ac overshoot.



Figure 7.9: Simulated front-end frequency response and eye-diagram.

# 7.3.3 Noise Reduction in DC Cancellation Loop

DC cancellation loop shown in Fig. 7.13 suppresses the DC component of the input current  $I_{in,DC}$  by the loop gain, generating the following frequency response

$$Z(s) = \frac{V_{out}}{I_{in}} = \frac{1 + s/\omega_p}{1 + g_m Z_{HF}(s)A + s/\omega_p} Z_{HF}(s).$$
(7.2)

To have reasonable dynamic range,  $M_0$ 's size must be large enough to support the maximum



Figure 7.10: (a) Voltage-mode additive CTLE and (b) current-mode additive CTLE with active inductor.



Figure 7.11: Inverter-based active inductor and its small signal model.

 $I_{in,DC}$  determined by

$$I_{in,DC} = \frac{(ER+1)R_{PD}P_{in,OMA}}{2(ER-1)},$$
(7.3)

where  $R_{PD}$  is the PD's responsivity, ER and  $P_{in,OMA}$  are the input laser's extinction ratio and OMA power, respectively.

Since noise from the DC feedback loop is filtered out by its low bandwidth,  $M_0$  becomes the main noise contributor here, with the following noise current directly added at the input node,

$$\overline{I_{n,M0}^2(\omega)} = 4kT\gamma g_m = 4kT\gamma \sqrt{2\mu C_{ox}(W/L)I_{in,DC}}.$$
(7.4)

 $M_0$  should use the minimum transistor length and transistor width just enough to stay in saturation region with maximum  $I_{in,DC}$ . It also helps to minimize any parasitic capacitance added to



Figure 7.12: Active inductor frequency response.



Figure 7.13: DC cancellation loop diagram.

the TIA's input node, which degrades the front-end's noise performance as discussed previously.

This design method works well previously, but may no longer be sufficient with reduced noise from other sources in an ultra-sensitive optical receiver. We added a source degeneration resistor  $R_S$  for further improvement as shown in Fig. 7.14, with a shorting switch.

When the shorting switch is open, the total noise current in the DC cancellation loop becomes



Figure 7.14: Source degeneration resistor with shorting switch and its noise model.

$$\overline{I_{n,total}^{2}(\omega)} = \left(\frac{g_{m}R_{S}}{1+g_{m}R_{S}}\right)^{2}\frac{4kT}{R_{S}} + \left(\frac{1}{1+g_{m}R_{S}}\right)^{2}\overline{I_{n,M0}^{2}(\omega)} \\ \approx \left(\frac{g_{m}}{1+g_{m}R_{S}}\right)^{2}4kTR_{S}, when \ g_{m}R_{S} >> 1,$$
(7.5)

and always drops with increasing  $R_S$ , but  $M_0$  may not stay in saturation region due to IR drop on  $R_S$  with a large input current. This problem is easily fixed by the shorting switch because noise performance no longer matters with a large input signal and reasonable laser extinction ratio.

#### 7.3.4 Low-voltage Slicer with Hybrid Offset Cancellation

Two-stage Schinkel slicer circuit [38] is widely used in high-speed SERDES, due to minimal device stacking compared to StrongARM version [43][44] allowing for low-voltage operation. One of the input-stage sampler's two output nodes rises high during the sampling phase depending on differential input and flips the subsequent SR latch if needed. The decision stored in the SR latch remains unchanged during the reset phase when both input-stage sampler's two output nodes fall to the ground.

Offsets due to random mismatch are largely suppressed by conversion gain when referred to the



Figure 7.15: Slicer with hybrid offset cancellation.

input, except the one related to the input differential pair. To avoid its excessively large size which impacts both its power consumption and the front-end's loading, current-mode offset cancellation is usually included for better sensitivity as shown in Fig. 7.15, with local current mirror isolating unnecessary loading capacitance. However, added DC current still requires a minimal differential pair size to cover the  $\pm/-3 \sigma$  worst-case scenario in Monte Carlo simulation. We propose a hybrid offset cancellation to alleviate this problem, including both capacitive and current-mode mechanisms. Coarse offset cancellation is implemented using switch-controlled parasitic capacitors, while DAC-controlled programmable DC current fine tunes it for the required residual offset.

## 7.3.5 Wire-bonded Optical Receiver Front-end Test Structure and Experimental Results

Fig. 7.16 shows the optical receiver front-end test structure fabricated in 22 nm technology, wire-bonded to a discrete PD. Diode-connected loading is added at TIA amplifier's each output node for bandwidth extension. One more buffer is inserted between TIA and CTLE to reduce TIA's loading. DC cancellation loop includes a source degeneration resistor for noise reduction at the cost of degraded dynamic range. CTLE's full-rate pseudo-differential output is driven output of the chip through the CML buffer.



Figure 7.16: Wire-bonded optical receiver front-end test structure.



Figure 7.17: Optical receiver front-end test setup.

Optical test setup is shown in Fig. 7.17. Optical input signal is modulated by MZM with PRBS15 data and then coupled to the discrete PD with 150 fF total combined input capacitance, 0.65 A/W responsivity and 4.42 dB extinction ratio. Off-chip ac-coupled PA and Balun amplify CML's output for BER testing and eye-diagram plotting. Measured eye-digram and BER timing margin curves are plotted in Fig. 7.18 and Fig. 7.19.

The 10 Gb/s OMA sensitivity at BER =  $10^{-12}$  is -13.9 dBm with 0.1 UI timing margin. Optical receiver front-end consumes 3.34 mW from a 0.85 V supply @ 10 Gb/s, which translates to 334 fJ/bit power efficiency. Unfortunately, bandwidth is limited by the 300 um long wire between

CTLE and CML buffer in this test structure. For higher bandwidth, more buffer stages should be inserted to drive full-rate data along the on-chip wire. This problem will NOT affect the 20-channel design, due to significantly reduced bandwidth to drive 1/8th-rate data.



Figure 7.18: Measured eye-diagram at (a) 8 Gb/s and (b) 10 Gb/s.



Figure 7.19: Measured BER timing margin curves with OMA = -13.9 dBm.



Figure 7.20: Clocking circuitry diagram of the 20-channel optical transceiver.

## 7.4 Power-efficient Clock Generation and Distribution

Power-efficient clocking circuitry is critical for both the transmitter and receiver. Fig. 7.20 shows the clocking circuitry diagram of the 20-channel optical transceiver. On the transmitter side, 4-phase quarter-rate clock is generated based on a pair of quarter-rate differential reference clock. H-tree clock distribution is used to maintain equal phases for the 20 transmitter channel. On the receiver side, similar clock architecture is used to deliver 8-phase 1/8th-rate clock to the 19 data channels. The 1/8th-rate differential reference clock comes from either clock channel RX5 in normal operation mode, or off-chip in test mode. That means this multi-phase clock generator must be very close to clock channel RX5.

## 7.4.1 Multi-phase Clock Generation

Test structure of multi-phase clock generation shown in Fig. 7.21 was fabricated in 22 nm technology. DLL converts differential reference clock to multi-phase clock and sends it to cascading





Figure 7.21: (a) Transmitter and (b) receiver multi-phase clock generation.



Figure 7.22: Simulated phase error of transmitter multi-phase clock generation.



Figure 7.23: Simulated phase error of receiver multi-phase clock generation.

ILO for further phase error suppression [34]. However, DLL's delay cell and ILO block must have sufficient large size and power consumption for a given matching requirement, making it unsuitable for low-power applications. Otherwise, there will be large phase error among multi-phase due to size mismatch, as shown by in Monte Carlo simulation Fig. 7.22 and Fig. 7.23. Measured phase



Figure 7.24: Measured phase error of transmitter and receiver multi-phase clock generation.

error of 5 built samples are plotted in Fig. 7.24. A less severe problem is the systematic phase error due to the discrepancy between the required Vref voltage and OTA's output when Up and Dn are equal.

Both these problems are fixed in the updated design shown in Fig. 7.25. ILO cell is removed to allow for more power spent on the delay cell for better matching and less random phase error. In a perfect locked condition, UP/DN should be equal, generating optimal output DC voltage which is about half of the supply. Meanwhile, we need a 550 mV supply for TX delay cell and 470 mV supply for RX delay cell for reasonable bandwidth and jitter. Unity-gain LDO configuration will force OTA's output away from its optimal 350 mV in this design, resulting in a systematic


Figure 7.25: Updated (a) transmitter and (b) receiver multi-phase clock generation.

offset between UP and DN and so systematic phase error, which can be fixed by the added resistive divider in LDO.

#### 7.4.2 Clock Distribution

On-chip clock distribution is another challenge for reasonable bandwidth to avoid jitter amplification, less jitter, and better power efficiency. Phase error and duty cycle variation must also be within correction range.



Figure 7.26: Inverter-based clock buffer.

Inverter-based clock buffer shown in Fig. 7.26 is used to construct the on-chip H-tree clocking network, each driving 110 um on-chip wire with fan-out factor = 3. Fig. 7.27 and Fig. 7.28 show TX/RX clock network's simulated jitter transfer function, output-referred jitter, and power consumption with various supply. TX clocking network has reasonable jitter even with 0.45V supply but needs 0.55V supply for less than 1dB jitter amplification. TX clocking network's output jitter is 108.9 fs with 5.53 mW power at 0.55 V supply. RX clocking network's jitter amplification is better due to the half clock frequency, and is more limited by its output referred jitter. RX clocking network's output jitter is 219.1 fs with 3.21 mW power at 0.45 V supply.

#### 7.4.3 Phase Correction and Duty Cycle Correction

Combining the multi-phase clock generation and clock distribution network, simulated total duty cycle variations are plotted in Fig. 7.29 and Fig. 7.31, while simulated total phase errors are plotted in Fig. 7.30 and Fig. 7.32. In  $3\sigma$  worst case scenario, TX clock has 55.61% duty cycle and 0.170 UI phase error, while RX clock has 45.55% duty cycle and 0.331 UI phase error.

Fig. 7.33 and Fig. 7.34 show duty cycle correction cell, which can cover duty cycle variation between 40% and 60%. Phase error correction are shown in Fig. 7.35 and Fig. 7.36. Two cascading



Figure 7.27: TX clocking distribution network's simulated jitter transfer function, output-referred jitter, and power consumption with various supply.



Figure 7.28: RX clocking distribution network's simulated jitter transfer function, output-referred jitter, and power consumption with various supply.

correction stages are needed to correct TX phase error within +/-0.25 UI, while three cascading correction stages are needed to correct RX phase error within +/-0.35 UI.



Figure 7.29: Simulated TX clocking total duty cycle variation.



Figure 7.30: Simulated TX clocking total phase error.



Figure 7.31: Simulated RX clocking total duty cycle variation.

# 7.5 Performance Summary

Forwarded clock cancels the majority jitter from the external reference clock and TX DLL/ILO. The rest jitter sources are summarized in Table 7.1, indicating 6.80 ps or 0.18 UI margin for 26.32



Figure 7.32: Simulated RX clocking total phase error.

Gb/s data-rate with BER =  $10^{-12}$ .

Based on the data channel receiver's simulated -21.5 dBm OMA sensitivity and link budget in Table 7.2, laser source power is 710 uW assuming 30% wall-plug efficiency. Power consumption breakdown optical transceiver with 26.32 Gb/s data-rate is shown in Fig. 7.37. Each data channel



Figure 7.33: (a) TX clocking duty cycle correction and (b) simulation result.



Figure 7.34: (a) RX clocking duty cycle correction and (b) simulation result.

consumes 4.84 mW, including amortized clock channel power, which translates to 184 fJ/bit fulllink power efficiency.



Figure 7.35: (a) TX clocking phase error correction and (b) simulation result.



Figure 7.36: (a) RX clocking phase error correction and (b) simulation result.

| Component (BER = $10^{-12}$ )                | $\sigma_{RJ}$ | DJ      | TJ                |
|----------------------------------------------|---------------|---------|-------------------|
| Data TX (VDD = $0.65$ V, VDD_DRV = $0.25$ V) | 0.21 ps       | 1.03 ps | 3.97 ps           |
| Data RX AFE (-21.5 dBm OMA)                  | 1.73 ps       | 1.74 ps | 25.96 ps          |
| Clock TX + Clock RX AFE (-20.0 dBm OMA)      | 0.84 ps       | 0       | 11.76 ps          |
| RX DLL + RX Clock Distribution               | 0.60 ps       | 0       | 8.40 ps           |
| Total Jitter                                 | 2.03 ps       | 2.77 ps | 31.19 ps          |
| Timing Margin at 26.32Gb/s                   | 0 ps          | 0 ps    | 6.80 ps (0.18 UI) |

Table 7.1: Jitter budget running at 26.32 Gb/s.

|                                     | Loss (dB) | $P_1$ (uW) | $P_0$ (uW) | $P_{\Delta}$ (uW) | $P_{avg}$ (uW) |
|-------------------------------------|-----------|------------|------------|-------------------|----------------|
| RX OMA Sensitivity = $-21.5$ dBm    |           |            |            | 7.08              |                |
| Margin                              | 2         | 13.51      | 2.29       | 11.22             | 7.90           |
| PD Coupling                         | 0.1       | 13.82      | 2.34       | 11.48             | 8.08           |
| Drop Filter Insertion Loss          | 0.3       | 14.81      | 2.51       | 12.30             | 8.66           |
| Crosstalk                           | 1         | 18.65      | 3.16       | 15.49             | 10.91          |
| Drop Filter Through Loss (40)       | 4         | 46.84      | 7.94       | 38.90             | 27.39          |
| Waveguide                           | 0.42      | 51.60      | 8.75       | 42.85             | 30.17          |
| RX Fiber Coupling                   | 0.3       | 55.29      | 9.37       | 45.92             | 32.33          |
| Fiber Loss (1km)                    | 0.3       | 59.25      | 10.04      | 49.20             | 34.64          |
| TX Fiber Coupling                   | 0.3       | 63.48      | 10.76      | 52.72             | 37.12          |
| Modulator 7.7dB ER                  |           | 63.48      | 10.76      | 52.72             | 37.12          |
| Mod Through Loss (40)               | 4         | 159.45     |            |                   |                |
| Waveguide Loss                      | 0.26      | 169.29     |            |                   |                |
| AWG Loss                            | 1.0       | 213.12     |            |                   |                |
| Source CW Laser with 30% efficiency |           | 710.42     |            |                   |                |

Table 7.2: Link budget.

# 4.72mW (179fJ/b)



Figure 7.37: Power consumption breakdown of optical transceiver with 26.32Gb/s data-rate, including amortized clock channel power.

#### 8. CONCLUSION AND FUTURE WORK

#### 8.1 Conclusion

Huge amounts of data are generated by consumer electronics, automated vehicles, and data centers, requiring ever-increasing data-rate with superior power efficiency. We are more limited by communication bandwidth instead of computing power. Conventional copper channel with complex equalization is no longer a solution even with a relaxed power budget, as thermal management becomes more and more difficult. Another challenge comes from the economic cost and risk to design huge chips using advanced CMOS technology. Chiplet technology is expected to experience significant growth in the near future, requiring ultra-wide chip-to-chip bandwidth. A mature optical link with improvements in circuit design, optical device, and 3D-integration, is critical to sustaining data-rate expansion.

This work covers different aspects of a power-efficient optical transceiver, including low-noise low-power optical receiver, low-jitter clocking circuitry, power-efficient transmitter, and microring resonant wavelength stabilization. Several design techniques are proposed to improve full-link power efficiency by reducing noise and power consumption. Three CMOS IC designs during my Ph.D. research are discussed here, one optical receiver in 28 nm, one optical transceiver in 12 nm, and one optical transceiver in 22 nm.

The wire-bonded 12.5 Gb/s optical receiver improves power efficiency by 3.6X and provides better normalized OMA sensitivity compared to conventional design, validating our proposed noise reduction techniques.

Significantly improvement is achieved in the second optical transceiver, thanks to design techniques, co-designed optical device, and 3D-integration. We have measured -17.0 dBm OMA sensitivity and 84.8 fJ/bit receiver power efficiency at 25 Gb/s, which is the start-of-the-art design to our best knowledge. Our design techniques are further proved by the normalized OMA sensitivity, second only to design using DFE with 18.75X more power. The third optical transceiver targeting an aggressive 184 fJ/bit power efficiency has been tapedout in 22 nm CMOS technology. Except for the aforementioned improvements, we further improve the MOS-capacitor low-swing transmitter with better optical devices. Electrostatic micro-ring resonant wavelength stabilization is used to eliminate thermal tuning power in the previous project. The clocking design is also updated for minimal clocking power with acceptable jitter and phase error.

In conclusion, our proposed optical transceiver offers superior power efficiency, which can be further improved by more advanced CMOS technology, better optical device, and chiplet integration.

### 8.2 Future Work

# 8.2.1 Further Improvement with Co-designed Optical Device and Integration Scheme

One straightforward improvement in optical transceivers' data rate and power efficiency can be achieved with better co-designed optical devices and integration method. Optical modulator with less required electrical swing and input capacitance can proportionally reduce optical transmitter's power.



Figure 8.1: Resistive TIA with DC cancellation loop.

On the receiver side, less PD and package capacitance means less high-frequency noise and higher bandwidth. In this dissertation, shunt-feedback TIA with NRZ signaling was proposed to interface 10 to 20 fF total input capacitance and has been the dominant receiver architecture in medium data-rate optical receiver, due to its excellent power efficiency. However, with input capacitance reduced to 1 fF or even lower, shunt-feedback amplifier is no longer necessary for bandwidth extension. Resistive TIA shown in Fig. 8.1 may regain popularity in an extremely high data-rate application, without any amplifier noise. This simple structure offers higher bandwidth no longer limited by CMOS technology and inherent better linearity suitable for PAM4 or more complex signaling, offering significantly higher data-rate.

# 8.2.2 Automatic Tuning Logic

Variable supply voltage generated by on-chip DAC and LDO is used to compensate PVT variation, but is manually controlled now. Ring oscillator was proposed as a speed indicator in inverterbased design [36]. As we're using DLL for multi-phase clock generation, DLL's LDO output voltage is a better speed indicator without extra power spent on a free-running ring oscillator.



Figure 8.2: (a) RX slicer with controlled offset and (b) CTLE frequency response.

As shown in Fig. 7.7 and Fig. 8.2 (a), there's one extra 1/8th-rate slicer with voltage DAC in the receiver for CTLE peaking searching but is manually controlled. Equivalent offset is generated by different DAC output, and can be used as a signal swing indicator. The following three-step tuning logic can be added to set CTLE's optimal high frequency gain, with frequency response shown in Fig. 8.2 (b).

Step 1, calibrate all 9 slicers to eliminate their offset voltage.

Step 2, measure low-frequency amplitude. Set input data as 16 consecutive ones and 16 consecutive zeros at 26.32 Gb/s, equivalent to 822.5 MHz clock. Increase offset till this extra slicer's output flips. This offset is the low-frequency signal amplitude.

Step 3, set high-frequency gain. Set input data as 1010 data stream at 26.32 Gb/s, equivalent to 13.16 GHz clock. Sweep CTLE high-frequency gain with the previous offset setting. When this extra slicer flips, high-frequency and low-frequency roughly have the same gain.

#### REFERENCES

- [1] S. Kiran, S. Cai, Y. Zhu, S. Hoyos, and S. Palermo, "Digital equalization with adc-based receivers: Two important roles played by digital signal processingin designing analogto-digital-converter-based wireline communication receivers," *IEEE Microwave Magazine*, vol. 20, no. 5, pp. 62–79, 2019.
- K. Yu, C. Li, H. Li, A. Titriku, A. Shafik, B. Wang, Z. Wang, R. Bai, C.-H. Chen, M. Fiorentino, P. Y. Chiang, and S. Palermo, "A 25 gb/s hybrid-integrated silicon photonic source-synchronous receiver with microring wavelength stabilization," *IEEE Journal of Solid-State Circuits*, vol. 51, no. 9, pp. 2129–2141, 2016.
- [3] M. G. Ahmed, M. Talegaonkar, A. Elkholy, G. Shu, A. Elmallah, A. Rylyakov, and P. K. Hanumolu, "A 12-Gb/s -16.8-dBm OMA Sensitivity 23-mW Optical Receiver in 65-nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 53, pp. 445–457, Feb 2018.
- [4] N. D. Dalt, Understanding Jitter and Phase Noise: A Circuits and Systems Perspective. Cambridge: Cambridge University Press, 2018.
- [5] A. Hajimiri and T. Lee, "A general theory of phase noise in electrical oscillators," *IEEE Journal of Solid-State Circuits*, vol. 33, no. 2, pp. 179–194, 1998.
- [6] A. Hajimiri, S. Limotyrakis, and T. Lee, "Jitter and phase noise in ring oscillators," *IEEE Journal of Solid-State Circuits*, vol. 34, no. 6, pp. 790–804, 1999.
- [7] B. Casper and F. O'Mahony, "Clocking analysis, implementation and measurement techniques for high-speed data links—a tutorial," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 56, no. 1, pp. 17–39, 2009.
- [8] C. Li, R. Bai, A. Shafik, E. Z. Tabasy, B. Wang, G. Tang, C. Ma, C.-H. Chen, Z. Peng, M. Fiorentino, R. G. Beausoleil, P. Chiang, and S. Palermo, "Silicon photonic transceiver

circuits with microring resonator bias-based wavelength stabilization in 65 nm cmos," *IEEE Journal of Solid-State Circuits*, vol. 49, no. 6, pp. 1419–1436, 2014.

- [9] A. Samanta, P.-H. Chang, P. Yan, M. Fu, M. Berkay-On, A. Kumar, H. Kang, I.-M. Yi, D. Annabattuni, Y. Zhang, D. Scott, R. Patti, Y.-H. Fan, Y. Zhu, S. Palermo, and S. Ben Yoo, "A direct bond interconnect 3d co-integrated silicon-photonic transceiver in 12nm finfet with -20.3dbm oma sensitivity and 691fj/bit," in *2023 Optical Fiber Communications Conference and Exhibition (OFC)*, pp. 1–3, 2023.
- [10] P.-H. Chang, A. Samanta, P. Yan, M. Fu, Y. Zhang, M. B. On, A. Kumar, H. Kang, I.-M. Yi, D. Annabattuni, D. Scott, R. Patti, Y.-H. Fan, Y. Zhu, S. J. Ben Yoo, and S. Palermo, "A sub-500fj/bit 3d direct bond silicon photonic transceiver in 12nm finfet," in 2023 Symposium on VLSI Technology and Circuits, 2023.
- [11] W. Sansen, "Distortion in elementary transistor circuits," *IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing*, vol. 46, no. 3, pp. 315–325, 1999.
- [12] K. Zheng, Y. Frans, S. L. Ambatipudi, S. Asuncion, H. T. Reddy, K. Chang, and B. Murmann,
  "An Inverter-Based Analog Front-End for a 56-Gb/s PAM-4 Wireline Transceiver in 16-nm
  CMOS," *IEEE Solid-State Circuits Letters*, vol. 1, no. 12, pp. 249–252, 2018.
- [13] D. Li, G. Minoia, M. Repossi, D. Baldi, E. Temporiti, A. Mazzanti, and F. Svelto, "A Low-Noise Design Technique for High-Speed CMOS Optical Receivers," *IEEE Journal of Solid-State Circuits*, vol. 49, pp. 1437–1447, June 2014.
- [14] E. Sackinger, Broadband Circuits for Optical Fiber Communication. New York, NY, USA: Wiley, 2005.
- [15] A. A. Abidi, "Gigahertz transresistance amplifiers in fine line NMOS," *IEEE Journal of Solid-State Circuits*, vol. 19, pp. 986–994, Dec 1984.
- [16] E. Säckinger, "The Transimpedance Limit," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 57, no. 8, pp. 1848–1856, 2010.

- [17] D. Li, L. Geng, F. Maloberti, and F. Svelto, "Overcoming the Transimpedance Limit: A Tutorial on Design of Low-Noise TIA," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 69, no. 6, pp. 2648–2653, 2022.
- [18] S. Mohan, M. Hershenson, S. Boyd, and T. Lee, "Bandwidth extension in cmos with optimized on-chip inductors," *IEEE Journal of Solid-State Circuits*, vol. 35, no. 3, pp. 346–355, 2000.
- [19] F. Y. Liu, D. Patil, J. Lexau, P. Amberg, M. Dayringer, J. Gainsley, H. F. Moghadam, X. Zheng, J. E. Cunningham, A. V. Krishnamoorthy, E. Alon, and R. Ho, "10-Gbps, 5.3-mW Optical Transmitter and Receiver Circuits in 40-nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 47, pp. 2049–2067, Sep. 2012.
- [20] P. Yan, C. Hong, P.-H. Chang, H. Kang, D. Annabattuni, A. Kumar, Y.-H. Fan, R. Liu, R. Rady, and S. Palermo, "A 12.5 Gb/s 1.38 mW Inverter-Based Optical Receiver in 28 nm CMOS," in 2022 IEEE 65th International Midwest Symposium on Circuits and Systems (MWSCAS), pp. 1–4, 2022.
- [21] M. G. Ahmed, D. Kim, R. K. Nandwana, A. Elkholy, K. R. Lakshmikumar, and P. K. Hanumolu, "A 16-Gb/s -11.6-dBm OMA Sensitivity 0.7-pJ/bit Optical Receiver in 65-nm CMOS Enabled by Duobinary Sampling," *IEEE Journal of Solid-State Circuits*, vol. 56, no. 9, pp. 2795–2803, 2021.
- [22] S. Saeedi, S. Menezo, G. Pares, and A. Emami, "A 25 gb/s 3d-integrated cmos/siliconphotonic receiver for low-power high-sensitivity optical communication," *Journal of Lightwave Technology*, vol. 34, no. 12, pp. 2924–2933, 2016.
- [23] D. Li, M. Liu, S. Gao, Y. Shi, Y. Zhang, Z. Li, P. Y. Chiang, F. Maloberti, and L. Geng, "Low-Noise Broadband CMOS TIA Based on Multi-Stage Stagger-Tuned Amplifier for High-Speed High-Sensitivity Optical Communication," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 66, no. 10, pp. 3676–3689, 2019.

- [24] B. Razavi, "The delay-locked loop [a circuit for all seasons]," *IEEE Solid-State Circuits Magazine*, vol. 10, no. 3, pp. 9–15, 2018.
- [25] A. Cevrero, I. Ozkaya, P. A. Francese, M. Brandli, C. Menolfi, T. Morf, M. Kossel, L. Kull, D. Luu, M. Dazzi, and T. Toifl, "6.1 a 100gb/s 1.1pj/b pam-4 rx with dual-mode 1-tap pam-4 / 3-tap nrz speculative dfe in 14nm cmos finfet," in 2019 IEEE International Solid- State Circuits Conference (ISSCC), pp. 112–114, 2019.
- [26] B. Razavi, "The ring oscillator [a circuit for all seasons]," *IEEE Solid-State Circuits Maga*zine, vol. 11, no. 4, pp. 10–81, 2019.
- [27] B. Razavi, "A study of injection locking and pulling in oscillators," *IEEE Journal of Solid-State Circuits*, vol. 39, no. 9, pp. 1415–1424, 2004.
- [28] M. Hossain and A. C. Carusone, "Cmos oscillators for clock distribution and injection-locked deskew," *IEEE Journal of Solid-State Circuits*, vol. 44, no. 8, pp. 2138–2153, 2009.
- [29] K. Hu, T. Jiang, J. Wang, F. O'Mahony, and P. Y. Chiang, "A 0.6 mw/gb/s, 6.4–7.2 gb/s serial link receiver using local injection-locked ring oscillators in 90 nm cmos," *IEEE Journal of Solid-State Circuits*, vol. 45, no. 4, pp. 899–908, 2010.
- [30] S. Chen, L. Zhou, I. Zhuang, J. Im, D. Melek, J. Namkoong, M. Raj, J. Shin, Y. Frans, and K. Chang, "A 4-to-16ghz inverter-based injection-locked quadrature clock generator with phase interpolators for multi-standard i/os in 7nm finfet," in 2018 IEEE International Solid -State Circuits Conference - (ISSCC), pp. 390–392, 2018.
- [31] X. Zheng, F. Lv, L. Zhou, D. Wu, J. Wu, C. Zhang, W. Rhee, and X. Liu, "Frequency-domain modeling and analysis of injection-locked oscillators," *IEEE Journal of Solid-State Circuits*, vol. 55, no. 6, pp. 1651–1664, 2020.
- [32] R. Adler, "A study of locking phenomena in oscillators," *Proceedings of the IRE*, vol. 34, no. 6, pp. 351–357, 1946.
- [33] L. Paciorek, "Injection locking of oscillators," *Proceedings of the IEEE*, vol. 53, no. 11, pp. 1723–1727, 1965.

- [34] Z. Wang, Y. Zhang, Y. Onizuka, and P. R. Kinget, "11.4 a high-accuracy multi-phase injection-locked 8-phase 7ghz clock generator in 65nm with 7b phase interpolators for highspeed data links," in 2021 IEEE International Solid- State Circuits Conference (ISSCC), vol. 64, pp. 186–188, 2021.
- [35] C. Li, R. Bai, A. Shafik, E. Z. Tabasy, B. Wang, G. Tang, C. Ma, C.-H. Chen, Z. Peng, M. Fiorentino, R. G. Beausoleil, P. Chiang, and S. Palermo, "12.2 a 4-channel 200gb/s pam-4 bicmos transceiver with silicon photonics front-ends for gigabit ethernet applications," *IEEE Journal of Solid-State Circuits*, vol. 49, no. 6, pp. 1419–1436, 2014.
- [36] K. Zheng, Y. Frans, K. Chang, and B. Murmann, "A 56 Gb/s 6 mW 300 um2 inverter-based CTLE for short-reach PAM2 applications in 16 nm CMOS," in 2018 IEEE Custom Integrated Circuits Conference (CICC), pp. 1–4, 2018.
- [37] T. Musah, J. E. Jaussi, G. Balamurugan, S. Hyvonen, T.-C. Hsueh, G. Keskin, S. Shekhar, J. Kennedy, S. Sen, R. Inti, M. Mansuri, M. Leddige, B. Horine, C. Roberts, R. Mooney, and B. Casper, "A 4–32 Gb/s Bidirectional Link With 3-Tap FFE/6-Tap DFE and Collaborative CDR in 22 nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 49, no. 12, pp. 3079–3090, 2014.
- [38] D. Schinkel, E. Mensink, E. Klumperink, E. van Tuijl, and B. Nauta, "A double-tail latch-type voltage sense amplifier with 18ps setup+hold time," in 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers, pp. 314–605, 2007.
- [39] A. Sharif-Bakhtiar, M. G. Lee, and A. C. Carusone, "Low-power CMOS receivers for short reach optical communication," in 2017 IEEE Custom Integrated Circuits Conference (CICC), pp. 1–8, April 2017.
- [40] J. E. Proesel, Z. Toprak-Deniz, A. Cevrero, I. Ozkaya, S. Kim, D. M. Kuchta, S. Lee, S. V. Rylov, H. Ainspan, T. O. Dickson, J. F. Bulzacchelli, and M. Meghelli, "A 32 gb/s, 4.7 pj/bit optical link with 11.7 dbm sensitivity in 14-nm finfet cmos," *IEEE Journal of Solid-State Circuits*, vol. 53, no. 4, pp. 1214–1226, 2018.

- [41] F. Bozorgi, M. Bruccoleri, M. Repossi, E. Temporiti, A. Mazzanti, and F. Svelto, "A 26-gb/s 3-d-integrated silicon photonic receiver in bicmos-55 nm and pic25g with 15.2-dbm oma sensitivity," *IEEE Solid-State Circuits Letters*, vol. 2, no. 9, pp. 187–190, 2019.
- [42] K. C. Chen and A. Emami, "A 25-gb/s avalanche photodetector-based burst-mode optical receiver with 2.24-ns reconfiguration time in 28-nm cmos," *IEEE Journal of Solid-State Circuits*, vol. 54, no. 6, pp. 1682–1693, 2019.
- [43] T. Kobayashi, K. Nogami, T. Shirotori, Y. Fujimoto, and O. Watanabe, "A current-mode latch sense amplifier and a static power saving input buffer for low-power architecture," in 1992 Symposium on VLSI Circuits Digest of Technical Papers, pp. 28–29, 1992.
- [44] B. Razavi, "The strongarm latch [a circuit for all seasons]," *IEEE Solid-State Circuits Magazine*, vol. 7, no. 2, pp. 12–17, 2015.