# DESIGN OF POWER-EFFICIENT ANALOG-TO-DIGITAL CONVERTERS AND A MIXED-MODE LOW DROP-OUT REGULATOR 

A Dissertation<br>by<br>DADIAN ZHOU

Submitted to the Office of Graduate and Professional Studies of Texas A\&M University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY

Chair of Committee, Jose Silva-Martinez
Co-Chair of Committee, Aydin Karsilayan
Committee Members, Peng Li
Rainer Fink
Head of Department, Miroslav M. Begovic

August 2019

Major Subject: Electrical Engineering

Copyright 2019 Dadian Zhou


#### Abstract

The increasing demand of portable electronic devices, such as cell phones, biomedical products, smart devices, etc, has been witnessed in recent years. These devices have limited power due to their battery life. Therefore, power-efficient researches and designs for IC chips used in the portable devices become more popular. One hot topic is about a power-efficient high resolution, wide bandwidth analog-to-digital converter (ADC) design. The ADC block is one of the key building blocks in a wireless communication system. The ADC is employed to process baseband signals after a mixer and filters. It converts analog signals into digital format for microprocessors/controllers. Hence, the power consumption of the ADC is important since the ADC is one of the most frequently used building blocks in the wireless communication system. Another widely used blocks are on-chip regulators. They regulate the supply voltages for different parts/cores on a microchip. Nowadays, many applications require different building blocks switching frequently between sleeping and operation modes. In this case, power-efficient on-chip regulators with fast transient response are demanded.

This research consists of three projects. All projects are about power-efficient analog and mixed-signal circuits design. The first research is a 13-bit 260MS/s pipeline ADC using a currentmode (CM) multiplying digital-to-analog converter (MDAC) with a current-reuse technique and interstage gain calibrations. In this pipeline ADC , the CM MDAC architecture is utilized to replace the conventional switch-capacitor (SC) architecture. The CM MDAC employs an operational transconductance amplifier (OTA) converting the voltage input signal into a current output signal. At the same time, the sub-ADC in the CM MDAC solves N bits which drives an N -bit currentsteering DAC. The current residue signal is generated at the output of the DAC and the OTA. Then, a transimpedance amplifier (TIA) is utilized to convert the current residue back to a voltage output for next pipeline stages. To overcome interstage gain errors due to variations, the errors are calibrated in digital domain. Finally, the work achieves a 68.1/66.3 dB signal-to-noise-anddistortion ratio (SNDR) and $82.3 / 78.2 \mathrm{~dB}$ spurious free dynamic range (SFDR) for a sinusoidal


inputs at $4.1736 / 123.129 \mathrm{MHz}$. The total power consumption for the ADC is around 15.38 mW . The Walden figure-of-merit (FoM) is $28.3 \mathrm{fJ} /$ conv-step with low frequency input. The chip was implemented by TSMC 40nm technology. The phototype occupies around $0.28 \mathrm{~mm}^{2}$.

The second project is a system-level design of a time-interleaved ADC with digital background calibrations. In this project, a 4-channel time-interleaved ADC with one additional ADC for calibration is proposed. The calibration algorithm matches the 4 -channel ADCs' outputs with the additional ADC by adjusting their gains, offsets and sampling clock phases. These three types of mismatches and skews are considered as the main errors for a high-speed time-interleaved architecture. The algorithm is implemented and functionally verified by using a field programmable gate array (FPGA) and commercial ADCs (ADS4126).

In the last project, a 245 mA digitally-assisted dual-loop low dropout (LDO) regulator is proposed and implemented in a TSMC40nm process. The purposed digitally-assisted loop is to speed up the transient response of large load variations. In this way, the digital loop maintains the loop speed of the LDO using dynamic current instead of large DC current. However, the digital loop has finite resolution leading to a current quantization error at the output. One of pass transistors in the LDO is turned on/off periodically in a steady state condition. In order to solve the issue, the analog loop is utilized for the steady state condition. It regulates small load changes. The digital loop is activated for tracking large load steps only. The digitally-assisted dual-loop LDO achieves 245 mA maximum load current. The power supply rejection (PSR) is -48 dB at low frequency and -43 dB at 1 MHz for a 240 mA load respectively. The LDO with low load current still shows -34 dB rejection at 1 MHz . The quiescent current is approximate to $300 \mu \mathrm{~A}$. The measured load transient tests indicate that the LDO has $71 \mathrm{mV} / 37 \mathrm{mV}$ voltage droops under a rising/falling edge of the maximum current step. The FoM based on the results is 7.4 ps which is highly competitive with recently published LDO designs.

## DEDICATION

To my parents Cheng Zhou and Jie Chen, my wife Yifan Cao.

## ACKNOWLEDGMENTS

At the moment approaching the final step of my PhD studies, I would like to thank all the people who give me supports and encouragements during my graduate student life.

First of all, I would like to appreciate my supervisor, Dr. Jose Silva-Martinez, for his supports from both academic and financial perspectives. He answered all my academic questions with his endless patience and deep understanding and experience in the area of circuits and systems. During our projects' discussions, he also showed me how to be an excellent designers with problem solving skills, intuitive circuit-analysis abilities and critical thinking habits. I strongly believe these knowledge and experiences will benefit my future career. He will always be a positive example for me to learn from and to catch up.

I would like to thank Dr. Aydin Karsilayan, Dr. Peng Li and Dr. Rainer Fink for being my committee members. In my first year, I take courses with Dr. Karsilayan and Dr. Li who bring me to the areas of VLSI and analog circuit design. Those courses give me foundations for my future researches.

I was very lucky to join Dr. Silva's research group. Here I met many friends. They are all very nice to me. Carlos Briseno-Vidrios, Qiyuan Liu, Alexander Edward, Negar Rashidi, Mohanmmad Naderi, Suraj Pankras, Eric Park, Juning Jiang, Jian Shao, Tanwei Yan, Sungjun Yoon, I want to thank you all for your co-operations. I really got a lot of benefits from co-operating and discussing with you.

During my academic life in TAMU, I also met many students from other groups and master students. Xin Zhan, Tao Chen, Aditya Bommireddipalli, Junzhi Yu, Yun Qi, Xin Chang, Shan He, thank you all.

Many thanks to Tammy Carda, Anni Brunker, Ella Gallagher and Katie Bryan at the Department of Electrical Engineering for your patience and help during my academic years.

To my parents, thank you for your encouragement and support during my stay in TAMU. You give me infinite confidence and power to overcome many difficulties. I am very proud to be your
son.
To my beloved wife, Yifan (Chloe) Cao. We met in China and went to the UK for master studies together. Then we went to Texas. I was enrolled in TAMU and she joined UT at Dallas for your second master degree in accounting. We had to be separated for five years. But I can feel your heart. Thank you for your company. Looking forward to our future together. I love you.

Finally, I would like to thank all people who support me for the achievement of this degree.

## CONTRIBUTORS AND FUNDING SOURCES

## Contributors

This work was supervised by a dissertation committee consisting of Professors Jose SilvaMartinez, Aydin Karsilayan and Peng Li of the Department of Electrical and Computer Engineering and Professor Rainer Fink of the Department of Engineering Technology and Industrial Distribution.

The experimental results analyzed for Chapter 3 were provided in part by Professor Claudio Talarico of the Department of Electrical and Computer Engineering at Gonzaga University.

All other work conducted for the dissertation was completed by the student independently.

## Funding Sources

This work was made possible in part by National Science Foundation under Grant Number 1404890 and 1509872.

## NOMENCLATURE

| ADC | Analog-to-Digital Converter |
| :--- | :--- |
| CM | Current Mode |
| MDAC | Multiplying Digital-to-Analog Converter |
| SC | Switch-Capacitor |
| OTA | Operational Transconductance Amplifier |
| TIA | Trans-Impedance Amplifier |
| SNDR | Signal-to-Noise-and-Distortion Ratio |
| SQNR | Signal-to-Quantization-Noise Ratio |
| SFDR | Spurious Free Dynamic Range |
| FoM | Figure-of-Merit |
| TI | Time Interleaved |
| FPGA | Field Programmable Gate Array |
| LDO | Low Drop-Out |
| PSR | Power Supply Rejection |
| SAR | Successive-Approximation-Register |
| GBW | Gain Bandwidth Product |
| LSB | Least Significant Bit |
| MSB | Most Significant Bit |
| RA | Residue Amplifier |
| PVT | Process, Voltage, Temperature Plane |
| LHP | RHP |


| CMFB | Common Mode Feedback |
| :--- | :--- |
| PSO | Particle Swarm Optimization |
| DUT | Device Under Test |
| FFT | Fast Fourier Transform |
| DDS | Digital detection system |
| DCDL | Digitally-Controlled Delay Line |
| EF | Error Function |
| VHDL | VHSIC Hardware Description Language |
| ASIC | Application-Specific Integrated Circuit |
| PRG | Pseudo Random Generator |
| IoT | Internet-of-Things |
| EA | Error Amplifier |
| SR | Slew Rate |
| LC | Loop Controller |
| FSM | Finite State Machine |
| SRB | Shift-Register-Based |

## TABLE OF CONTENTS

## Page

ABSTRACT ..... ii
DEDICATION ..... iv
ACKNOWLEDGMENTS ..... v
CONTRIBUTORS AND FUNDING SOURCES ..... vii
NOMENCLATURE ..... viii
TABLE OF CONTENTS ..... x
LIST OF FIGURES ..... xii
LIST OF TABLES ..... xvi

1. INTRODUCTION ..... 1
1.1 Motivation of Power-efficient Circuit Designs ..... 1
1.1.1 Power-efficient Pipeline and Time-interleaved ADCs ..... 2
1.1.2 Power-efficient Low Dropout Regulator Design ..... 3
1.2 Thesis Organization ..... 4
2. A 13-BIT $260 \mathrm{MS} / \mathrm{S}$ POWER-EFFICIENT PIPELINE ADC WITH CURRENT-MODE MDACS EMPLOYING A CURRENT-REUSE TECHNIQUE AND INTERSTAGE GAIN AND NONLINEARITY ERRORS CALIBRATION ..... 5
2.1 Introduction ..... 5
2.2 Current-Mode MDAC Architecture with a Current-Reuse Technique ..... 11
2.3 Systematic Errors for a Pipeline ADC with CM MDACs ..... 16
2.3.1 Comparator Offset Errors ..... 16
2.3.2 Interstage Gain and Nonlinearity Errors ..... 18
2.4 Circuit Implementations ..... 20
2.4.1 Implementation of a PMOS Current-steering DAC and an NMOS OTA ..... 20
2.4.2 Implementation of a TIA Using a Two-stage Amplifier with Feedforward Compensation ..... 29
2.4.3 Implementation of a 4-bit Flash ADC ..... 36
2.5 Calibration of Interstage Gain and Nonlinearity Errors ..... 39
2.6 Measurement Results ..... 42
2.7 Conclusion ..... 52
3. A 4-CHANNEL TIME-INTERLEAVED ADC WITH DIGITAL BACKGROUND CAL- IBRATION USING A DIGITAL-CIRCUIT-BASED OPTIMIZATION ALGORITHM ..... 53
3.1 Introduction ..... 53
3.1.1 TI ADC Architecture ..... 53
3.1.2 TI ADC Issues ..... 54
3.1.3 Effects of Timing Skews in a 4-channel System ..... 57
3.2 Proposed Time-interleaved ADC Architecture ..... 62
3.3 Circuit Implementation of the Digital Background Calibration ..... 67
3.3.1 Particle Swarm Optimizer (PSO) ..... 67
3.3.2 Pseudo Random Number Generator ..... 69
3.4 Experimental Results ..... 71
3.4.1 Simulation Results of the TI ADC with/without Calibration ..... 71
3.4.2 Circuit Implementation and Simulation Results for Digital Detection Sys- tem in the TI ADC ..... 74
3.5 Conclusion ..... 77
4. A 245 MA DIGITALLY-ASSISTED DUAL-LOOP LOW DROPOUT REGULATOR ..... 78
4.1 Introduction ..... 78
4.1.1 Conventional Analog LDOs ..... 78
4.1.2 Digital LDOs ..... 80
4.2 Proposed Dual-loop Low Dropout Regulator Architecture ..... 83
4.2.1 Circuit Implementation of a 3-bit Flash ADC ..... 85
4.2.2 Circuit Implementation of Pass Transistor Units ..... 87
4.2.3 Loop Controller ..... 88
4.3 Digital Loop Analysis and Operations ..... 92
4.3.1 Shift-Register-Based Digital LDO Architecture ..... 92
4.3.2 ADC-Based Digital LDO Architecture ..... 94
4.4 Analog Loop Operations ..... 97
4.4.1 Stability of the Analog Loop ..... 99
4.5 Measurement Results ..... 102
4.6 Conclusion ..... 108
5. CONCLUSION ..... 109
REFERENCES ..... 111

## LIST OF FIGURES

## FIGURE

Page
1.1 Applications for Required Bandwidth and Resolution ..... 1
1.2 Resolution and Bandwidth for Different Types of ADCs ..... 2
2.1 A Conventional Pipeline ADC Architecture ..... 6
2.2 A Conventional Flip-Around Switch-Capacitor MDAC with a Half Bit Redun- dancy for Comparator Offsets ..... 7
2.3 (Reprinted from [1]) A Pipeline ADC with Virtual Ground Buffers ..... 10
2.4 A Proposed Current-Mode MDAC Architecture with Current-Reuse Technique ..... 11
2.5 Minimum $G_{m}$ Versus the Number of Bits Solved by the MDAC stage ( $G_{m 2}$ means the minimum transconductance required by the SC MDAC solving 1.5-bit/stage. $N=2$ for 1.5-bit/stage.) ..... 13
2.6 Total Power Consumption for Both 3.5-bit/stage SC and CM MDACs (Power con- sumption is normalized to the RA's power in the CM MDAC) ..... 14
2.7 A 13-bit Pipeline ADC with Three 3.5-bit/stage CM MDACs and a 4-bit sub-ADC. . ..... 16
2.8 A Residue Curve for a 3.5-bit/stage MDAC without/with Comparator Offset Errors ..... 17
2.9 A Transfer Curve for a Pipeline ADC with Interstage Gain and Nonlinearity Errors ..... 18
2.10 Circuit Implementation of a PMOS Current-steering DAC and an NMOS OTA ..... 21
2.11 Small-signal Equivalent Circuits for the OTA ..... 22
2.12 Small-signal Noise Model of the OTA+DAC Architecture ..... 24
2.13 Equivalent Circuit from All Noisy Components: a) Noise from the PMOS Bias Transistors and the Current-steering DAC; b) Noise from Transistor $M_{1}$; c) Noise from Transistor $M_{n, 1}$ and the Source Degeneration Resistor ..... 25
2.14 Power Consumption of the Current-steering DAC and the OTA with/without Current- reuse Technique ..... 28
2.15 A TIA Using a Two-stage Amplifier with Feedforward Compensation ..... 29
2.16 Circuit Implementation of the Two-stage Amplifier ..... 32
2.17 Equivalent Circuit of the CMFB loop a) CMFB Loop b) Equivalent Loop from the Breaking Point ..... 33
2.18 Frequency Response of the CMFB Loop with/without $2 C_{c 1}$ and $R_{f b}$ ..... 35
2.19 Circuit Implementation of the 4-bit Flash ADC ..... 36
2.20 Circuit Implementation: a) Pre-amplifier; b) Strong-arm Latch and c) Optimized SR Latch ..... 37
2.21 Off-line Interstage Gain and Nonlinearity Calibration between the First and the Second Pipelined Stages ..... 39
2.22 Flow Chart of the PSO Algorithm for Interstage Gain and Nonlinearity Errors Cal- ibration ..... 41
2.23 Die Photo of the Proposed ADC ..... 42
2.24 Test Bench of the Proposed ADC ..... 43
2.25 Measured Results for the Output Spectrum with an Input Tone at $4.1736-\mathrm{MHz}$ before Interstage Gain and Nonlinearity Calibration ..... 44
2.26 Measured Results for the Output Spectrum with an Input Tone at $4.1736-\mathrm{MHz}$ after Interstage Gain and Nonlinearity Calibration ..... 45
2.27 Measured Results for the Output Spectrum with an Input Tone at $123.129-\mathrm{MHz}$ after Interstage Gain and Nonlinearity Calibration ..... 46
2.28 Measured SNDR/SFDR against Normalized Input Amplitude in dB at 4.1736-MHz ..... 47
2.29 Measured SNDR/SFDR against Input Frequency with Full-scale Amplitude ..... 48
2.30 Measured DNL before/after Calibration ..... 49
2.31 Measured INL before/after Calibration ..... 50
3.1 A General M-channel Time-interleaved ADC ..... 54
3.2 Power Consumption vs Bandwidth for M-channel Time-interleaved ADCs and Single-channel ADCs ..... 55
3.3 FFT of a 4-channel TI ADC with/without Offset Mismatches Effect ..... 56
3.4 FFT of a 4-channel TI ADC with/without Gain Mismatches Effect ..... 57
3.5 Sampling Process of the Input Signal in Time and Frequency Domains ..... 58
3.6 Sampling Process for a 4-channel Time-interleaved System in Time Domain ..... 59
3.7 FFT of a 4-channel TI ADC with/without Timing Skews Effect ..... 60
3.8 Maximum SQNR versus Timing Skews Percentage for the 4-channel System ..... 61
3.9 Proposed 4-channel Time-interleaved ADC Architecture with Digital Background Calibration ..... 63
3.10 Simplified Model of the Time-interleaved ADC System Employing the DDS for Gain, Offset Mismatches and Timing Skews Calibration ..... 63
3.11 Timing Diagram of the Proposed TI ADC ..... 64
3.12 Circuit Implementation of the Pseudo Random Generator: a) 8-bit PRG b) 14-bit PRG ..... 69
3.13 Simulation Results with a $152.832-\mathrm{MHz}$ Input Tone before Calibration ..... 72
3.14 Simulation Results with a $152.832-\mathrm{MHz}$ Input Tone after Calibration ..... 72
3.15 $G_{k}$ Values against the Number of Iterations for 4 sub-ADCs ..... 73
$3.16 C_{k}$ Values against the Number of Iterations for 4 sub-ADCs ..... 73
$3.17 t_{k}$ Values against the Number of Iterations for 4 sub-ADCs ..... 74
3.18 Waveform for the DDS Simulation with Given Stimuli ..... 75
4.1 Conventional Analog LDO Architecture ..... 79
4.2 Conventional Digital LDO Architecture ..... 81
4.3 Proposed Digitall-assisted Dual-loop LDO Architecute: a) Dual-loop LDO b) Ref- erence Levels for the Flash ADC c) Transient Response to a Max Load Step ..... 84
4.4 Implementation of a 3-bit Flash ADC ..... 85
4.5 Circuit Implementation of a Comparator a) Pre-amplifier b) Strong-Arm Latch c) Optimized SR Latch ..... 86
4.6 Circuit Implementation of Pass Transistor Units ..... 87
4.7 Finite State Machine Diagram Employed in the LC ..... 88
4.8 Timing Diagram of the Proposed LDO with a Large Load Current Step ..... 90
4.9 Behavioral Model: a) SRB Digital LDO and b) Transient Response ..... 92
4.10 Behavioral Model for a ADC-based Digital LDO ..... 94
4.11 Circuit Implementation of the Error Amplifier ..... 98
4.12 Analog Loop Equivalent Circuit ..... 100
4.13 Loop Magnitude \& Phase under Different Loads ..... 101
4.14 Die Photo the Proposed Dual-loop LDO ..... 102
4.15 Test Bench Setup of the Proposed Dual-loop LDO ..... 103
4.16 Load Regulation Case a): $I_{\text {load }}$ from 0.5 mA to 240 mA ..... 104
4.17 Load Regulation Case b): $I_{\text {load }}$ from 25 mA to 165 mA ..... 104
4.18 Load Regulation Case c): $I_{\text {load }}$ from 75 mA to 110 mA ..... 105
4.19 PSR Measurement under Different Load Current ..... 106

## LIST OF TABLES

TABLE Page
2.1 Summary of the Two-stage Amplifier.................................................................... 32
2.2 List of Pole-zero Locations for the CMFB Loop ........................................... 35
2.3 ADCs Summary and Comparison ............................................................................

3.1 Pseudo-Random Polynomials .................................................................. 70
3.2 Mismatches and Timing Skews for the 4-channel TI ADC ............................. 71
3.3 Synthesis summary of the DDS on an Altera FPGA ..................................... 76
4.1 Digital \& Analog LDOs' Comparison Table .................................................. 81
4.2 Summary of States \& Inputs for the Finite State Machine ............................... 89
4.3 ADC Output Levels and the Feedback Voltage Range ................................... 96
4.4 Parameters of the EA from Simulations ................................................................ 99

4.6 Comparison of the Proposed LDO with Published State-of-the-art LDOs .............. 107

## 1. INTRODUCTION

### 1.1 Motivation of Power-efficient Circuit Designs

Portable electronic devices are becoming more popular nowdays. They step into people's daily life, such as mobile phones, smart devices, biomedical sensors, IoT devices, etc. All those devices are powered by their batteries. Therefore, the battery life for portable devices is very significant. There are two ways to extend their battery life: one way is to increase the battery capability; the other one is to reduce the power consumption of power-hungry chips and devices in the portable devices. This dissertation focuses on power-efficient designs for critical building blocks on the chip of the electronic devices, such as power-efficient ADCs for wireless communications and linear regulators for power supplies.


Figure 1.1: Applications for Required Bandwidth and Resolution

### 1.1.1 Power-efficient Pipeline and Time-interleaved ADCs

In many smart devices, analog-to-digital converters (ADCs) are necessary parts. An ADC converts an analog input signal into digital output codes. The digital codes can be further processed by micro-processors or controllers. Therefore, ADCs have wide application areas. Figure 1.1 reveals application areas according to the bandwidth and resolution of the ADC.


Figure 1.2: Resolution and Bandwidth for Different Types of ADCs

For biomedical sensors and audio devices, they need high resolution digital data for digital signal processing. The ADCs used in these applications are supposed to be higher than 13-bit resolution. Under this circumstance, the bandwidth for these applications is relaxed. They can be as low as kilo hertz range. There are also some very high-speed applications, such as optical communications and high-speed data communications. The bandwidth of the ADC used in those
applications can reach up to 10 GHz or higher. For applications, such as WiFi, LTE-A , etc, they need medium-to-high resolution and mega-hertz range bandwidth. From the figure, it can be seen that there is a tradeoff between the bandwidth and the resolution for the ADC with certain power consumption.

There are also many types of ADCs invented by researchers and designers. Base on the bandwidth and resolution, Figure 1.2 reveals the most popular ADC architectures. Therefore, different types of ADCs have their own application areas. For example, $\Sigma \Delta$ ADCs are widely used for high resolution applications. They are considered as oversampling ADCs. Compared with other Nyquist rate ADCs, the $\Sigma \Delta$ architecture is able to do noise shaping for quantization noise. Thus, the oversampling ADCs can achieve high resolution. However, the oversampling sacrifices the ADC bandwith on the other hand.

In this project, we expect a power-efficient ADC design for wireless communications. Therefore, a medium-to-high resolution and medium bandwidth ADC architecture should be selected. According to Figure 1.2, pipeline ADC architecture is utilized. The pipeline architecture has residue amplifications throughout pipelined stages. However, the residue amplification is a powerconsuming process. To solve this issue, a pipeline ADC with current-mode MDAC are introduced in chapter 2. This CM MDAC has inherit interstage gain and nonlinearity errors. In the project, a calibration algorithm is proposed for the interstage errors.

To further increase the bandwidth of a single-channel ADC, the time-interleaved ADC architecture is investigated in chapter 3. The project about the TI ADC is a system-level design. An calibration algorithm is proposed for gain mismatches, offset mismatches and timing skews in the TI ADC architecture.

### 1.1.2 Power-efficient Low Dropout Regulator Design

LDOs are widely used to adaptively provide the current to a load. In this way, the voltage supplies used on a large chip are normally regulated by on-chip LDOs. There are many two types of LDOs. The conventional LDO is implemented by using analog circuits. There is an error amplifier in the loop. However, the error amplifier usually consume large amount of power,
especially when the pass transistor size is large. To solve this issue, digital LDOs are proposed.
Based on recent published researches and designs, digital LDOs become more popular. There are several reasons for this trend. First, digital LDOs are mainly implemented by digital circuits. With technology developments, digital circuits get more benefits compared with analog circuits. The transistor size is smaller with technology scaling. The speed of a single transistor becomes faster and the power consumption is getting lower. Then, digital circuit is portable under different technologies. The re-design work for the digital LDO is much less than the work for the analog one when the process is changed. However, the digital LDO architecture has two main drawbacks: output ripples and poor PSR performance. To solve this issue, a digitally-assisted dual-loop LDO is proposed in this project. The proposed LDO combines some advantages from both sides.

### 1.2 Thesis Organization

This dissertation is organized as follows: Chapter 2 presents a pipeline ADC using currentreuse technology in current-mode MDACs and interstage gain and nonlinearity errors calibration. Both analysis and results are revealed in this chapter. Chapter 3 describes the system-level design of a 4-channel time-interleaved ADC architecture. A background calibration algorithm is proposed. The experimental results are shown at the end of this chapter. Chapter 4 describes the design, analysis and measurement results for a digitally-assisted dual-loop LDO. The dual-loop LDO can provide maximum $245-\mathrm{mA}$ current to the load. The undershoot is around 70 mV for a $240-\mathrm{mA}$ transient step with $300-\mathrm{ns}$ rising time.

## 2. A 13-BIT $260 \mathrm{MS} / \mathrm{S}$ POWER-EFFICIENT PIPELINE ADC WITH CURRENT-MODE MDACS EMPLOYING A CURRENT-REUSE TECHNIQUE AND INTERSTAGE GAIN AND NONLINEARITY ERRORS CALIBRATION*

### 2.1 Introduction

ADCs are significant building blocks for many systems or applications. There are several types of ADC architectures, such as flash, SAR, pipeline, $\Sigma \Delta$, etc. Each of them has special applications based on their specifications. Among all types of ADCs, pipeline ADCs perform medium-to-high resolution (10-14 bits) and medium bandwidth ( $>100 \mathrm{MHz}$ ) [2]. Thus, pipeline architectures are widely used for processing baseband signals in data communication systems.

A conventional pipeline ADC architecture is presented in Figure 2.1. The entire ADC consists of several pipelined stages. Two adjacent stages use complementary clock phases ( $\Phi_{1}$ and $\Phi_{2}$ ) at the same frequency. A small non-overlap time is added between $\Phi_{1}$ and $\Phi_{2}$ to avoid switching errors. In each stage, there are mainly four parts: a sample/hold block, a sub-ADC, a DAC circuit and a residue amplifier. The input signal is sampled by the sample-and-hold circuit. Then, the wbit sub-ADC as in Figure 2.1 quantizes the sampled input and generates w-bit output data. In order to find the residue signal due to quantization of the sub-ADC, the w-bit DAC is employed. The residue signal is obtained by subtracting the DAC output from the sampled input. After that, the residue signal is amplified for the next stage. The pipeline stage is called multiplying DAC (MDAC). In a pipeline ADC, similar or even the same MDACs are utilized for the next stage or other back-end stages. In this way, the design complexity for MDACs can be relaxed, especially for the case that all pipelined stages use the same MDACs.

Compared with other types of ADCs, pipeline architecture shows two main advantages. First, the total number of bits is divided into several stages. At the same time, the throughput of the pipeline ADC has the same rate for each stage. Therefore, it is not necessary for a pipeline ADC

[^0]

Figure 2.1: A Conventional Pipeline ADC Architecture
to solve all bits out in one clock cycle as a flash ADC or to use multiple clock cycles to solve all bits as a SAR architecture. The second benefit is from the residue amplification. The design specifications of sub-ADCs used in the second/back-end stages are relaxed due to large swing of the residue signal after the amplification.

In recent years, the pipeline ADC architecture has a strong competitor which is a SAR architecture especially for low power applications. In a SAR ADC, more digital circuits are utilized, such as comparators and logic controllers, rather than pure analog parts. With technology scaling, digital circuits get more benefits from speed improvement and power consumption reduction. Thus, SAR ADCs are becoming increasingly more popular for power-efficient designs. However, SAR ADCs can not fully replace pipeline ADCs due to their inherit limitations as described in [3]. SAR ADCs use passive charge recombination operations; They require multiple clock cycles to obtain one digital output symbol. Therefore, comparators/DACs with higher speed are necessary components in SAR ADCs. The second challenge is from comparators' resolution and offsets. Without residue amplification, comparators need severe design to achieve higher resolution and better offset tolerance. They may consume high power to overcome the issue. Hence, pipeline ADCs do not have these issues and they are still employed especially for the resolution higher than

10 bits.
According to published researches $[4,5,6,1,7]$, the most power consumption building blocks are residue amplifiers in MDACs for pipeline ADCs. Recent published works focus on reducing power consumption of residue amplifiers without decreasing the overall bandwidth of pipeline ADCs. The power/transconductance requirement of the amplifier in a conventional flip-around switch-capacitor MDAC is analyzed in the following paragraphs.


Figure 2.2: A Conventional Flip-Around Switch-Capacitor MDAC with a Half Bit Redundancy for Comparator Offsets

In Figure 2.2, the MDAC consists of an N-bit sub-ADC and a multiplexer and a flip-around SC circuit. In the SC building block, k unit capacitors $\left(C_{1}, C_{2}, \ldots, C_{k}\right)$ and an amplifier $\left(A_{0}\right)$ are utilized. The clock period of the pipeline stage is $t_{s}$. There are two phases, $\Phi_{1}$ and $\Phi_{2}$, for sampling
and settling respectively. During the sampling phase, all unit capacitors sample the input signal, $V_{i}$. The sub-ADC solves N bits during $\Phi_{1}$ as well. The N -bit codes select proper reference voltages for $C_{2}, \ldots, C_{k}$. During $\Phi_{2}, C_{1}$ is flipped around and connected to the residue output, $V_{\text {res }}$. Based on charge conservation behavior at the virtual ground node of the amplifier, the residue output can be obtained by the following equation after $\Phi_{2}$.

$$
\begin{equation*}
V_{r e s}=A_{C L} \times\left(V_{i}-\frac{d \cdot V_{F S}}{2^{N}}\right) \tag{2.1}
\end{equation*}
$$

In Equation 2.1, $d$ denotes a decimal number from the sub-ADC output, where $d=-\left(2^{N-1}-\right.$ 1), $\ldots, 0, \ldots,\left(2^{N-1}-1\right) ; V_{F S}$ is the full-scale range of the input signal; $A_{C L}$ represents the close loop gain. During the settling phase, $A_{C L}$ can be further expressed as

$$
\begin{equation*}
A_{C L}=\frac{\frac{\sum_{k=1}^{K} C_{k}}{C_{1}}}{1+\frac{1}{A_{0}} \cdot \frac{\sum_{k=1}^{K} C_{k}}{C_{1}}}=\frac{\frac{1}{\beta}}{1+\frac{1}{A_{0} \cdot \beta}} \approx 2^{N-1} \tag{2.2}
\end{equation*}
$$

By observing Equation 2.2, $\beta=C_{1} /\left(\sum_{k=1}^{K} C_{k}\right)$ is the feedback factor; When mismatches among unit capacitors are small, $\beta=1 / 2^{N-1} ; A_{0}$ denotes the low frequency gain of the amplifier. In order to make $A_{C L}$ approach to $2^{N-1}, A_{0}$ is supposed to be very large. In this case, the design specification of the amplifier becomes significant to the entire MDAC stage.

To find the speed of the loop and the accuracy of $V_{\text {res }}$ during the settling phase, it is assumed that the amplifier has a low-frequency gain of $A_{0}$ and a dominant pole at $\omega_{p 0}$. Then, the close loop transfer function is obtained as

$$
\begin{equation*}
A_{C L}(s)=\frac{\frac{A_{0}}{1+s / \omega_{p 0}}}{1+\frac{A_{0}}{1+s / \omega_{p 0}} \cdot \beta}=\frac{A_{0}}{\left(1+A_{0} \cdot \beta\right)\left(1+\frac{s}{\left(1+A_{0} \cdot \beta\right) \cdot \omega_{p 0}}\right)} \tag{2.3}
\end{equation*}
$$

where $A_{0} \cdot \omega_{p 0}$ denotes the low-frequency loop gain and it is supposed to be much larger than 1 . Hence, the close loop gain transfer function has a pole at $\left(1+A_{0} \cdot \beta\right) \cdot \omega_{p 0} \approx A_{0} \beta \cdot \omega_{p 0}$. Under this circumstance, $A_{0} \cdot \omega_{p 0}$ equals the gain bandwidth product of the amplifier (Assume that the amplifier has only one dominant pole before unity gain frequency.) which can be also expressed
as $G B W=G_{m} / C_{L}=A_{0} \cdot \omega_{p 0}$.
During $\Phi_{2}$, the settling time for the residue amplifier should be less than $t_{c l k} / 2$ due to nonoverlapping and slewing time. For example, $t_{c l k} / 3$ is used for the settling time of the residue signal. The settling error for $A_{C L}(s)$ with a step response is supposed to be less than the resolution of back-end stages of the pipeline ADC. Therefore, an unequal relationship between the back-end stages' resolution and the settling error are obtained as

$$
\begin{equation*}
e^{-G B W \cdot \beta \frac{t_{c l k}}{3}} \leq \frac{1}{2^{N_{B}}} \tag{2.4}
\end{equation*}
$$

From Equation 2.4, the total transconductance of the amplifier is then defined as

$$
\begin{equation*}
G_{m} \geq \frac{3 \cdot \ln (2)}{t_{c l k}} \cdot \frac{C_{L} \cdot N_{B}}{\beta} \tag{2.5}
\end{equation*}
$$

where $t_{c l k}$ is the clock period; $C_{L}$ denotes the total load capacitance and is defined by $K T / C$ noise requirement; $N_{B}=N_{A D C}-N$ is the total number of bits solved by back-end stages. In Equation 2.5, $G_{m}$ is positive proportional to the power consumption of the amplifier used in the MDAC. In order to reduce $G_{m}$ value, only $N_{B}$ and $\beta$ can be manipulated since $t_{c l k}$ and $C_{L}$ are set based on design specifications and $K T / C$ noise requirement respectively. Furthermore, the value of $C_{L}$ is also limited by an unit capacitor size. Small unit capacitors do not match well and are not tolerant to parasitic capacitors. Thus, the MDAC performance will be influenced if very small unit capacitors are utilized.

In the conventional SC MDAC, increasing $N$ will reduce $N_{B}$. Then, the minimum $G_{m}$ requirement will be reduced. However, the feedback factor, $\beta=1 / 2^{N-1}$, is inversely proportional to $N$ and in the denominator of Equation 2.5. The benefit from increasing $N$ is limited since both $N_{B}$ and $\beta$ are decreasing.

In order to solve the issue, a virtual ground buffer is added for all unit capacitors except the flip-around capacitor as described in [1]. Figure 2.3 indicates the proposed MDAC architecture. During the settling phase, the input of the unity gain buffer is connected at the virtual ground


Figure 2.3: (Reprinted from [1]) A Pipeline ADC with Virtual Ground Buffers
node of the amplifier. Then, the same voltage change emerges at both plates of $C_{1}$ which means $\Delta V=0$. In this case, the feedback factor, $\beta$, is supposed to be 1 . In real implementations, $\beta$ is approximate to 0.3 because of a non-ideal level-shifting buffer and parasitic capacitors at the virtual ground node. According to measured results in [1], the n-bit MDAC with a virtual ground buffer can improve the power efficiency of the pipeline ADC. Compared with this solution, more power-efficient architecture using current-mode MDACs is proposed in the next sub-section.

### 2.2 Current-Mode MDAC Architecture with a Current-Reuse Technique

In a conventional SC MDAC, the amplifier is a power-hungry device used in the close-loop residue amplifier. The interstage gain is based on the capacitor ratio during the settling phase. By solving more bits $(N>1)$ in the first pipeline stage, the design of back-end stages is relaxed. However, the amplifier in the first stage may consume more power due to the reduction of $\beta$ according to Equation 2.5. To avoid the reduction of $\beta$ while increasing $N$, the current-mode MDAC architecture is proposed in this dissertation as shown in Figure 2.4.


Figure 2.4: A Proposed Current-Mode MDAC Architecture with Current-Reuse Technique

The CM MDAC consists of a sub-ADC, an OTA, a current-steering DAC and a TIA. The input voltage signal is sampled by a sampling capacitor, $C_{i n}$, and then quantized by the sub-ADC. $N$-bit digital output is obtain from the sub-ADC. The current-steering DAC delivering the discrete current output value, $I_{D A C}$, is controlled by the sub-ADC output. At the same time, the OTA converts the sampled voltage into the current signal, $I_{\text {OTA }}$. At the output of the OTA+DAC building block, the
residue current is expressed as

$$
\begin{equation*}
I_{r e s}=I_{O T A}-I_{D A C} \tag{2.6}
\end{equation*}
$$

In Equation 2.6, the single-ended format of $I_{\text {res }}$ is used here for analysis simplicity. $I_{\text {res }}$ represents an AC signal. For DC mode, both PMOS current-steering DAC and NMOS OTA are biased by current mirror sources. Since the OTA employs PMOS transistors only and the OTA uses NMOS components, the DC bias current flowing through the supply voltage to the ground can be reused. No separate bias sources are needed for the DAC or the OTA. Thus, the DC bias current is saved according to the OTA + DAC architecture. Finally, $I_{\text {res }}$ enters the following TIA. The current residue signal will be converted into the voltage output, $V_{\text {res }}$. Thus, $V_{\text {res }}$ can be further sampled and quantized by the similar CM MDAC circuit.

The overall residue signal is obtained as

$$
\begin{equation*}
V_{r e s}=G_{m} \cdot R_{f b} \cdot\left(V_{i}-d \times \frac{I_{L S B}}{G_{m}}\right) \tag{2.7}
\end{equation*}
$$

where $G_{m}$ is the total OTA transconductance; $R_{f b}$ denotes the feedback resistor in the following TIA; $d=0, \pm 1, \pm 2, \ldots$ represents the output code from the sub-ADC; $I_{L S B}$ is the current provided by one current DAC cell.

In the CM MDAC, the TIA is composed of a high-gain amplifier with a feedback resistor. Thus, the feedback factor of the TIA can be defined as

$$
\begin{equation*}
\beta=\frac{R_{O T A} \| R_{D A C}}{R_{O T A} \| R_{D A C}+R_{f b}} \tag{2.8}
\end{equation*}
$$

where $R_{O T A}$ and $R_{D A C}$ are the equivalent output resistance of the OTA and the current-steering DAC respectively. The values of $R_{O T A}$ and $R_{D A C}$ are much larger than the value of $R_{f b}$ since both OTA and DAC provide current output signal. Therefore, $\beta$ can be approximate to a unity ( $\approx 1$ ). Compared with the capacitive feedback in the SC architecture, $\beta$ will slightly reduce with the number of bits solved by the first stage increasing. Based on Equation 2.5, the transconductance
requirement of the amplifier in the CM MDAC is obtained as

$$
\begin{equation*}
G_{m} \geq \frac{3 \cdot \ln (2)}{t_{c l k}} \cdot C_{L} \cdot N_{B} \tag{2.9}
\end{equation*}
$$



Figure 2.5: Minimum $G_{m}$ Versus the Number of Bits Solved by the MDAC stage ( $G_{m 2}$ means the minimum transconductance required by the SC MDAC solving 1.5-bit/stage. $N=2$ for 1.5bit/stage.)

To find the relationship between minimum $G_{m}$ and $N$ bits solved by the first stage, it is assumed that the same MDAC is utilized for the second stage of the pipeline ADC. Then, the load
capacitance, $C_{L}$, due to $K T / C$ noise requirement for the SC MDAC is defined as

$$
\begin{equation*}
C_{L}=\frac{C_{u} \cdot\left(A_{C L}-1\right) \cdot C_{u}}{A_{C L} \cdot C_{u}}+\frac{C_{u} \cdot A_{C L}}{A_{C L}^{2}}=\left(1-\frac{1}{A_{C L}}\right) \cdot C_{u}+\frac{C_{u}}{A_{C L}} \tag{2.10}
\end{equation*}
$$

where $C_{u}$ is the unit capacitor for the first stage. The first term in Equation 2.10 denotes the total equivalent capacitance of the feedback capacitor. The second term represents the equivalent capacitance for the next stage. However, only the second term exists in $C_{L}$ for the CM architecture because of the resistive feedback. To compare the $G_{m}$ requirements for both types of MDACs, the same clock frequency and the same ADC resolutions is assumed. The minimum $G_{m}$ versus $N$ is indicated in Figure 2.5.


Figure 2.6: Total Power Consumption for Both 3.5-bit/stage SC and CM MDACs (Power consumption is normalized to the RA's power in the CM MDAC)

It can be depicted that the residue amplifier (RA) in the CM MDAC requires much smaller $G_{m}$ value compared with the RA in the SC MDAC, especially for solving more bits in the first stage. The RA in the ideal CM MDAC may save large amount of power. However, many factors constrain the power saving for the ideal case. For instance, parasitic capacitance at the output will increase the total load capacitance. The value of $\beta$ is also slightly less than 1 according to Equation 2.8 since the output resistance of the OTA+DAC block can not reach infinity. Moreover, a very small capacitor value will be strongly influenced by parasitics. The capacitor value has a lower limit which also restricts the power consumption reduction of the RA. From Figure 2.5, 3.5$\mathrm{bit} /$ stage indicates a small normalized $G_{m}$. Solving more bits in an MDAC will increase the digital logic complexity and improve the design difficulties of the RA. Thus, the 3.5 -bit/stage MDAC architecture is selected for this project.

Besides residue amplifiers, other building blocks contribute power consumption for both types of MDACs. Figure 2.6 shows the total power normalized to the power of the RA in the current mode. In the 3.5-bit/stage SC MDAC, the RA requires high power budget for the settling phase. Even though the OTA+DAC block consumes more power than the RA in the CM architecture, the total power consumption of the CM MDAC is still less than the SC MDAC's consumption. Therefore, the pipeline ADC with CM MDACs is designed and implemented in this project.

### 2.3 Systematic Errors for a Pipeline ADC with CM MDACs

In the project, 3.5-bit/stage current-mode MDACs are employed as shown in Figure 2.7. The first three stages solve 3.5 bits per stage with a half bit redundancy. The forth stage consists of a 4-bit sub-ADC since no residue amplification is needed for the last stage in the pipeline ADC. The digital codes obtained from four stages are denoted as $D_{1}, D_{2}, D_{3}$ and $D_{4}$.


Figure 2.7: A 13-bit Pipeline ADC with Three 3.5-bit/stage CM MDACs and a 4-bit sub-ADC.

In the pipeline system, the input signal is processed with a half-clock-cycle delay for two consecutive stages. Digital codes from back-end stages should be attenuated based on interstage gain values. For example, 3.5 bits are solved in the first stage. Then, the interstage gain is theoretically 8 . $D_{2}$ should times $1 / 8$ when it is used to reconstruct the digital output. In the design, the reconstruction of $D_{\text {out }}$ is completed off-chip. The output code is obtained as

$$
\begin{equation*}
D_{\text {out }}=D_{1}[n]+\frac{1}{8} \cdot D_{2}\left[n-\frac{1}{2}\right]+\frac{1}{8^{2}} \cdot D_{3}[n-1]+\frac{1}{8^{3}} \cdot D_{4}\left[n-\frac{3}{2}\right] \tag{2.11}
\end{equation*}
$$

### 2.3.1 Comparator Offset Errors

The first type of systematic error existing the CM MDAC is caused by comparators' offsets. The same error also emerges in a conventional SC MDAC. To eliminate/reduce the comparator
offset errors, a half bit redundancy is employed as shown in Figure 2.8.


Figure 2.8: A Residue Curve for a 3.5-bit/stage MDAC without/with Comparator Offset Errors

The offset errors, such as $V_{o f f, 1}, V_{o f f, 2}, V_{o f f, 3}$, etc., are the difference between the real transition and the transition in the ideal residue curve. Using the half bit redundancy, $\pm V_{L S B} / 2$ offset errors can be tolerated, which equals $\pm V_{F S} / 32$. There is an example indicating the comparator offset correction for red crossing points in Figure 2.8. According to the residue curve with offset errors, the digital codes obtained from the current and the next stages are $d_{1}=1$ and $d_{2}=12$; In this case, the final digital output is calculated as

$$
\begin{equation*}
d_{o}=d_{1} \times 8+d_{2}=1 \times 8+12=20 \tag{2.12}
\end{equation*}
$$

If no comparator offset errors, the corresponding digital code obtained from the ideal residue curve is expressed as

$$
\begin{equation*}
d_{o, \text { ideal }}=d_{1, \text { ideal }} \times 8+d_{2, \text { ideal }}=2 \times 8+4=20 \tag{2.13}
\end{equation*}
$$

Compared with Equation 2.12 with Equation 2.13, the results for both cases are the same. Therefore, the half bit redundancy can tolerate maximum $\pm V_{L S B} / 2$ for each comparator in the sub-ADC for 3.5-bit/stage MDACs.

### 2.3.2 Interstage Gain and Nonlinearity Errors



Figure 2.9: A Transfer Curve for a Pipeline ADC with Interstage Gain and Nonlinearity Errors

The interstage gain and nonlinearity errors exist in both CM ans SC pipeline stages. In a SC MDAC, the interstage gain is approximate to the capacitor ratio when the loop gain is large. It is also very linear because of the ratio of two passive elements. However, the interstage gain for a CM MDAC may become inaccurate and non-linear under different conditions.

In the CM MDAC, the interstage gain is denoted by $G_{m} \cdot R_{f b} . G_{m}$ is the total transconductance of the OTA and $R_{f b}$ is the tranimpedance gain of the following TIA. These two parameters are very sensitive under different process, voltage and temperature (PVT) conditions. For a 3.5-bit/stage MDAC, $G_{m} \cdot R_{f b}$ should equals to 8 . After chip fabrication, the interstage gain values can not be exactly 8 and can not be measured directly from the output of MDACs. If $1 / 8$ is still applied for the code reconstruction between two adjacent stages as in Figure 2.7, the interstage gain errors will influence the ADC performance. The transfer curve of the pipeline ADC indicates the interstage gain errors in Figure 2.9

Secondly, nonlinearity errors also exist in the OTA transconductance and the TIA gain. Thus, the ADC performance is limited by nonlinearities in $G_{m} \cdot R_{f b}$ values. The nonlinearity errors are also shown in Figure 2.9.

In this project, both gain errors and nonlinear errors from CM MDACs are considered as systematic errors. The errors are independent with the input signal. To calibrate the gain and nonlinearity errors, an off-line calibration algorithm is proposed in the following section. All inputdependent errors and circuit mismatches in the CM MDAC are minimized by careful designs.

### 2.4 Circuit Implementations

The proposed pipeline ADC with CM MDACs are composed of 4 stages as in Figure 2.7. The first three stages are implemented by 3.5 -bit/stage MDACs. Among them, the $1^{\text {st }}$ stage consumes more power than the following two stages since the first stage requires higher linearity and lower input-referred noise. The $2^{\text {nd }}$ and $3^{r d}$ stages are scaled down to reduce the total power consumption. The $4^{\text {th }}$ stage is implemented by a 4-bit flash ADC since no residue amplification is needed for the last stage. The 3.5-bit/stage MDAC architecture mainly consists of an OTA+DAC building block, a TIA and a flash ADC. These blocks are described in the following sub-sections.

### 2.4.1 Implementation of a PMOS Current-steering DAC and an NMOS OTA

In Figure 2.10, the circuit consists of a PMOS current-steering DAC and an NMOS OTA. The PMOS DAC uses P-type transistors only, while the OTA utilizes the N-type transistors. The OTA and the DAC are combined together for reusing their DC bias current [8]. For AC mode, two devices can still properly. With the current-reuse technique, neither the current-steering DAC nor the OTA need separate bias sources. Thus, the OTA+DAC architecture save static power compared with two standalone devices.

In the PMOS current-steering DAC, there are totally 14 PMOS unit cells. Each cell provide $I_{L S B}$ which is the LSB of the current-steering DAC. In this project, the total bias current provided by the DAC is smaller the OTA bias current. Therefore, two identical bias current sources, $I_{b p}$, are added for proper DC operation. For the NMOS OTA, it is biased by three NMOS transistors, $M_{n, 1}, M_{n, 2}$ and $M_{n, 3}$. The DC current from the three transistors are $I_{n, 1}, I_{n, 2}$ and $I_{n, 3}$. Thus, the relationship between the DAC and the OTA can be defined as

$$
\begin{equation*}
14 \times I_{L S B}+2 \times I_{b p}=I_{n, 1}+I_{n, 2}+I_{n, 3} \tag{2.14}
\end{equation*}
$$

The total bias current for the current-steering DAC in the $1^{\text {st }}$ stage is 1.05 mA . Hence, each DAC cell has $75 \mu \mathrm{~A}$. The total current for the OTA in the $1^{\text {st }}$ stage is set to 1.8 mA . Then, the PMOS bias source from each branch is $375 \mu \mathrm{~A}$. In the $2^{\text {nd }}$ and $3^{r d}$ MDAC stages, the DC bias current is


Figure 2.10: Circuit Implementation of a PMOS Current-steering DAC and an NMOS OTA
reduced due to relaxed design requirements. The total DC current for the OTA is approximately set to 0.9 mA which is the half of the total OTA current in the $1^{s t}$ stage.

The implementation of the NMOS OTA is shown in Figure 2.10. It consists of a common source amplifier with source degeneration resistors, which converts the voltage input ( $V_{\text {in土 }}$ ) into a current output ( $I_{O T A \pm}$ ). The reason for using the degeneration resistors is to improve the OTA


Figure 2.11: Small-signal Equivalent Circuits for the OTA
linearity. For simplicity, half circuit analysis is used for the fully-differential OTA and the smallsignal equivalent circuit is presented in Figure 2.11. Thus, the small-signal transconductance gain of the OTA is obtained as

$$
\begin{equation*}
G_{m}=\frac{I_{O T A}}{V_{i n}}=-\frac{g_{m 1} \times\left(\frac{1}{R_{1}}+\frac{1}{r_{o n, 1}}\right)}{g_{m 1}+\frac{1}{R_{1}}+\frac{1}{r_{o 1}}+\frac{1}{r_{o n, 1}}} \tag{2.15}
\end{equation*}
$$

where $g_{m 1}$ is the transconductance of $M_{1} ; R_{1}$ denotes the source degeneration resistor; $r_{o 1}$ and $r_{o n, 1}$ are the output resistors of $M_{1}$ and $M_{n, 1}$ respectively. In the OTA, the output resistors are assumed much larger than $R_{1}$. Thus, Equation 2.15 can be further simplified as

$$
\begin{equation*}
G_{m} \approx-\frac{g_{m 1} \times \frac{1}{R_{1}}}{g_{m 1}+\frac{1}{R_{1}}}=-\frac{g_{m 1}}{1+g_{m 1} \cdot R_{1}}=\frac{g_{m 1}}{1+\eta} \tag{2.16}
\end{equation*}
$$

where $\eta=g_{m 1} \cdot R_{1}$ is called the source degeneration factor. The value of $\eta$ is inversely proportional to the OTA linearity. This behavior will be analyzed in the following paragraphs.

For the CM MDAC, the OTA converts the input voltage into the current output, where the current output can be directly connected to the current-steering DAC to obtain the current residue signal. The OTA transconductance gain is a part of the interstage gain coefficient ( $G_{m} \cdot R_{f b}$ ). Thus, a highly linear OTA is significant to make $G_{m} \cdot R_{f b}$ with less distortions. Then, the final performance of the pipeline ADC may achieve better SFDR/SNDR with low-distorted interstage gains.

According to [9] and [10], the $3^{r d}$ order distortion with the inter-modulation test $\left(I M_{3}\right)$ for a conventional common source amplifier is defined as

$$
\begin{equation*}
I M_{3}=\frac{3 V_{i n, A}^{2}}{128 V_{d, s a t}^{2}} \tag{2.17}
\end{equation*}
$$

$V_{i n, A}^{2}$ represents the signal amplitude at the gate of the transistor used as the common source amplifier; $V_{d, s a t}^{2}$ is the saturation voltage for the transistor. In order to improve the linearity of the OTA, the source degeneration resistors are employed. Hence, $I M_{3}$ of the common source amplifier with source degeneration is obtained as

$$
\begin{equation*}
I M_{3}=\frac{3 V_{i n, A}^{2}}{128(1+\eta)^{3} V_{d, s a t}^{2}} \tag{2.18}
\end{equation*}
$$

In Equation 2.18, $\eta$ is the source degeneration factor as mentioned in Equation 2.16. The value of $I M_{3}$ is proportional to the input signal amplitude and is inversely proportional to $1+\eta$ and $V_{d, s a t}$. Therefore, a large value of $\eta$ can improve the linearity of the OTA. Based on Equation 2.16, the way to increase $\eta$ is to increase $g_{m 1}$ or increase $R_{1}$. Increasing $g_{m 1}$ may improve the total power consumption of the OTA, while increasing $R_{1}$ will result in the higher voltage drop across $R_{1}$ if the same tail current flows into $M_{n, 2}$ from Figure 2.10. Hence, both methods to increase $\eta$ are not free. In this design, $\eta$ is approximately set to 5 for a compromise.

Besides the linearity of the OTA, the proper design for the OTA noise is very important to the


Figure 2.12: Small-signal Noise Model of the OTA+DAC Architecture
performance of the MDAC, even the entire pipeline ADC. In order to further analyze the noise contribution of each building blocks, the small-signal noise model of the OTA+DAC architecture is revealed in Figure 2.12.

In the noise model, $I_{N, b p+}^{2}+I_{N, D A C+}^{2}$ and $I_{N, b p-}^{2}+I_{N, D A C-}^{2}$ denote the current noise sources from two branches of the PMOS bias sources and the current-steering DAC. The noise sources of $M_{1}$ and $M_{2}$ are $I_{N, M 1}^{2}$ and $I_{N, M 2}^{2}$. All NMOS bias transistors ( $M_{n, 1}, M_{n, 2}$ and $M_{n, 3}$ ) contribute $I_{N, n, 1}^{2}, I_{N, n, 2}^{2}$ and $I_{N, n, 3}^{2}$ to the OTA. The noise from two source degeneration resistors are $I_{N, R 1}^{2}$ and $I_{N, R 2}^{2}$. They can be splitted into two parts. For example, one part of $I_{N, R 1}^{2}$ is connected to Node X and the other one is tied to the virtual ground of the fully-differential circuit. Hence, $I_{N, n, 1}^{2}+I_{N, R 1}^{2}$ and $I_{N, n, 3}^{2}+I_{N, R 2}^{2}$ contribute noise to Node X and Node Y in Figure 2.12 respectively. The noise sources, $I_{N, R 1}^{2}+I_{N, n, 2}^{2}+I_{N, R 2}^{2}$, are ignored due to the virtual ground node.

To find the influence of every noisy component for the OTA, the input-referred noise, $V_{N, i n+}^{2}$


Figure 2.13: Equivalent Circuit from All Noisy Components: a) Noise from the PMOS Bias Transistors and the Current-steering DAC; b) Noise from Transistor $M_{1}$; c) Noise from Transistor $M_{n, 1}$ and the Source Degeneration Resistor
or $V_{N, i n-}^{2}$, is investigated. For simplicity, half circuit analysis is applied for the fully-differential circuit. Then, $V_{N, i n+}^{2}$ is obtained as the summation of three parts ( $V_{N, i n 1+}^{2}, V_{N, i n 2+}^{2}$ and $V_{N, i n 3+}^{2}$ ) according to the superposition method. Figure 2.13 indicates the current noise contribution from different components.

The first term, $V_{N, i n 1+}^{2}$, represents the total input-referred noise from the PMOS bias transistors and the current-steering DAC. Based on Figure 2.13 a), $I_{N, b p+}^{2}$ and $I_{N, D A C+}^{2}$ directly flow into the OTA output. The OTA gain is denoted as $G_{m}$. Hence, $V_{N, i n 1+}^{2}$ can be defined as

$$
\begin{equation*}
V_{N, i n 1+}^{2}=\frac{I_{N, O 1}^{2}}{G_{m}^{2}}=\frac{I_{N, b p+}^{2}+I_{N, D A C+}^{2}}{G_{m}^{2}} \tag{2.19}
\end{equation*}
$$

Secondly, $V_{N, i n 2+}^{2}$ is caused by the transistor, $M_{1}$. The small-signal equivalent circuit is shown in Figure 2.13 b). In this case, both the input and the output are grounded and other noise sources are ignored. $I_{N, M 1}^{2}$ is connected between the drain and source terminals of $M_{1}$. Based on the current-divide rule, the total current noise, $I_{N, O 2}^{2}$, can be calculated as

$$
\begin{equation*}
I_{N, O 2+}^{2}=I_{N, M 1}^{2} \times\left(\frac{\frac{1}{R_{1}}}{g_{m 1}+\frac{1}{r_{o 1}}+\frac{1}{R_{1}}}\right)^{2} \approx I_{N, M 1}^{2} \times\left(\frac{1}{1+g_{m 1} R_{1}}\right)^{2} \tag{2.20}
\end{equation*}
$$

Hence, the input-referred noise caused by $M_{1}$ is obtained as

$$
\begin{equation*}
V_{N, i n 2+}^{2}=\frac{I_{N, O 2+}^{2}}{G_{m}^{2}} \approx I_{N, M 1}^{2} \times\left(\frac{1}{1+g_{m 1} R_{1}}\right)^{2} \times\left(\frac{1+g_{m 1} R_{1}}{g_{m 1}}\right)^{2}=\frac{I_{N, M 1}^{2}}{g_{m 1}^{2}} \tag{2.21}
\end{equation*}
$$

Figure 2.13 c ) represents the small-signal equivalent circuit for the current noise from $M_{n, 1}$ and $R_{1}$. The total current noise is obtained as

$$
\begin{equation*}
I_{N, O 3+}^{2} \approx\left(I_{N, n, 1}^{2}+I_{N, R 1}^{2}\right) \times\left(\frac{g_{m 1} R_{1}}{1+g_{m 1} R_{1}}\right)^{2} \tag{2.22}
\end{equation*}
$$

Then, the input-referred noise of the OTA bias transistor $\left(M_{n, 1}\right)$ and the source degeneration resis-
tor $\left(R_{1}\right)$ is calculated as

$$
\begin{equation*}
V_{N, i n 3+}^{2} \approx\left(I_{N, n, 1}^{2}+I_{N, R 1}^{2}\right) \times\left(\frac{g_{m 1} R_{1}}{1+g_{m 1} R_{1}} \cdot \frac{1+g_{m 1} R_{1}}{g_{m 1}}\right)^{2}=\left(I_{N, n, 1}^{2}+I_{N, R 1}^{2}\right) \times R_{1}^{2} \tag{2.23}
\end{equation*}
$$

From Equation 2.19 to Equation 2.23, the overall input-referred noise at the OTA input can be defined as

$$
\begin{array}{r}
V_{N, i n+}^{2}=V_{N, i n 1+}^{2}+V_{N, i n 2+}^{2}+V_{N, i n 3+}^{2} \\
=\frac{I_{N, b p+}^{2}+I_{N, D A C+}^{2}}{G_{m}^{2}}+\frac{I_{N, M 1}^{2}}{g_{m 1}^{2}}+\left(I_{N, n, 1}^{2}+I_{N, R 1}^{2}\right) \times R_{1}^{2} \tag{2.24}
\end{array}
$$

According to Equation 2.24, the $1^{\text {st }}$ term indicates the noise from the current-steering DAC and the PMOS bias source. They are not related to the OTA design. Thus,they are not touched for optimizing the input-referred noise of the OTA. The $2^{\text {nd }}$ and $3^{\text {rd }}$ terms are manipulated to reduce the total noise. Based on noise models for all components (assuming $\gamma \approx 1$ in the TSMC40nm process), the noise contribution from $M_{1}, M_{n, 1}$ and $R_{1}$ is further obtained as

$$
\begin{equation*}
V_{N, i n 2+}^{2}+V_{N, i n 3+}^{2}=\frac{4 k T}{G_{m}} \times\left(1+\frac{\eta^{2}}{1+\eta} \times \frac{g_{m n, 1}}{g_{m 1}}\right) \tag{2.25}
\end{equation*}
$$

In Equation 2.25, $\eta$ denotes the source degeneration factor as in Equation 2.16; $g_{m 1}$ and $g_{m n, 1}$ are the transconductance of $M_{1}$ and $M_{n, 1}$ respectively; $k$ is the Boltzmann constant and $T$ denotes the temperature in Kelvin. For the OTA implementation, the optimal way to reduce the total inputreferred noise is to decreasing the value of $g_{m n, 1}$ since all other parameters in Equation 2.25 (such as $G_{m}, \eta$ and $g_{m 1}$ ) have connections between each other. Touching one parameter may affect the performance of the OTA or the entire MDAC. For example, $G_{m}$ is strongly related to the total transconductance gain of the OTA and its linearity. In this case, $g_{m n, 1}$ is reduced to improve the noise performance of the OTA.

However, a small value of $g_{m n, 1}$ or even $g_{m n, 1}=0$ means the bias current flowing through $M_{n, 1}$ becomes less. More current will flow through $R_{1}$. In this case, the voltage drop across
the resistor will be increased. The voltage headroom and $V_{d, s a t}$ are then reduced. According to Equation 2.18, the OTA linearity will be affected. The same thing also happens to $M_{n, 3}$ and $R_{2}$ for the fully-differential OTA architecture. To balance the noise and the linearity of the OTA, the bias transistors, $M_{n, 1}, M_{n, 2}$ and $M_{n, 3}$, are set to allow $25 \%, 50 \%$ and $25 \%$ of the total bias current of the OTA.


Figure 2.14: Power Consumption of the Current-steering DAC and the OTA with/without Currentreuse Technique

With the current-reuse technique, the total power consumption of the OTA and the currentsteering DAC is saved. From Figure 2.14, the supply voltage of the OTA+DAC architecture is 2.5 V. The OTA needs 1.8 mA bias current, while the DAC requires 1.05 mA . Therefore, the PMOS

DAC reuses 1.05 mA bias current from the OTA. The current-reuse technique saves approximately $37 \%$ power from the OTA and the DAC with separately biased sources.

### 2.4.2 Implementation of a TIA Using a Two-stage Amplifier with Feedforward Compensa-

 tionAfter the OTA+DAC architecture, the TIA is employed to convert the current signal into the voltage output. The TIA is implemented as presented in Figure 2.15 a).

b)

Figure 2.15: A TIA Using a Two-stage Amplifier with Feedforward Compensation

The TIA consists of a two-stage amplifier with a resistive feedback, $R_{f b}$. $C_{f b}$ is used to generate a high-frequency LHP zero to compensate high-frequency poles in the feedback loop. $R_{e q}=$ $R_{O T A} \| R_{D A C}$ is the equivalent output resistor from the previous OTA+DAC building block. The value of $R_{e q}$ is supposed to be much larger than $R_{f b}$ due to the current input signal ( $I_{r e s}$ ). The voltage output ( $V_{r e s}$ ) is also indicated in Figure 2.15 a). Hence, the low-frequency gain of the TIA is defined as

$$
\begin{equation*}
A_{T I A}=\frac{R_{f b}}{1+\frac{1}{A_{1} A_{2}} \cdot \frac{R_{f b}+R_{e q}}{R_{e q}}}=\frac{R_{f b}}{1+\frac{1}{A \cdot \beta}} \approx R_{f b} \tag{2.26}
\end{equation*}
$$

In Equation 2.26, $A=A_{1} A_{2}$ is the DC gain of the two-stage amplifier; $\beta=\frac{R_{e q}}{R_{e q}+R_{f b}}$ denotes the feedback factor. Thus, $A \cdot \beta$ is the open loop gain. A reason design of the TIA should make the open loop gain is much large than 1 . Thus, the TIA gain can be approximate to $R_{f} b$. The feedback resistor is a passive element which is a highly linear component. In order to increase the value of $A \cdot \beta$, the two-stage amplifier is utilized.

However, the stability of the TIA loop becomes a challenge for using the two-stage amplifier. The loop requires a compensation technique to improve the phase margin. The conventional way to compensate a two-stage amplifier is the Miller compensation. It splits the dominant pole with the second pole for the amplifier. Whereas, the Miller compensation sacrifices the bandwidth of the amplifier. When the bandwidth is reduced, the settling time of the close-loop function for a step response may increase according to Equation 2.3.

To compensate the loop without reducing the bandwidth, a feedforward compensation technique is employed as shown in Figure 2.15 b). The two-stage amplifier is achieved through $-g_{m 1}$ and $-g_{m 2}$; The feedforward path $g_{m 3}$ is added for compensation. Under this circumstance, a LHP zero can be generated through the feedforward path. In the figure, $C_{c}$ and $R_{c}$ forms a high-pass filter which is used to decouple low-frequency signals. Thus, the small-signal voltage gain of the
two-stage amplifier with a feedforward path is obtained as

$$
\begin{align*}
A_{v}(s)= & \left(\frac{g_{m 1} R_{1} g_{m 2}}{1+s / \omega_{p 1}}+g_{m 3} \frac{s / \omega_{f}}{1+s / \omega_{f}}\right) \cdot \frac{R_{2}}{1+s / \omega_{p 2}} \\
& =\frac{g_{m 1} R_{1} g_{m 2} R_{2}\left[1+\frac{s}{\omega_{f}}+\frac{s}{K \omega_{f}}+\frac{s^{2}}{K \omega_{p 1} \omega_{f}}\right]}{\left(1+s / \omega_{p 1}\right)\left(1+s / \omega_{p 2}\right)\left(1+s / \omega_{f}\right)} \tag{2.27}
\end{align*}
$$

where $\omega_{p 1}=1 /\left(R_{1} C_{1}\right) ; \omega_{p 2}=1 /\left(R_{2} C_{2}\right) ; \omega_{f}=1 /\left(R_{c} C_{c}\right) ; K=g_{m 1} R_{1} g_{m 2} / g_{m 3}$.
By observing Equation 2.27, there are three poles and two zeros for the voltage gain of the two-stage amplifier. In this design, $\omega_{p 1}$ is set to the dominant pole and $\omega_{p 1} \approx \omega_{f}$ is maintained. Then, the amplifier's voltage gain is simplified as

$$
\begin{equation*}
A_{v}(s) \approx \frac{g_{m 1} R_{1} g_{m 2} R_{2}\left(1+\frac{s}{\omega_{f}}\right)\left(1+\frac{s}{K \omega_{p 1}}\right)}{\left(1+\frac{s}{\omega_{p 1}}\right)\left(1+\frac{s}{\omega_{p 2}}\right)\left(1+\frac{s}{\omega_{f}}\right)}=\frac{g_{m 1} R_{1} g_{m 2} R_{2}\left(1+\frac{s}{K \omega_{p 1}}\right)}{\left(1+\frac{s}{\omega_{p 1}}\right)\left(1+\frac{s}{\omega_{p 2}}\right)} \tag{2.28}
\end{equation*}
$$

Therefore, the pole-zero pair caused by the high-pass filter ( $R_{c}$ and $C_{c}$ ) cancels with each other. The amplifier's gain retains two poles, $\omega_{p 1}$ and $\omega_{p 2}$, and one LHP zero, $\omega_{z 1}=K \omega_{p 1}$. The location of the zero is placed at higher frequency than $\omega_{p 1}$ since $K=g_{m 1} R_{1} g_{m 2} / g_{m 3}$ is normally larger than 1. Thus, $\omega_{z 1}$ is used to compensate $\omega_{p 2}$ to ensure loop stability under PVT variations.

Figure 2.16 reveals the circuit implementation of the two-stage amplifier with a feedforward path. The transistors, $M_{1}$ and $M_{2}$, form the first stage of the amplifier. They are self-biased with a pair of large resistors, $R_{c m 1}$ and $R_{c m 2}$. Two small capacitors, $C_{c m 1}$ and $C_{c m 2}$, are connected in parallel with $R_{c m 1}$ and $R_{c m 2}$. These capacitors are utilized to stabilize the local common-mode feedback loop generated by $R_{c m 1}$ and $R_{c m 2}$. The output of the first stage is then tied to the gates of transistors, $M_{8}$ and $M_{9}$. The transistors form the second stage of the amplifier. The feedfowrad path is realized by reusing the bias transistors of the second stage. From low-frequency perspective, $M_{6}$ and $M_{7}$ are controlled by a common mode feedback (CMFB) circuit to bias the second stage. However, the RC network ( $R_{c 1}, R_{c 2}, C_{c 1}$ and $C_{c 2}$ ) acts as a high-pass filter. The AC signal from the input will pass the filter. Thus, the high-frequency poles and zeros are generated according to Equation 2.28. The common-mode voltage of the second stage is sensed by $R_{c m 3}$ and $R_{c m 4}$. Then,


Figure 2.16: Circuit Implementation of the Two-stage Amplifier
the CMFB loop makes the common-mode output voltage of the amplifier approach to $V_{c m}$.

| Architecture | Differential <br> Gain (dB) | Frequency <br> (MHz) | Bias <br> Current |
| :---: | :---: | :---: | :---: |
| $1^{\text {st }}$ Stage | 21.3 | $f_{p 1}=76.8$ | $127.3 \mu \mathrm{~A}$ |
| $2^{\text {nd }}$ Stage | 19.6 | $f_{p 2}=812$ | 1.55 mA |
| Feedforward <br> Path | - | $f_{z 1}=844.6$ | Reuse bias current <br> from $2^{\text {nd }}$ Stage |

Table 2.1: Summary of the Two-stage Amplifier

Based on the simulation results, the performance of the amplifier is listed in Table 2.1. The $1^{\text {st }}$ stage low-frequency gain is approximate 21.3 dB ; The pole at the $1^{\text {st }}$ stage, $f_{p 1}$, is set around $77 \mathrm{MHz} . f_{p 1}$ is regarded as the dominant pole of the two-stage amplifier. The low-frequency gain and the pole for the $2^{\text {nd }}$ stage are approximate 19.6 dB and 812 MHz . The LHP zero at 845 MHz is generated by the feedforward path. According to Equation 2.28, the LHP zero is used


Figure 2.17: Equivalent Circuit of the CMFB loop a) CMFB Loop b) Equivalent Loop from the Breaking Point
to compensates the $2^{\text {nd }}$ pole. Therefore, The total voltage gain of the amplifier achieves 41 dB . This amplifier is employed in the TIA. A feedback resistor, $R_{f b}$, is applied for the TIA. $C_{f b}$ is also added to generator a LHP high-frequency zero in the feedback loop. The zero compensates the pole generated at the input of the TIA due to parasitic capacitance. Finally, the phase margin of the TIA feedback loop is over $70^{\circ}$. The unity gain frequency is 6.2 GHz .

To make the TIA work properly, the stability of the CMFB loop should also be guaranteed. The equivalent circuit of the CMFB loop is presented in Figure 2.17 a). The CMFB amplifier output, $V_{\text {cmo }}$, is used to bias the $2^{\text {nd }}$ stage transistors ( $M_{6}$ and $M_{7}$ ). In the equivalent circuit, the common-mode gain of the $1^{\text {st }}$ stage is modest as shown in the grey part of Figure 2.17 a). Thus, the CMFB loop is dominant through $M_{6}$. The small-signal equivalent circuit of the CMFB loop is then presented in Figure 2.17 b ). The capacitor $\left(2 C_{c 1}\right)$ and the feedback resistor $\left(R_{f b} / 2\right)$ act as the widely-used Miller capacitor and resistor for the CMFB loop compensation. In order to simplify the analysis and calculations for the CMFB loop, the influence of $C_{c 1}$ and $R_{f b}$ is firstly ignored. Hence, the equivalent CMFB loop transfer function ignoring $C_{c 1}$ and $R_{f b}$ is obtained as

$$
\begin{equation*}
L G(s)=-\frac{g_{m_{-} C M F B} R_{o_{-} C M F B}}{1+s \cdot A_{1}+s^{2} \cdot B_{1}} \times \frac{g_{m 6} R_{o_{-} M 6, M 8}\left(1+s R_{c m 3} C_{c m 3}\right)}{1+s \cdot A_{2}+s^{2} \cdot B_{2}} \tag{2.29}
\end{equation*}
$$

In Equation 2.29, the coefficients $\left(A_{1}, B_{1}, A_{2}\right.$ and $\left.B_{2}\right)$ in two denominators are denoted as

- $A_{1}=R_{o_{-} C M F B} C_{o_{-} C M F B}+2 R_{o_{-} C M F B} C_{i n_{-} M 6}+R_{c 1} C_{i n_{-} M 6}$
- $B_{1}=R_{O_{-} C M F B} C_{O_{-} C M F B} R_{c 1} C_{i n_{-} M 6}$
- $A_{2}=R_{o_{-} M 6, M 8}\left(C_{\text {load }}+0.5 C_{i n_{-} C M F B}\right)+R_{\text {cm3 }}\left(C_{c m 3}+0.5 C_{i n_{-} C M F B}\right)$
- $B_{2}=R_{o_{-} M 6, M 8} R_{c m 3}\left(C_{\text {load }} C_{c m 3}+0.5 C_{\text {load }} C_{\text {in_CMFB }}+0.5 C_{c m 3} C_{\text {in_CMFB }}\right)$

The values of the parameters in the CMFB loop are estimated as follows. $g_{m_{-} C M F B}=920$ $\mu A / V ; R_{o_{-} C M F B}=16 k \Omega ; C_{o_{-} C M F B}=6 \mathrm{fF} ; R_{c 1}=1 k \Omega ; C_{i n_{-} M 6}=240 \mathrm{fF} ; g_{m 6}=18.5 \mathrm{~mA} / \mathrm{V}$; $R_{o_{-} M 6 M 8}=400 \Omega ; C_{l o a d}=320 \mathrm{fF} ; R_{c m 3}=21 \mathrm{k} \Omega ; C_{c m 3}=40 \mathrm{fF} ; C_{\text {in_CMFB }}=20 \mathrm{fF}$.

| Poles/Zeros | Frequency | Stage |
| :---: | :---: | :---: |
| $f_{p 1 \_c m}$ | 19.9 MHz | CMFB Stage 1 |
| $f_{p 2 \_c m}$ | 151.4 MHz | CMFB Stage 2 |
| $f_{p 3 \_c m}$ | 1.214 GHz | CMFB Stage 2 |
| $f_{p 4_{\_} c m}$ | 55.3 GHz | CMFB Stage 1 |
| $f_{z 1 \_c m}$ | 189.5 MHz | CMFB Stage 2 |

Table 2.2: List of Pole-zero Locations for the CMFB Loop


Figure 2.18: Frequency Response of the CMFB Loop with/without $2 C_{c 1}$ and $R_{f b}$

According to Equation 2.29, there are one zero and four poles for the loop gain without $2 C_{c 1}$ and $R_{f b} / 2$. The pole-zero locations are calculated and listed in Table 2.2. The dominant pole obtained from the large values of $C_{i n_{-} M 6}$ and $R_{o_{-} C M F B}$ is located at $19.9 \mathrm{MHz} . f_{p 2_{-} c m}$ and $f_{p 3 \_c m}$
are the $2^{\text {nd }}$ and $3^{\text {rd }}$ poles from the CMFB Stage 2 as presented in Figure 2.17 b). They are located at higher frequencies than the dominant pole. $R_{c m 3}$ and $C_{c m 3}$ in Stage 2 also generate the LHP zero, $f_{z 1 \_c m}$, at 189.5 MHz . The zero is used to compensate the CMFB loop.

However, in order to find the real frequency response of the CMFB loop, the influence of $2 C_{c 1}$ and $R_{f b} / 2$ should be included and is considered as the Miller compensation for the two-stage CMFB loop. The results with the Miller compensation are obtained from the Cadence simulation and are plotted in Figure 2.18. They are compared with the results ignoring $2 C_{c 1}$ and $R_{f b} / 2$. The Miller compensation leads to the reduction of the CMFB loop bandwidth. The phase margin of the loop with $2 C_{c 1}$ and $R_{f b} / 2$ becomes $104^{\circ}$ since the dominant and the second poles are splitted and the LHP zero is generated. Thus, the loop unity gain frequency is 720 MHz . All other poles are regarded as very high-frequency poles. They may not affect the CMFB loop stability.

14 Comparators


Figure 2.19: Circuit Implementation of the 4-bit Flash ADC

### 2.4.3 Implementation of a 4-bit Flash ADC

The flash ADC used in the CM MDAC is composed of 14 comparator cells and a resistor ladder as displayed in Figure 2.19. The output codes of the flash ADC are thermometer codes. The total number of comparators employed for the flash ADC is 14 , instead of 15 , due to half bit redundancy


Figure 2.20: Circuit Implementation: a) Pre-amplifier; b) Strong-arm Latch and c) Optimized SR Latch
used for comparator offset errors according to Figure 2.8. The resistor ladder is implemented by unit resistors. The size of one unit resistor is designed for the matching purpose and for reducing the reference voltage ripples. All reference voltages are generated from the lowest voltage ( $V_{\text {refl }}$ ) to the highest voltage $V_{\text {refh }}$.

In every comparator cell, there are three parts: a pre-amplifier, a strong-arm latch and a optimized SR latch. Figure 2.20 a) reveals the pre-amplifier implementation. With the pre-amplifier, the kick-back noise from the clock signal can be suppressed. The difference between the input signal and the reference is also amplified. Thus, the design of the next strong-arm latch can be relaxed. The implementations of the strong-arm latch [11] and the optimized SR latch [12] are shown in Figure 2.20 b ) and c) respectively. The strong-arm latch can be driven by a clock signal running up to 2 GHz .

### 2.5 Calibration of Interstage Gain and Nonlinearity Errors

In section 2.3, interstage gain and nonlinearity errors are introduced and analysized. These errors are generated between pipelined stages because of large variations of the interstage gain under PVT conditions and the nonlinear product for the OTA gain $\left(G_{m}\right)$ and the TIA gain $\left(R_{f b}\right)$. In order to calibrate these types of errors, an off-line calibration algorithm is proposed as follows.


Figure 2.21: Off-line Interstage Gain and Nonlinearity Calibration between the First and the Second Pipelined Stages

Figure 2.21 indicates the off-online calibration method. For example, $D_{1}$ and $D_{2}$ are the output codes for the $1^{\text {st }}$ stage and the $2^{\text {nd }}$ stage respectively. $V_{\text {res } 1}$ denotes the residue output voltage of the $1^{\text {st }}$ stage. Then, it is sampled and digitized by the sub-ADC in the $2^{\text {nd }}$ stage and expressed as $D_{2}$. When $G_{m} \cdot R_{f b}$ does not exactly equal to 8 or contains nonlinearity errors, such as $3^{\text {rd }}$ or $5^{\text {th }}$ order distortion, $D_{2}$ will inherit these errors through $V_{\text {res } 1}$. To eliminate/reduce the errors, a correction polynomial is utilized for $D_{2}$ to reconstruct the digital output ( $D_{\text {out }}$ ), instead of using a coefficient
$(1 / 8)$ for $D_{2}$ as described in Equation 2.11. The correction polynomial for $D_{2}$ is defined as

$$
\begin{equation*}
f_{K_{2}}\left(D_{2}\right)=K_{2,1} \cdot D_{2}+K_{2,3} \cdot D_{2}^{3}+K_{2,5} \cdot D_{2}^{5} \tag{2.30}
\end{equation*}
$$

From Equation 2.30 and Figure 2.21, $K_{2,1}$ denotes a gain coefficient to compensate the interstage gain error; $K_{2,3}$ and $K_{2,5}$ are $3^{\text {rd }}$ and $5^{\text {th }}$ order cofficients for the interstage nonlinearity errors. The same approach is applied for $D_{3}$ and $D_{4}$. Therefore, the ADC output data, $D_{\text {out }}$, can be obtained as

$$
\begin{equation*}
D_{\text {out }}=D_{1}+f_{K_{2}}\left(D_{2}\right)+f_{K_{3}}\left(D_{3}\right)+f_{K_{4}}\left(D_{4}\right) \tag{2.31}
\end{equation*}
$$

Therefore, if the proper values of the coefficients embedded in correction polynomials are figured out, the interstage gain and nonlinearity errors will be eliminated or reduced at least. However, the coefficients in the polynomials are not in closed-form equations. They can not be directly solved by using mathematical derivations and calculations. To find the coefficients, a test tone is applied at the input of the pipeline ADC as described in Figure 2.21. The coefficients are measured and adjusted by a particle swarm optimization (PSO) algorithm, [13] and [14], according to the test tone. When the coefficients under the test tone are solved, they will be used for all types of input signals.

Figure 2.22 displays the entire procedure for finding the coefficients of corrent polynomials. The objective of the optimization algorithm is to maximize the SNDR of $D_{\text {out }}$ with a given test signal by adjusting the coefficients. Initially, a group of solutions $\left(X_{p}\right)$ for the coefficients are randomly generated by the PSO. Then, the algorithm substitutes $X_{p}$ into Equation 2.30 and Equation 2.31. Thus, a group of SNDR values for $D_{\text {out }}$ can be obtained. Among all groups of solutions, the one with the largest SNDR value is selected as $X_{p, \text { max }}$. It is utilized to create a new group of coefficient solutions, $X_{n p}$. The PSO algorithm proposed in [13] describes the method for generating new solutions. The SNDR values obtained from $X_{n p}$ are compared with the SNDR from $X_{p}$. The groups of coefficients having large SNDR values will be saved in $X_{p}$ and used for iterations. After many iterations, $X_{p, \max }$ gives the maximum SNDR of $D_{\text {out }}$. Therefore, the coefficients in


Figure 2.22: Flow Chart of the PSO Algorithm for Interstage Gain and Nonlinearity Errors Calibration
$X_{p, \max }$ can calibrate interstage gain and nonlinearity errors and are also utilized for any other input signals.

### 2.6 Measurement Results

Figure 2.23 reveals the die photo of the proposed pipeline ADC. The ADC core consists of several building blocks including a clock buffer, sampling capacitors and 4 pipelined stages. The size of the core occupies around $400 \mu m \times 690 \mu m$ which is less than $0.276 \mathrm{~mm}^{2}$. The chip was manufactured in the TSMC40nm 1P8M process.


Figure 2.23: Die Photo of the Proposed ADC


Figure 2.24: Test Bench of the Proposed ADC

A test bench was constructed in order to test the performance of the pipeline ADC. Figure 2.24 indicates the test bench setup. Some commercial power management chips (ADP223) were used to provide low noise power supplies for the device under test (DUT). The analog blocks on the chip use 2.5 V and 1.1 V power supplies and the digital circuit requires 1.1 V supply voltage. The common mode voltage for a balun on the PCB board is also from ADP223. An Agilent signal generator was employed to generate the input signal. In order to reduce the effect of noise and distortion from the signal generator, the input signal was connected to passive narrow-band bandpass filters before going into the balun. For instance, a bandpass filter with a center frequency at 4.15 MHz was utilized for a low-frequency input at 4.1732 MHz . Then, the balun converted the single-ended input signal into the fully-differential input with a common-mode voltage, $V_{c m \_i n}$. The proposed ADC has 4 stages and each stage has 4 bits. To capture all those bits, a high-
speed data capture card (TSW1405EVM) with an internal FPGA were employed. The card is programmable and can capture the 16-bit output codes up to 500 MHz . All digital codes were saved in the card's memory. All clock signals used in the test bench were provided by the Si5341EVB board from Silicon Labs. The Si5341-EVB has an internal master clock. Thus, the clocks for both the DUT and the data capture card were easily synchronized at the same frequency. Finally, the data was read out from the memory and further processed in MATLAB.


Figure 2.25: Measured Results for the Output Spectrum with an Input Tone at $4.1736-\mathrm{MHz}$ before Interstage Gain and Nonlinearity Calibration

To measure the performance of the proposed ADC, a full-scale sinusoidal signal running at


Figure 2.26: Measured Results for the Output Spectrum with an Input Tone at $4.1736-\mathrm{MHz}$ after Interstage Gain and Nonlinearity Calibration
4.1736 MHz is applied at the input of the proposed ADC. With the output codes captured from 4 stages, $D_{\text {out }}$ can be reconstructed according to Equation 2.11. Figure 2.25 indicates the FFT of $D_{\text {out }}$ using 16384 data points without interstage gain and nonlinearity calibration. Based on measured results, the $3^{\text {rd }}$ and $5^{\text {th }}$ order harmonic distortions are -60 dBc and -75 dBc in the output spectrum respectively. The proposed ADC achieves 57.58 dB SNDR and 63.9 dB SFDR before the calibration.

In order to improve the ADC performance, $D_{\text {out }}$ should be reconstructed based on Equation 2.31 and the coefficients in the correction polynomials is obtained by using the PSO algorithm. Figure 2.26 presents the output FFT after the gain and nonlinearity errors calibration. Then, SNDR

FFT 16384 points (after calibration)


Figure 2.27: Measured Results for the Output Spectrum with an Input Tone at $123.129-\mathrm{MHz}$ after Interstage Gain and Nonlinearity Calibration
for the proposed ADC are improved by approximate 10 dB and becomes 68.1 dB . For the SFDR, it reaches 82.3 dB after the calibration. The $3^{r d}$ and $5^{\text {th }}$ harmonic distortions are also suppressed. Higher order distortions are at similar power levels, e.g. lower than -80 dBc .

Figure 2.27 shows the output FFT for a close-to-Nyquist ( 123.129 MHz ) input signal with the interstage gain and nonlinearity calibration. The coefficients used in the calibration are the same as the coefficients obtained from a 4.1736 MHz input. Then, SNDR and SFDR for the 123.129 MHz input achieve 66.3 dB and 78.22 dB respectively. Compared with the results from a low-frequency input, both SNDR and SFDR are slightly lower.


Figure 2.28: Measured SNDR/SFDR against Normalized Input Amplitude in dB at 4.1736-MHz

The ADC dynamic range is characterized by sweeping the amplitude of a sinusoidal input signal running at 4.1736 MHz for the proposed pipeline ADC. Figure 2.28 presents the measured SNDR/SFDR versus the input amplitude normalized to the full-scale input voltage which is


Figure 2.29: Measured SNDR/SFDR against Input Frequency with Full-scale Amplitude
$20 \log _{10}\left(V_{I N} / V_{F S}\right)$. From the figure, the SNDR vs the normalized amplitude has a linear trend. The SFDR shows two discontinuities when the amplitude of the input signal is approximate the resolution of the first and the second stages. Then, Figure 2.29 indicates the results for the ADC performance across the entire Nyquist bandwidth. The SNDR/SFDR values are measured by sweeping the input frequency for a full-scale input signal. The measured SNDR with the sinusoidal input at 123.129 MHz degrades less than 2 dB compared with the SNDR value for low-frequency signals. According to the measured results, a stable behavior throughout the entire ADC bandwidth can be verified.

The differential nonlinearity (DNL) and the integrated nonlinearity (INL) plots before/after calibration are shown inFigure 2.30 and Figure 2.31 respectively. Without the calibration, many missing codes emerge from both DNL and INL plots. DNL is around $\pm 4$ LSBs, while INL is greater than $\pm 5$ LSBs when there is no interstage gain and nonlinearity calibration. However, all


Figure 2.30: Measured DNL before/after Calibration
missing codes disappear after the calibration. Both DNL and INL are limited within $\pm 1$ LSB. Therefore, the linearity performance and the SFDR for the proposed ADC can be significantly improved.

The performance summary and comparisons for some recently published ADC architectures are listed in Table 2.3. The summary table collects the most power efficient Nyquist ADCs with more than 10 bits resolution around 250 MHz . Among all those ADCs, this work achieves competitive performance, especially for its high SNDR/SFDR. Both types of FoMs (Walden and Schreier) for the proposed ADC are among the best within the range of $200 \mathrm{MS} / \mathrm{s}$ to $300 \mathrm{MS} / \mathrm{s}$.

The proposed pipeline ADC consumes totally 15.38 mW . Table 2.4 presents the power consumption and the percentage for every building block. The most power consuming part of the ADC is from the first MDAC stage due to its severe design specification. In every pipelined stage, the OTA+DAC architecture consumes the most power percentage compared with other building


Figure 2.31: Measured INL before/after Calibration

| Publications | This Work | [7] | [1] | [15] | [6] | [16] | [17] |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | JSSC18 | JSSC15 | JSSC16 | JSSC17 | JSSC15 | JSSC18 |
|  |  | C. Briseno | H. Boo | Y. Zhu | H. Huang | Y. Lim | K. Chang |
| Architecture | Current mode | Current mode | $\begin{gathered} \text { Virtual } \\ \text { GND Ref } \end{gathered}$ | Time Interleave | Dynamic Amp | Ring Amp | SAR |
| Process (mm) | 40 | 40 | 65 | 65 | 65 | 65 | 40 |
| Sampling rate (MHz) | 260 | 200 | 250 | 450 | 330 | 100 | 150 |
| SNDR (dB) | 68.1 | 61.3 | 65 | 60.8 | 67.7 | 57.9 | 61.7 |
| SFDR (dB) | 82.3 | 74 | 84.6 | 70 | 83.4 | 65 | 74.4 |
| Full Scale (Vppd) | 1.25 | 1 | 1.5 | 1.2 | - | 2 | - |
| Power (mW) | 15.38 | 8.4 | 49.7 | 7.4 | 6.23 | 2.46 | 1.5 |
| Resolution (bits) | 13 | 11 | 12 | 11 | 12 | 10.5 | 12 |
| FoM ${ }^{\text {a }}$ ( $\mathrm{fJ} /$ /conv-step) | $\begin{aligned} & \text { 28.3@low } \\ & \text { 34.8@Nyq } \end{aligned}$ | $\begin{aligned} & 44 @ \text { low } \\ & 50 @ N y q \end{aligned}$ | 108.5 | $\begin{aligned} & \text { 21@low } \\ & 32 @ N y q \end{aligned}$ | $\begin{aligned} & 9.5 @ \text { low } \\ & 15.4 @ \mathrm{Nyq} \end{aligned}$ | $\begin{aligned} & \text { 38.4@low } \\ & 44.5 @ N y q \end{aligned}$ | $\begin{aligned} & \hline \text { 10.3@low } \\ & \text { 18.9@Nyq } \end{aligned}$ |
| $\mathrm{FoM}^{\mathrm{b}}$ (dB) | 167 | 161 | 159 | 165.6 | 172 | 161 | 168.7 |
| Calibration | Off-chip | Off-chip | Off-chip | On-chip | - | - | Off-chip |
| Supply (V) | 1.1/2.5 | 1.1/1.8 | 1.2 | 1.2 | 1.25-1.35 | 0.75/1.2 | 0.9 |

${ }^{\text {a }}$ Walden FoM $=$ Power $/\left[2 \cdot B W \cdot 2^{(S N D R-1.76) / 6.02}\right]$
${ }^{\mathrm{b}}$ Schreier FoM $=D R(d B)+10 \cdot \log \left(f_{\text {snyq }} / 2 /\right.$ Power $)$

## Table 2.3: ADCs Summary and Comparison

| Components |  | Power (mW) | Percentage (\%) |
| :---: | :---: | :---: | :---: |
| Stage 1 | OTA+DAC | 4.5 | 29.26 |
|  | Sub-ADC | 0.31 | 2.02 |
|  | TIA | 2.1 | 13.64 |
| Stage 2 | OTA+DAC | 2.3 | 14.95 |
|  | Sub-ADC | 0.31 | 2.02 |
|  | TIA | 1.21 | 7.87 |
|  | OTA+DAC | 2.3 | 14.95 |
|  | Sub-ADC | 0.31 | 2.02 |
| Stage 4 (Sub-ADC) | 1.21 | 7.87 |  |
| Clock Buffer |  | 0.31 | 2.02 |
| Total |  | $\mathbf{1 5 . 3 8}$ | 3.38 |

${ }^{\text {a }}$ The power of resistive ladders is included in all subADCs.

Table 2.4: Summary of Power Consumption
blocks. It can be depicted that the power consumption of the TIA is relatively small as discussed in section 2.2.

### 2.7 Conclusion

To summarize, a 13-b 260MS/s pipeline ADC with current-mode MDACs is proposed in this paper. The ADC is implemented in the TSMC40nm technology. The PMOS current-steering DAC and the NMOS OTA are connected according to the current-reuse technology. Compared with biasing two building blocks separately, the DC bias current can be reduced. To improve the TIA linearity and accuracy, the TIA uses a two-stage amplifier with a feedforward compensation technique. For CM MDACs, the gain and nonlinearity errors among pipelined stages are calibrated by using the PSO algorithm. Based on the proposed technique, the DNL/INL of the ADC are both under $\pm 1$ LSB with calibration. The SNDR presents over 68.2 and 66 dB at low and close-to-Nyquist frequencies. A 28.3 fJ/conv-step Walden FoM and a 167 dB Schreier FoM are demonstrated by the proposed ADC when measured at low frequencies. The chip core area is around $0.276 \mathrm{~mm}^{2}$ and the total power consumption is 15.38 mW at $260 \mathrm{MS} / \mathrm{s}$ sampling rate.

## 3. A 4-CHANNEL TIME-INTERLEAVED ADC WITH DIGITAL BACKGROUND CALIBRATION USING A DIGITAL-CIRCUIT-BASED OPTIMIZATION ALGORITHM

### 3.1 Introduction

Over the years, the demand for wide-bandwidth ADCs in many applications, such as in wireless communications, RF circuits and high-speed applications for receivers, has increased dramatically. As described in section 1.1, different types of ADC architectures have their own range of applications. Time-interleaved (TI) ADCs are normally used for very high bandwidth due to their parallel architectures employing other types of ADCs as their sub-ADC. According to many previous researches [18], increasing the bandwidth of a single-channel ADC and at the same time maintain the required resolution may exponentially increase the power consumption of the ADC. To avoid huge amount of power consumption, the ideal of time-interleaved array architecture was firstly proposed by Black and Hodges which can be looked back upon the year of 1980 [19].

### 3.1.1 TI ADC Architecture

For a time-interleaved architecture, multiple ADCs are used and organized in parallel to process the same input. Figure 3.1 displays the general picture for a TI ADC with M channels. All subADCs used in the TI ADC channels are physically identical. The clock frequency, $f_{s}=1 / t_{s}$, for every channel is the same. However, all of them have different phases. The phase difference between adjacent channels is set as $360^{\circ} / M$. Hence, the overall sampling period for the TI ADC is defined as

$$
\begin{equation*}
T_{s}=\frac{t_{s}}{M} \tag{3.1}
\end{equation*}
$$

In this way, the TI ADC increases the sampling rate of the single channel by M times. Theoretically, the total power consumption of the TI ADC is also increased by M times due to ideally identical sub-channels when the power from controllers and multiplexers used in the TI ADC is ignored. However, if the bandwidth of a single-channel ADC is increased by M times, its power consumption should be increased exponentially as described in Figure 3.2. This is the main benefit


Figure 3.1: A General M-channel Time-interleaved ADC
from the TI ADC architecture. Furthermore, any ADC types, such as flash, pipeline, SAR ADCs, can be utilized to implement the sub-ADCs. Thus, the TI architecture is widely used for high-speed ADC designs.

### 3.1.2 TI ADC Issues

Time-interleaved ADCs are the best option for ultra-broadband applications. However, due to process variations and inequalities of routing wires on the TI ADC chip [20] and [21], the TI ADC suffers from mainly three types of issues including offset mismatches, gain mismatches and time skews. Gain and offset mismatches are caused by process variations and are not avoidable in real


Figure 3.2: Power Consumption vs Bandwidth for M-channel Time-interleaved ADCs and Singlechannel ADCs
implementations. Timing skews are induced by different delays due to mismatches of clock buffers and routing wires.

For offset mismatches in TI ADCs, they are considered as systematic errors and are independent with the input signal in both time and frequency domains. For instance, the FFT for a 4channel TI ADC with/without offset mismatches are presented in Figure 3.3. It can be concluded that the spurs generated in the spectrum are at

$$
\begin{equation*}
f_{o f f s e t}=\frac{n}{M} \times F_{s} \tag{3.2}
\end{equation*}
$$

where $F_{s}=f_{s} \times M$ is the total sampling frequency of the TI ADC; M is the number of parallel channels; $n=0,1,2, \ldots$ is an integer number.

Gain mismatches among sub-channels are also systematic errors. Whereas, they are dependent with the input frequency. Figure 3.4 reveals the FFT of a 4-channel TI ADC with/without gain


Figure 3.3: FFT of a 4-channel TI ADC with/without Offset Mismatches Effect


Figure 3.4: FFT of a 4-channel TI ADC with/without Gain Mismatches Effect
mismatches. In the figure, the spurs due to gain mismatches emerge at

$$
\begin{equation*}
f_{\text {gain }}= \pm f_{i n}+\frac{n}{M} \times F_{s} \tag{3.3}
\end{equation*}
$$

where $f_{\text {in }}$ denotes the frequency of the input sinusoidal signal; other parameters are the same as Equation 3.2.

### 3.1.3 Effects of Timing Skews in a 4-channel System

Time skews in a TI system occur during the sampling process. To analyze the influence of time skews, a 4-channel time-interleaved system model is utilized as an example. The sampling process is modeled in both time and frequency domains as described in Figure 3.5.


Figure 3.5: Sampling Process of the Input Signal in Time and Frequency Domains

In the time domain, $x(t)$ denotes the input signal. $s(t)$ is an impulse train function and defined as

$$
\begin{equation*}
s(t)=\sum_{n} \delta\left(t-n t_{s}\right) \tag{3.4}
\end{equation*}
$$

where $t_{s}=1 / f_{s}$ is the time between two adjacent pulses in $s(t)$. Then, the pulse train in frequency domain can be obtained as

$$
\begin{equation*}
s(t)=\sum_{n} \delta\left(t-n t_{s}\right) \stackrel{F}{\Longleftrightarrow} S(f)=f_{s} \sum_{n} \delta\left(f-n f_{s}\right) \tag{3.5}
\end{equation*}
$$

Hence, the sampled output, $x_{s}(t)$, is the product of $x(t) \times s(t)$ in the time domain. In the frequency domain, the output signal is the convolution of two signals.

Figure 3.6 reveals a 4 -channel sampling system. The sampling period for each channel is $t_{s}$. $T_{s}=t_{s} / 4$ denotes the total sampling period of the system. $f_{s}$ and $F_{s}$ are frequencies for the single channel and the entire system respectively. Therefore, timing skews in sampling signals can be defined in frequency domain as

$$
\left\{\begin{array}{l}
s_{1}(t)=\sum_{n} \delta\left(t-n t_{s}-\Delta t_{1}\right) \stackrel{F}{\Longleftrightarrow} S_{1}(f)=\frac{F_{s}}{4} \sum_{n} \delta\left(f-n \frac{F_{s}}{4}\right) e^{-2 \pi \Delta t_{1} f}  \tag{3.6}\\
s_{2}(t)=\sum_{n} \delta\left(t-n t_{s}-\left(T_{s}+\Delta t_{2}\right)\right) \stackrel{F}{\Longleftrightarrow} S_{2}(f)=\frac{F_{s}}{4} \sum_{n} \delta\left(f-n \frac{F_{s}}{4}\right) e^{-2 \pi\left(T_{s}+\Delta t_{2}\right) f} \\
s_{3}(t)=\sum_{n} \delta\left(t-n t_{s}-\left(T_{s}+\Delta t_{3}\right)\right) \stackrel{F}{\Longleftrightarrow} S_{3}(f)=\frac{F_{s}}{4} \sum_{n} \delta\left(f-n \frac{F_{s}}{4}\right) e^{-2 \pi\left(T_{s}+\Delta t_{3}\right) f} \\
s_{4}(t)=\sum_{n} \delta\left(t-n t_{s}-\left(T_{s}+\Delta t_{4}\right)\right) \stackrel{F}{\Longleftrightarrow} S_{4}(f)=\frac{F_{s}}{4} \sum_{n} \delta\left(f-n \frac{F_{s}}{4}\right) e^{-2 \pi\left(T_{s}+\Delta t_{4}\right) f}
\end{array}\right.
$$



Figure 3.6: Sampling Process for a 4-channel Time-interleaved System in Time Domain
where $\Delta t_{1}, \Delta t_{2}, \Delta t_{3}$ and $\Delta t_{4}$ denote timing skews for 4 channels through the sampling system.
Based on the convolution theorem and Equation 3.6, the sampled output signal in frequency domain can be obtained as

$$
X_{s}(f)=F_{s} \sum_{n}\left\{\begin{array}{l}
X\left(f-n F_{s}\right)  \tag{3.7}\\
+X\left[f-\left(n+\frac{1}{4}\right) F_{s}\right] \times \\
\frac{1}{4}\left[(-j)^{\frac{\Delta t_{1}}{T_{s}}}-j \cdot(-j)^{\frac{\Delta t_{2}}{T_{s}}}-(-j)^{\frac{\Delta t_{3}}{T_{s}}}+j \cdot(-j)^{\frac{\Delta t_{4}}{T_{s}}}\right] \\
+X\left[f-\left(n+\frac{1}{2}\right) F_{s}\right] \times \\
\frac{1}{4}\left[(-1)^{\frac{\Delta t_{1}}{T_{s}}}-(-1)^{\frac{\Delta t_{2}}{T_{s}}}+(-1)^{\frac{\Delta t_{3}}{T_{s}}}-(-1)^{\frac{\Delta t_{4}}{T_{s}}}\right] \\
+X\left[f-\left(n+\frac{3}{4}\right) F_{s}\right] \times \\
\frac{1}{4}\left[j^{\frac{\Delta t_{1}}{T_{s}}}+j \cdot j^{\frac{\Delta t_{2}}{T_{s}}}-j^{j^{\frac{\Delta t_{3}}{T_{s}}}}-j \cdot j^{\left.j^{\frac{\Delta t_{4}}{T_{s}}}\right]}\right.
\end{array}\right\}
$$

From Equation 3.7, the first summation term, $X\left(f-n F_{s}\right)$, denotes the original input signal


Figure 3.7: FFT of a 4-channel TI ADC with/without Timing Skews Effect
in frequency domain. The other terms are shifted from the original signal, e.g. the second term, $\frac{1}{4} X\left[f-\left(n+\frac{1}{4}\right) F_{s}\right]$, is obtained by shifting the original term with $F_{s} / 4$. For the third and forth terms, they are shifted with $F_{s} / 2$ and $3 F_{s} / 4$ respectively. The coefficients for all shifted terms are based on timing skews. If there are no timing skews in the system $\left(\Delta t_{1}=\Delta t_{2}=\Delta t_{3}=\Delta t_{4}=0\right)$, all coefficients for the second, third and forth terms can cancel with each other and finally equal to 0 . For instance, the second term can be calculated as

$$
\begin{align*}
X\left[f-\left(n+\frac{1}{4}\right) F_{s}\right] & \times \frac{1}{4}\left[(-j)^{0}-j \cdot(-j)^{0}-(-j)^{0}+j \cdot(-j)^{0}\right] \\
& =X\left[f-\left(n+\frac{1}{4}\right) F_{s}\right] \times \frac{1}{4}(1-j-1+j)=0 \tag{3.8}
\end{align*}
$$

Therefore, the TI sampling system will have aliasing errors since timing skews bring the shifted parts from the origitnal signal as described in Equation 3.7. The aliasing errors are also inputdependent errors which is similar as gain mismatches. When a sinusoidal input signal is used to test a 4-channel TI ADC, the FFT plots with/without timing skews is presented in Fig. The spurs from the FFT appear at

$$
\begin{equation*}
f_{\text {timing }}= \pm f_{\text {in }}+\frac{n}{M} \times F_{s} \tag{3.9}
\end{equation*}
$$



Figure 3.8: Maximum SQNR versus Timing Skews Percentage for the 4-channel System

To find the influence of timing skews for one channel, such as Channel 2, it is assumed that the first channel is a reference channel and the third and forth channels have no timing skews. Thus,
$\Delta t_{1}, \Delta t_{3}$ and $\Delta t_{4}$ equal 0 . The percentage of the timing skew for Channel 2 is $\Delta t_{2} / T_{s}$. According to Equation 3.7, the maximum SQNR is calculated as the ratio for the original signal power over the total power of all shifted signals. Figure 3.8 indicates the maximum SQNR against the percentage of timing skews. The timing skew is very sensitive to the SQNR value, especially for the value larger than 70 dB . When the TI ADC has no timing skews and no gain/offset mismatches, the SQNR value will be limited to the performance of all sub-ADCs.

### 3.2 Proposed Time-interleaved ADC Architecture ${ }^{\star}$

The proposed 4-channel time-interleaved ADC is shown in Figure 3.9. In the architecture, there are 4 sub-ADCs and 1 calibration ADC. All of 5 ADCs have the same input signal, $x(t)$. The digital outputs from all ADCs are then captured by the digital circuit. The objective of the digital block is to match each sub-ADC output with the calibration ADC output through adding a gain and an offset to each sub-ADC output and also adjusting the clock phases for $C L K_{1}, C L K_{2}$, $C L K_{3}$ and $C L K_{4}$. A digital detection system (DDS) is implemented in the TI ADC architecture for reducing mismatches and errors between $A D C_{k}$ and $A D C_{c a l}$. When 4 sub-ADCs match with the calibration ADC , a multiplexer is employed to generate the final output, $y[n]$, from the outputs of 4 sub-ADCs.

The simplified model for the error detection is presented in Figure 3.10. In the digital circuit, the
offset value for the $\mathrm{k}^{t h}$ channel $\left(C_{k}\right)$ and the gain value $\left(G_{k}\right)$ are applied to the output code of $A D C_{k} . C_{k}$ and $G_{k}$ are adjusted by the DDS. For timing skews, the DDS also generates a $d_{k}$ value for the $\mathrm{k}^{\text {th }}$ channel. $d_{k}$ controls a delay line which is able to change the clock phase in a certain range. In order to match $A D C_{k}$ with $A D C_{c a l}$, the DDS evaluates the proper values for $G_{k}, C_{k}$ and $d_{k}$. Thus, an error function is proposed to indicate the difference between each sub-ADC and the

[^1]

Figure 3.9: Proposed 4-channel Time-interleaved ADC Architecture with Digital Background Calibration


Figure 3.10: Simplified Model of the Time-interleaved ADC System Employing the DDS for Gain, Offset Mismatches and Timing Skews Calibration
calibration ADC and is defined as

$$
\begin{equation*}
E F\left(G_{k}, C_{k}, d_{k}\right)=\sum_{m=m_{0}}^{m_{0}+M-1}\left|x_{k}\left(m, G_{k}, C_{k}, d_{k}\right)-x_{c a l}(m)\right| \tag{3.10}
\end{equation*}
$$

where $\operatorname{EF}\left(G_{k}, C_{k}, d_{k}\right)$ denotes the error function (EF); k is the channel index and $k=1,2,3$ or 4 ; $G_{k}, C_{k}$ and $d_{k}$ are the gain mismatch, the offset mismatch and the timing skew for $\mathrm{k}^{\text {th }}$ channel; $x_{k}(m)$ represents the digital output code from $A D C_{k}$, while $x_{\text {cal }}(m)$ is the output code produced by $A D C_{\text {cal }} ; \mathrm{m}$ is the index for digital output samples; $m_{0}$ is assumed to be the starting point; M denotes the total number of digital output codes from both $A D C_{k}$ and $A D C_{c a l} . \mathrm{M}$ in the EF is supposed to be larger than 3 . The reason is that the EF contains 3 unknowns ( $G_{k}, C_{k}$ and $d_{k}$ ). To solve these unknowns, more points used in the EF are needed, otherwise their solutions may not be unique.


Figure 3.11: Timing Diagram of the Proposed TI ADC

For gain and offset mismatches, they can be adjusted in digital domain as described in Figure 3.10. However, the phase adjustment for each sub-ADC should be through the digitallycontrolled delay line (DCDL) in the feedback loop. For a 4-channel system, the ideal phase difference between two adjacent channels is supposed to be $90^{\circ}$. To generate the phase difference, the clock period of $A D C_{c a l}$ is defined as

$$
\begin{equation*}
t_{c a l}=\frac{4 n+1}{4} \times t_{s} \tag{3.11}
\end{equation*}
$$

where $t_{c a l}$ and $t_{s}$ are clock periods for the calibration ADC and sub-ADCs respectively; $n$ is a positive integer number. From the equation, it can be depicted that $t_{c a l}>t_{s}$ which means the calibration ADC runs at lower speed.

The timing diagram of the TI ADC is presented in Figure 3.11. In the timing diagram, $n$ is assumed to be 2 as an example. The first falling edge of $C L K_{\text {cal }}$ is expected to align with the falling edge of $C L K_{1}$. Since $C L K_{\text {cal }}$ has a period of $(9 / 4) \times t_{s}$, the next falling edge of $C L K_{\text {cal }}$ should align with the falling edge of $C L K_{2}$. Then, the falling edges of $C L K_{3}$ and $C L K_{4}$ for the next two clock cycles is supposed to align with $C L K_{\text {cal }}$. For the fifth clock cycle, the falling edges of $C L K_{\text {cal }}$ and $C L K_{1}$ will align again with each other due to their periodicity. Thanks to the $(9 / 4) \times t_{s}$ clock period of $A D C_{c a l}$, we can guarantee exactly $90^{\circ}$ phase difference between two adjacent channels.

Under this circumstance, the background calibration of the TI ADC is equivalent to solve $\left(G_{k}, C_{k}, d_{k}\right)$ in the equation that $E F\left(G_{k}, C_{k}, d_{k}\right)=0$. In other words, the calibration is to find ( $G_{k}, C_{k}, d_{k}$ ) values optimizing the $E F\left(G_{k}, C_{k}, d_{k}\right)$. However, the EF is not a closed-form function. It varies in real time. Thus, a population-based evolutionary algorithm is employed for the background calibration. The optimization algorithm is called particle swarm optimizer (PSO) described in [13]. According to the PSO, the EF can be optimized by solving gain mismatches, offset mismatches and timing skews at the same time.

The calibration algorithm is an input-blind algorithm since the additional ADC can sample the
same input as each sub-ADC at corresponding time based on Figure 3.11. The input signal is absolutely user-defined, such as a sinusoidal tone, a ramp ramp, a PAM/QAM modulation input, etc. Whereas, the user-defined input should exclude a DC input. The reason is that changing $d_{k}$ does not change the EF. The DC input does not contain information against time. In this case, the optimization will not work at least for finding $d_{k}$ values. With the calibration ADC, the normal operation for the 4-channel architecture is not stopped while the calibration is on. Thus, the calibration algorithm can be regarded as a background algorithm which runs for real-time process.

According to Equation 3.10, the total number of samples ( $M$ ) in the EF is supposed to be larger than 3. The small value of $M$ may make the EF sensitive to the variations in $\left(G_{k}, C_{k}, d_{k}\right)$. When the large value of $M$ is used, more error samples will accumulate in the EF through the summation. However, large $M$ increases the cost for obtaining the EF. Therefore, $M$ is selected between 20 and 50 in this project, instead of using the lower limit of 3 .

### 3.3 Circuit Implementation of the Digital Background Calibration

The goal of the calibration is to minimize the EF through finding proper values of $\left(G_{k}, C_{k}, d_{k}\right)$. To achieve this, the particle swarm optimization (PSO) algorithm is implemented in the DDS as displayed in Figure 3.9. The digital-circuit-based PSO is synthesized through a VHDL code.

Gain mismatches $\left(G_{k}\right)$, offset mismatches $\left(C_{k}\right)$ and timing skews $\left(d_{k}\right)$ are three unknowns in the EF. To find the set of ( $G_{k}, C_{k}, d_{k}$ ) values minimizing the EF, the PSO substitutes the set into the EF. If a set of $\left(G_{k}, C_{k}, d_{k}\right)$ making the EF equal to 0 is obtained, the errors in the TI architecture is solved. In practice, the PSO may not find the set which make the EF exactly equal to 0 . However, the set obtained by the PSO can reduce three types of errors at least. Then, the performance of the TI ADC will be improved.

### 3.3.1 Particle Swarm Optimizer (PSO)

PSO is a population-based stochastic optimization technique introduced in [22]. It belongs to evolutionary computation algorithm area, including genetic algorithms (GA), differential evolution, swarm algorithms, etc. In recent years, the PSO has many applications in both academic and industry territories. Compared with a genetic algorithm, PSO requires fewer parameters to optimize an objective function. Hence, it is employed in this project for the digital circuit implementation in ASIC or FPGA chips.

In the project, the PSO algorithm is utilized as an optimizer to minimize the EF by adjusting the set of $\left(G_{k}, C_{k}, d_{k}\right)$ values. Since there are three unknowns in the EF, the PSO faces a three-dimensional optimization issue. The flow of the PSO algorithm for the optimization issue is presented in details as below.

- Randomly generate N particles, $X_{n}=\left(x_{g n}, x_{o n}, x_{t n}\right)$, where n is the index of N particles; $X_{n}$ denotes the $\mathrm{n}^{\text {th }}$ particle (as described in $X_{I}$ in [13]). In each dimension of $X_{n}, x_{g n}$ means the gain mismatch for the $\mathrm{n}^{\text {th }}$ particle. $x_{o n}$ and $x_{t n}$ denote the offset mismatch and the timing skew respectively.
- Second, initially generate another N particles, $Y_{n}=\left(y_{g n}, y_{o n}, y_{t n}\right) . Y_{n}$ has the same di-
mension as $X_{n}$. The newly generated N particles, $Y_{n}$, are used to store the best particles throughout multiple iterations. This means the EF value obtained from the $\mathrm{n}^{\text {th }}$ particle is minimum among iterations.
- The previous best particles are stored in $Y_{n}$ and the current particles are in $X_{n}$. Among all of these particles, the particle achieving the minimum EF value is named the global best particle. $Y_{b}$ is utilized to express the global best particle.
- Then, generate N velocity vectors, $V_{n}=\left(v_{g n}, v_{o n}, v_{t n}\right)$. Initially, all velocity vectors can be set to zero. $V_{n}$ controls the step length for the $\mathrm{n}^{t h}$ particle.
- Substitute the current N particles $\left(X_{n}\right)$ and the previous best particles $\left(Y_{n}\right)$ into the EF.
- Compare the results from all particles. If $E F\left(Y_{n}\right)>E F\left(X_{n}\right), Y_{n}$ is replaced by $X_{n}$. Otherwise, $Y_{n}$ is retained.
- For the global best particle, $Y_{b}$, it is updated with the particle given the minimum EF value throughout iterations.
- According to [13], the update of the velocity vectors and the new groups of particles is defined as

$$
\begin{gather*}
V_{n}=\omega V_{n}+r n d_{n 1} \times c_{1} \times\left(Y_{n}-X_{n}\right)+r n d_{n 2} \times c_{2} \times\left(Y_{b}-X_{n}\right)  \tag{3.12}\\
X_{n}=X_{n}+V_{n} \tag{3.13}
\end{gather*}
$$

Where $c_{1}$ and $c_{2}$ are fixed coefficients; $\omega$ is the weight for the velocity and is usually set between 0.8 and 1.2; $r n d_{n 1}$ and $r n d_{n 2}$ denote two uniformly distributed random numbers in the range of $[0,1]$.

- Eventually, the newly generated particles, $X_{n}$, are substituted into the EF again. The same process is iterated for M times. After M iterations, $\operatorname{EF}\left(Y_{b}\right)$ becomes very small which
may be closed to zero. In this case, $\left(y_{g b}, y_{o b}, y_{t b}\right)$ stored in $Y_{b}$ represents the solution of $\left(G_{k}, C_{k}, d_{k}\right)$ values. In the project, M is set to large than 300 . Thus, gain mismatches, offset mismatches and timing skews in the TI ADC can be reduced/eliminated by $Y_{b}$ after M iterations.


### 3.3.2 Pseudo Random Number Generator

The PSO is a population-based stochastic optimizer. According to Equation 3.12, random coefficients are applied for updating the velocity vectors. The particles generated initially should be random as well. Hence, the random number generation is significant to implement the PSO in digital. In this project, exclusive-OR logic gates and D flip-flops are cascaded and connected in a loop to form the pseudo random generators (PRG). Thus, random numbers in digital circuits can be realized by synthesizing the PRG into ASIC or FPGA chips.


14-bit Pseudo Random Generator


Figure 3.12: Circuit Implementation of the Pseudo Random Generator: a) 8-bit PRG b) 14-bit PRG

| Resolution | Feedback Polynomials | Period |
| :---: | :---: | :---: |
| 8 | $x^{8}+x^{6}+x^{5}+x^{4}+1$ | 255 |
| 14 | $x^{14}+x^{13}+x^{12}+x^{2}+1$ | 16383 |

Table 3.1: Pseudo-Random Polynomials

The PSO algorithm uses different variables, such as $X_{n}, Y_{n}, V_{n}$, etc. For circuit implementation, all these variables are defined as higher resolution digital numbers. In order to reduce the area of digital circuits and the total cost of the silicon area, variables in the PSO algorithm have customsized resolutions. For the gain mismatch and the offset mismatch, 14-bit integer format is utilized. Due to the limited resolution of DCDL, an 8-bit integer is defined for timing skews. Therefore, the PRGs should be able to randomly create 14-bit and 8-bit integer numbers. Figure 3.12 indicates the circuit implementation for the 8-bit and the 14-bit PRGs. The implementation is based on the feedback polynomials as listed in Table 3.1.

### 3.4 Experimental Results

### 3.4.1 Simulation Results of the TI ADC with/without Calibration

To verify the digital background calibration, a 12-bit 4-channel time-interleaved ADC model is built. In the TI ADC model, the total sampling rate is assumed to be 2 GHz . Thus, each sub-ADC runs at $500 \mathrm{MS} / \mathrm{s}$. The calibration ADC is also employed in the ADC model. The sampling period of $A D C_{c a l}$ is set to $9 / 4 \times t_{s}=9 / 4 \times 2 \mathrm{~ns}=4.5 \mathrm{~ns}$. The calibration ADC is regarded as a reference. Some gain mismatches, offset mismatches and timing skews compared with the reference ADC are manually added to all sub-ADCs in the model. Their values are listed in Table 3.2.

|  | $A D C_{1}$ | $A D C_{2}$ | $A D C_{3}$ | $A D C_{4}$ |
| :---: | :---: | :---: | :---: | :---: |
| Gain Mismatches | 1.02 | 0.985 | 0.973 | 1.034 |
| Offset Mismatches | -0.01 | -0.015 | 0.035 | -0.042 |
| Timing Skews (ps) | 15 | -25 | 22.5 | -17.5 |

Table 3.2: Mismatches and Timing Skews for the 4-channel TI ADC

Figure 3.13 reveals the simulation results of the 4-channel TI ADC without calibration for an input tone at 152.832 MHz . It is clear that there are huge distortions for the FFT of the digital output. The highest spur reaches around -30.7 dBc and it is from offset mismatches since the spur is at half of the Nyquist bandwidth. The SNDR is 26.1 dB before calibration.

The simulation results for the same input signal after calibration are presented in Figure 3.14. Under this circumstance, all spurs are reduced heavily and they are all under -83.7 dBc . The SNDR after calibration reaches 77.1 dB . There is 51 dB improvement compared with the simulation results before calibration. It is verified that the PSO algorithm can find the proper values of $\left(G_{k}, C_{k}, d_{k}\right)$ for each channel. Then, the EF is minimized and the influence of three types of errors in the TI architecture can also be reduced.

In the system-level simulations, the number of iterations is set to 500 . The values of $\left(G_{k}, C_{k}\right.$, $d_{k}$ ) against iterations are shown in the following three figures: Figure 3.15, Figure 3.16 and Fig-


Figure 3.13: Simulation Results with a $152.832-\mathrm{MHz}$ Input Tone before Calibration


Figure 3.14: Simulation Results with a $152.832-\mathrm{MHz}$ Input Tone after Calibration


Figure 3.15: $G_{k}$ Values against the Number of Iterations for 4 sub-ADCs


Figure 3.16: $C_{k}$ Values against the Number of Iterations for 4 sub-ADCs


Figure 3.17: $t_{k}$ Values against the Number of Iterations for 4 sub-ADCs
ure 3.17. Taking $d_{k}$ as an example, the $d_{k}$ values finally converge to approximate -15 ps for $A D C_{1}$, +25 ps for $A D C_{2},-22.5 \mathrm{ps}$ for $A D C_{3}$ and -17.5 ps for $A D C_{4}$. These values can be utilized to compensate assumed timing skews as in Table 3.2.

### 3.4.2 Circuit Implementation and Simulation Results for Digital Detection System in the TI ADC

To verify the functionality and the cost of the digital-circuit-based calibration algorithm, the DDS was implemented on an FPGA. A behavioral model for the DDS was also tested by using the Virtuoso Cadence simulator.

In this project, the main TI ADC model consists of 4 sub-ADCs, 1 calibration ADC, the DDS block and the DCDL block as displayed in Figure 3.9. The DDS and the DCDL are only implemented for the calibration. In the model, all ADCs and the DDS circuit were implemented in Verilog-A. The DCDLs was implemented in the transistor-level. The sub-ADC models was running at $2 \mathrm{GS} / \mathrm{s}$ and all ADCs had 12-bit resolution including the calibration ADC. Then, the output data from all ADCs can be obtained from the simulator. The data from the behavior model simu-


Figure 3.18: Waveform for the DDS Simulation with Given Stimuli
lation was used as stimuli for the DDS implemented on the FPGA chip. After that, we compared the results from the DDS on the FPGA and the DDS modeled using the Cadence simulator. The FPGA results were actually obtained through the ModelSim software tool.

The ModelSim simulation results are shown in Figure 3.18. Considering the transistor-level cost, there are only 5 iterations for the behavior model in Cadence. The TI ADC outputs are
loaded into the DDS. $/ \mathrm{tb} / \mathrm{d} 1, / \mathrm{tb} / \mathrm{d} 2$, /tb/d3, /tb/d4 denotes the real-time output code for $d_{k}$. The waveform indicates $A D C_{1}$ is calibrated first, then for the following $A D C_{2}, A D C_{3}$ and $A D C_{4}$. The zoom-in area reveals the detailed information for $A D C_{1}$ within the first iteration. According to the waveform, when the $d_{k}$ value for only one channel is adjusted, all other $d_{k}$ values are temporarily hold.

The DDS clock period was set to 20 ns in the waveform. Thus, the system will spend 1665990 ns for the calibration of 4 channels with 5 iterations. The number of clock cycle here should be around 83300 . Based on the simulation results in subsection 3.4.1, the convergence of ( $G_{k}, C_{k}, d_{k}$ ) values for all sub-ADCs requires approximate 500 iterations. After 500 iterations, more than 50 dB SNDR can be improved for the 12-bit TI ADC. If the DDS synthesized on the FPGA chip is used for the calibration, the total time required the system convergence can be calculated as $1665990 \mathrm{~ns} \times 500 / 5 \approx 166.6 \mathrm{~ms}$. Thanks to the background calibration, the system can be turned on / off any time. Therefore, the DDS only need to active for 166.6 ms . After that, the system can be switched off for power saving operation.

| Analysis \& Synthesis Summary |  |
| :---: | :---: |
| Top-level Entity Name | dds_4ch |
| Family | Cyclone IV E |
| Total logic elements | $13,236 / 114,480(12 \%)$ |
| Total combinational functions | $8,163 / 114,480(7 \%)$ |
| Dedicated logic registers | $5,731 / 114,480(5 \%)$ |
| Total pins | $118 / 529(22 \%)$ |
| Total virtual pins | 0 |
| Total memory bits | $0 / 3,981,312(0 \%)$ |
| Embedded Multiplier 9-bit elements | $78 / 531(15 \%)$ |
| Total PLLs | $0 / 4(0 \%)$ |

Table 3.3: Synthesis summary of the DDS on an Altera FPGA

The DDS was implemented on an Altera FPGA (Cyclone IV E architecture device: EP4CE 115F29C7) in order to evaluate the digital circuit cost. The detailed information is listed in Ta-
ble 3.3. Finally, the logic elements required by the DDS for a 4-channel TI ADC are 13,236 out of 114,480 , which only occupies approximate $12 \%$ silicon area of the Altera FPGA. Therefore, the digital background calibration algorithm is feasible to transistor-level circuit implementation.

### 3.5 Conclusion

The project proposes a 4-channel time-interleaved ADC architecture with digital background calibration in order to extend the bandwidth of a single channel ADC. The three main types of issues emerging in TI ADCs are characterized. To calibrate gain mismatches, offset mismatches and timing skews, the outputs from sub-ADCs are compared with the output from the calibration ADC. A EF is proposed to quantify the difference between the two outputs. In this way, the calibration of the TI ADC becomes optimizing the EF by changing parameters in the EF $\left(G_{k}, C_{k}, d_{k}\right)$. Here, the PSO algorithm is utilized for finding the proper values of $\left(G_{k}, C_{k}, d_{k}\right)$. Both the PSO algorithm and the implementation of the algorithm are verified. The algorithm is feasible for real-time, background process.

## 4. A 245 MA DIGITALLY-ASSISTED DUAL-LOOP LOW DROPOUT REGULATOR

### 4.1 Introduction

In recent years, the usage of portable devices has seen a rapid market growth, e.g. smart phones, smart devices, biomedical equipment or sensors and internet-of-things chips (IoT), etc. All of those devices are powered by batteries. The power of batteries are limited after fully charged. Thus, most of the devices are able to working in different modes in daily use, such as a sleeping mode and an operation mode. Frequent switching between these modes requires fast and accurate control for power supplies of different parts on the chip. Therefore, on-chip power-efficient linear regulators with fast transient response are becoming popular areas for researchers and designers.

Based on [23] and [24], the relationship between the output change ( $\Delta V_{\text {out }}$ ) and the load current change ( $\Delta I_{\text {load }}$ ) is defined as

$$
\begin{equation*}
\Delta V_{\text {out }}=\frac{\Delta I_{\text {load }}}{C_{\text {load }}} \times\left(t_{B W}+t_{S R}\right) \tag{4.1}
\end{equation*}
$$

In Equation 4.1, $C_{\text {load }}$ denotes the load capacitor; $t_{B W}$ is the time associated with the close-loop bandwidth for the LDO; $t_{S R}$ is the time associated with the slew rate of the error amplifier (EA) in the LDO loop. In order to design an LDO with fast transient response at the same time using low quiescent current $\left(I_{Q}\right)$, many researches focus on reducing $t_{B W}$ and $t_{S R}$ as in Equation 4.1 using less DC current.

### 4.1.1 Conventional Analog LDOs

The conventional analog LDO architecture is shown in Figure 4.1. The analog LDO mainly consists of an error amplifer, a large pass transistor which is used to provide large amount of current to its load and feedback resistors. In the architecture, the parasitic capacitance at the pass transistor gate is very large due to its large size. In order to drive the gate of the pass transistor, large amount of DC current are burned by the EA. The power consumption improves the slew rate
and the bandwidth of the negative feedback loop. Thus, the speed of the analog loop is maintained.


Figure 4.1: Conventional Analog LDO Architecture

To reduce the power consumption of high-SR EAs, [25, 26, 27] employ dynamic SR boosting technique. The parasitic capacitance at the pass transistor gate is dynamically charged/discharged when necessary. No extra quiescent current is used for the SR boosting technique. Hence, these types of LDOs consume less DC current for large load steps. The other way to speed up the transient response of the LDO loop is from bulk modulation technique as described in [28] and [29]. The bulk of the pass transistor becomes a signal path to boost the LDO loop speed. The drawback of the bulk modulation is that the extra circuitry and control blocks are not avoidable. Most LDOs using dynamic SR or bulk modulation are output capacitor-less architectures. The LDOs in this topology usually set the dominant pole at the gate of their pass transistors, instead of the output node. The maximum load current for capacitor-less LDOs is usually limited less than 100 mA . Even though the capacitor-less LDO has a large loop bandwidth, the undershoot for a large load current step is huge compared with the LDOs with large load capacitors.

The quiescent current can also be optimized by using adaptive biasing circuits. In [30, 31, 32], the EAs are biased adaptively based on the status of the load. Thus, the $I_{Q}$ changes with respect to the load current requirement. The technique is very useful when the load circuit is usually switched into a power-saving mode. Under this circumstance, the total power consumption over a long period of time for the adaptively biased EA is very low. Furthermore, both dynamic SR and adaptive biasing techniques are utilized in $[23,33,34]$ to further reduce $I_{Q}$ for fast loop transient response. The concern of the adaptive biasing is from the stability of the LDO loop. The loop must be well compensated throughout the load current range.

### 4.1.2 Digital LDOs

In recent years, digital LDOs are becoming increasingly popular compared with conventional analog topology, especially for on-chip applications. There are many types of digital LDO architectures. [35] uses the discrete phase-locked technique in the LDO loop. [36] introduces a successive approximation recursive method for the output regulation. However, the most frequently-used digital LDO architecture is presented in Figure 4.2. It mainly consists of a comparator, a bi-directional shift register, discrete pass transistor units and some feedback resistors.

From the figure, most building blocks in the digital LDO are implemented by digital circuits. This is the reason that the digital LDO is becoming more popular with technology developments. In this case, transistor sizes are getting smaller. Thus, digital circuits will get more benefits compared with analog circuits since both size and power consumption for digital circuits are decreasing. For a single transistor, the intrinsic frequency increases quadratically with technology scaling due to the reduction of parasitic capacitance. Hence, digital LDOs are widely employed for regulating on-chip digital blocks, such as micro-processor, digital controllers, memories, etc.

However, drawbacks for digital LDOs are also obvious. Due to the discrete current provided by pass transistors units, the output of a digital LDO has voltage ripples. Moreover, the PSR for digital type regulators is quite low since all pass elements are used in triode regions. When the pass element is turned on, it acts as a resistor connected between the supply voltage and the output. Thus, any noise at the supply will be inserted to the output. To reduce ripples at the digital LDO


Figure 4.2: Conventional Digital LDO Architecture
output, multiple-loop architectures with coarse and fine tuning loops are proposed in [37] and [38]. [39] uses a hybrid architecture which combines the digital and the analog LDO together.

|  | Digital LDO | Analog LDO |
| :---: | :---: | :---: |
| Dropout Voltage | Low | Medium to High |
| Benefits from <br> Technology Scaling | More | Fewer |
| Area Efficiency | High | Low |
| Loop Compensation | Less Load <br> Dependent | Load Dependent |
| Settling Time | Fast | Slow |
| Slew Rate | Dynamic | Static $\left(I_{E A} / C_{g a t e}\right)$ |
| Current Efficiency | High | Medium to High |
| Power Supply Rejection | Poor | Good |
| Quantization Error | Yes | No |
| Ripples @ Output | Yes | No |

Table 4.1: Digital \& Analog LDOs’ Comparison Table
[37] summarizes and compares two types of LDOs as listed in Table 4.1. From the table, an ultra-low dropout voltage can be applied for the digital architecture since all pass elements are in triode regions when they are activated. Due to the same reason, the area efficiency of a digital LDO is much higher than the analog one. With technology developments, digital circuits are more portable. The same design can be reused under different process. The digital LDO also works faster and more power-efficient due to smaller transistor sizes and lower supply voltages. The slew rates for digital LDOs are dynamic, while the error amplifiers in analog LDOs require static bias current. However, analog LDOs show their merits from high PSR and clear voltage at the output which means no voltage ripples. Compared with digital LDOs, analog LDOs can filter out the noise from the input as shown in Figure 4.1. In practice, $V_{i n}$ is normally from a DC-DC converter output which contains large switching noise. Since the pass transistor is continuously controlled by the EA output, there will be no ripple issues for the analog LDO architecture.

### 4.2 Proposed Dual-loop Low Dropout Regulator Architecture

The proposed digitally-assisted dual-loop (DADL) LDO is presented in Figure 4.3. In the LDO architecture, there are three main parts: a digitally-assisted (DA), an analog loop and a loop controller.

The digital loop is used to do coarse tuning when the load current change is large. In the digital loop, a 3-bit flash ADC and D flip-flops are employed to select the number of activated transistors. The pass transistor is divided into 7 segments. The output of the 3-bit ADC uses thermometer codes to select the transistor units. Thus, the pass transistors provide discrete current to match the load current ( $I_{\text {load }}$ ). $C_{\text {load }}$ denotes the output load capacitor. This capacitor is relatively large, and it is off-chip. According to Equation 4.1, large $C_{\text {load }}$ can reduce the output voltage variations under the same $\Delta I_{\text {load }}$. The feedback signal $\left(V_{f}\right)$ is generated by two relatively large resistors, $R_{1}$ and $R_{2}$. From the voltage divided rule, the feedback factor $(\beta)$ can be obtained as $R_{2} /\left(R_{1}+R_{2}\right)$.

The analog loop in Figure 4.3 a) consists of an error amplifier, the pass transistor units and feedback resistors. The analog loop is used for fine tuning after the completion of the digital loop. There is a switch $\left(T_{1}\right)$ which selects the gate voltage between $V_{B}$ and $V_{A} . V_{B}$ is used for the digital loop and is generated by a fixed voltage source, while $V_{A}$ is from the EA output. The switch is controlled by another building block, the loop controller, in the dual-loop LDO architecture. When $T_{1}$ is closed, the gates of activated pass elements equal $V_{B}$ since the voltage source has a smaller resistance than the output resistor of the EA. When $T_{1}$ is open, the EA will take over the pass transistors. In this case, the proposed LDO operates in the analog mode.

The loop controller is implemented in digital and controls the time of operations for both digital and analog loops. It manipulates the dual-loop system in different states. Details of the block is described in the following sub-section.

Reference levels for the 3-bit flash ADC is displayed in Figure 4.3 b). All levels are generated by a resistor ladder in the ADC. There are totally 7 levels $\left(V_{r 1}, V_{r 2}, \ldots V_{r 7}\right)$ associated with 7 output bits ( $D_{o}[7: 1]$ ). The thermometer output codes from ' 0000000 ' to ' 1111111 ' are directly applied to activate/deactivate 7 transistor units. $V_{r 1}$ and $V_{r 7}$ are overshoot and undershoot threshold


Figure 4.3: Proposed Digitall-assisted Dual-loop LDO Architecute: a) Dual-loop LDO b) Reference Levels for the Flash ADC c) Transient Response to a Max Load Step
voltages. They represent highest and lowest reference levels, respectively. The reference voltage for the analog loop $\left(V_{r e f}\right)$ is set between $V_{r 1}$ and $V_{r 2}$. Thus, the undershoot will be much larger than the overshoot during the coarse tuning when there are large load current variations. Figure 4.3 c) indicates the transient response for the dual-loop LDO. When $I_{\text {load }}$ rises from 1 mA to 240 mA , the digital loop is turned on first for tracking the load variation. Then, the ADC holds the output bits and the analog loop is turned on. In this state, the EA will tune all enabled units. Finally, the LDO enters a steady state and $V_{\text {out }}$ is regulated by the analog loop.

### 4.2.1 Circuit Implementation of a 3-bit Flash ADC



Figure 4.4: Implementation of a 3-bit Flash ADC

The 3-bit flash ADC consists of 7 comparator cells and a resistor ladder as presented in Figure 4.4. The resistor ladder is connected to two reference voltages, $V_{\text {refh }}$ and $V_{\text {refl }}$. The voltages
are provided by off-chip devices in this project. All 7 cells use the same clock signal, Clk. For each comparator, two complementary outputs are generated. One of the two complementary outputs are connected to digital drivers to generate $D_{o}[7: 1]$.


Figure 4.5: Circuit Implementation of a Comparator a) Pre-amplifier b) Strong-Arm Latch c) Optimized SR Latch

Figure 4.5 reveals the implementation of the comparator. It consists of three building blocks. The pre-amplifier is shown in a) and is used to slightly amplify the difference between the input and the reference. Kick-back noise from the input clock can also be reduced by the pre-amplifier. In b) and c), the strong-arm latch and the optimized SR latch implemented in the comparator is displayed. They are the same architectures as described in subsection 2.4.3.


Figure 4.6: Circuit Implementation of Pass Transistor Units

### 4.2.2 Circuit Implementation of Pass Transistor Units

The circuit implementation for pass transistor units is presented in Figure 4.6. In every segment, the pass transistor is designed to provide approximate 35 mA at maximum. Thus, the total maximum load achieved by the LDO is 245 mA . The gate voltage for every pass transistor is selected by an analog multiplexer. For instance, when $D_{o}[7]=1, M_{3}$ in the unit is turned off and the
transmission gate formed by $M_{1}$ and $M_{2}$ is on. Under this circumstance, the gate voltage of $M_{p}[7]$ will be driven by $V_{b p} . V_{b p}$ is connected to $V_{B}$ or $V_{A}$ based on the loop controller as described in Figure 4.3. However, when $D_{o}[7]=0, M_{3}$ is turned on. The gate voltage will be pulled up to the supply voltage. Then, the pass transistor unit is completely disabled.

### 4.2.3 Loop Controller

The loop controller (LC) in the dual-loop LDO architecture is implemented to avoid conflicts from the digital loop and the analog loop. A finite state machine (FSM) is designed to guarantee the operations for the proposed LDO. Figure 4.7 denotes FMS diagram.


Figure 4.7: Finite State Machine Diagram Employed in the LC

According to the FSM diagram, there are three states ( $S_{0}, S_{1}$ and $S_{2}$ ) and four input signals ( $R S T, A, B$ and $C$ ). Table 4.2 lists the meaning of FSM states and the function of every input. $S_{0}$ represents that the digitally-assisted loop in on. This occurs when there are large variations at the load. $S_{1}$ is the state associated with the analog loop. In $S_{1}$, the EA takes over the activated

| State/Input <br> Name | Category | Comments |
| :---: | :---: | :---: |
| $\mathrm{S}_{0}(00)$ | State | Digitally-assisted loop on <br> (Coarse tuning) |
| $\mathrm{S}_{1}(01)$ | State | Analog loop on (Fine tuning) |
| $\mathrm{S}_{2}(10)$ | State | Analog loop holding |
| RST | Input | Asynchronous reset button |
| A | Input | Whether the coarse tuning from <br> the digital loop is completed |
| B | Input | Whether the given time for the loop <br> swap is reached |
| C | Input | Whether $V_{f}$ is beyond the <br> range of the analog loop |

Table 4.2: Summary of States \& Inputs for the Finite State Machine
pass units. $S_{2}$ is the state when the analog loop reaches the steady state condition. In this state, the analog loop regulates the output voltage for small load variations. $R S T$ denotes an asynchronous reset the LDO to $S_{0}$ and it can be manually controlled off-chip. $A$ is the input that tells the LC when to jump to the next state. When the ADC output starts fluctuating with one LSB, $A$ will equal to ' 1 '. Otherwise, it is set to ' 0 '. The input $B$ indicates the delay time that the LC stays in $S_{1}$. In this project, the delay time is set to 8 clock cycles. After entering $S_{1}$, the LC will stay in $S_{1}$ for 8 clock cycles before jumping to $S_{2}$. The delay time is given because of the loop switching and the analog loop settling. $C$ is the input which controls the LC to keep using the analog loop or activating the digital loop for coarse tuning. The value of $C$ depends on whether the feedback voltage, $V_{f}$, exceeds the threshold voltages for the analog loop.

To analyze the whole process of the FSM implemented in the LC, a large load current step is assumed at the LDO output. The timing diagram for critical nodes in the proposed LDO is shown in Figure 4.8 as an example. Initially, the proposed LDO is in $S_{2}$ when the load current is 1 mA for a long time. In this case, $V_{\text {out }}$ is regulated at the target, 1.1 V . The gate voltage of pass elements is from $V_{A}$ according to Figure 4.3. Since $I_{\text {load }}=1 \mathrm{~mA}$, only one pass unit is activated. Thus, $D_{o}[7: 1]={ }^{\prime} 0000001^{\prime}$.


Figure 4.8: Timing Diagram of the Proposed LDO with a Large Load Current Step

After that, the load current starts rising. $V_{\text {out }}$ will decrease immediately. The LC can detect the value of $V_{\text {out }}$. When $V_{\text {out }}$ is out of the target range for the analog loop, $C$ in the FSM is set to ' 1 '. Then, the LC jumps to $S_{0}$ and turns the digital loop on. Under this circumstance, $V_{\text {gate }}$ is switched to $V_{B}$ which is a fixed value. The flash ADC in the digital loop activates more pass units to track the load current. When $I_{\text {load }}$ reaches approximately maximum value, the output of the ADC starts fluctuating between '0111111' and '1111111'. The LC can detect the fluctuation and then $A$ in the FSM will be set to ' 1 '.

Based on the FSM diagram, the LC enters $S_{1}$ when $A={ }^{\prime} 1^{\prime}$. In this state, the flash ADC holds the output code, which selects the certain number of pass transistors. The gate voltage for all selected transistors is connected to $V_{A}$. The EA takes over the loop and track the load current change. In this project, 8 clock cycles are given for the swap between the digital and the analog
loop. In the LC, there is an internal digital counter which counts 8 times. After 8 cycles, $B$ will become '1'. Then, the LC will go into State $S_{2}$.

In $S_{2}$, the analog loop regulates the LDO output. $C$ can only be set to ' ${ }^{\prime}$ ' in $S_{2}$. This is the main difference between $S_{1}$ and $S_{2}$. The reason is that the loop swap and the analog loop for fine tuning require some time. 8 clock cycles are given in $S_{1}$. During these periods, variations at $V_{\text {out }}$ will be regarded as the loop tracking process, rather than the effect of a large load current change. Hence, the proposed LDO stays in the analog loop holding state until $C=$ ' 1 '.
$R S T$ is an asynchronous reset button. It can pull the LC into $S_{0}$ anytime. When $R S T$ is released, the dual-loop system starts working. The LC manipulates dual loops based on the designed FSM.

### 4.3 Digital Loop Analysis and Operations

### 4.3.1 Shift-Register-Based Digital LDO Architecture

In subsection 4.1.2, the conventional digital LDO architecture is described. The digital LDOs as shown in Figure 4.2 are called shift-register-based (SRB) LDOs. A single comparator and a shift register are utilized in this architecture. The comparator detects the sign of the error from the feedback voltage. Then, the error sign is accumulated through the shift register. In this way, the proper number of pass transistor units can be turned with the negative feedback loop.


Figure 4.9: Behavioral Model: a) SRB Digital LDO and b) Transient Response

Figure 4.9 a) indicates a behavioral model for the SRB architecture. In the model, the comparator is regarded as a single bit ADC . Thus, it is expressed by adding a quantization error, $e_{q}$. The shift register can be represented by using a digital integrator. Then, the pass elements in the SRB LDO act as a current-steering DAC. The transconductance gain of the DAC is calculated as $I_{L S B} / V_{L S B}$. The total current provided by the pass elements is denoted as $I_{P}$. The difference between $I_{P}$ and $I_{\text {load }}$ will pass the load impedance, $Z_{\text {load }}$. Then, the output voltage can be obtained. Finally, $V_{\text {out }}$ is fed back through the voltage divider. The feedback factor is expressed as $\beta=R_{2} /\left(R_{1}+R_{2}\right)$ according to Figure 4.2. From the model, the loop gain transfer function can be defined as

$$
\begin{equation*}
L G(s)=\frac{\beta \cdot I_{L S B} \cdot R_{L}}{2 \pi \cdot V_{L S B}} \times \frac{1}{\frac{s}{\omega_{s}}\left(1+\frac{s}{\omega_{p 0}}\right)} \tag{4.2}
\end{equation*}
$$

where $\beta$ denotes the feedback factor; $R_{L}$ is the equivalent resistor at the load; $\omega_{p 0}=1 /\left(R_{L} C_{\text {load }}\right)$ is the dominant pole at the output node; $\omega_{s}$ is the angular frequency of the integrator clock. In Equation 4.2, $s / \omega_{s}$ is from the digital integrator. In z-domain, the integrator is defined as $1 /\left(1-Z^{-1}\right)$. According to the Backward Difference Method, the z-domain transfer function can be mapped to the s-domain function, $1 / s T_{s} . \omega_{s}=2 \pi T_{s}$ in the equation represents the angular frequency of the integrator clock. There are two poles in the loop gain. The phase margin is limited by the twopole system. Thus, the SRB LDO may have some oscillations through settling for a transient step response.

The transient response of the SRB loop with a current step is presented in Figure 4.9 b). The fast change of the load current pulls $V_{\text {out }}$ low. Thus, the voltage undershoot is generated. The peak value of the undershoot depends on the tracking speed of the digital feedback loop. The SRB digital LDO is able to have high resolution. However, only one comparator is utilized in the loop, which means only one pass unit is turned on / off during one clock cycle. In this case, the SRB LDO has slow transient response, especially for the high resolution of pass units. To avoid this issue, the project employs the other digital LDO architecture as described as the ADC-based architecture.

### 4.3.2 ADC-Based Digital LDO Architecture

The ADC-base LDO is presented in Figure 4.10 a). The feedback voltage is directly quantized by a flash ADC. Then, the ADC output is used to select the number of pass transistor units in order to provide enough current to the load.


Figure 4.10: Behavioral Model for a ADC-based Digital LDO

In this project, the ADC-based architecture is used as the digital loop in the dual-loop LDO.

Compared with the SRB architecture, the ADC-based LDO can directly digitized the feedback voltage. Therefore, it presents the fast reaction for the load transient response. However, there is a large error for the output voltage since the ADC has a range rather than a fixed reference voltage level. To reduce the voltage error, the ADC used in the loop is supposed to be at high resolution and high speed. In reality, high resolution and high speed ADCs have many design challenges and large cost even though they are used in the ADC-based LDOs.

Thanks to the dual-loop architecture, the ADC-based digital loop is only employed for coarse tuning in this project. The ADC in the digital loop is designed to have 3 bits and the large pass transistor is divided into 7 segments. In this project, a $500 \mathrm{MHz}-800 \mathrm{MHz}$ clock frequency is used for the 3-bit flash ADC. Higher clock frequency will increase the digital loop speed. However, the power consumption for the ADC and other digital blocks may also increase. Thus, the clock frequency used in the project is limited below 800 MHz .

Figure 4.10 b) indicates a behavioral model for the ADC-based architecture. In the model, the flash ADC output is obtained by adding a quantization error as described in Figure 4.9. Then, the ADC cascading with the current-steering DAC can be considered as a transconductance amplifier. The transconductance gain is defined as $\Delta I_{M A X} / V_{F S_{A} D C}$. The rest part of the model is the same as the SRB LDO model. Hence, the loop gain transfer function can be obtained as

$$
\begin{equation*}
L G(s)=\frac{\beta \cdot \Delta I_{M A X} \cdot R_{L}}{V_{F S-A D C}} \times \frac{1}{1+\frac{s}{\omega_{p 0}}}=\frac{\beta \cdot I_{L S B} \cdot R_{L}}{V_{L S B}} \times \frac{1}{1+\frac{s}{\omega_{p 0}}} \tag{4.3}
\end{equation*}
$$

where $R_{L}$ denotes the load equivalent resistor; $\omega_{p 0}=1 /\left(R_{L} \cdot C_{\text {load }}\right)$ is the dominant pole at the output. In Equation 4.3, the loop gain function has only one pole. Thus, the ADC-based system is more stable than the SRB LDO.

Table 4.3 shows the 3-bit flash ADC output levels associate with the feedback voltage, $V_{f}$. $V_{f}$ denotes an open range around 1 V . When $V_{f}>1005 \mathrm{mV}$, the ADC output is ' 0000000 ' which means turn off all pass transistor units. Whereas, the ADC output turns all pass transistor units on if $V_{f} \leq 945 \mathrm{mV}$. In this project, the expected $V_{\text {ref }}$ is set to 1 V and $\beta=10 / 11$. Thus, the output

| $D_{o}[7: 1]$ | Range of $V_{f}(\mathrm{mV})$ |
| :---: | :---: |
| 0000000 | $V_{f}>1005$ |
| 0000001 | $1005 \geq V_{f}>995$ |
| 0000011 | $995 \geq V_{f}>985$ |
| 0000111 | $985 \geq V_{f}>975$ |
| 0001111 | $975 \geq V_{f}>965$ |
| 0011111 | $965 \geq V_{f}>955$ |
| 0111111 | $955 \geq V_{f}>945$ |
| 1111111 | $V_{f} \leq 945$ |

Table 4.3: ADC Output Levels and the Feedback Voltage Range
voltage after the settling of the analog loop should be 1.1 V . Even through the voltage reference is a range instead of a value in the digital loop, the output voltage will be eventually regulated by the analog loop to avoid the large output errors.

### 4.4 Analog Loop Operations

The EA is employed in the analog loop in order to continuously tune the gate voltage of pass transistor units. Figure 4.11 reveals the circuit implementation of the EA. There are two stages in the EA. To realize a large low-frequency gain as conventional analog LDOs, a folded-cascode architecture is utilized. In Figure 4.11, $M_{1}-M_{11}$ forms the folded-cascode stage, which has a fully-differential input, $V_{i+}$ and $V_{i-}$, and a single-ended output, $V_{o 1}$. The output resistance of the folded-cascode stage is very large. When the analog loop is on, the EA is connected to the gates of activated pass transistors seeing a large parasitic capacitor. In this case, the pole at the gate is at very low frequency. The pole becomes a dominant pole instead of the pole at the LDO's output of the LDO. In this case, the speed of the analog loop may be reduced since the pole at the LDO output is also a low-frequency pole. To avoid issue, a buffer stage is added after the foldedcascode stage in the EA. $M_{12}$ and $M_{13}$ forms a source follower. Therefore, the pole at the gate of pass transistors can be moved to high frequency since the output resistance of the source follower should be relatively small.

In Figure 4.11, $M_{1}$ and $M_{2}$ are implemented as a fully-differential pair. The differential architecture is converted into the single-ended topology through a self-biased current mirror, $M_{10}$ and $M_{11}$. The folded-cascode transistors, $M_{8}$ and $M_{9}$, are utilized to further amplify the input signal and they are biased by cascode current sources, $M_{4}-M_{7}$. In this circuit, transistors on both branches are set to be identical. Hence, the fully-differential circuit analysis can be applied to the folded-cascode stage. Then, the folded-cascode stage output is connected to the source follower, $M_{12}$. Therefore, the transfer function of the EA can be obtained by (ignore the body effect for simplicity)

$$
\begin{equation*}
\frac{V_{o}}{V_{i}}(s) \approx \frac{G_{m} \cdot R_{o 2} \cdot A_{b u f f}}{\left(1+s / \omega_{p 2}\right)\left(1+s / \omega_{p g}\right)}=\frac{G_{m} \cdot R_{o 2} \cdot A_{b u f f}}{\left(1+s R_{o 2} C_{p 2}\right)\left(1+s R_{b u f f} C_{p_{-} g a t e}\right)} \tag{4.4}
\end{equation*}
$$

In Equation 4.4, $G_{m}$ is the total transconductance of the folded-cascode amplifier; $R_{o 2}$ denotes the output resistance at $V_{o 2}$ node; $A_{b u f f}$ is the unity gain of the voltage buffer implemented by $M_{12}$


Figure 4.11: Circuit Implementation of the Error Amplifier
and $M_{13}$; According to Figure 4.11, the parameters can be defined as

$$
\begin{gather*}
G_{m}=-g_{m 2} \cdot \frac{g_{m 9}+\frac{1}{r_{o 9}}}{g_{m 9}+\frac{1}{r_{o 9}}+\frac{1}{r_{o 11}}} \approx-g_{m 2}  \tag{4.5}\\
R_{o 2}=\left[g_{m 9} \cdot r_{o 9} \cdot\left(r_{o 11} / / r_{o 2}\right)+r_{o 9}+\left(r_{o 11} / / r_{o 2}\right)\right]  \tag{4.6}\\
/ /\left[g_{m 7} \cdot r_{o 7} \cdot r_{o 5}+r_{o 7}+r_{o 5}\right] \\
A_{b u f f}=\frac{g_{m 12}}{g_{m 12}+\frac{1}{r_{o 12}}+\frac{1}{r_{o 13}}} \approx 1 \tag{4.7}
\end{gather*}
$$

In addition, there are two main poles in the EA, $\omega_{p 2}$ and $\omega_{p g}$. The other poles in the EA are at very high frequencies, such as the pole at the EA's input. Thus, they can be neglected in Equation 4.4. $\omega_{p 2}=1 /\left(R_{o 2} C_{p 2}\right)$ denotes the pole located at $V_{o 2}$ due to a large value of $R_{o 2} ; R_{o 2}$

| Stage | Values |  |
| :---: | :---: | :---: |
| Folded-cascode | $G_{m}=-850 \mu A / V$ | $R_{o 2}=95.2 k \Omega$ |
| Stage | $C_{p 2}=3.5 f F$ | $f_{p 2} \approx 476 \mathrm{MHz}$ |
| Voltage Buffer Stage | $g_{m 12}=213 \mu A / V$ | $R_{b u f f}=4.2 \mathrm{k} \Omega$ |
|  | $C_{p_{-} g a t e}=1.12 p F$ | $f_{p g} \approx 5.26 \mathrm{MHz}$ |
|  | $\sim 7.2 p F$ | $\sim 33.8 \mathrm{MHz}$ |

Table 4.4: Parameters of the EA from Simulations
is shown in Equation 4.6 and $C_{p 2}$ denotes the total parasitic capacitor at the folded-cascode stage. $\omega_{p g}=1 /\left(R_{b u f f} C_{p_{-} g a t e}\right)$ is the pole at the gates of pass transistors; $R_{b u f f}$ is the output resistor of the voltage buffer, while $C_{p_{-} g a t e}$ denotes the parasitic capacitance at the gates of pass transistors. Thus, $R_{b u f f}$ is defined as

$$
\begin{equation*}
R_{b u f f}=\frac{1}{g_{m 12}+\frac{1}{r_{o 12}}+\frac{1}{r_{o 13}}} \approx \frac{1}{g_{m 12}} \tag{4.8}
\end{equation*}
$$

From the simulation, the small-signal parameters and the pole locations are summarized in Table 4.4. In this design, the value of $C_{p_{-} g a t e}$ depends on the number of activated transistors and varies with the load current. $C_{p_{-} g a t e}$ is relatively small at a light load. It becomes very large when all 7 transistors are activated. Therefore, $\omega_{p g}$ is in a range, instead of a fixed value. The influence of $\omega_{p g}$ for the analog loop stability is described in the next subsection.

### 4.4.1 Stability of the Analog Loop

The equivalent circuit of the analog loop is presented in Figure 4.12. In $S_{1}$ and $S_{2}$, the analog loop is turned on while the digitally-assisted loop is disabled. The number of pass transistors connected to the EA depends on the load current. Hence, $\omega_{p g}$ has a certain range as shown in Table 4.4. $\omega_{p o}, \omega_{p g}$ and $\omega_{p 2}$ are three main poles in the loop gain from the equivalent circuit. $\omega_{p g}$ and $\omega_{p 2}$ are from the EA. $\omega_{p o}$ denotes the dominant pole at the LDO output due to the large load capacitor, $C_{\text {load }} . R_{1}$ and $R_{2}$ are feedback resistors, and $r_{o_{-} p t}$ is the equivalent output resistance for pass transistors. $r_{o_{-} p t}$ is much smaller than $R_{1}+R_{2}$. Hence, the total output resistance can be approximate to $r_{o_{-} p t}$. To maintain the stability of the loop, an equivalent series resistor (ESR) is


Figure 4.12: Analog Loop Equivalent Circuit
employed. With the ESR, the load impedance, $Z_{\text {load }}$, can be simplified as

$$
\begin{equation*}
Z_{l o a d} \approx \frac{r_{o-p t}\left(1+s R_{E S R} C_{l o a d}\right)}{1+s\left(R_{E S R}+r_{o_{-} p t}\right) C_{l o a d}} \tag{4.9}
\end{equation*}
$$

where $R_{E S R} \ll r_{o_{-} p t}$; A LHP zero, $\omega_{z}$, is generated by the ESR. Thus, the loop gain can be obtained by

$$
\begin{equation*}
L G(s)=\frac{G_{m} R_{o 2} A_{b u f f} g_{m p t} r_{o_{-} p t} \beta\left(1+\frac{s}{\omega_{z}}\right)}{\left(1+\frac{s}{\omega_{p o}}\right)\left(1+\frac{s}{\omega_{p g}}\right)\left(1+\frac{s}{\omega_{p 2}}\right)} \tag{4.10}
\end{equation*}
$$

where $g_{m p t}$ denotes the total transconductance of pass elements; $\beta=R_{2} /\left(R_{1}+R_{2}\right)$ is the feedback factor.

From Table 4.4, the second and third poles are $\omega_{p g}$ and $\omega_{p 2} . \omega_{p g}$ in the circuit changes according to the load current. For a light load, one of 7 pass elements is enabled. The parasitic capacitor at the gate of pass elements is small. Thus, $\omega_{p g}$ locates at high frequency. The light load also makes the dominant pole locate at low frequencies. In this case, $\omega_{p o}$ and $\omega_{p g}$ are widely separated. However, all pass elements will activated due to the digitally-assisted loop under a heavy load.


Figure 4.13: Loop Magnitude \& Phase under Different Loads
$C_{p_{-} g a t e}$ becomes larger compared with the light load scenario. At the same time, $\omega_{p o}$ is located at high frequencies since $r_{o-p t}$ becomes small under a heavy load. Therefore, the heavy load scenario is regarded as the worst case for the loop stability.

To investigate the loop stability, different load values are used at the output. The light, medium and heavy loads are 1,50 and 240 mA . From the simulation, the loop magnitude and phase are plotted and shown in Figure 4.13. The dominant pole in the loop is $f_{p o}$ which is the pole at the LDO output. It varies with different load conditions since $r_{o_{-} p t}$ changes. For a light load case, the low-frequency loop gain is large and the dominant pole is located at very low frequencies. With the load current increasing, $f_{p o}$ moves to high frequencies. The second pole $\left(f_{p g}\right)$ for this system is located at the gate of pass transistors. The trend of $f_{p g}$ is opposite to the dominant pole with the increasing of the load current. For the LHP zero from the ESR resistor, $f_{z}$ does not change under
different load conditions because it is only related to $R_{E S R}$ and $C_{\text {load }}$. From simulation results, the analog loop are stable for all cases ( $I_{\text {load }}$ changes from $100 \mu \mathrm{~A}$ to 245 mA ). The worst scenario happens under the heavy load since $f_{p o}$ and $f_{p g}$ approach with each other. In this case, the phase margin of the loop is still more than $50^{\circ}$.

### 4.5 Measurement Results

The dual-loop LDO was fabricated in the TSMC 40-nm 1P8M technology. The die photo of the main LDO is revealed in Figure 4.14. The LDO area is less than $0.056 \mathrm{~mm}^{2}$ and the area of the pass transistors is around $0.023 \mathrm{~mm}^{2}$. The performance of the proposed LDO was measured by using a test bench as shown in Figure 4.15. Several commercial regulators (ADP223) were utilized to provide a low-noise power supply for the device under test (DUT), such as the digital logic block supply voltage (VDD) and the supply of the EA and pass transistors (VIN).


Figure 4.14: Die Photo the Proposed Dual-loop LDO


Figure 4.15: Test Bench Setup of the Proposed Dual-loop LDO

In Figure 4.15, the input clock of the flash ADC is generated by a signal generator. Dual NPN bipolar transistors are utilized to a current load step. The two bipolar transistors are identical and form a current mirror. They are controlled through a driver (LTC1693) and a power MOSFET. The control signal runs much slower than $C L K_{i n}$. When the control signal is ' 0 ', the power MOS is off. In this case, $I_{L 1}$ can be adjusted through $R_{1}$. The DUT should provide $I_{L 1}+I_{L 2}$ to maintain


Figure 4.16: Load Regulation Case a): $I_{\text {load }}$ from 0.5 mA to 240 mA


Figure 4.17: Load Regulation Case b): $I_{\text {load }}$ from 25 mA to 165 mA


Figure 4.18: Load Regulation Case c): $I_{\text {load }}$ from 75 mA to 110 mA
$V_{o}$. When the control signal is ' 1 ', the power MOSFET is turned on. Then the node $V_{I L}$ is pulled down. The total load current is $I_{L 2}$. Both $V_{o}$ and $V_{I L}$ are observed by an oscilloscope in order to test load transient response of the proposed LDO. For the PSR measurement, an AC signal is coupled through $C_{2}$ at $V I N$ of the DUT. The output signal is scoped by a spectrum analyzer as shown in Figure 4.15 . The frequency of the AC input signal is swept from $1 \mathrm{kHz}-40 \mathrm{MHz}$ in order to obtain the PSR of the proposed LDO versus frequency.

Figure 4.16, Figure 4.17 and Figure 4.18 show the measured transient response for different load steps. Figure 4.16 depicts the maximum load step. $I_{L 1}$ changes from 0 mA to 240 mA and $I_{L 2}=0.5 \mathrm{~mA}$. Thus, the ramp of $\Delta I_{\text {load }}$ is totally 240 mA with 300 ns rising and 100 ns falling time. The output undershoot is approximate to 71 mV within 520 ns settling time. The overshoot is around 10 mV within 220 ns settling time.

In Figure 4.17, the total current step $\left(\Delta I_{\text {load }}\right)$ is from 25 mA to $165 \mathrm{~mA} . I_{L 1}$ varies from 140 mA to 0 mA , and $I_{L 2}$ is set to 25 mA . Both undershoot and overshoot at the output are smaller than Case a). The settling time for Case b) is $300 \mathrm{~ns} / 90 \mathrm{~ns}$ for rising/falling current step, respectively.

A 25 mA current step is tested in Case c) as shown in Figure 4.18. In the test, $I_{\text {load }}$ jumps from 75 mA to 100 mA with a rising/falling time of $200 \mathrm{~ns} / 90 \mathrm{~ns}$. According to $V_{o}$, it can be depicted that the digital loop is not triggered under the small current step ( $\Delta I_{\text {load }}=25 \mathrm{~mA}$ ). Thus, the load transient response is very fast. Zero undershoot and 5 mV overshoot are displayed by the oscilloscope in Case c).


Figure 4.19: PSR Measurement under Different Load Current

Figure 4.19 shows the measured PSR under different load current conditions. From the figure, the PSR under heavy load conditions is better than the one under light loads. The LDO achieves more than 45 dB PSR at low frequencies and 35 dB rejection at 1 MHz when $I_{\text {load }}=0.5 \mathrm{~mA}$. The PSR is lower than -42 dB up to 1 MHz at approximately maximum load ( $I_{\text {load }}=240 \mathrm{~mA}$ ).

The quiescent current, $I_{Q}$, is composed of some building blocks and components in the proposed LDO. The value and percentage for the blocks are list in Table 4.5. The total $I_{Q}$ approxi-

| Building Blocks | DC Current $(\mu \mathbf{A})$ | Percentage |
| :---: | :---: | :---: |
| Error Amplifier | 180 | $60 \%$ |
| 3-bit ADC | 90 | $30 \%$ |
| Resistor Ladder | 30 | $10 \%$ |

Table 4.5: List of Constitution for $I_{Q}$
mately equals to $300 \mu \mathrm{~A}$. The EA occupies $60 \%$ of the $I_{Q}$, and the ADC and the resistor ladder contributes the rest $40 \%$. The building blocks are utilized in different modes. Therefore, they can be temporarily disabled to save more power when they are not in use.


Table 4.6: Comparison of the Proposed LDO with Published State-of-the-art LDOs

Table 4.6 provides a comparison of the proposed LDO with recently published state-of-the-art works, including digital, analog and hybrid-mode architectures. All LDOs except [39] support more than 100 mA output load current. Compared with digital and hybrid LDOs, analog architectures have better PSR values. This work shows the best PSR @ 1 MHz among selected publications. However, digital/hybrid LDOs have fast settling time for sharp load steps. With the digitally-assisted loop, the load transient response of the proposed LDO is much faster than other
analog LDOs in Table 4.6.

### 4.6 Conclusion

In this project, a digitally-assisted dual-loop LDO is proposed and analyzed. The proposed LDO is implemented in TSMC40nm 1P8M technology. A digital loop will be turned on to boost the tracking speed when a large load current step occurs. In this case, power consumption of the EA is reduced since the coarse tuning of pass transistors is completed by the digitally-assisted block. Moreover, the proposed LDO also shows strong rejection for supply noise compared with purely digital / hybrid-mode LDOs. As a result, the LDO achieves maximum 245 mA load current. The PSR is $-43 /-35 \mathrm{~dB}$ at 1 MHz for a $0.5 / 240 \mathrm{~mA}$ load respectively. The LDO also consumes around $300 \mu \mathrm{~A}$ quiescent current. The maximum voltage droop and recovery time are 71 mV and 520 ns for a 240 mA load step. The FOM is 7.4 ps and makes this design competitive among all types including digital, analog and hybrid architectures. Thus, the digitally-assisted dual-loop LDO is highly befitting for frequent modes switching, short wake-up time and highly supply rejection applications.

## 5. CONCLUSION

Limited battery life is one of the most critical issues for the development of portable electronic devices recently. From circuit designers' point of view, the power-efficient designs are the key to solve the issue. With the technology development, the power-efficient designs for on-chip integrated circuits not only extend the battery life, but also improve the performance of electronic devices.

There are three projects presented in this dissertation. The first project is about a 13-bit 260MS/s pipeline ADC design. Using a CM MDAC with a current-reuse technique and interstage gain calibrations, the pipeline ADC achieves 68.1/66.3 dB SNDR and 82.3/78.2 dB SFDR for a sinusoidal inputs at $4.1736 / 123.129 \mathrm{MHz}$, respectively. The total power consumption for the proposed ADC is around 15.38 mW . The Walden FoM with a low-frequency input tone is 28.3 $\mathrm{fJ} /$ conv-step, and the Schreier FoM is 167 dB . When the input signal is near Nyquist frequency, the performance of the ADC becomes worse. The chip core area is around $0.276 \mathrm{~mm}^{2}$, and it was implemented by TSMC 40nm technology.

The time-interleaved ADC with digital background calibrations is the second project. The proposed ADC is implemented in system-level. The project investigates a 4-channel time-interleaved architecture with a calibration ADC. Thanks to the additional ADC used for the calibration purpose, three main types of errors (gain/offset mismatches and timing skews) in a TI architecture can be eliminated/reduced. In the proposed architecture, the output codes from four sub-ADCs are adjusted dynamically in order to match the calibration ADC output. The gain and offset mismatches are calibrated in digital domain. However, timing skews among sub-channels are controlled in a feedback system using digitally controlled delay lines. The calibration algorithm is implemented and functionally verified by using a field programmable gate array (FPGA) and commercial ADCs (ADS4126).

The third project demonstrates a LDO with dual loops (a digital loop and an analog loop). The analog LDO is implemented in normal operation for the proposed dual-loop architecture, while
the digitally-assisted loop is added to improve the dynamic slew rate and settling time under large load current variations. To avoid conflicts in the dual-loop system, a loop controller is designed and employed according an FSM. The maximum load current of the proposed LDO is 245 mA . Under a heavy load condition ( $I_{\text {load }}=240 \mathrm{~mA}$ ), the PSR achieves around -50 dB at low frequencies and -42 at 1 MHz . The load regulation with $\Delta I_{\text {load }}=240 \mathrm{~mA}$ is tested. Measured results show a $71 / 10 \mathrm{mV}$ peak voltage undershoot/overshoot, respectively. The settling time is $520 / 220 \mathrm{~ns}$ with a rising/falling edge of the load current. Finally, the proposed LDO demonstrates a 7.4 ps FoM.

## REFERENCES

[1] H. H. Boo, D. S. Boning, and H. Lee, "A 12b 250 MS/s Pipelined ADC With Virtual Ground Reference Buffers," in IEEE Journal of Solid-State Circuits, vol. 50, pp. 2912-2921, December 2015.
[2] B. Murmann, "ADC Performance Survey 1997-2018," [Online] https://web.stanford.edu/ murmann/adcsurvey.html, July 2018.
[3] B. Razavi, "A Tale of Two ADCs: Pipelined Versus SAR," in IEEE Solid State Circuits Magazine, vol. 7, pp. 38-46, June 2015.
[4] Y. Lim and M. P. Flynn, "A 1 mW 71.5 dB SNDR $50 \mathrm{MS} / \mathrm{s}$ 13-bit Fully Differential Ring Amplifier Based SAR-Assisted Pipeline ADC," in IEEE Journal of Solid-State Circuits, vol. 50, pp. 2901-2911, Dec 2015.
[5] J. Lin, D. Paik, S. Lee, M. Miyahara, and A. Matsuzawa, "An Ultra-Low-Voltage $160 \mathrm{MS} / \mathrm{s}$ 7 Bit Interpolated Pipeline ADC Using Dynamic Amplifiers," in IEEE Journal of Solid-State Circuits, vol. 50, pp. 1399-1411, June 2015.
[6] H. Huang, H. Xu, B. Elies, and Y. Chiu, "A Non-Interleaved 12-b 330-MS/s Pipelined-SAR ADC With PVT-Stabilized Dynamic Amplifier Achieving Sub-1-dB SNDR Variation," in IEEE Journal of Solid-State Circuits, vol. 52, pp. 3235-3247, December 2017.
[7] C. Briseno-Vidrios, D. Zhou, S. Prakash, Q. Liu, A. Edward, E. G. Soenen, M. Kinyua, and J. Silva-Martinez, "A 44-fJ/conversion Step 200-MS/s Pipeline ADC Employing CurrentMode MDACs," in IEEE Journal of Solid-State Circuits, vol. 53, pp. 3280-3292, November 2018.
[8] S. Ryu, B. Song, and K. Bacrania, "A 10-bit 50-MS/s Pipelined ADC With Opamp Current Reuse," in IEEE Journal of Solid-State Circuits, vol. 42, pp. 475-485, March 2007.
[9] B. Razavi, Design of Analog CMOS Integrated Circuits. McGraw-Hill Education, 2000.
[10] W. Sansen, "Distortion in Elementary Transistor Circuits," in IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 46, pp. 315-325, March 1999.
[11] M. Matsui, H. Hara, Y. Uetani, K. Lee-Sup, T. Nagamatsu, and Y. Watanabe, "A 200 MHz 13 mm2 2-D DCT Macrocell using Sense-amplifying Pipeline Flip-flop Scheme," in IEEE Journal of Solid-State Circuits, vol. 29, pp. 1482-1490, September 1994.
[12] D. Schinkel, E. Mensink, E. Klumperink, E. V. Tuijl, and B. Nauta, "A Double-Tail LatchType Voltage Sense Amplifier with 18ps Setup+Hold Time," n Solid-State Circuits Conference, 2007 IEEE International, pp. 314-316, February 2007.
[13] Y. Shi and R. Eberhart, "A Modified Particle Swarm Optimizer," in IEEE Int. Conf. Evol. Comput., pp. 69-73, May 1998.
[14] D. Zhou, C. Talarico, and J. Silva-Martinez, "A Digital-circuit-based Evolutionarycomputation Algorithm for Time-interleaved ADC Background Calibration," in 2016 29th IEEE International System-on-Chip Conference (SOCC), pp. 13-17, September 2016.
[15] Y. Zhu, C. Chan, S. U, and R. P. Martins, "An 11b 450 MS/s Three-Way Time-Interleaved Subranging Pipelined-SAR ADC in 65 nm CMOS," in IEEE Journal of Solid-State Circuits, vol. 51, pp. 1223-1234, May 2016.
[16] Y. Lim and M. P. Flynn, "A $100 \mathrm{MS} / \mathrm{s}, 10.5$ Bit, 2.46 mW Comparator-Less Pipeline ADC Using Self-Biased Ring Amplifiers," in IEEE Journal of Solid-State Circuits, vol. 51, pp. 23312341, October 2015.
[17] K. Chang and C. Hsieh, "A 12-bit 150-MS/s Sub-Radix-3 SAR ADC With Switching Miller Capacitance Reduction," in IEEE Journal of Solid-State Circuits, vol. 53, pp. 1755-1764, June 2018.
[18] S. M. J. et al, "A 10-b 120-Msample/s Time-interleaved Analog-to-digital Converter with Digital Background Calibration," in IEEE Journal of Solid-State Circuits, vol. 37, pp. 16181627, December 2002.
[19] W. C. Black and D. A. Hodges, "Time-interleaved converter arrays," in IEEE Journal of Solid-State Circuits, vol. 15, pp. 1022-1029, December 1980.
[20] N. Kurosawa, H. Kobayashi, K. Maruyama, H. Sugawara, and K. Kobayashi, "Explicit analysis of channel mismatch effects in time-interleaved ADC systems," in IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, vol. 48, pp. 261-271, March 2001.
[21] S. Singh, L. Anttila, W. M. Epp, Schlecker, and M. Valkama, "Frequency Response Mismatches in 4-channel Time-Interleaved ADCs: Analysis, Blind Identification, and Correction," in IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 62, pp. 22682279, September 2015.
[22] J. Kennedy and R. C. Eberhart, "Particle Swarm Optimization," in Proc. IEEE int'l conf. on neural networks, vol. 4, pp. 1942-1948, 1995.
[23] R. Magod, B. Bakkaloglu, and S. Manandhar, "A $1.24 \mu \mathrm{~A}$ Quiescent Current NMOS Low Dropout Regulator With Integrated Low-Power Oscillator-Driven Charge-Pump and Switched-Capacitor Pole Tracking Compensation," in IEEE Journal of Solid-State Circuits, vol. 53, pp. 2356-2367, August 2018.
[24] G. A. Rincon-Mora, Analog IC Design with Low-Dropout Regulators. McGraw-Hill, 2014.
[25] P. Y. Or and K. N. Leung, "An Output-Capacitorless Low-Dropout Regulator With Direct Voltage-Spike Detection," in IEEE Journal of Solid-State Circuits, vol. 45, pp. 458-466, February 2010.
[26] C. Huang, Y. Ma, and W. Liao, "Design of a Low-Voltage Low-Dropout Regulator," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 22, pp. 1308-1313, June 2014.
[27] C. Park, M. Onabajo, and J. Silva-Martinez, "External Capacitor-Less Low Drop-Out Regulator With 25 dB Superior Power Supply Rejection in the 0.44 MHz Range," in IEEE Journal of Solid-State Circuits, vol. 49, pp. 486-501, February 2014.
[28] K. Keikhosravy and S. Mirabbasi, "A 0.13- $\mu \mathrm{m}$ CMOS Low-Power Capacitor-Less LDO Regulator Using Bulk-Modulation Technique," in IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 61, pp. 3105-3114, November 2014.
[29] F. Lavalle-Aviles, J. Torres, and E. Snchez-Sinencio, "A High Power Supply Rejection and Fast Settling Time Capacitor-Less LDO," in IEEE Transactions on Power Electronics, vol. 34, pp. 474-484, January 2019.
[30] X. Ming, Q. Li, Z. Zhou, and B. Zhang, "An Ultrafast Adaptively Biased Capacitorless LDO With Dynamic Charging Control," in IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 59, pp. 40-44, January 2012.
[31] D. Mandal, C. Desai, B. Bakkaloglu, and S. Kiaei, "Adaptively Biased Output Cap-less NMOS LDO With 19 ns Settling Time," in IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 66, pp. 167-171, February 2019.
[32] X. Han, T. Burger, and Q. Huang, "An output-capacitor-free adaptively biased LDO regulator with robust frequency compensation in $0.13 \mu \mathrm{~m}$ CMOS for SoC application," in 2016 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 2699-2702, 2016.
[33] A. Maity and A. Patra, "Dynamic slew enhancement technique for improving transient response in an adaptively biased low-dropout regulator," in IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 62, pp. 626-630, July 2015.
[34] A. Maity and A. Patra, "A hybrid-mode operational transconductance amplifier for an adaptively biased low dropout regulator," in IEEE Trans. Power Electrons, vol. 32, pp. 1245-1254, February 2017.
[35] S. Gangopadhyay, D. Somasekhar, J. W. Tschanz, and A. Raychowdhury, "A 32 nm Embedded, Fully-Digital, Phase-Locked Low Dropout Regulator for Fine Grained Power Management in Digital Circuits," in IEEE Journal of Solid-State Circuits, vol. 49, pp. 2684-2693, November 2014.
[36] L. G. Salem, J. Warchall, and P. P. Mercier, "A Successive Approximation Recursive Digital Low-Dropout Voltage Regulator with PD Compensation and Sub-LSB Duty Control," in IEEE Journal of Solid-State Circuits, vol. 53, pp. 35-49, January 2018.
[37] Y. LEE, W. Qu, S. Singh, D. Kim, K. Kim, S. Kim, J. Park, and G. Cho, "A 200-ma Digital Low Drop-Out Regulator With Coarse-Fine Dual Loop in Mobile Application Processor," in IEEE Journal of Solid-State Circuits, vol. 52, pp. 64-76, January 2017.
[38] M. Huang, Y. Lu, S. U, and R. P. Martins, "An Analog-Assisted Tri-Loop Digital LowDropout Regulator," in IEEE Journal of Solid-State Circuits, vol. 53, pp. 20-34, January 2018.
[39] S. B. Nasir, S. Sen, and A. Raychowdhury, "Switched-Mode-Control Based Hybrid LDO for Fine-Grain Power Management of Digital Load Circuits," in IEEE Journal of Solid-State Circuits, vol. 53, pp. 569-581, February 2018.
[40] F. Yang and P. K. T. Mok, "A Nanosecond-Transient Fine-Grained Digital LDO With MultiStep Switching Scheme and Asynchronous Adaptive Pipeline Control," in IEEE Journal of Solid-State Circuits, vol. 52, pp. 2463-2474, September 2017.
[41] X. L. et al., "A modular hybrid ldo with fast load-transient response and programmable psrr in 14 nm cmos featuring dynamic clamp tuning and time-constant compensation," in 2019 IEEE International Solid- State Circuits Conference - (ISSCC), pp. 234-236, 2019.
[42] M. Ho, J. Guo, K. H. Mak, W. L. Goh, S. Bu, Y. Zheng, X. Tang, and K. N. Leung, "A CMOS Low-Dropout Regulator With Dominant-Pole Substitution," in IEEE Transactions on Power Electronics, vol. 31, pp. 6362-6371, September 2016.
[43] Q. D. et al., "Multiple-Loop Design Technique for High-Performance Low-Dropout Regulator," in IEEE Journal of Solid-State Circuits, vol. 52, pp. 2533-2549, October 2017.


[^0]:    ${ }^{*}$ Part of the data reported in this chapter is reprinted with permission from "A 44-fJ/Conversion Step 200-MS/s Pipeline ADC Employing Current-Mode MDACs" by C. Briseno-Vidrios, Dadian Zhou and et al., 2018, IEEE Journal of Solid-State Circuits, vol. 53, no. 11, pp. 3280-3292, Copyright © 2018, IEEE.

[^1]:    ${ }^{*}$ Part of the data reported in this chapter is reprinted with permission from "A Digital-circuit-based Evolutionarycomputation Algorithm for Time-interleaved ADC Background Calibration" by Dadian Zhou, Claudio Talarico and Jose Silva-Martinez, 2016 29th IEEE International System-on-Chip Conference (SOCC), Seattle, WA, 2016, pp. 1317, Copyright © 2016, IEEE.

