# Design and Calibration of High-Speed and Power-Efficient Flash ADC in FDSOI CMOS

By

**Yulang Feng** 

A dissertation submitted to the Department of Electrical and Computer Engineering, Cullen College of Engineering in partial fulfillment of the requirements for the degree of

# **DOCTOR OF PHILOSOPHY**

### IN ELECTRICAL ENGINEERING

Chair of Committee: Dr. Jinghong Chen

Committee Member: Dr. Xin Fu

Committee Member: Dr. Yuhua Chen

Committee Member: Dr. Yi-Lung Mo

Committee Member: Dr. Wanda Zagozdzon-Wosik

Committee Member: Dr. Xingpeng Li

University of Houston December 2021 Copyright 2021, Yulang Feng

#### ACKNOWLEDGMENTS

First, I would like to sincerely express my greatest appreciation to my advisor, Dr. Jinghong Chen, for his continuous research supervision, guidance, assistance, patience, and valuable suggestions throughout my Ph.D. adventure. I would like to express my gratitude to my committee members, Dr. Xin Fu, Dr. Yuhua Chen, Dr. Yi-Lung Mo, Dr. Wanda Zagozdzon-Wosik, and Dr. Xingpeng Li, for their advice and guidance in my proposal and defense.

Furthermore, I would like to thank my labmates, Hao Deng, Qingjun Fan, Yuxuan Tang, and Bozorgmehr Vosooghi for their technical insight and friendship.

In addition, I would like to thank Alphacore Inc. for the sponsor support and tapeout chances.

Finally, I am greatly thankful to my parents. I couldn't have reached this point in my life without their unconditional support and encouragement.

## ABSTRACT

High-speed analog-to-digital converters (ADCs) with medium resolutions find various wideband applications in wireline and wireless communications, radar systems, and electronic test instruments. While time-interleaved (TI) successive-approximation register (SAR) ADCs have been widely investigated to achieve low-power and highspeed performance, the large number of sub-ADC channels makes the TI-SAR architecture more susceptible to mismatches including offset and gain mismatches among the sub-ADC channels and timing skew of the clocks distributed to them. The objective of this dissertation is to investigate power-efficient high-speed flash ADCs while alleviating the timing skew and inter-channel mismatches.

Flash ADC provides the highest conversion speed. However, the flash ADC requires a large number of comparators to carry out the quantization process. As the ADC resolution increases, the number of comparators increases exponentially, resulting in high-power consumption. To improve power efficiency while benefiting from the high-speed performance of the flash architecture, three flash ADCs are developed in this research, including an ADC with a partially active 2-stage comparison and 2× time-domain latch interpolation (TDI), a 2-way TI-flash ADC with voltage-domain interpolation, and a pipelined flash ADC with a ping-pong structure in the second stage.

The first flash ADC employs a partially active 2-stage comparison and 2× TDI to reduce power consumption while avoiding PVT-sensitive calibrations, such as time reference and voltage reference calibrations. To enhance the conversion speed of the 2-stage structure, the stringent timing constraint is resolved by a 25%-75% duty-cycle clock scheme, a 0.5-bit redundancy in the first comparison stage, and an embedded

second-stage slice selection logic. The bandwidth requirements of the track-and-hold (T/H) and T/H buffer under the 25%-75% duty-cycle clock are also analyzed. Fabricated in a 28-nm fully-depleted silicon-on-insulator (FDSOI) CMOS process, the 5-GS/s 6bit ADC achieves a signal-to-noise and distortion ratio (SNDR) of 32.8 dB and a spurious-free dynamic range (SFDR) of 41.82 dB at Nyquist frequency while consuming 15.07 mW power, translating into a Walden figure-of-merit (FOM<sub>W</sub>) of 84.5 fJ/conv.-step. In the second work, a 2-way TI-flash ADC is developed, which employs dynamic comparators with a pre-amplifier stage to achieve 10 GS/s conversion speed for the sub-channel ADC and voltage-domain interpolation to reduce power consumption. Fabricated in a 28-nm FDSOI CMOS process, the 20-GS/s 6-bit 2-way TI-flash ADC achieves an SNDR of 31.2 dB and an SFDR of 38.5 dB at Nyquist frequency, respectively, while consuming 204 mW power. The FOM<sub>W</sub> is 344 fJ/conv.step. To further increase the flash ADC speed, a pipelined flash ADC is also developed, where the first stage employs current-mode logic (CML) comparators to enhance the speed and the second stage employs a ping-pong structure with dynamic comparators to achieve high power efficiency. Designed in a 22-nm FDSOI CMOS process, the 15-GS/s 7-bit pipelined single-channel flash ADC achieves an SNDR of 41.34 dB and an SFDR of 49.36 dB at Nyquist frequency with a power consumption of 97.5 mW. The corresponding FOM<sub>w</sub> is 72 fJ/conv.-step.

Furthermore, an on-chip comparator offset calibration approach based on a successive-approximation (SA) search algorithm and the FDSOI back-gate bias is developed to provide sufficient comparator offset calibration range while avoiding comparator speed degradation.

V

# **TABLE OF CONTENTS**

| ACKNOWLEDGMENTSiii                                 |
|----------------------------------------------------|
| ABSTRACTiv                                         |
| TABLE OF CONTENTS vi                               |
| LIST OF TABLES xi                                  |
| LIST OF FIGURES xii                                |
| CHAPTER I INTRODUCTION                             |
| 1.1 Motivation1                                    |
| 1.2 Main Contribution                              |
| 1.3 Dissertation Organization                      |
| CHAPTER II HIGH-SPEED ADC REVIEW 6                 |
| 2.1 Time-Interleaved ADC Architectures             |
| 2.2 Design Considerations of Time-Interleaved ADCs |
| 2.2.1 Timing Skew                                  |
| 2.2.2 Gain and Offset 11                           |
| 2.2.3 Bandwidth                                    |
| 2.2.4 Summary                                      |
| 2.3 Review of Power Efficient Flash ADC Structures |
| 2.3.1 Folding Flash ADC                            |

|     | 2.3.2   | Partially Active Two-stage Flash ADC                       | 15      |
|-----|---------|------------------------------------------------------------|---------|
|     | 2.3.3   | Sub-Ranging Flash ADC                                      | 16      |
|     | 2.3.4   | Interpolation in Flash ADC                                 | 17      |
| 2.4 | Flash   | n ADC Offset Calibration Techniques                        | 17      |
| 2.5 | 5 State | e-of-the-Art Flash ADCs                                    | 19      |
| CHA | APTER   | III FLASH ADC WITH PARTIALLY ACTIVE                        | 2-STAGE |
| CON | MPARI   | SON AND 2× TIME-DOMAIN INTERPOLATION                       |         |
| 3.1 | Moti    | vation                                                     | 21      |
| 3.2 | 2 ADC   | C Top Level Architecture                                   |         |
| 3.3 | B Desi  | gn Considerations of 2-Stage Comparison and 2× TDI         | 24      |
| 3.4 | Banc    | dwidth Analysis of the T/H with A 25%-75% Duty-Cycle Clock |         |
| 3.5 | 6 Circo | uits Implentation                                          |         |
|     | 3.5.1   | T/H and T/H Buffer                                         |         |
|     | 3.5.2   | Comparator with Kick-Back Noise Mitigation                 |         |
|     | 3.5.3   | 25%-75% Duty-Cycle Clock Generation                        |         |
| 3.6 | 6 Mea   | surements                                                  |         |
|     | 3.6.1   | Measurement Setup                                          |         |
|     | 3.6.2   | Measurement Results                                        |         |
| 3.7 | Conc    | clusion                                                    | 45      |

| CHAPTER IV TWO-WAY TIME-INTERLEAVED FLASH ADC WITH             |
|----------------------------------------------------------------|
| SUCCESSIVE-APPROXIMATION COMPARATOR OFFSET CALIBRATION 47      |
| 4.1 Motivation                                                 |
| 4.2 ADC Top Level Architecture                                 |
| 4.3 High-Speed Comparator with 2× Voltage-Domain Interpolation |
| 4.4 Comparator Offset Calibration Analysis                     |
| 4.4.1 Transistor Threshold Voltage with FDSOI Back-Gate Bias   |
| 4.4.2 Offset Calibration Loop Using the SA-Search Algorithm    |
| 4.5 Bandwidth Analysis of the High-Speed T/H 55                |
| 4.6 Circuit Implementation                                     |
| 4.6.1 Comparator Offset Calibration Circuits                   |
| 4.6.2 Wideband High-Speed Dynamic Encoder                      |
| 4.6.3 High-Speed Clock Generation and Distribution             |
| 4.6.4 Decimation Network                                       |
| 4.7 High-Speed Flash ADC Measurement                           |
| 4.7.1 ADC Chip and PCB Board                                   |
| 4.7.2 Measurement Setup                                        |
| 4.7.3 Measurement Results                                      |
| 4.8 Conclusion                                                 |

| CHAPTER V PIPELINED FLASH ADC WITH A PING-PON                | G STRUCTURE IN |
|--------------------------------------------------------------|----------------|
| THE SECOND STAGE                                             | 74             |
| 5.1 Motivation                                               |                |
| 5.2 ADC Tope Level Architecture                              |                |
| 5.3 Design Considerations of the Pipelined Flash ADC         |                |
| 5.3.1 ADC Timing Analysis                                    |                |
| 5.3.2 ADC Power Analysis                                     |                |
| 5.3.3 Analysis of the 0.5-Bit Redundancy in the Coarse Flash | n ADC 80       |
| 5.3.4 ADC Bandwidth Analysis                                 |                |
| 5.3.5 Comparator Noise Analysis                              |                |
| 5.4 Circuits Implementation                                  |                |
| 5.4.1 Source-Follower Based Bootstrapped T/H                 |                |
| 5.4.2 Comparator Offset Calibration Loop                     |                |
| 5.4.3 Clock Generation and Distribution                      |                |
| 5.4.4 Data Alignment and Process                             |                |
| 5.5 Simulation Results                                       |                |
| 5.6 Conclusions                                              |                |
| CHAPTER VI CONCLUSION AND FUTURE DIRECTIONS                  |                |
| 6.1 Conclusions                                              |                |
| 6.2 Future Directions                                        |                |

| 2FERENCES |
|-----------|
|-----------|

# **LIST OF TABLES**

Table 1 Performance Summary and Comparison with State-of-the-Art Flash ADCs...... 45Table 2 Performance Summary and Comparison with State-of-the-Art Flash ADCs...... 72Table 3 Performance Summary and Comparison with State-of-the-Art Flash ADCs...... 93

# LIST OF FIGURES

| Fig. 1.1. Block diagram of converting an analog signal into digital data                      | 1       |
|-----------------------------------------------------------------------------------------------|---------|
| Fig. 2.1. The block diagram of the M-way time-interleaved SAR ADC.                            | 6       |
| Fig. 2.2. TI-Flash ADC structure.                                                             | 7       |
| Fig. 2.3. Principles of the sign equality-based estimation method [52], [53]                  | 3       |
| Fig. 2.4. Principle of estimation technique with MAD [53], [54].                              | )       |
| Fig. 2.5. (a) Waveforms showing the effect of timing error. (b) Cross-correlation-based       | d       |
| timing skew detection block diagram [55]                                                      | )       |
| Fig. 2.6. Inline demux sampling network [6]12                                                 | 2       |
| Fig. 2.7. Folding flash ADC block diagram [23], [24] 13                                       | 3       |
| Fig. 2.8. Two-stage flash ADC block diagram [31], [32]14                                      | 1       |
| Fig. 2.9. Sub-ranging flash ADC block diagram [33]15                                          | 5       |
| Fig. 2.10. (a) Voltage-domain interpolation [57]-[59] and (b) time-domain interpolation       | n       |
| [25]–[30]16                                                                                   | 5       |
| Fig. 2.11. Comparator offset calibration approaches (a) current DACs, (b) combinatoria        | 1       |
| redundancy, (c) capacitor DACs, (d) threshold voltage adjustment and (e) clocks-skew          | V       |
| adjustment18                                                                                  | 3       |
| Fig. 3.1. (a) The block diagram of the proposed flash ADC, (b) the timing diagram of the      | е       |
| proposed flash ADC, and (c) the slice selection mechanism                                     | 2       |
| Fig. 3.2. Partial activation mechanism of two-stage comparators with $2\times$ time-domain    | n       |
| interpolation and 0.5-bit redundancy                                                          | 1       |
| Fig. 3.3. (a) 2×TDI implementation. (b) Voltage-to-time conversion in 2× TDI                  | 5       |
| Fig. 3.4. Interpolation error vs. $g_m$ mismatch between two neighboring comparators 26       | 5       |
| Fig. 3.5. The 25%-75% duty-cycle clocking scheme and timing budgets                           | 5       |
| Fig. 3.6. T/H buffer settling error tolerance (a) without 0.5-bit redundancy and (b) with 0.5 | -       |
| bit redundancy                                                                                | 7       |
| Fig. 3.7. (a) T/H buffer input and output waveforms. (b) T/H buffer bandwidth vs. settling    | 3       |
| time with 5-GHz sampling frequency and 6-bit accuracy                                         | 3       |
| Fig. 3.8. Simulated SNDR versus 1/H buffer bandwidth w/ and w/o 0.5-bit redundancy            |         |
| $\Sigma^{*} = 2.0  (25)$                                                                      | 1       |
| Fig. 3.9. (a) Slice selection logic with OR function embedded in the second-stage             | е<br>1  |
| comparator. (b) Selection signal with embedded OR saves 16 ps                                 | l       |
| Fig. 3.10. (a) 1/H input and output under the 5-GHZ 25%-/5% duty-cycle clock condition        | ו.<br>ר |
| (b) 1/H circuit during the tracking period and its equivalent first-order RC model            | 2<br>1  |
| Fig. 3.11. Schematics of the 1/H and 1/H buffer                                               | ŧ       |
| Fig. 3.12. (a) First-stage comparator with kickback noise mitigation. (b) Simulation results  | S<br>⊿  |
| of the kickback noise mitigation. $3^2$                                                       | +       |
| Fig. 5.15. Block diagram of 25% duly cycle clock generation                                   | )<br>7  |
| Fig. 3.14. (a) Die photo, (b) chip layout, and (c) PCB                                        | /<br>5  |
| Fig. 5.15. Ivicasurement setup block diagram                                                  | с<br>С  |
| Fig. 2.17. Data contura                                                                       | 1<br>0  |
| Fig. 2.19 Massured DNL and INL before and after affect calibration                            | 1       |
| Fig. 5.16. Wieasured DNL and INL before and after offset canoration                           | J       |

| Fig. 3.19. Measured output spectrum before and after comparator calibration with a             | low  |
|------------------------------------------------------------------------------------------------|------|
| frequency input (decimated by 55).                                                             | . 41 |
| Fig. 3.20. Measured output spectrum before and after comparator calibration with a r           | near |
| Nyquist frequency input (decimated by 55)                                                      | . 42 |
| Fig. 3.21. Measured SNDR/SFDR vs. sampling frequency with a 200 MHz input                      | . 43 |
| Fig. 3.22. Measured SNDR/SFDR vs. input frequency at 5 GS/s                                    | . 43 |
| Fig. 3.23. ADC power breakdown at 5 GS/s.                                                      | . 44 |
| Fig. 4.1. The proposed 2-way time-interleaved flash ADC with SA-based comparator of            | fset |
| calibration scheme.                                                                            | 48   |
| Fig. 4.2. Schematic of the pre-amplifier and the StrongArm latch                               | . 50 |
| Fig. 4.3. (a) Interpolation block diagram and (b) voltage interpolation curve                  | . 50 |
| Fig. 4.4. Simplified cross-sections of (a) Bulk CMOS and (b) FDSOI CMOS [65]                   | . 52 |
| Fig. 4.5. (a) Simulation results of the threshold voltage vs the back-gate voltage of L        | VT   |
| NMOS in 28nm FDSOI. (b) Simulated input-referred offset of the comparator                      | . 52 |
| Fig. 4.6. The SA-based offset calibration flow.                                                | . 53 |
| Fig. 4.7. Calibrated comparator input-referred offset over PVT variations                      | . 54 |
| Fig. 4.8. SA-based automatic comparator offset calibration diagram.                            | . 56 |
| Fig. 4.9. Schematics of the enable logic and the offset sign latch                             | . 57 |
| Fig. 4.10. Schematic of the modified R-2R DAC with split-2R units                              | . 58 |
| Fig. 4.11. Back-gate bias voltage generation in the initialization phase                       | . 59 |
| Fig. 4.12. Simulated back-gate bias voltages and input-referred offset of the SA calibrat      | tion |
| process with an input-referred offset being 30.4 mV.                                           | . 60 |
| Fig. 4.13. Simulated comparator offset calibration range                                       | . 60 |
| Fig. 4.14. Fat-tree based 4-bit high-speed encoder                                             | . 61 |
| Fig. 4.15. Schematic of the input clock buffer and clock distribution circuits                 | . 62 |
| Fig. 4.16. Decimation network block diagram.                                                   | . 63 |
| Fig. 4.17. Chip micrograph.                                                                    | . 64 |
| Fig. 4.18. Custom-designed PCB for ADC testing                                                 | . 65 |
| Fig. 4.19. Measurement setup block diagram.                                                    | . 65 |
| Fig. 4.20. Lab measurement setup.                                                              | . 66 |
| Fig. 4.21. Data capture.                                                                       | . 67 |
| Fig. 4.22. Measured (a) DNL and (b) INL before and after offset calibration                    | . 68 |
| Fig. 4.23. FFT plot when fin is 0.2343 GHz with comparator offset calibration and time         | ning |
| skew calibration (decimated by 113).                                                           | . 69 |
| Fig. 4.24. FFT plot when fin is 0.2343 GHz without comparator offset calibration               | and  |
| timing skew calibration (decimated by 113)                                                     | . 69 |
| Fig. 4. 25. FFT plot when $f_{in}$ is 9.921875 GHz with comparator offset calibration and time | iing |
| skew calibration (decimated by 113).                                                           | . 70 |
| Fig. 4.26. FFT plot when f <sub>in</sub> is 9.921875 GHz without comparator offset calibration | and  |
| timing skew calibration (decimated by 113)                                                     | . 70 |
| Fig. 4.27. Measured SNDR and SFDR versus input frequency                                       | . 71 |
| Fig. 4.28. ADC power breakdown.                                                                | . 71 |
| Fig. 5.1. The proposed pipelined ping-pong flash ADC architecture.                             | 75   |
| Fig. 5.2. (a) The simplified pipelined flash ADC block diagram and (b) the timing diagr        | am.  |
|                                                                                                | . 77 |
| Fig. 5.3. Simulation result of the comparator power vs. speed                                  | . 79 |

| Fig. 5.4. (a) The conversion error induced by comparator offset and (b) the 0.5-bit        |
|--------------------------------------------------------------------------------------------|
| redundancy tolerates the comparator offset                                                 |
| Fig. 5.5. Simulated input-referred offset of the comparator in the coarse flash ADC 81     |
| Fig. 5.6. ADC input network                                                                |
| Fig. 5.7. Modified StrongArm latch to improve noise performance                            |
| Fig. 5.8. Simulated input referred noise of the conventional and improved StrongArm        |
| latches                                                                                    |
| Fig. 5.9. The schematic of the source-follower based boostrapped T/H 85                    |
| Fig. 5.10. Simulated SNDR vs. input frequency of the source-follower based bootstrapped    |
| Т/Н                                                                                        |
| Fig. 5.11. The 7-bit SA-based comparator offset calibration diagram                        |
| Fig. 5.12. The schematic of the 7-bit modified R-2R DAC                                    |
| Fig. 5.13. The schematic of clock generation and distribution circuit                      |
| Fig. 5.14. Block diagram of the high-speed data alignment and process                      |
| Fig. 5.15. The layout of the proposed ADC                                                  |
| Fig. 5.16. Simulated DNL and INL of the proposed ADC                                       |
| Fig. 5.17. Simulated output spectrum before and after comparator offset calibration with a |
| low-frequency input                                                                        |
| Fig. 5.18. Simulated output spectrum before and after comparator offset calibration with a |
| Nyquist-rate input                                                                         |
| Fig. 5.19. ADC dynamic performance                                                         |
| Fig. 5.20. ADC power breakdown                                                             |

# **CHAPTER I**

# INTRODUCTION

# 1.1 Motivation

Analog-to-Digital Converter (ADC) is used to convert analog signals into digital data as shown in Fig. 1.1. ADCs with tens of giga-samples-per-second (GS/s) conversion speed nowadays find various high-speed applications. In wireline communication systems, a front-end ADC is often employed to support advanced equalization techniques, mitigating inter-symbol interference (ISI) caused by high-speed signaling over bandwidth-limited channels. Direct RF sampling receiver in a wireless communication system employs a high-speed ADC to digitize the received signal and avoid a down-conversion process, achieving smaller form factors and lowering design costs. Other applications, such as radar systems and modern electronic test instruments, also demand high-speed ADCs for reliable detection and acquisition.

Latest works [1]–[20] have demonstrated that time-interleaved (TI) successiveapproximation-register (SAR) ADCs with advanced CMOS technologies can achieve a



Fig. 1.1. Block diagram of converting an analog signal into digital data.

high conversion speed up to 90 GS/s with highly competitive power efficiency. However, the required large number of sub-channels makes TI-SAR ADCs more susceptible to interchannel offset, gain, bandwidth mismatches, and timing skew, which require substantial on-chip [21] or off-chip calibration techniques [22] to mitigate performance degradation. Besides, the inter-stage buffers, the multi-channel reference generation, and multi-phase clock generation and distribution cause a substantial amount of power consumption in the TI-SAR ADCs. With these considerations, it is motivated to develop a faster sub-channel ADC, thus reducing the total number of sub-channel cores.

Flash ADC provides the highest conversion speed as compared to other ADCs, which can be used as a sub-channel ADC of a TI system to alleviate inter-channel mismatches, timing skews as well as the associated calibrations. However, flash ADC suffers from low power efficiency. Flash ADC dissipates a high power due to the exponentially increased number of comparators with the bit resolution. For the TI-flash ADC to be competitive, reducing flash ADC power consumption is mandatory. To improve flash ADC power efficiency, various techniques have been proposed in the latest works. [23], [24] develop a folding structure to reduce the number of comparators. However, the use of chopper degrades conversion speed. Time-domain latch interpolation (TDI) is widely investigated to reduce the number of comparators [25]–[30]. Yet, the voltage-to-time conversion (VTC) nonlinearity affects the interpolation accuracy in high-order TDI. [31]–[33] develop a 2-stage comparison structure to selectively activate comparators. Nevertheless, the conversion speed is affected due to carrying out two consecutive comparisons in one conversion phase.

Furthermore, comparator offset in flash ADC needs to be calibrated to ensure ADC accuracy. Thus, developing an offset calibration with sufficient calibration range and no comparator speed degradation is required. To calibrate comparator offset, various techniques have been developed, such as capacitive DACs [34], [35], reference pair redundancy [36], and combinatorial redundancy [37]. Yet, those approaches introduce capacitive loads in the signal path, inevitably degrading comparator speed. [38] proposes to apply transistor bulk voltage adjustment to calibrate comparator offset and avoid speed degradation. Yet, the threshold voltage adjustment range in bulk CMOS is very limited. A time-based comparator offset calibration approach is developed in [39], which avoids the comparator speed penalty, yet is subject to process-voltage-temperature (PVT) variations.

# **1.2 Main Contribution**

In this dissertation, three power-efficient high-speed flash ADCs are developed including an ADC with a partially active 2-stage comparison and 2× time-domain latch interpolation (TDI), a 2-way TI flash ADC with voltage-domain interpolation, and a pipelined flash ADC with a ping-pong structure in the second stage. Three flash ADCs achieve competitive power efficiency and high-speed performance as compared to the state-of-the-art.

The first flash ADC jointly employs a partially active 2-stage comparison structure and 2× TDI to improve power efficiency and avoid PVT-sensitive calibrations, such as time reference and voltage reference calibrations. The stringent timing constraint of the 2stage structure is addressed by developing a 25%-75% duty cycle clock, a 0.5-bit redundancy in the first comparison stage, and an embedded second-stage slice selection logic. The bandwidth requirements of the T/H and T/H buffer under the 25%-75% dutycycle clock are also analyzed. Fabricated in a 28-nm FDSOI CMOS process, the 5-GS/s 6bit ADC achieves an SNDR of 32.8 dB and an SFDR of 41.82 dB at Nyquist frequency while consuming 15.07 mW power, translating into a FOM<sub>w</sub> of 84.5 fJ/conv.-step

In the second work, a 2-way TI-flash ADC is developed, which employs dynamic comparators with a pre-amplifier stage to achieve 10 GS/s conversion speed for the subchannel ADC and voltage-domain interpolation to reduce power consumption. Besides, an on-chip comparator offset calibration approach utilizing a successive-approximation (SA) search algorithm and FDSOI back-gate bias is developed, which provides sufficient calibration range without impairing comparator speed. Fabricated in a 28-nm FDSOI CMOS process, the 20-GS/s 6-bit 2-way TI-flash ADC achieves an SNDR of 31.2 dB and an SFDR of 38.5 dB at Nyquist frequency, while consuming 204 mW power. The FOM<sub>W</sub> is 344 fJ/conv.-step.

In the third work, to further increase the flash ADC speed, a pipelined flash ADC is developed, where the first stage employs current-mode logic comparators to enhance the speed and the second stage employs a ping-pong structure with dynamic comparators to achieve high power efficiency. Besides, the partial activation of comparators and the  $2\times$  TDI are also employed to improve power efficiency. Designed in a 22-nm FDSOI CMOS process, the 15-GS/s 7-bit single-channel pipelined flash ADC achieves an SNDR of 41.34 dB and an SFDR of 49.36 dB at Nyquist frequency with a power consumption of 97.5 mW. The corresponding FOM<sub>w</sub> is 72 fJ/conv.-step.

# 1.3 Dissertation Organization

The dissertation is organized as follows: Chapter II starts with a review of highspeed TI-SAR and TI-flash ADCs. Then, the latest design techniques for flash ADC power efficiency improvement and comparator offset calibration are analyzed, respectively. Finally, the performance of the state-of-the-art flash ADCs are discussed.

Chapter III presents the design and implementation of the flash ADC with a partially active 2-stage comparison and 2× TDI. The architecture of the 2-stage comparison in conjunction with 2× TDI is firstly described. Then, the ADC power efficiency, conversion speed, and bandwidth requirements for the T/H and T/H buffer are analyzed. After that, the circuit implementations of major building blocks in this ADC are presented. Finally, measurement results are provided and the ADC performance is compared with the state-of-the-art.

Chapter IV elaborates on the implementation of the 2-way TI flash ADC. The StrongArm latch comparator with a pre-amplifier stage to enhance speed and 2× voltagedomain interpolation to reduce power consumption are firstly analyzed, respectively. Then, the SA-based comparator offset calibration employing FDSOI back-gate bias is presented. Next, the bandwidth requirement for high-speed T/H is discussed. After that, detailed circuit implementations are presented. Finally, measurement results of the prototype flash ADC are provided and compared with the state-of-the-art.

Chapter V details the design of the pipelined flash ADC. Firstly, the proposed flash ADC architecture is presented. Then, the design considerations including ADC timing, power efficiency, bandwidth, and comparator noise are analyzed. Next, implementations of the major building blocks are shown. Finally, the simulated ADC performance and performance comparison with the state-of-the-art are presented at the end of the chapter.

Chapter VI concludes this dissertation and discusses future directions.

# **CHAPTER II**

# **HIGH-SPEED ADC REVIEW**

In this chapter, TI-SAR and TI-flash ADC architectures are discussed, respectively. Then, timing skews and inter-channel mismatches in the TI structure are briefly introduced and the corresponding calibration techniques are analyzed. To alleviate timing skews, interchannel mismatches, and the associated calibrations, TI-flash ADC is further discussed. Next, the latest power-efficient techniques and comparator offset calibration methods for flash ADCs are analyzed. At last, the state-of-the-art flash ADC performance is analyzed.

# 2.1 Time-Interleaved ADC Architectures

As shown in Fig. 2.1, a block diagram of *M*-way TI SAR ADC with an overall conversion speed of  $f_s$  is presented, where *M* identical SAR ADCs operate at the speed of  $f_s/M$  and their sampling phases  $(\phi_1...\phi_M)$  are equally spaced with an interval of  $2\pi/M$ . With



Fig. 2.1. The block diagram of the M-way time-interleaved SAR ADC.

a large number of sub-channel ADCs employed, the conversion speed of each sub-channel ADC can be significantly relaxed, thus allowing the low-power SAR ADC to be applied. As depicted in the figure, the sub-channel SAR ADC consists of a track-and-hold (T/H) network, a feedback capacitive digital-to-analog converter (CDAC), a comparator, and SAR logic. Since no active amplification is involved and only one comparator is required, the sub-channel SAR ADC achieves a high power efficiency. However, TI-SAR ADC suffers from inter-channel mismatches and timing skews, which require complicated calibration approaches to mitigate performance degradation.

TI-flash ADC [40]–[44] is another approach to achieve a high conversion speed. A block diagram of M-way TI-flash ADC with an N-bit resolution is depicted in Fig. 2.2, where each sub-channel ADC consists of a T/H, a resistor ladder,  $2^{N}$ -1 comparators, and a thermometer-to-binary encoder. The input sampled signal is resolved by  $2^{N}$ -1 comparators in parallel and the *N*-bit conversion is completed in one clock cycle, thus achieving a high conversion speed. With the high-speed advantage, the number of sub-channels can be



Fig. 2.2. TI-Flash ADC structure.

significantly relaxed as compared to the TI-SAR ADC. However, a total number of  $2^{N}$ -1 comparators are activated in each clock cycle and only a few make a critical impact on the final decision, thus degrading power efficiency [45]–[51]. Furthermore, due to device mismatch, the random offset of the comparator can cause comparison error, thus resulting in signal-to-noise-distortion-ration (SNDR) degradation of flash ADC.

# 2.2 Design Considerations of Time-Interleaved ADCs

#### 2.2.1 Timing Skew

Timing skew among sub-ADCs results in the non-uniform sampling of the input signal, which causes undesired input-frequency-dependent spurs at the ADC output spectrum, thus degrading ADC SNDR. To reduce such spurs, timing skew calibration consisting of timing skew detection and correction is required. Various approaches have been developed to detect the timing skew. Reference [52] develops a sign-equality-based (SEB) method for timing skew estimation. The principle of the timing skew estimation method is depicted in Fig. 2.3. The timing skew polarity of sub-ADCs (leading or lagging the ideal sampling phase) can be detected by observing the sign equality of two



Fig. 2.3. Principles of the sign equality-based estimation method [52], [53].



Fig. 2.4. Principle of estimation technique with MAD [53], [54].



Fig. 2.5. (a) Waveforms showing the effect of timing error. (b) Cross-correlation-based timing skew detection block diagram [55].

coefficients, r, and e, which respectively represent the output difference between two adjacent sub-ADCs, and the output difference between the sub-ADC to be calibrated and a reference ADC. While the timing skew can be detected leveraging the SEB method, it requires a reference ADC running at a full sampling speed, which consumes considerable power, especially when the sampling frequency increases up to GHz. Reference [54] introduces a simplified timing skew polarity extraction algorithm based on mean absolute deviation (MAD), which requires no reference ADC. The principle of the timing skew polarity extraction is shown in Fig. 2.4. The algorithm is to leverage the absolute value difference between sub-ADC output data to detect timing skew polarity. When the sampled signal is located near a zero-crossing point, the sub-ADC output should also be very close to zero. Otherwise, it is the timing skew that deviates sub-ADC output from zero. While this detection approach is low-cost, it requires the analog input signal to have zero-crossing points and sufficient slopes, which are limited in practical applications. Reference [55] develops a cross-correlation-based method to detect timing skew, which doesn't require reference ADC or specific input signal patterns. The principle of the cross-correlation method is depicted in Fig. 2.5. However, post data process for cross-correlation computations introduces a high power penalty.

After the timing skew is obtained, the timing skew correction is carried out either in the analog domain or digital domain. One common approach is to adjust the sampling clock delay to correct the timing skew. However, the multi-phase clock generation and the distribution along with the programmable delay cells pick up clock jitter, thus exacerbating the overall ADC signal-to-noise ratio (SNR). To correct timing skew without introducing extra jitter to the sampling clock, adaptive finite-impulse-response (FIR) filters can be used. But this approach implemented in the digital domain incurs high power dissipation. To ensure that the timing skew residual can be tolerable by the ADC, the maximum timing skew residual should meet the following equation [40]

$$\sigma_{\tau}^{2} \leq \left(\frac{N}{N-1}\right) \cdot \left(\frac{2}{3 \cdot 2^{2B}}\right) \cdot \left(\frac{1}{(2\pi f)^{2}}\right), \tag{2.1}$$

where  $\sigma_r$ , *N*, *B*, and *f* represent the standard deviation of the timing skew, the number of interleaved channels, the bit resolution, and the frequency of the input sinusoidal signal, respectively.

#### 2.2.2 Gain and Offset

In the TI structure, gain and offset mismatches between sub-ADCs cause spurs at the ADC output spectrum, which degrade the time-interleaved ADC performance. While gain and offset mismatches can be detected and corrected by leveraging adaptive FIR filters, the corresponding power overhead highly relies on the activity factor of logic gates [56]. With the number of sub-ADC channels increasing, the power overhead due to the gain and offset mismatch calibrations is increased, thus limiting ADC power efficiency.

#### 2.2.3 Bandwidth

The bandwidth of TI ADCs is heavily affected as multiple capacitive loads from sub-channel ADCs are connected to the input at the same time. To alleviate bandwidth degradation, the common approach is to insert track-and-hold amplifiers (THAs) to drive sub-channel ADCs, isolating the capacitive loads from the TI ADC input. For applications requiring a large number of sub-channel ADCs to achieve several tens-of-GS/s conversion speed, inserting THAs to relax input bandwidth incurs tremendous power dissipation. An inline demux sampling network is proposed in [6] to reduce the capacitive loads at the ADC input. As depicted in Fig. 2.6, the inline demux sampling network stacks multiplestage samplers, where the first stage tracks and samples the analog signal while the rest stages serve as de-multiplexers. The inline-demux sampling network reduces the number of sub-channel ADCs connecting to the input by turning on a fraction of samplers.



Fig. 2.6. Inline demux sampling network [6].

However, inter-stage buffers are still required in the inline demux sampling network, which inevitably degrades ADC power efficiency.

### 2.2.4 Summary

TI-SAR ADC achieves a high conversion speed with a large number of subchannels. Due to the low-power characteristics of the SAR structure, the overall power efficiency is improved. However, the large number of sub-channels inevitably makes the TI-SAR ADC susceptible to timing skews and inter-channel mismatches, such as offset, gain mismatches. This requires substantial on-chip or off-chip calibrations as discussed above. The timing skew and inter-channel mismatch calibrations become even more complicated with the number of the sub-channels increased to achieve higher conversion speed. Therefore, TI-flash ADC is investigated and due to the reduced number of subchannels, the timing skew and inter-channel mismatch calibrations are relaxed. However, the flash ADC power consumption penalty is large. The following discusses the latest power consumption reduction methods and comparator offset calibration approaches for the high-speed flash ADC.

## 2.3 Review of Power Efficient Flash ADC Structures

#### **2.3.1 Folding Flash ADC**

Folding flash ADC [23], [24] completes an N-bit conversion by carrying out 2stage comparisons where the first- and the second-stage resolve K most significant bits (MSBs) and L least significant bits (LSBs), respectively (N=K+L). The number of comparators is reduced from 2<sup>N</sup>-1 to 2<sup>K</sup>+2<sup>L</sup>-2. To achieve this 2-stage comparison, the sampled signal must be rectified into a L-bit subrange based on the *K*-MSB quantization results, thus requiring a chopper or a folding amplifier. Fig. 2.7 shows the corresponding flash ADC block diagram with *K*=1, where the chopper rectifies the sampled signal if the



Fig. 2.7. Folding flash ADC block diagram [23], [24].

quantization result of the MSB comparator is 0. This requires the *K*-bit comparators to cover only half of the signal range, thus reducing the power consumption by half.

While the folding technique improves power efficiency, ADC conversion speed is significantly affected due to the signal rectification process. The rectified signal must be settled to N-bit accuracy for the second-stage quantization and the settling process degrades the ADC conversion speed.



Fig. 2.8. Two-stage flash ADC block diagram [31], [32].

#### 2.3.2 Partially Active Two-stage Flash ADC

To avoid the signal rectification process and relax the speed degradation effect, a partially active two-stage *N*-bit (N=K+L) flash ADC is proposed [31], [32], where the second stage employs *N*-bit comparators to cover the full-scale range, but only *L*-bit comparators are activated at a given time based on the first-stage outcomes. The partially active 2-stage flash ADC structure is depicted in Fig. 2.8. Due to the full-scale range coverage by the second-stage comparators, the partially active two-stage structure doesn't require the signal rectification process, thus relaxing the conversion speed degradation. Meanwhile, flash ADC power efficiency is improved by activating only a fraction of the second-stage comparators. However, the conversion speed of the two-stage structure is still limited due to two consecutive comparisons in one conversion phase.



Fig. 2.9. Sub-ranging flash ADC block diagram [33].

#### 2.3.3 Sub-Ranging Flash ADC

Sub-ranging flash structure with an *N*-bit resolution also applies coarse *K*-MSB and fine *L*-LSB quantizations. However, different from the two-stage comparison structure that employs *N*-bit comparators to cover the full-scale range at the second stage, the subranging flash ADC employs *L*-bit comparators with adjustable references to dynamically cover the full signal range. The sub-ranging flash ADC structure is depicted in Fig. 2.9. Such structure reduces the core ADC size. Besides, a reference-embedded comparator can be developed to further reduce ADC power consumption [33]. However, the conversion speed is still affected due to two consecutive comparisons in one conversion phase. Furthermore, the foreground reference calibration of the comparator is susceptible to voltage-and-temperature (VT) variations, thus making the design less robust.



Fig. 2.10. (a) Voltage-domain interpolation [57]–[59] and (b) time-domain interpolation [25]–[30].

#### **2.3.4** Interpolation in Flash ADC

Flash ADCs can adopt interpolations to reduce the number of pre-amplifiers or dynamic comparators. Interpolations are categorized into voltage-domain interpolation (VDI) [57]–[59] and time-domain interpolation (TDI) [25]–[30]. Voltage-domain interpolation is illustrated in Fig. 2.10(a), where the output voltage levels of adjacent pre-amplifiers are used to generate extra zero-crossing points, thus achieving the interpolation. The number of pre-amplifiers is reduced with the VDI. TDI leverages the input-dependent latching time difference between adjacent dynamic comparators to extract LSBs information as depicted in Fig. 2.10(b). The nonlinearity of the VTC, however, degrades the interpolation accuracy in high-order TDIs. In [29], an 8× TDI is developed with the VTC operating in the linear region to alleviate the VTC nonlinearity problem. The method, however, requires heavy time reference calibrations, which are sensitive to PVT and incur more power consumption.

# 2.4 Flash ADC Offset Calibration Techniques

In designing high-speed flash ADCs, comparators must achieve a high operation speed while maintaining low offset error. While CMOS technology scaling has greatly enhanced comparator speed, yet, the comparator offset becomes worse. To calibrate comparator offset, several techniques have been developed in the latest works as shown in Fig. 2.11. Reference [36] proposed to apply current DACs, introducing current imbalance on the signal path to compensate for the offset. Combinatorial redundancy is another approach to perform offset calibration proposed by the reference [37], where the size of input pairs is trimmed to cancel out the offset. It is also popular to apply the capacitive loading imbalance to calibrate comparator offset as presented in reference [34], [35]. The aforementioned comparator offset calibration approaches, however, inevitably increase capacitive loads to the signal path, thus degrading comparator speed. Transistor bulk voltage trimming is used to calibrate comparator offset and avoid speed penalty [38]. Nevertheless, the threshold voltage adjustment range in bulk CMOS is limited, which can hardly be used to support practical applications. A recent publication [39] proposes a clock-skew-based offset calibration approach, which provides a wide calibration range and



Fig. 2.11. Comparator offset calibration approaches (a) current DACs, (b) combinatorial redundancy, (c) capacitor DACs, (d) threshold voltage adjustment and (e) clocks-skew adjustment.

avoids speed penalty. The delay cells for clock skew adjustment, however, are sensitive to PVT variations.

#### 2.5 State-of-the-Art Flash ADCs

The Walden figure-of-merit ( $FOM_W$ ) is used to quantitatively evaluate ADC overall performance, which considers ADC power consumption, conversion speed, and the effective number of bits (ENOB). The  $FOM_W$  is expressed in the following equation

$$FOM_{W} = \frac{Power}{2^{ENOB} \times Speed}.$$
 (2.2)

With the latest power-efficient flash structures and comparator offset calibrations, the reported flash ADCs have achieved very low FOM<sub>w</sub>. In [33], a 4-GS/s 6-bit flash ADC is developed in a 28-nm FDSOI CMOS technology, employing a sub-ranging structure and reference-embedded comparator to achieve a FOM<sub>w</sub> of 26.8 fJ/conv.-step. Reference [29] develops a 6-GS/s 6-bit flash ADC in a 65-nm CMOS technology, which applies an 8× TDI with the VTC operating in the linear region to alleviate the VTC nonlinearity problem. The 6-GS/s 6-bit flash ADC achieves a FOM<sub>w</sub> of 85 fJ/conv.-step. While achieving such low FOM<sub>w</sub> and high conversion speed, respectively, yet, [29], [33] require complex architecture-level calibrations including time reference calibration and embedded reference calibration. Besides, those are foreground architecture-level calibrations, thus sensitive to PVT variations. In this dissertation, a power-efficient flash ADC structure employing partially active 2-stage comparison in conjunction with the 2× TDI is developed, which avoids the architecture-level calibrations. The timing constraint in the 2-stage comparison structure is also resolved to enhance the conversion speed.

Time-interleaved flash ADCs are often adopted [31], [32], [40]–[44] to achieve higher conversion speed. [41] develops a 20-GS/s 6-bit flash ADC with 8 sub-channels, which achieves a FOM<sub>w</sub> of 130 fJ/conv.-step. While the timing skew calibration has been greatly relaxed due to the reduced number of sub-channels as compared to the TI-SAR, yet, it still requires calibrating the timing skews between 8 sub-ADCs. This timing skew calibration becomes even more complicated as the overall conversion speed keeps increasing. Therefore, further increasing the sub-channel ADC conversion speed to reduce the number of sub-channel cores and the timing skew calibration is needed. In this dissertation, a high-speed sub-channel flash ADC with a reasonable power efficiency is proposed, which employs the StrongArm latch comparator with a pre-amplifier stage to enhance the speed to 10 GS/s and 2× voltage-domain interpolation to reduce power consumption. The sub-channel flash ADC is utilized in a two-way time-interleaved structure, which achieves an overall conversion speed of 20-GS/s with a significantly reduced timing skew calibration as compared to [41]. To further enhance single-channel flash ADC conversion speed, a pipelined 2-stage flash ADC with a ping-pong comparison structure in the second stage is developed, where the first stage employs CML comparators to enhance speed and the second stage employs a ping-pong structure with dynamic comparators to achieve high power efficiency.

# **CHAPTER III**

# FLASH ADC WITH PARTIALLY ACTIVE 2-STAGE COMPARISON AND 2× TIME-DOMAIN INTERPOLATION

#### 3.1 Motivation

The latest 2-stage work [33] applies an embedded-reference comparator to further reduce power consumption, which yet requires architecture-level calibration to ensure the embedded reference accuracy. Besides, such architecture-level calibration is foregroundbased and thus sensitive to VT variations. Furthermore, the conversion speed is still affected due to the two consecutive comparisons in one conversion phase. In this work, a partially active 2-stage comparison structure in conjunction with 2× time-domain interpolation (TDI) is developed to reduce ADC power consumption. The 2× TDI further reduces the number of comparators almost by half and avoids any architecture-level calibration. To enhance the conversion speed of the 2-stage comparison structure, a 25%-75% duty-cycle clock, a 0.5-bit redundancy in the first comparison stage, and an embedded second-stage slice selection logic are developed. The T/H and T/H buffer bandwidth requirements under the 25%-75% duty-cycle clock are also analyzed. The 5-GS/s 6-bit flash ADC prototype is designed and fabricated in a 28-nm FDSOI CMOS process. Measurement results show that this ADC achieves an SNDR of 32.8 dB at Nyquist frequency with a power consumption of 15.07 mW, translating into a FOM<sub>w</sub> of 84.5fJ/conv.-step. This chapter is organized as follows. Section 3.2 describes the proposed flash ADC structure. Section 3.3 presents the design considerations of the 2-stage comparison



Fig. 3.1. (a) The block diagram of the proposed flash ADC, (b) the timing diagram of the proposed flash ADC, and (c) the slice selection mechanism.

structure in conjunction with 2× TDI. Section 3.4 analyzes the T/H tracking bandwidth requirement under a 25%-75% duty-cycle clock. Section 3.5 introduces the transistor-level design of the proposed flash ADC. Section 3.6 provides the ADC testing methods and experimental results and Section 3.7 summarizes the ADC performance.

# **3.2 ADC Top Level Architecture**

Figs. 3.1(a) and (b) depict the block and timing diagrams of the proposed flash ADC. An external 5 GHz clock terminated with an on-chip 50- $\Omega$  resistor is converted to a
25%-75% duty cycle clock, allocating 50 ps to the T/H and 150 ps to the 2-stage comparison. A T/H buffer is designed to drive the comparators. The first stage carries out a 2.5-bit coarse quantization utilizing 4 voltage-domain comparators and 2 interpolation latches (ILs), dividing the full-scale range into seven subsections, V<sub>ref1~12</sub>, V<sub>ref1~20</sub>, V<sub>ref21~28</sub>,  $V_{ref29\sim36}$ ,  $V_{ref37\sim44}$ ,  $V_{ref45\sim52}$ , and  $V_{ref53\sim64}$ . The second stage consists of 32 voltage-domain comparators and 32 interpolation latches, which are segmented into eight slices,  $CMP_{1\sim8}$ , CMP9~16, CMP17~24, CMP25~32, CMP33~40, CMP41~48, CMP49~56, and CMP57~64. Two adjacent slices are activated at a given time to carry out the 4-bit fine quantization. The partial activation of the second-stage slices is achieved by a slice selection logic, which generates a 1-out-of-N selection signals  $S_{1-7}$  based on the first-stage comparison results  $D_{1-7}$ 6. The slice selection mechanism as depicted in Fig. 3.1(c) shows  $\pm 4$  LSBs redundancy to tolerate the T/H buffer settling error. The slice selection logic circuit consists of a 1-outof-N encoder followed by an OR logic stage. To meet the timing requirement in the 2-stage structure, the OR logic is embedded in the second-stage comparator circuit. The comparator offset is calibrated by adjusting the input pair's back-gate bias voltages through a successive-approximation (SA) algorithm. To start the back-gate bias from the middle point of the adjustment range, a modified R-2R digital-to-analog converter (DAC) is developed.

The operation mechanism of the proposed flash ADC is depicted in Fig. 3.2. With  $V_{in} = V_{ref25}$ , the first-stage comparison identifies that  $V_{in}$  falls into the region between  $V_{ref21}$  and  $V_{ref28}$ . Then, the second-stage slice [3] (CMP<sub>17~24</sub>) and slice [4] (CMP<sub>25~32</sub>) are active to perform a fine 4-bit quantization. The 2-stage comparison results are aligned and added



Fig. 3.2. Partial activation mechanism of two-stage comparators with 2× time-domain interpolation and 0.5-bit redundancy.

to generate the final 6-bit output. Both the first- and second-stage comparators utilize  $2 \times$  TDI to reduce the number of comparators by half.

## 3.3 Design Considerations of 2-Stage Comparison and 2× TDI

The 2-stage comparison in conjunction with  $2 \times$  TDI activates 12 voltage-domain comparators and 10 interpolation latches to complete a 6-bit conversion. As shown in Fig. 3.3(a), the voltage-domain comparator is implemented using a StrongArm latch and the interpolation latch is implemented using a set-reset (SR) latch. Since the power consumption of the SR latch is much less than that of the StrongArm comparator, the power efficiency is thus improved. For a dynamic latch comparator, the relation between input difference  $V_{in}$ - $V_{ref}$  and regeneration time  $T_{comp}$  is expressed by the following equation



Fig. 3.3. (a)  $2 \times TDI$  implementation. (b) Voltage-to-time conversion in  $2 \times TDI$ .

$$V_{\text{out}} = (V_{\text{in}} - V_{\text{ref}}) \times \exp\left(\frac{g_{\text{m}} T_{\text{comp}}}{C_{\text{load}}}\right), \qquad (3.1)$$

where  $g_{\rm m}$ ,  $C_{\rm load}$ ,  $V_{\rm out}$ , and  $V_{\rm ref}$  are the transconductance of the cross-coupled inverters, the load capacitance, the output voltage, and the reference voltage, respectively. With some derivations from (3.1),  $T_{\rm comp}$  can be expressed as

$$T_{\rm comp} = \frac{C_{\rm load}}{g_{\rm m}} \ln\left(\frac{V_{\rm out}}{V_{\rm in} - V_{\rm ref}}\right). \tag{3.2}$$

Fig. 3.3(b) shows the regeneration times of comparators CMP<sub>0</sub> and CMP<sub>2</sub>, and the regeneration time difference  $\Delta T_{\text{CMP}}$ . As shown in the figure, even though  $\Delta T_{\text{CMP}}$  versus  $V_{\text{in}}$  is nonlinear, yet, when  $V_{\text{in}} < V_{\text{refl}}$ ,  $\Delta T_{\text{CMP}} < 0$ . Therefore, the SR latch output is set to 0, and vice versa. The 2× TDI is achieved by comparing the regeneration time of CMP<sub>0</sub> with that of CMP<sub>2</sub> without a time reference, thus avoiding the time reference calibration. The  $g_{\text{m}}$  mismatch between CMP<sub>0</sub> and CMP<sub>2</sub> can affect the interpolation accuracy. Monte Carlo



Fig. 3.5. Interpolation error vs.  $g_m$  mismatch between two neighboring comparators.



Fig. 3.4. The 25%-75% duty-cycle clocking scheme and timing budgets.

simulation is performed, and the results show that the 3- $\sigma$  g<sub>m</sub> mismatch between CMP<sub>0</sub> and CMP<sub>2</sub> is about 14%. This causes an interpolation error of 0.58 mV (0.074 LSB) as shown in Fig. 3.4, which can be neglected.

The 2-stage comparison structure, while reducing the number of activated comparators, poses a stringent speed requirement to the comparators. To solve this problem, a 25%-75% duty-cycle clock scheme is developed as depicted in Fig. 3.5, where a total of 120 ps is assigned to the first- and second-stage comparators. The T/H buffer and the second-stage slice selection logic occupy 11 ps and 19 ps, respectively. The 25%-75%



Fig. 3.6. T/H buffer settling error tolerance (a) without 0.5-bit redundancy and (b) with 0.5-bit redundancy.

clock scheme increases the power consumptions of the clock generation and the T/H circuit by 0.52 mW and 0.2 mW, respectively. On the other hand, the power saving by the activated comparators due to the relaxed speed requirement is around 4.9 mW. Hence, the overall ADC power efficiency is improved. The metastability error rate of this 2-stage work is around  $3 \cdot 10^{-7}$ . To further reduce the metastability error rate, soft-decision selection [32] or metastability detector and bypass logic [60] can be applied. It should be mentioned that 2-stage pipelined structure can also relax the comparator speed requirement. However, the overall conversion speed is typically limited due to requiring an inter-stage residual amplifier (RA) [61]. While passive residual transfer with a TI structure on the second stage eliminates RA [62], it requires inter-channel mismatch calibration and increases die area. With an 11-ps timing budget, the T/H buffer requires a high bandwidth in order to maintain the effective resolution of the ADC. However, it is difficult to achieve a high bandwidth as the buffer needs to drive the first- and second-stage comparators. To address this problem, a 0.5-bit redundancy is developed in the first comparison stage. As depicted in Fig. 3.6, both cases start the coarse quantization when the T/H buffer output has a 4-LSB settling error. Only the case with a 0.5-bit redundancy carries out a correct digital output leveraging the redundant ±4-LSB coverage. The following quantitatively analyzes



Fig. 3.7. (a) T/H buffer input and output waveforms. (b) T/H buffer bandwidth vs. settling time with 5-GHz sampling frequency and 6-bit accuracy.

the T/H buffer bandwidth requirement with and without redundancy bit. Fig. 3.7(a) depicts the T/H buffer input and output waveforms. At the end of the tracking period, the maximum voltage deviation  $\Delta V_{buf}$  between the input and the output of the T/H buffer can be obtained as [46]

$$\Delta V_{\rm buf} = \left| 2A \times \sin\left(0.5 \tan^{-1}\left(\frac{f_{\rm in}}{f_{\rm buf}}\right)\right) \right|,\tag{3.3}$$

where A and  $f_{in}$  are the amplitude and frequency of the input signal applied to the T/H, and  $f_{buf}$  is the T/H buffer bandwidth. During the hold period, the input signal of the buffer is constant, and the output signal approaches the constant input exponentially over time. After a settling time of  $\Delta t_s$ , the  $\Delta V_{buf}$  is reduced to  $\Delta V_{buf, settle}$ . The  $\Delta V_{buf, settle}$  is designed to be less than 0.5 LSB. Such a requirement can be described as

$$\left|2A \times \sin\left(0.5 \tan^{-1}\left(\frac{f_{\text{in}}}{f_{\text{buf}}}\right)\right)\right| e^{-2\pi \,\Delta t_{\text{s}} f_{\text{buf}}} \le \frac{2A}{2^{N+1}},\tag{3.4}$$

where N is the resolution of the ADC. From (3.4),  $\Delta t_s$  versus  $f_{buf}$  in the Nyquist condition is obtained as



Fig. 3.8. Simulated SNDR versus T/H buffer bandwidth w/ and w/o 0.5-bit redundancy.

$$\Delta t_{\rm s} \ge \frac{1}{2\pi f_{\rm buf}} \ln \left\{ 2^{\rm N+1} \sin \left[ 0.5 \tan^{-1} \left( \frac{0.5 f_{\rm s}}{f_{\rm buf}} \right) \right] \right\},\tag{3.5}$$

where  $f_s$  is the sampling frequency. In this design, the 0.5-bit redundancy in the first stage is used to tolerate the settling error  $\Delta V_{\text{buf, settle}}$  up to ±4 LSBs as described in Section 3.2. This is equivalent to a 3-bit redundancy when referred to the LSB. As a result, the effective N in (3.5) is reduced by 3. With  $f_s$ =5 GHz, the  $f_{\text{buf}}$  versus  $\Delta t_s$  with and without the 0.5-bit redundancy in the first stage are plotted in Fig. 3.7(b). As shown in the figure, with an 11ps timing budget, the 0.5-bit redundancy greatly relaxes the T/H buffer bandwidth requirement from 26.2 GHz to 9.9 GHz. The analysis results are also validated by a 2-stage (2.5 MSBs + 4 LSBs) flash ADC behavior model simulation as depicted in Fig. 3.8. With the ADC maintaining an SNDR of 37.88 dB, the T/H buffer bandwidth requirement is alleviated from 26.8 GHz to 9.6 GHz by employing the 0.5-bit redundancy, which is in line with the analysis results. Although the 0.5-bit redundancy requires two more comparators and one more interpolation latch, yet, the power consumption of the T/H buffer is substantially reduced due to the alleviated bandwidth requirement, which overall reduces the ADC power consumption by about 2.3 mW.

Under the 9.9 GHz T/H buffer bandwidth condition, we next compare the comparator speed requirement in the proposed two-stage structure with the conventional single-stage design. As shown in Fig. 3.7(b), the T/H buffer requires a settling time of 44 ps in the conventional structure, and thus the timing budget available for the comparator is 56 ps. In the proposed 2-stage flash structure, a total of 120 ps is allocated to both comparator stages. Thus, the averaged timing budget for each stage is 60 ps, which is comparable to that of the conventional structure.

The second-stage slice selection logic, consisting of a 1-out-of-N encoder followed by an OR logic stage, has a timing budget of 19 ps. In this design, the OR function is embedded in the second-stage comparator as shown in Fig. 3.9(a). The comparator in slice [4] is used as an example. It has two clocked tail transistors controlled by  $S_3$  and  $S_4$ , respectively and only one selection signal is active at a time. Similarly, each reset transistor is implemented by two series-connected transistors which are also controlled by  $S_3$  and  $S_4$ . The slice selection logics with and without the embedded OR function are simulated, and



Fig. 3.9. (a) Slice selection logic with OR function embedded in the second-stage comparator. (b) Selection signal with embedded OR saves 16 ps.

the result is plotted in Fig. 3.9(b). As shown in the figure, the embedded OR function saves the selection logic time by 16 ps. The 19 ps timing budget is sufficient for the 1-out-of-N encoder, which is implemented using AND logic.

# 3.4 Bandwidth Analysis of the T/H with A 25%-75% Duty-Cycle Clock

The 25%-75% duty-cycle clock, while alleviating the comparator speed requirement, reduces the T/H tracking time by half. This demands a higher T/H tracking bandwidth to maintain the tracking accuracy. The tracking bandwidth requirement of the T/H under the 25%-75% clock is studied in this section. As depicted in Fig. 3.10(a), where  $T = \frac{1}{f_s}$ , the sinusoidal signal applied to the T/H can be expressed as

$$V_{\rm in, T/H}(t) = A \left[ 1 + \sin \left( 2\pi f_{\rm in} t + \varphi_0 \right) \right], \tag{3.6}$$



Fig. 3.10. (a) T/H input and output under the 5-GHz 25%-75% duty-cycle clock condition. (b) T/H circuit during the tracking period and its equivalent first-order RC model.

where  $\varphi_0$  is the phase of the input signal at the sampling clock rising edge, *t*=0. During the tracking period, the T/H can be modeled as a first-order low-pass filter with a time constant,  $\tau = R_{ON} \cdot (C_S + C_G)$ , as depicted in Fig. 3.10(b). The  $R_{ON}$  is the drain-to-source on resistor of the NMOS transistor, and  $C_S$  and  $C_G$  are the sampling capacitor and the T/H buffer gate capacitance, respectively. The T/H output  $V_{out, T/H}$  (t) can be derived using the following equation,

$$\tau \frac{dV_{\text{out, T/H}}(t)}{dt} + V_{\text{out, T/H}}(t) = V_{\text{in, T/H}}(t).$$
(3.7)

Solving (6),  $V_{\text{out, T/H}}(t)$  can be obtained as

$$V_{\text{out, T/H}}(t) = A \left[ 1 + \frac{\sin\left(2\pi f_{\text{in}}t + \varphi_0 + \Delta\varphi\right)}{\alpha} \right] + \left[ V_{\text{out, T/H}}(0) - A - A \frac{\sin(\varphi_0 + \Delta\varphi)}{\alpha} \right] e^{-\frac{t}{\tau}}, \quad (3.8)$$

where  $\Delta \varphi = \tan^{-1}(2\pi \tau f_{in})$  and  $\alpha = \sqrt{1 + (2\pi \tau f_{in})^2}$  are the phase shift and the signal amplitude attenuation, respectively.  $V_{\text{out,T/H}}(0)$  is derived with  $V_{\text{out,T/H}}(0) = V_{\text{out,T/H}}(-0.75\text{T}) \approx V_{\text{in,T/H}}(-0.75\text{T})$  as shown in Fig. 3.10(a). Since  $2\pi \tau f_{in} \ll 1$ ,  $\Delta \varphi$  and  $\alpha$  are thus approximately equal to 0 and 1, respectively. Therefore,  $V_{\text{out,T/H}}(t)$  can be approximated as

$$V_{\text{out, T/H}}(t) \approx V_{\text{in, T/H}}(t) + A \left[ \sin\left(-\frac{3\pi f_{\text{in}}}{2f_{\text{s}}} + \varphi_0\right) - \sin(\varphi_0) \right] e^{-\frac{t}{\tau}}, \quad (3.9)$$

where the first term is the desired T/H output signal, and the second term is the tracking error,  $\Delta V_{\text{T/H}}(t)$ . It can be shown that in the Nyquist condition,  $f_{\text{in}} = 0.5 f_{\text{s}}$ , the worst-case  $\Delta V_{\text{T/H}}(t)$  happens at  $\varphi_0 = \frac{3\pi}{8}$ , and is

$$\left|\Delta V_{\text{T/H, worst}}(t)\right| \approx 1.848A \cdot e^{-\frac{t}{\tau}}.$$
(3.10)

Under the 25%-75% clocking condition, the tracking time is 50 ps. In this design, the



Fig. 3.12. Schematics of the T/H and T/H buffer.



Fig. 3.11. (a) First-stage comparator with kickback noise mitigation. (b) Simulation results of the kickback noise mitigation.

tracking error at the end of the tracking period is designed to be less than 0.1 LSB, and thus the T/H tracking bandwidth requirement is derived as

$$f_{\text{tracking, T/H}} \ge \frac{2f_{\text{s}}\ln(9.24 \cdot 2^{N})}{\pi}.$$
(3.11)

With  $f_s$ =5GHz and N=6, the T/H tracking bandwidth requirement under the 25%-75% clock is obtained as 20.32 GHz according to Eq. 3.11. It should be mentioned that although

sinusoidal waveform is used in the above analysis, the methodology is also applicable to other input waveform scenarios. With a step input, the tracking bandwidth requirement is 20.56 GHz. The proposed ADC has a sufficient T/H tracking bandwidth due to the substantially reduced T/H load capacitance as a result of the T/H buffer.

### **3.5** Circuits Implentation

### 3.5.1 T/H and T/H Buffer

The T/H and T/H buffer schematics are illustrated in Fig. 3.11. The T/H circuit is bootstrapped to achieve a  $R_{on}$  of 12  $\Omega$  and has a loading capacitance of 62 fF, including 40 fF from the sampling capacitor and 22 fF from T/H buffer parasitic. Moreover, a pair of cross-coupled dummy switches are employed to avoid the signal coupling [63]. The T/H buffer is based on a source follower structure with an embedded RC degeneration filter to improve the bandwidth [32]. The T/H buffer drives a total capacitance of 380 fF including 234 fF from routing interconnects and 146 fF from the comparator bank input parasitic.

#### **3.5.2** Comparator with Kick-Back Noise Mitigation

Both the first- and second-stage dynamic comparators utilize StrongArm latch structure, where MOS capacitors are employed to mitigate the kickback noise [64]. Using the first-stage comparator as an example, as shown in Fig. 3.12(a), cross-coupled MOS capacitors  $M_1$  and  $M_2$  are employed to neutralize the kickback noise from the cross-coupled inverters ( $M_3$ - $M_6$ ), and MOS capacitors  $M_7$  and  $M_8$  are utilized to mitigate the kickback noise from the tail transistor ( $M_9$ ). The simulated kickback noise is depicted in Fig. 3.12(b), which shows that the kickback noise is significantly reduced by  $5\times$  with the MOS capacitors when  $|V_{in}-V_{ref}|$  reaches the full-scale voltage of 500 mV.

# 3.5.3 25%-75% Duty-Cycle Clock Generation

Fig. 3.13 shows the schematics of the input clock buffer as well as the generation of the 25% duty cycle sampling clock. The input clock buffer along with the differential to single-ended converter converts the sinusoidal signal to the squarewave clock. The 25%-75% duty cycle clock is generated with an AND gate using a 50% duty cycle clock and its 50-ps-delayed version as inputs. This delay is generated by inverter-based delay cells with MOS capacitor loads. To compensate for the PVT variation, the MOS capacitors are programmable with 6-bit control registers and the LSB MOS capacitor is sized to be 2  $\mu$ m/30 nm, which is equivalent to 1.8 fF. The 25%-75% duty cycle clock circuit consumes a power of 0.52 mW under a 0.9 V power supply.



Fig. 3.13. Block diagram of 25% duty cycle clock generation.

# 3.6 Measurements

# 3.6.1 Measurement Setup

The 5-GS/s 6-bit 2-stage flash ADC prototype design is fabricated in the 28-nm FDOSI CMOS process. The ADC die photo and layout are presented in Fig. 3.14(a) and



Fig. 3.14. (a) Die photo, (b) chip layout, and (c) PCB.



Fig. 3.15. Measurement setup block diagram.

(b) with an active area of 240 μm×390 μm. To reduce the signal and clock path routing parasitics, the ADC core is positioned in the upper left corner of the chip. The T/H buffer and the two-stage comparator banks layout are placed closely to minimize the parasitics as well. Both the first- and the second-stage comparator bank layout are in vertical strip shape to simplify the subsection activation encoder routing. The offset calibration blocks are grouped into 2 parts which are placed at the top and bottom of the core area, respectively. The bare die is packaged using the quad flat no-lead (QFN) technique and then surface-mounted on a custom-designed PCB board. The overall PCB board is shown in Fig. 3.14(c). The ADC measurement setup diagram is illustrated in Fig. 3.15. Two high-performance RF and microwave signal generators SMA 100B-B112 and MG3692B are used to generate the ADC input signal and the clock source, respectively. Two broadband baluns HL9402 are employed to convert the single-ended high-frequency signals into fully



Fig. 3.17. Lab measurement setup.



Fig. 3.16. Data capture.

differential ones. The ADC output data is decimated by a factor of 55 and the decimated

data is captured by the Tektronix 8-channel MSO58 oscilloscope. The laptop is used to analyze the captured data from the oscilloscope and evaluate the ADC performance. The Xilinx SP6 FPGA is used to program the ADC chip through the SPI (Serial Peripheral Interface) for initial ADC configuration, offset calibration, and measurement. The measurement setup in the laboratory and the data capture are presented in Fig. 3.16 and Fig. 3.17, respectively.

### 3.6.2 Measurement Results

Fig. 3.18 shows the measured differential and integral nonlinearities (DNL/INL). Before calibration, the peak DNL and INL are +1.27/-1.02 and +2.48/-1.42 LSB,



Fig. 3.18. Measured DNL and INL before and after offset calibration.

respectively. They are reduced to +0.5/-0.41 and +0.54/-0.71 LSB after calibration. The FFT plots before and after comparator offset calibration with a 117.18 MHz input are illustrated in Fig. 3.19. Since the ADC output codes are decimated by 55, the signal tone in the output spectrum is folded back to 26.28 MHz. With the comparator offset calibration, the SNDR and SFDR are improved from 27.8 dB and 33.97 dB to 36.1 dB and 43.1 dB, respectively. With a 2.451 GHz input, the signal tone in the output spectrum is folded back



Fig. 3.19. Measured output spectrum before and after comparator calibration with a low frequency input (decimated by 55).



Fig. 3.20. Measured output spectrum before and after comparator calibration with a near Nyquist frequency input (decimated by 55).

to 3.545 MHz after a decimation of 55. As depicted in Fig. 3.20, after the comparator offset calibration, the SNDR and SFDR are improved from 27.4 dB and 32.8 dB to 35.46 dB and 41.82 dB, respectively. The dynamic performance of the proposed ADC at 5 GS/s is shown in Fig. 3.21, where the ADC achieves an SNDR greater than 32.8 dB over the entire Nyquist bandwidth. The measured SNDR versus the sampling frequency with a 200 MHz input is depicted in Fig. 3.22. It shows an SNDR over 33.6 dB with the sampling frequency



Fig. 3.21. Measured SNDR/SFDR vs. sampling frequency with a 200 MHz input.



Fig. 3.22. Measured SNDR/SFDR vs. input frequency at 5 GS/s.

increasing up to 5 GHz. However, when the sampling rate reaches 5.5 GS/s, the SNDR is decreased to 23.3 dB due to insufficient time for the second-stage comparison. The prototyped ADC achieves an SNDR of 32.8 dB and an SFDR of 41.8 dB at Nyquist frequency and consumes 15.07 mW power, translating into a Walden FOM of 84.5 fJ/conv.- step. Fig. 3.23 shows the power breakdown of the ADC, where the first- and second-stage comparators consume 51.6% of the total power. Table 1 summarizes the ADC



Fig. 3.23. ADC power breakdown at 5 GS/s.

performance and compares it with recently published multi-GS/s 6-bit flash ADCs. The proposed ADC achieves an SNDR of 32.8 dB at Nyquist frequency with competitive power efficiency. With the timing constraint addressed by the 25%-75% duty-cycle clock, the 0.5-bit redundancy, and the embedded slice selection logic, the conversion speed of the proposed ADC reaches 5 GS/s, which is the highest as compared to other 2-stage works [31]–[33]. Furthermore, as compared to [29], [33], which require PVT-sensitive and complex architecture-level calibrations, the proposed ADC only calibrates the comparator offset without the need for architecture-level calibration. The comparator offset is calibrated by adjusting the transistor back-gate bias voltages and the adjustment is conducted with a successive-approximation search algorithm, which will be discussed in the next chapter.

| Specifications                | Wang [23]  | Liu [28]  | Cai [32]           | Yi [29]  | Yang [33]            | This work          |
|-------------------------------|------------|-----------|--------------------|----------|----------------------|--------------------|
|                               | TCASII 17' | ASSCC 15' | JSSC 17'           | JSSC 21' | TCASII 21'           |                    |
| Arabitatura                   | 2× Folding | 4× TDI    | 2 Stage            | ¢γ τDI   | Subrange             | 2-Stage with       |
| Arcintecture                  | 2^ Folding | 4^ IDI    | 2-Stage            | 8^ IDI   | with ER <sup>b</sup> | 2× TDI             |
| Technology [nm]               | 16         | 65        | 65                 | 65       | 28 FDSOI             | 28 FDSOI           |
| Resolution [bits]             | 6          | 6         | 6                  | 6        | 6                    | 6                  |
| Sampling Rate [GS/s]          | 4.0625     | 3.4       | 3.125 <sup>a</sup> | 6        | 4                    | 5                  |
| Supply Voltage [V]            | 0.9        | 1         | 1                  | 1        | 1                    | 0.9                |
| SNDR <sub>@Nyquist</sub> [dB] | 30.6       | 34.2      | 29.57ª             | 31.8     | 30.7                 | 32.8               |
| SFDR <sub>@Nyquist</sub> [dB] | 40.8       | 46.08     | 41ª                | 41       | 40                   | 41.82              |
| ENOB <sub>@Nyquist</sub> [dB] | 4.79       | 5.39      | 4.62ª              | 4.99     | 4.8                  | 5.15               |
| Power [mW]                    | 34.4       | 12.6      | 7.025ª             | 15.1     | 3                    | 15.07              |
| FOM [fJ/cs.]                  | 306        | 89        | 90ª                | 85       | 26.8                 | 84.5               |
| Core Area [mm <sup>2</sup> ]  | N/A        | 0.034     | 0.22ª              | 0.021    | 0.034                | 0.094              |
| Comparator Offset Cal.        | Off-Chip   | On-Chip   | Off-Chip           | Off-Chip | On-Chip              | On-Chip            |
|                               |            |           |                    |          |                      | (SA <sup>c</sup> ) |
| Architecture-Level Cal.       | No         | No        | No                 | Yes      | Yes                  | No                 |
|                               |            |           |                    |          |                      |                    |

Table 1 Performance Summary and Comparison with State-of-the-Art Flash ADCs.

<sup>a</sup>Single channel, <sup>b</sup>embedded reference, and <sup>c</sup>successive approximation (SA).

# 3.7 Conclusion

This presents a 5-GS/s 6-bit 15.07-mW flash ADC in 28-nm FDSOI CMOS technology. The ADC jointly employs partially active 2-stage comparison and 2× time-domain latch interpolation to improve power efficiency and avoid architecture-level calibration. To address the stringent timing constraint in the 2-stage comparison structure, three methods are developed including a 25%-75% duty cycle clock, a 0.5-bit redundancy in the first comparison stage, and an embedded second-stage slice selection logic. The T/H and T/H buffer bandwidth requirements under the 25%-75% duty-cycle clock condition are theoretically analyzed. Measurement results show that the ADC achieves an SNDR of

32.8 dB and an SFDR of 41.82 dB at Nyquist frequency, respectively, leading to a Walden FOM of 84.5 fJ/conv.-step.

# **CHAPTER IV**

# TWO-WAY TIME-INTERLEAVED FLASH ADC WITH SUCCESSIVE-APPROXIMATION COMPARATOR OFFSET CALIBRATION

### 4.1 Motivation

To achieve tens of giga-sample-per-second speed, TI-flash ADC has been widely studied [31], [32], [40]–[44]. While [41] has achieved a competitive FOM<sub>w</sub> of about 130 fJ/conv.-step, yet, the applied 8 sub-channels operating at 2.5 GS/s still require the timing skew calibration. [46] develops a single-core flash ADC to achieve a conversion speed of 24 GS/s in a 28 nm LP CMOS process. Although timing skew calibration is eliminated in that work, the ADC power dissipation is about 0.4 W. In order to further alleviate the timing skew calibrations as compared to [41] and at the same time avoid a high power consumption [46], a two-way time-interleaved flash ADC with voltage-domain interpolation is developed. The speed of this sub-ADC, as compared to that of the ADC in Chapter III, is increased to 10 GS/s by employing a pre-amplifier stage to the StrongArm latch comparator. Meanwhile, to maintain ADC power efficiency, the 2× voltage-domain interpolation (VDI) is developed to reduce the number of pre-amplifiers by half. To calibrate comparator offset and avoid speed penalty, the FDSOI back-gate bias is applied.

algorithm and implemented on-chip. Fabricated in a 28-nm FDSOI CMOS process, the 20-GS/s 6-bit two-way time-interleaved flash ADC achieves an SNDR of 31.2 dB at Nyquist frequency with a power consumption of 204 mW, translating into a Walden FOM of 344 fJ/conv.-step. This chapter is organized as follows. Section 4.2 describes the architecture



Fig. 4.1. The proposed 2-way time-interleaved flash ADC with SA-based comparator offset calibration scheme.

of the proposed 2-way time-interleaved flash ADC. Section 4.3 presents the high-speed comparator design and 2× voltage-domain interpolation. Section 4.4 introduces the SA-based comparator offset calibration scheme and design details. Section 4.5 presents the details of the wideband T/H bandwidth analysis and derivations. Section 4.6 briefs the circuit implementation of the major block in this ADC. Section 4.7 provides the ADC testing methods and experimental results, and Section 4.8 concludes the chapter.

# 4.2 ADC Top Level Architecture

Fig 4.1 depicts the block diagram of the proposed flash ADC with 2 sub-channels driven by a 10-GHz differential clock. In the sub-channel, the high-frequency analog input signal is directly sampled by a 48  $\mu$ m/ 30 nm Low- $V_{TH}$  NMOS transistor with a 1V forward back-gate bias to reduce the transistor's on resistance to around 14  $\Omega$ . The total sampling capacitance consisting of the sampling capacitor and the parasitics from the T/H buffer is around 100 fF. The T/H buffer employs an RC source-degenerated structure to improve the bandwidth and drive the 6-bit comparator array with 2× VDI. Each comparator is developed with a pre-amplifier followed by a StrongArm latch. Each comparator has a dedicated on-chip offset calibration loop which stores the comparator offset polarities information and then adjusts the input pair's back-gate bias of the StrongArm latch to compensate the comparator offset. Thermometer data from this 6-bit comparator array are converted to binary codes by utilizing a static logic fat-tree encoder. The high-speed comparison data from both sub-ADCs are decimated by a ratio of 113 and then multiplexed for the off-chip measurement.

# 4.3 High-Speed Comparator with 2× Voltage-Domain Interpolation

Fig 4.2 illustrates that the high-speed comparator consists of a pre-amplifier followed by a StrongArm latch. The objectives of the pre-amplifier are to improve the comparator gain to enhance regeneration speed and isolate the T/H output signal from the kickback noise. In this work, the pre-amplifier is designed to achieve 7-dB gain with a bandwidth of 12 GHz, which supports the comparator to achieve a speed of 10 GHz. To further enhance speed, a fully CML comparator [46] can be developed. Yet, the power consumption is significantly increased.



Fig. 4.3. Schematic of the pre-amplifier and the StrongArm latch.



Fig. 4.2. (a) Interpolation block diagram and (b) voltage interpolation curve.

To reduce comparator power consumption, the  $2 \times VDI$  is applied to the comparator array as depicted in Fig. 4.3(a). Since the output signal of the pre-amplifier is proportional to its input signal, the zero-crossing of the adjacent pre-amplifiers outputs can be used to extract the LSB as shown in Fig. 4.3(b). The power consumptions of the pre-amplifier and the StrongArm latch are 0.8 mW and 1 mW, respectively. Half of the pre-amplifiers are saved due to the interpolation and thus the power consumption of the 6-bit comparator bank is reduced by 19% as compared to the conventional 6-bit design.

### 4.4 Comparator Offset Calibration Analysis

Due to transistor mismatches, comparator random offset is a critical issue that significantly degrades the flash ADC performance. Various offset calibration approaches have been proposed to calibrate the comparator offset. However, they either introduce a speed penalty to the comparator or have a limited calibration range. In this section, transistor back-gate bias in the FDSOI CMOS technology is investigated to perform the comparator offset calibration, providing a sufficient calibration range without impairing speed performance. Furthermore, the comparator offset calibration loop is designed with a successive-approximation (SA) search algorithm and implemented on-chip.

### 4.4.1 Transistor Threshold Voltage Adjustment with FDSOI Back-Gate Bias

Adjusting the transistor bulk voltage  $V_{\rm B}$  changes the transistor threshold voltage  $V_{\rm TH}$  accordingly. The body bias effect can be expressed as follows

$$V_{\rm TH} = V_{\rm TH0} + \gamma \left( \sqrt{\left| V_{\rm SB} + 2\phi_{\rm F} \right|} - \sqrt{\left| 2\phi_{\rm F} \right|} \right), \tag{4.1}$$

where  $V_{SB}$  is the transistor source-to-bulk voltage,  $V_{TH0}$  is the threshold voltage when the  $V_{SB}$  is 0V,  $\gamma$  is the body effect coefficient, and  $\phi_F$  is the Fermi level. While body biasing can be employed in both bulk CMOS and FDSOI CMOS, the latter enables a much wider



Fig. 4.4. Simplified cross-sections of (a) Bulk CMOS and (b) FDSOI CMOS [65].



Fig. 4.5. (a) Simulation results of the threshold voltage vs the back-gate voltage of LVT NMOS in 28nm FDSOI. (b) Simulated input-referred offset of the comparator.

adjustment range. Fig. 4.4 shows the simplified cross sections for both the bulk CMOS and FDSOI CMOS. Unlike the bulk CMOS, where the maximum body bias voltage is constrained by the source-to-bulk P-N junction leakage and potential latch-up, FDSOI CMOS employs a buried oxide (BOX) layer for isolation and a ground plane (GP) implant for improving the body biasing efficiency. In addition, the isolation property due to the BOX layer enables a much higher bulk voltage  $V_{\rm B}$ , therefore achieving a wider  $V_{\rm TH}$  adjustment range. The  $V_{\rm TH}$  of the low threshold voltage (LVT) NMOS with a feature size of 6µm/30nm versus the back-gate bias voltage  $V_{\rm B}$  in the 28-nm FDSOI technology is simulated as shown in Fig. 4.5(a), where the transistor  $V_{\rm TH}$  adjustment ratio is approximately 84 mV/V. Meanwhile, Monte-Carlo simulations show that the 6- $\sigma$  input-referred offsets of the comparator in this design is ±48.6 mV as shown in Fig. 4.5(b). The FDSOI back-gate bias provides a sufficient calibration range covering the 6- $\sigma$  offsets and



Fig. 4.6. The SA-based offset calibration flow.



Fig. 4.7. Calibrated comparator input-referred offset over PVT variations.

is thus adopted. The back-gate bias-based offset calibration does not introduce any capacitive load in the signal path, and therefore the comparator speed is not affected.

### 4.4.2 Offset Calibration Loop Using the SA-Search Algorithm

Since flash ADC consists of a large number of comparators, manually calibrating each comparator's offset is time-consuming and prone to error. To address this issue, a foreground comparator offset calibration loop employing a successive-approximation (SA) algorithm is developed. It automatically calibrates the comparator's offset at power-up.

The SA calibration loop consists of 6 latches (SAL[1–6]) to store the offset polarities, 2 customized R-2R DACs to generate the back-gate bias voltages  $V_{BP}$  and  $V_{BN}$ , and 6 enable logics (EN[1–6]) to sequentially carry out the 6 SA cycles. The offset calibration flow chart is depicted in Fig. 4.6. In the initialization phase (*Rst*=1),  $V_{BP}$  and  $V_{BN}$  are reset to 0.492V<sub>DD</sub>, the middle point of the back-gate bias adjustment range. The outputs of the SA latches and the enable logics,  $V_{SAL[1-6]_P}$ ,  $V_{SAL[1-6]_N}$ , and  $V_{EN[1-6]}$ , are all reset to 0. The first SA cycle starts by setting *Rst*=0 and  $V_{EN[1]}$ =1. The offset polarity indicated by the comparator output,  $V_{CMP_P}$  and  $V_{CMP_N}$ , is saved into SAL[1]. If the offset polarity is positive,  $V_{BP}$  is reduced by 0.25V<sub>DD</sub> and  $V_{BN}$  is increased by 0.25V<sub>DD</sub>, and vice versa. Meanwhile,  $V_{EN[1]}$  is turned into 0 to complete the first SA cycle and  $V_{EN[2]}=1$  is generated and synchronized with a clock signal to enable the second SA cycle. The rest of the SA cycles are carried out in the same manner. The offset calibration range is designed as 66 mV at the nominal condition (0.9V, 27°C, TT corner). With the back-gate bias voltages obtained under this condition, the input-referred offset residual over PVT is simulated. As shown in Fig. 4.7, the worst-case input-referred offset residual is 0.28 LSB.

## 4.5 Bandwidth Analysis of the High-Speed T/H

As the sampling frequency goes to 10 GHz, a high T/H tracking bandwidth is demanded to maintain the sampling accuracy. This section analyzes the T/H tracking bandwidth requirement at the 10-GHz sampling frequency. Similar to section 3.4, the input sinusoidal input signal applied to the T/H can be expressed as

$$V_{\rm in, T/H}(t) = A \left[ 1 + \sin \left( 2\pi f_{\rm in} t + \varphi_0 \right) \right].$$
(4.2)

The T/H output  $V_{\text{out, T/H}}(t)$  can be obtained as

$$V_{\text{out, T/H}}(t) = A \left[ 1 + \frac{\sin\left(2\pi f_{\text{in}}t + \varphi_0 + \Delta\varphi\right)}{\alpha} \right] + \left[ V_{\text{out, T/H}}(0) - A - A \frac{\sin(\varphi_0 + \Delta\varphi)}{\alpha} \right] e^{-\frac{t}{\tau}}, \quad (4.3)$$

where  $\Delta \varphi = \tan^{-1}(2\pi \tau f_{in})$  and  $\alpha = \sqrt{1 + (2\pi \tau f_{in})^2}$  are the phase shift and the signal amplitude attenuation, respectively. Under the 50%-50% duty-cycle clock condition,  $V_{out,T/H}(0)$  is derived with  $V_{out,T/H}(0) = V_{out,T/H}(-0.5T) \approx V_{in,T/H}(-0.5T)$ . Since  $2\pi \tau f_{in} \ll 1$ ,  $\Delta \varphi$  and  $\alpha$  are thus approximately equal to 0 and 1, respectively. Therefore,  $V_{out,T/H}(t)$  can be approximated as

$$V_{\text{out, T/H}}(t) \approx V_{\text{in, T/H}}(t) + A \left[ \sin\left(-\frac{\pi f_{\text{in}}}{f_{\text{s}}} + \varphi_0\right) - \sin(\varphi_0) \right] e^{-\frac{t}{\tau}}, \quad (4.4)$$

where the first term represents the desired T/H output signal, and the second term represents tracking error,  $\Delta V_{\text{T/H}}(t)$ . It can be shown that in the Nyquist condition,  $f_{\text{in}}=0.5 f_{\text{s}}$ , the worst-case  $\Delta V_{\text{T/H}}(t)$  happens at  $\varphi_0 = \frac{\pi}{4}$ , and is

$$\Delta V_{\text{T/H, worst}}(t) = -\sqrt{2}Ae^{-\frac{t}{\tau}}.$$
(4.5)

With the sampling frequency being 10 GHz, the tracking time is 50 ps.  $\Delta V_{T/H, \text{ worst}}(t)$  is designed to be less than 0.1 LSB. Thus, the T/H bandwidth requirement can be expressed as

$$f_{3-dB, \sin} \ge \frac{f_{\rm s} \ln(2^{N+0.5} \cdot 5)}{\pi}.$$
 (4.6)

With  $f_s=10$  GHz and N=6, the T/H tracking bandwidth requirement is obtained as 19.46 GHz. The designed T/H has a  $R_{on}$  of around 14  $\Omega$  and a loading capacitance of around 100 fF, thus sufficiently meeting the T/H tracking bandwidth requirement.



Fig. 4.8. SA-based automatic comparator offset calibration diagram.

# 4.6 Circuit Implementation

### 4.6.1 Comparator Offset Calibration Circuits

Fig. 4.8 shows the diagram of the SA-based comparator offset calibration circuits. The comparator offset polarities saved into SAL[1–6] are used to control the R-2R DAC1 and DAC2, and the EN[1–6]. The DAC1 and DAC2 generate the back-gate bias voltages,  $V_{BP}$  and  $V_{BN}$ , respectively. Fig. 4.9 depicts the schematics of the SAL and the EN circuits. The SAL is implemented using the cross-couple inverters (M<sub>1</sub>–M<sub>4</sub>) to store the offset polarity. The SAL output can be reset to V<sub>SS</sub> using M<sub>5</sub> and M<sub>6</sub>. The comparator outputs are written to the SAL using M<sub>7</sub> and M<sub>12</sub>. As depicted in the figure, the  $V_{EN[i]}$  is generated by using the output signals of SAL[i-1],  $V_{SAL[i-1]_P}$  and  $V_{SAL[i-1]_N}$ , through a NOR gate, which is further synchronized with the clock signal,  $Clk_n$  or  $Clk_p$ . The EN[i] enables the i-th SA cycle and also turns on transistors M<sub>7</sub> and M<sub>12</sub> passing the i-th polarity result to the SAL[i]. Once the offset polarity is saved into the SAL[i],  $V_{SAL[i]_P}$  and  $V_{SAL[i]_N}$  will set  $V_{EN[i]}$  to be 0, thus ending the i-th SA cycle.



Fig. 4.9. Schematics of the enable logic and the offset sign latch.



Fig. 4.10. Schematic of the modified R-2R DAC with split-2R units.

Fig. 4.10 shows the circuit implementation of the customized R-2R DAC. To ensure that  $V_{BP}$  and  $V_{BN}$  search starts at 0.492V<sub>DD</sub>, each 2R unit is split into two paralleled 4R resistors, thus providing 3 voltage levels, V<sub>SS</sub>, V<sub>DD</sub>, and 0.5V<sub>DD</sub>. In the initialization phase, each split-2R unit has one of the 4R resistors tied to V<sub>DD</sub> and the other tied to V<sub>SS</sub>, thus generating the back-gate bias voltage of 0.492V<sub>DD</sub>. The back-gate bias generation in the initialization phase is depicted in Fig. 4.11. During the first SA cycle, if  $V_{SAL[1]_P}=1$  and  $V_{SAL[1]_N}=0$ ,  $V_{BP}$  is then reduced by 0.25V<sub>DD</sub>, which is achieved by having the DAC1's first 4R resistor pair tied to V<sub>SS</sub>. Meanwhile,  $V_{BN}$  is increased by 0.25V<sub>DD</sub> and this is achieved by connecting the DAC2's first 4R resistor pair to V<sub>DD</sub>. The rest of the  $V_{BP}$  and  $V_{BN}$ generations follow the same manner. The  $V_{BP}$  and  $V_{BN}$  generations are expressed using the following equations

$$V_{\rm BP} = \begin{cases} 0.492 \cdot V_{\rm DD} & Rst=1 \text{ (reset)} \\ V_{\rm BP} - 0.5^{i+1} \cdot V_{\rm DD} & V_{\rm SAL[i]\_P} > V_{\rm SAL[i]\_N}, \\ V_{\rm BP} + 0.5^{i+1} \cdot V_{\rm DD} & V_{\rm SAL[i]\_P} < V_{\rm SAL[i]\_N} \end{cases}$$
(4.7)
and 
$$V_{\rm BN} = \begin{cases} 0.492 \cdot V_{\rm DD} & Rst=1 \text{ (reset)} \\ V_{\rm BP} + 0.5^{i+1} \cdot V_{\rm DD} & V_{\rm SAL[i]_P} > V_{\rm SAL[i]_N} \\ V_{\rm BP} - 0.5^{i+1} \cdot V_{\rm DD} & V_{\rm SAL[i]_P} < V_{\rm SAL[i]_N} \end{cases}$$
 (4.8)

Fig. 4.12 shows an example of the offset calibration loop, where the comparator input-referred offset ( $V_{OS}$ ) is initially set as 30.4 mV. After 6 SA cycles, the V<sub>OS</sub> is reduced



Fig. 4.11. Back-gate bias voltage generation in the initialization phase.



Fig. 4.12. Simulated back-gate bias voltages and input-referred offset of the SA calibration process with an input-referred offset being 30.4 mV.



Fig. 4.13. Simulated comparator offset calibration range.

to 0.46 mV. The simulated calibration range is depicted in Fig. 4.13, which sufficiently

covers the  $6-\sigma$  comparator offset.

# 4.6.2 Wideband High-Speed Dynamic Encoder

The thermometer outputs of the comparator array are converted into 6-bit binary data utilizing a thermometer-to-binary encoder. In this work, a fat-tree structure [66] is



Fig. 4.14. Fat-tree based 4-bit high-speed encoder.

adopted to implement this high-speed encoder. The main advantage of the fat-tree encoder is its high encoding speed. Algorithmically, the fat-tree circuit signal delay is  $O(log_2N)$ whereas the ROM-based encoder signal delay is O(N) and the Wallace tree encoder signal delay is  $O(log_{1.5}N)$ , making the fat-tree encoder have the fastest speed. For simplicity, a 4bit fat-tree encoder is depicted in Fig. 4.14. The encoder consists of two stages. The first stage converts the thermometer code to one-out-of-N code using 3-input AND gates, which reduces bubble errors. The second stage converts the one-out-of-N code to binary code using OR gates.

#### 4.6.3 High-Speed Clock Generation and Distribution

In this work, a differential 10 GHz clock, *clk\_n* and *clk\_p*, is used to drive the 2 sub-ADCs. The differential clock with a non-50% duty cycle causes the sampling time of the 2 sub-ADCs to be deviated from the ideal ones, thus incurring a systematic timing skew. To address this issue, duty-cycle correction circuits are employed. Fig. 4.15 depicts the schematic of the input clock buffer and the clock distribution circuit. The input clock buffer is source-degenerated and employs a replica bias. The distribution clock circuits utilize



Fig. 4.15. Schematic of the input clock buffer and clock distribution circuits.

cross-coupled inverters to correct the non-50% duty cycle problem. Another consideration for the clock generation and distribution is that the clock root mean square (RMS) jitter,  $t_{\rm rms}$ , should be small to meet the ADC speed and SNR requirements [67]. Such requirement can be expressed using the following equation

$$SNR_{ADC} = -20 \times \log\left(2\pi f_{in, Nyquist} t_{rms}\right),$$
 (4.7)

where  $f_{in,Nyquist}$  is the Nyquist input frequency. To achieve the 20-GS/s conversion speed and 6-bit resolution, this requires  $t_{rms}$  to be less than 200 fs according to Eq. 4.7. The input clock buffer along with the distribution circuits is simulated and shows the  $t_{rms}$  of 86 fs.

#### 4.6.4 Decimation Network

In this ADC, a decimation network is applied to down-sample the high-speed data, which facilitates the off-chip measurement. Fig. 4.16 shows the block diagram of the decimation network. The mode 113 counter is implemented with CML logic and triggers



Fig. 4.16. Decimation network block diagram.

the DFFs to down-sample the high-speed data from the sub-ADCs. The down-sampled data are then muxed and brought out off-chip with a speed of 176 MS/s.

# 4.7 High-Speed Flash ADC Measurement

#### 4.7.1 ADC Chip and PCB Board

The 20-GS/s 6-bit 2-way interleaved flash ADC prototype is fabricated in a 28-nm FDOSI CMOS process. The ADC chip has 48 pins and its core occupies an active area of 500  $\mu$ m×600  $\mu$ m as shown in Fig. 4.17. It should be mentioned that the comparator array layout is a "C" shape avoiding an extremely long strip layout. One major advantage of using the "C" shape structure for the comparator array layout is that the routing of analog input to each comparator is the same as that of the clock to each comparator in length. Such routing makes sure that the signal and clock can be distributed to comparators with the same delay. This is very critical to the high-speed flash ADC as the delay difference between the input signal and the clock may result in different sampled input voltages to



Fig. 4.17. Chip micrograph.



Fig. 4.19. Custom-designed PCB for ADC testing.



Fig. 4.18. Measurement setup block diagram.

comparators. The bare die is surface-mounted on a custom-designed PCB board using a chip-on-board (COB) technology and the PCB board is depicted in Fig. 4.18.



Fig. 4.20. Lab measurement setup.

# 4.7.2 Measurement Setup

The general measurement setup block diagram is illustrated in Fig. 4.19. Highperformance RF and microwave signal generators SMA 100B-B112 and MG3692B are used to generate the ADC input signal and the clock source, respectively. Two broadband baluns HL9402 are used to convert the single-ended signal and clock into fully differential ones. Phased-matched cables are used to avoid any phase imbalance to the input and clock signals. The input signal and clock trace on the PCB board have been minimized to reduce the losses as much as possible. The 176 MS/s 6-bit decimated data stream from the ADC output is captured with the MSO58 oscilloscope. The laptop is used to analyze the captured data from the oscilloscope and evaluate the ADC performance. The Xilinx SP6 FPGA is used to program the ADC chip through the SPI for initial ADC configuration, offset



Fig. 4.21. Data capture.

calibration, and measurement. The measurement setup in the laboratory is presented in Fig.4.20. The ADC data captured by the MSO is presented in Fig. 4.21.

## 4.7.3 Measurement Results

Fig. 4.22 shows the DNL/INL before and after comparator offset error calibration. The measured DNL/INL before and after comparator offset calibration are 2.89/-5.91 LSB and 0.58/-0.61 LSB, respectively, which validates the effectiveness of the comparator offset calibration based on the back-gate bias adjustment in the FDSOI technology. The FFT plots with and without comparator offset calibration and timing skew calibration when the f<sub>in</sub> is 0.2343 GHz are illustrated in Fig. 4.23 and Fig. 4.24. The ADC SNDR is improved from 26.8 dB to 33.1 dB and the SFDR is improved from 35.8 dB to 39.8 dB. When f<sub>in</sub> reaches a near Nyquist frequency, 9.921875 GHz, the ADC output spectrums with and



Fig. 4.22. Measured (a) DNL and (b) INL before and after offset calibration. without comparator offset calibration and timing skew calibration are depicted in Fig. 4.25 and Fig. 4.26. The SNDR is improved from 25.57 dB to 31.2 dB and the SFDR is improved from 29.26 dB to 38.5 dB.

The measured dynamic performance of the 20 GS/s 6-bit flash ADC is illustrated in Fig. 4.27. The ADC achieves an SNDR of 33.1 dB and an SFDR of 39.8 dB at low input frequencies. At Nyquist input, the ADC achieves an SNDR of 31.2 dB and SFDR of 38.5 dB. The power breakdown of the ADC at 20 GS/s is shown in Fig. 4.28. The 6-bit comparator array consumes 79.8% of the total power. The clock buffers, T/H buffer, and



Fig. 4.24. FFT plot when f<sub>in</sub> is 0.2343 GHz with comparator offset calibration and timing skew calibration (decimated by 113).



Fig. 4.23. FFT plot when f<sub>in</sub> is 0.2343 GHz without comparator offset calibration and timing skew calibration (decimated by 113).

the rest circuit blocks consume 9.8%, 6.4%, and 3.9% of the total power, respectively. With the  $2 \times$  VDI, the total power consumption of all pre-amplifiers is reduced from 82 mW to



Fig. 4. 26. FFT plot when  $f_{in}$  is 9.921875 GHz with comparator offset calibration and timing skew calibration (decimated by 113).



Fig. 4.25. FFT plot when f<sub>in</sub> is 9.921875 GHz without comparator offset calibration and timing skew calibration (decimated by 113).

43 mW. Since only 2-interleaved sub-ADCs are employed, the timing skew calibration and its related hardware are greatly relaxed.



Fig. 4.28. Measured SNDR and SFDR versus input frequency.



Fig. 4.27. ADC power breakdown.

This work is compared with the recently published high-speed flash ADCs and a comparison summary is generated as depicted in Table 2. As can be seen from the table, while [31], [32], and [41] achieved a low figure of merit by utilizing relatively large interleave factors, the ADC cores running at 2.5 GS/s or 3.125 GS/s limit their practical use when a higher sampling rate is required. It is not possible to apply the time interleaving

| Specifications                 | Tretter [46]<br>MTT 16' | Yang [31]<br>TCASI 14' | Cai [32]<br>JSSC 17' | Chen [41]<br>JSSC 14' | This work  |
|--------------------------------|-------------------------|------------------------|----------------------|-----------------------|------------|
| Architecture                   | Flash                   | TI-flash               | TI-flash             | TI-flash              | TI-flash   |
| Technology (nm)                | 28                      | 65                     | 65                   | 32                    | 28 (FDSOI) |
| Sample rate (GS/s)             | 24                      | 10                     | 25                   | 20                    | 20         |
| Interleave factor              | 1                       | 4                      | 8                    | 8                     | 2          |
| Resolution (bits)              | 3                       | 6                      | 6                    | 6                     | 6          |
| Power supply (V)               | 1.4                     | N/A                    | 1                    | 0.9                   | 1          |
| SNDR <sub>@Nyquist</sub> (dB)  | 15                      | 32                     | 29.7                 | 30.7                  | 31.2       |
| SFDR <sub>@Nyquist</sub> (dB)  | N/A                     | 45                     | 40                   | 39.4                  | 38.5       |
| Power (mW)                     | 400                     | 83                     | 88                   | 69.5                  | 204        |
| FOM (fJ/convs.)                | 3600                    | 259                    | 143                  | 124                   | 344        |
| Active area (mm <sup>2</sup> ) | 0.1                     | 0.2                    | 0.2                  | 0.25                  | 0.3        |
| Comparator offset cal.         | Off-chip                | Off-chip               | Off-chip             | On-chip               | On-chip    |

Table 2 Performance Summary and Comparison with State-of-the-Art Flash ADCs.

architecture at an arbitrary scale due to jitter in multi-phase clock generation and distribution, clock transition times, input capacitance, and the bandwidth requirements on the T/H and the T/H buffer. While the single-core flash ADC in [46] avoids the forementioned issues, yet, it dissipated a significantly high power consumption. The presented flash ADC employing two 10 GS/s sub-ADCs provides a reasonable tradeoff between the interleaving factor and power consumption. The proposed ADC facilitates a larger interleaving factor for a higher conversion speed. The foreground comparator offset calibration in this work is implemented on-chip using the SA search algorithm and FDSOI back-gate bias without degrading comparator speed. With the back-gate bias voltages

obtained under the nominal condition, the worst-case comparator input-referred offset residual over PVT is 0.28 LSB.

# 4.8 Conclusion

This chapter presents a 20 GS/s 6-bit flash ADC in a 28-nm FDSOI CMOS process. A 2-way time-interleaved structure is adopted to tradeoff the interleaving factor and the power efficiency. To enhance sub-channel speed, the StrongArm latch comparator with a pre-amplifier stage is employed. Meanwhile, to maintain flash ADC power efficiency, the 2× voltage-domain interpolation is applied to reduce the number of pre-amplifiers by half. An on-chip successive-approximation-based comparator offset calibration employing FDSOI back-gate is also developed, providing sufficient calibration range without impairing comparator speed performance. Measurement results show that the ADC achieves an SNDR of 31.2 dB and an SFDR of 38.5 dB at 20 GS/s, which dissipates 204 mW from a 1.0 V power supply, translating into a Walden FOM of 344 fJ/conv.-step.

# **CHAPTER V**

# PIPELINED FLASH ADC WITH A PING-PONG STRUCTURE IN THE SECOND STAGE

# 5.1 Motivation

With the number of sub-channels reduced, the calibrations of timing skew and interchannel mismatches are alleviated in TI-flash ADC. However, the higher speed requirement to the sub-channels poses higher power consumption. To further increase ADC speed while maintaining a high power efficiency, a pipelined flash ADC is developed where the first stage employs CML comparators to enhance the speed and the second stage employs a ping-pong structure with dynamic comparators to achieve high power efficiency. Besides, the partial activation of comparators and the 2× TDI are also employed to improve power efficiency. A 15-GS/s 7-bit pipelined flash ADC with a ping-pong structure in the second stage is designed in 22-nm FDSOI CMOS technology. Post-layout simulation results show that this ADC achieves an SNDR of 41.34 dB and an SFDR of 49.36 dB at Nyquist frequency with a power consumption of 97.5 mW, translating into a FOM<sub>w</sub> of 72 fJ/conv.-step. This chapter is organized as follows. Section 5.2 describes the architecture of the proposed flash ADC. Section 5.3 presents the analysis and design considerations of the proposed pipelined flash ADC. Section 5.4 introduces the circuit implementations of the major building blocks. Section 5.5 provides the ADC simulation

results and Section 5.6 summarizes the ADC performance and compares it with the stateof-the-arts flash ADCs.

# 5.2 ADC Tope Level Architecture

Fig. 5.1 depicts the architecture of the proposed pipelined flash ADC. An external 15-GHz clock,  $V_{\text{clk}}$ , is terminated by an on-chip 50- $\Omega$  resistor and converted into a squarewave clock,  $V_{\text{sw}}$ , with a CML-to-CMOS conversion logic.  $V_{\text{sw}}$  is used to drive the 2.5-bit coarse flash ADC, as well as a divide-by-two circuit. The divide-by-two circuit generates a differential 7.5-GHz clock,  $V_{\text{ssp}}$  and  $V_{\text{ssn}}$ , to drive two 7-bit flash ADCs, which are partially activated to carry out the 5-bit fine conversion in a ping-pong manner. The 2.5bit coarse flash ADC is implemented with the T/H<sub>1</sub> and 6 CML comparators, dividing the



Fig. 5.1. The proposed pipelined ping-pong flash ADC architecture.

full signal range into 6 subsections,  $V_{ref1-24}$ ,  $V_{ref25-40}$ ,  $V_{ref41-56}$ ,  $V_{ref57-72}$ ,  $V_{ref73-88}$ ,  $V_{ref89-104}$ ,  $V_{ref105-128}$ . The 7-bit flash ADC with 2× TDI is implemented with the T/H<sub>21</sub>, T/H<sub>22</sub>, 64 dynamic comparators, and 64 interpolation latches (ILs), which are segmented into eight slices, CMP<sub>1-16</sub>, CMP<sub>17-32</sub>, CMP<sub>33-48</sub>, CMP<sub>49-64</sub>, CMP<sub>65-80</sub>, CMP<sub>81-96</sub>, CMP<sub>97-112</sub>, and CMP<sub>113-128</sub>. Only two slides are activated at a given time to conduct the 5-bit conversion. The partial activation is enabled by employing a slice selection logic, which generates 1-out-of-N selection signal *S*<sub>1-7</sub> to the eight slices based on the outcomes of the 2.5-bit coarse flash ADC. The comparator offset is calibrated by the proposed SA-based calibration loop employing FDSOI back-gate bias.

## 5.3 Design Considerations of the Pipelined Flash ADC

The proposed pipelined flash ADC reduces the power consumption by employing the partial activation to the 7-bit flash ADC, enabling a power-efficient dynamic comparator design due to the ping-pong structure. Furthermore, the 2× TDI to the 7-bit flash ADC also reduces the ADC power consumption without the need for architecturelevel calibrations. However, several design considerations of the pipelined flash ADC need to be analyzed and are detailed in the following sections.

### 5.3.1 ADC Timing Analysis

Fig. 5.2 (a) and (b) depict the simplified block and the timing diagrams of the proposed pipelined flash ADC structure. The 2.5-bit coarse comparison and the 5-bit fine comparison are carried out in two consecutive clock cycles. Driven by the 15-GHz clock,



Fig. 5.2. (a) The simplified pipelined flash ADC block diagram and (b) the timing diagram. the  $T/H_1$  in the 2.5-bit coarse flash ADC is assigned with 33 ps for tracking the input signal

and 33 ps for holding the sampled signal, respectively. During the holding period, the 2.5bit comparators resolve the sampled signal. Based on the outcome of the 2.5-bit comparators, two slides in the 7-bit flash ADC are activated in the next clock cycle to complete the 5-bit quantization. To ensure that the coarse and fine comparators resolve the same sampled signal,  $T/H_{21}$  and  $T/H_{22}$  are utilized.

Due to the ping-pong operation in the second stage, only one of the two T/Hs is in the tracking mode and the other is in the holding mode at any time, which ensures that the two 7-bit flash ADCs are not connected to the input port simultaneously, thus avoiding significant input bandwidth degradation. Furthermore, the applied 2× TDI reduces the number of dynamic comparators by half, not only reducing the power consumption but also reducing the input loading capacitance by half. The output data from the ping-pong flash ADCs are multiplexed and then re-sampled, which are further aligned and added with the data from the 2.5-bit coarse flash ADC.

#### 5.3.2 ADC Power Analysis

In the proposed pipelined flash ADC, three techniques are developed to substantially reduce power consumption including applying a ping-pong structure with dynamic comparators, the partial activation of the slides, and the 2× TDI.

In the 2.5-bit coarse quantization, six CML comparators are adopted to achieve high-speed performance. In the 5-bit fine quantization, a ping-pong structure is employed, thus allowing two sets of comparators to alternately resolve the sampled signal. Such a method relaxes the pipelined second-stage comparator speed requirement by half and helps to design the comparator using a dynamic StrongArm latch structure for improving power



Fig. 5.3. Simulation result of the comparator power vs. speed.

efficiency. Fig. 5.3 depicts the simulation result of the comparator power versus its speed performance. With the same speed, two ping-pong dynamic comparators consume much less power than the high-speed CML comparator does. In this work, the CML comparator with a 15-GHz operation speed consumes a power of 2.45 mW. As a comparison, two ping-pong StrongArm latch-based comparators consume a total power of 1.24 mW to achieve the equivalent high-speed performance.

The pipelined structure enables the partial activation of the 7-bit flash ADC based on the outcomes of the 2.5-bit coarse flash ADC. It should be noted that the 2-stage comparison structure achieves a similar partial activation but requires the coarse and the fine quantization to be completed in one clock cycle, thus limiting the overall ADC conversion speed. In this work, the partial activation is achieved in a pipelined manner, which allows the coarse and fine comparisons to be completed in two consecutive clock cycles. To further reduce power consumption, 2× TDI is employed in the pipelined second stage, which reduces the number of the StrongArm latch-based comparators by half. To complete a 7-bit conversion, this structure only requires to activate 6 CML-based comparators, 32 StrongArm latch-based comparators, and 32 interpolation latches. As a comparison, the conventional flash ADC activates 127 CML-based comparators.

#### 5.3.3 Analysis of the 0.5-Bit Redundancy in the Coarse Flash ADC

The 0.5-bit redundancy in the coarse flash ADC is used to tolerate the comparators' offset errors, thus requiring no comparator offset calibration in the coarse flash ADC. As depicted in Fig. 5.4(a), due to the 8-LSB offset in the coarse flash ADC, the slice [5] and



Fig. 5.4. (a) The conversion error induced by comparator offset and (b) the 0.5-bit redundancy tolerates the comparator offset.



Fig. 5.5. Simulated input-referred offset of the comparator in the coarse flash ADC.

[6] in the fine flash ADC are activated, thus causing a 5-LSB decision error. The 0.5-bit redundancy in the first comparison stage provides a  $\pm$ 8-LSB coverage to absorb the comparator offset as shown in Fig. 5.4(b). While the comparator offset calibration can be applied to address this issue, yet, the calibration hardware for the coarse flash ADC is larger than that of the 0.5-bit redundancy. The LSB in this work is 6.25 mV and the  $\pm$ 8-LSB coverage due to the 0.5-bit redundancy in the coarse flash ADC can totally tolerate  $\pm$ 50 mV offset, which sufficiently covers the CML-based comparator's 6- $\sigma$  offset of  $\pm$ 48 mV as depicted in Fig. 5.5.

## 5.3.4 ADC Bandwidth Analysis

With the Nyquist sampling theorem, the proposed pipelined flash ADC should achieve an input bandwidth of at least 7.5 GHz. The proposed ADC input network is depicted in Fig. 5.6. The input signal  $V_{in}$  is transmitted through a 50- $\Omega$  transmission line. The inductance of the wire bond is considered to be 1 nH. The ESD and pad parasitics are around 350 fF. The  $V_{in}$  is terminated by an on-chip 50- $\Omega$  poly resistor and then sampled by



Fig. 5.6. ADC input network

the T/H<sub>1</sub> in 2.5-bit coarse flash ADC with a  $R_{on}$  of 3  $\Omega$  and a loading capacitor of 84 fF. The loading capacitor includes the sampling capacitor, the 2.5-bit coarse comparators' gate capacitance, and the interconnect parasitics. The sampled signal is then transferred to the T/H (TH<sub>21</sub> or T/H<sub>22</sub>) in the fine flash ADC with a  $R_{on}$  of 3  $\Omega$  and a loading capacitor of 310 fF. The loading capacitor includes the sampling capacitor, the 7-bit comparators' gate capacitance, and the interconnect parasitics. With the T/H<sub>21</sub> and T/H<sub>22</sub> operating in a pingpong mode, only one is turned on, thus avoiding doubling the loading capacitors at the pipelined ADC input. The 3-dB bandwidth of the ADC is simulated to be 7.68 GHz. To further improve the bandwidth, T-coil [68], [69] or internal signal buffer can be applied.

#### 5.3.5 Comparator Noise Analysis

With the ADC resolution increased from 6 bits to 7 bits, the comparator noise requirement becomes more stringent. The CML-based comparator has a pre-amplifier which is designed to achieve a 7 dB gain to improve the noise performance. The StrongArm latch-based comparator is modified to improve the noise performance. The modified



Fig. 5.7. Modified StrongArm latch to improve noise performance.

StrongArm latch schematic is depicted in Fig. 5.7, where 2 tail switches and 2 sets of reset switches are used to realize the embedded OR logic for the partial activation logic. To improve the comparator noise performance, the reset switches ( $M_{3-4}$  and  $M_{8-9}$ ) are controlled by  $S_{3_d}$  and  $S_{4_d}$ , respectively, which are the delayed versions of  $S_3$  and  $S_4$ . This effectively extends the amplification phase of the comparator, leading to reduced input-referred noise.

Conventionally,  $M_{3-4}$  or  $M_{8-9}$  are turned off simultaneously along with  $M_{2,5}$  or  $M_{7,10}$ . This causes the comparator amplification phase to come to an end once the drains of the input pair are discharged to  $V_{DD}$ - $|V_{TH,P}|$ . Therefore, the amplification time,  $\Delta t_A$ , is expressed as

$$\Delta t_{\rm A} = \frac{C_{\rm P} \cdot \left| V_{\rm TH, P} \right|}{I_{\rm DS}},\tag{5.1}$$



Fig. 5.8. Simulated input referred noise of the conventional and improved StrongArm latches.

where  $C_P$  is the parasitic capacitance at the drain of the input pair and  $I_{DS}$  is the averaged current flowing through the input pair during the amplification phase. The gain of the dynamic amplifier can then be expressed as

$$A_{\rm V} = \frac{g_{\rm m}}{I_{\rm DS}} \cdot \Delta t_{\rm A}.$$
 (5.2)

With  $M_{3-4}$  and  $M_{8-9}$  controlled by the delayed signals,  $S_{3_d}$  and  $S_{4_d}$ , the amplification phase can be effectively extended. As depicted in Fig. 5.8, simulation results show that the modified StrongArm latch with a delay of 13 ps achieves an extracted RMS input-referred noise of 0.85 mV as compared to the conventional StrongArm latch whose RMS noise is 0.96 mV.

## 5.4 Circuits Implementation

#### 5.4.1 Source-Follower Based Bootstrapped T/H

To achieve a 7-bit tracking accuracy while maintaining a high sampling speed, source-follower-based bootstrapped T/H is employed [31]. As depicted in Fig. 5.9, when



Fig. 5.10. The schematic of the source-follower based boostrapped T/H.



Fig. 5.9. Simulated SNDR vs. input frequency of the source-follower based bootstrapped T/H.

the T/H is in the sampling phase, the gate-to-source voltage of the sampling switch  $M_1$  is approximately constant, which is achieved by the source follower implemented by  $M_{2-5}$ . When  $M_6$  is on and  $M_5$  is off, the gate voltage of the sampling switch is  $V_{SS}$ , thus making the T/H turn into the hold mode. As shown in the figure, a voltage booster is applied to double the clock voltage to turn on or off  $M_5$ . Simulation results are plotted in Fig. 5.10, which shows that the T/H achieves an SNDR of 45 dB at the Nyquist frequency over PVT with a power of 7.1 mW.

# 5.4.2 Comparator Offset Calibration Loop

The successive-approximation (SA) based comparator offset calibration employing FDSOI back-gate bias is also used in this work. As the ADC resolution is increased to 7



Fig. 5.11. The 7-bit SA-based comparator offset calibration diagram.



Fig. 5.12. The schematic of the 7-bit modified R-2R DAC.

bit and the LSB is 6.25 mV, the comparator offset calibration is also increased to 7 bit with an accuracy of 0.5 mV (0.08 LSB). The comparator offset calibration diagram is depicted in Fig. 5.11, which includes 7 latches (SAL[1–7]) and 7 enable logics (EN[1–7]). The R-2R DAC is also modified to achieve a 7-bit resolution as shown in Fig. 5.12. As compared to the 6-bit SA calibration, the hardware increment of the 7-bit calibration is 1 more SAL, 1 more EN, and 1 more split-2R unit.

### 5.4.3 Clock Generation and Distribution

The clock generation and distribution circuit is depicted in Fig. 5.13. The 15-GHz differential signal is terminated by an on-chip 100  $\Omega$  and then buffed by the input clock buffer. The output signal of the clock buffer is firstly converted into square-wave signals,



Fig. 5.13. The schematic of clock generation and distribution circuit.



Fig. 5.14. Block diagram of the high-speed data alignment and process.

 $V_{T/H}$  and  $V_{CMP}$ , which drive the coarse flash ADC's T/H<sub>1</sub> and the 2.5-bit comparators, respectively. Secondly, the clock buffer's output signal is used to drive the divide-by-2 circuit, generating a differential 7.5-GHz clock for the T/H<sub>21</sub> and T/H<sub>22</sub> in the fine flash ADCs. To achieve 15-GHz speed and 7-bit accuracy, the maximum tolerable RMS jitter is 130 fs. With simulations, the RMS jitter of the clock signal,  $V_{T/H}$ , is 60 fs, thus meeting the requirement. The clock generation and distribution consume a total power of 18 mW under a 0.9V power supply.

#### 5.4.4 Data Alignment and Process

The 2.5-bit coarse and the 5-bit fine data, MSB[1:3] and LSB[1:5], are aligned and processed at the digital backend as depicted in Fig. 5.14. The data are firstly aligned by the true single-phase clock (TSPC) logic D flip-flops. After that, the aligned data are added by using dynamic half adders. Besides, multiple delay cells are inserted in the data path to match the delay generated by the half adders. Therefore, each bit of D[1:7] has the same

delay, facilitating the output measurement. The high-speed data alignment and process consume a total power of 3.2 mW under a 0.9V power supply.



Fig. 5.15. The layout of the proposed ADC.



Fig. 5.16. Simulated DNL and INL of the proposed ADC.

# 5.5 Simulation Results

The proposed 15-GS/s 7-bit power-efficient pipelined flash ADC is designed in the 22-nm FDSOI CMOS technology. As depicted in Fig. 5.15, the ADC core layout occupies an area of 1160  $\mu$ m × 380  $\mu$ m. The proposed ADC is simulated with ESD parasitics and wire-bonding inductance considered. The overall on-chip decoupling capacitance is around 1.1 nF. Fig. 5.16 shows the simulated differential and integral nonlinearities (DNL/INL).



Fig. 5.17. Simulated output spectrum before and after comparator offset calibration with a low-frequency input.

Before calibration, the peak DNL and INL are +1.09/-1.44 and +2.62/-1.48 LSB, respectively. They are reduced to +0.43/-0.63 and +0.72/-0.88 LSB after comparator offset calibration.

Fig. 5.17 shows the simulated output spectrums with a low-frequency input before and after comparator offset calibration. After comparator offset calibration, the SNDR is



Fig. 5.18. Simulated output spectrum before and after comparator offset calibration with a Nyquist-rate input.



Fig. 5.20. ADC dynamic performance.



Fig. 5.19. ADC power breakdown.

improved from 35.11 dB to 42.24 dB and the SFDR is improved from 41.68 dB to 53.2 dB,

respectively. Fig. 5.18 shows the simulated output spectrums with a near Nyquist-rate input before and after comparator offset calibration. After comparator offset calibration, the SNDR is improved from 32.51 dB to 41.34 dB and the SFDR is improved from 39.64 dB to 49.36 dB, respectively.

The ADC dynamic performance at 15 GS/s is shown in Fig. 5.19. The ADC achieves an SNDR greater than 41.05 dB over the entire Nyquist bandwidth. Fig. 5.20 shows the power breakdown of the ADC. The total power consumption of the proposed ADC is 97.5 mW, 43% of which is consumed by the comparators in coarse and fine flash ADCs. Table 3 summarizes the ADC performance and compares it with recently published

| Specifications                 | Tretter [46] | Lotfi [70] | Cai [32] | Chen [41] | This work  |
|--------------------------------|--------------|------------|----------|-----------|------------|
|                                | MTT 16'      | ISCAS 19'  | JSSC 17' | JSSC 14'  |            |
| Architecture                   | Flash        | Flash      | TI-flash | TI-flash  | Flash      |
| Technology (nm)                | 28           | 22 (SOI)   | 65       | 32        | 22 (FDSOI) |
| Sample rate (GS/s)             | 24           | 18.5       | 25       | 20        | 15         |
| Interleave factor              | 1            | 1          | 8        | 8         | 1          |
| Resolution (bits)              | 3            | 5          | 6        | 6         | 7          |
| Power supply (V)               | 1.4          | 0.9/1.5    | 1        | 0.9       | 0.9/1.6    |
| SNDR <sub>@Nyquist</sub> (dB)  | 15           | 28.5*      | 29.7     | 30.7      | 41.05*     |
| SFDR <sub>@Nyquist</sub> (dB)  | N/A          | 40*        | 40       | 39.4      | 49*        |
| Power (mW)                     | 400          | 140        | 88       | 69.5      | 97.5       |
| FOM <sub>W</sub> (fJ/convs.)   | 3600         | 348        | 143      | 124       | 72         |
| Active area (mm <sup>2</sup> ) | 0.1          | N/A        | 0.2      | 0.25      | 0.44       |

Table 3 Performance Summary and Comparison with State-of-the-Art Flash ADCs.

\*Simulation results.

tens-GS/s flash ADC. The proposed single-channel ADC achieves 15-GS/s conversion speed and 7-bit resolution with highly competitive power efficiency. The proposed flash ADC power is significantly reduced by jointly employing a ping-pong structure with dynamic comparators, the partial activation of the comparators, and the 2× TDI.

## 5.6 Conclusions

This chapter presents a 15-GS/s 7-bit pipelined flash ADC in the 22-nm FDSOI CMOS technology. To improve ADC speed while maintaining a high power efficiency, the pipelined first stage employs current-mode logic comparators and the pipelined second stage employs a ping-pong structure with dynamic comparators. Besides, the partial activation of comparators, and 2× TDI are employed to reduce ADC power consumption. Post-layout simulation results show that the proposed ADC achieves an SNDR and an SFDR of 41.34 dB and 49.36 dB, respectively, at the Nyquist frequency, with a power consumption of 97.5 mW, translating into a Walden FOM of 72 fJ/conv.-step.
#### **CHAPTER VI**

# **CONCLUSION AND FUTURE DIRECTIONS**

### 6.1 Conclusions

High-speed ADCs with medium resolutions find various applications in wireline transceivers, wireless communication systems, and electronic test instruments. To achieve high power efficiency and high-speed performance while alleviating the timing skew and inter-channel mismatches, three flash ADCs are developed in this research including an ADC with a partially active 2-stage comparison and 2× time-domain latch interpolation (TDI), a 2-way TI-flash ADC with voltage-domain interpolation, and a pipelined flash ADC with a ping-pong structure in the second stage.

The first work employs a partially active 2-stage comparison and the 2× TDI to improve flash ADC power efficiency and avoid PVT-sensitive calibrations, such as time reference and voltage reference calibration. To enhance the conversion speed of the 2-stage structure, the stringent timing constraint is resolved by a 25%-75% duty-cycle clock scheme, a 0.5-bit redundancy in the first comparison stage, and an embedded second-stage slice selection logic. The bandwidth requirements of the T/H and T/H buffer under the 25%-75% duty-cycle clock are also analyzed. Fabricated in a 28-nm FDSOI CMOS process, the 5-GS/s 6-bit ADC achieves an SNDR of 32.8 dB and an SFDR of 41.82 dB at Nyquist frequency while consuming 15.07 mW power, translating into a FOMw of 84.5 fJ/conv.-step.

In the second work, a 2-way TI-flash ADC is developed, which employs dynamic comparators with a pre-amplifier stage to achieve 10 GS/s conversion speed for the subchannel ADC and voltage-domain interpolation to reduce power consumption. Fabricated in a 28-nm FDSOI CMOS process, the 20-GS/s 6-bit 2-way TI-flash ADC achieves an SNDR of 31.2 dB and an SFDR of 38.5 dB at Nyquist frequency, respectively, while consuming 204 mW power. The FOM<sub>w</sub> is 344 fJ/conv.-step.

To further increase the flash ADC speed, a pipelined flash ADC is also developed, where the first stage employs current-mode logic comparators to enhance the speed and the second stage employs a ping-pong structure with dynamic comparators to achieve high power efficiency. Designed in a 22-nm FDSOI CMOS process, the 15-GS/s 7-bit pipelined single-channel flash ADC achieves an SNDR of 41.34 dB and an SFDR of 49.36 dB at Nyquist frequency with a power consumption of 97.5 mW. The corresponding FOM<sub>w</sub> is 72 fJ/conv.-step.

### 6.2 Future Directions

To further improve flash ADC power efficiency, the comparator with an embedded reference can be investigated, which reduces comparator power by half. To achieve the reference embedding, comparator input transistor size imbalance can be explored, which intentionally generates an offset as the embedded reference to the comparator. To further improve flash ADC conversion speed while maintaining a high power efficiency, a moderately time-interleaved flash ADC utilizing the proposed 5-GS/s 6-bit power-efficient flash ADC as the sub-channel can be developed. The number of sub-channels in this moderately TI-flash ADC is reduced as compared to the TI-SAR ADC, thus alleviating timing skew and inter-channel mismatches calibrations.

The proposed comparator offset calibration employing FDSOI back-gate bias is a foreground methodology and the offset calibration residual can be varied from 0.13 LSB up to 0.28 LSB over PVT variation as shown in Fig. 4.7. To address this calibration residual variation problem, a mechanism to sense the PVT variation and re-calibrate the comparator offset accordingly can be further developed. To achieve the PVT sensing, one dedicated comparator along with its calibration circuits including SALs, EN logics, and R-2R DACs can be used. The dedicated comparator is firstly calibrated and then keeps operating with  $V_{in}$ =0. Given a duration of time, the comparator output data are collected and the probabilities of P<sub>out=Vdd</sub> and P<sub>out=Vss</sub> are calculated. If P<sub>out=Vdd</sub> = P<sub>out=Vss</sub> =50%, then PVT variation is not detected. Otherwise, PVT variation is detected and all the comparators will re-calibrate the offsets to ensure their accuracy.

# REFERENCES

- P. Schvan, J. Bach, C. Falt, P. Flemke, R. Gibbins, Y. Greshishchev, N. B.-Hamida,
   D. Pollex, J. Sitch, S.-C. Wang, and J. Wolczanski, "A 24 GS/s 6b ADC in 90nm
   CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC)*, Feb. 2008, pp. 544–545.
- [2] C.-H. Chan, Y. Zhu, S.-W. Sin, S.-P. U, and R. P. Martins, "A 5.5mW 6b 5GS/s 4×interleaved 3b/cycle SAR ADC in 65nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Mar. 2015, pp. 466–468.
- C.-H. Chan, Y. Zhu, I.-M. Ho, W.-H. Zhang, S.-P. U, and R. P. Martins, "A 5mW 7b
   2.4GS/s 1-then-2b/cycle SAR ADC with background offset calibration," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2017, pp. 282–284.
- Y.-M. Creshishchev, J. Aguirre, M. Besson, R. Gibbins, C. Falt, P. Flemke, N. B.-Hamida, D. Pollex, P. Schvan, and S.-C. Wang, "A 40GS/s 6b ADC in 65nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC)*, Feb. 2010, pp. 390–391.
- [5] Y. Duan, and E. Alon, "A 6b 46GS/s ADC with >23 GHz BW and sparkle-code error correction," in *Proc. IEEE Symp. VLSI Circuits*, Jun. 2015, pp. C162–C163.
- [6] L. Kull, T. Toifl, M. Schmatz, P. A. Francese, C. Menolfi, M. Braendli, M. Kossel, T. Morf, T. M. Andersen, and Y. Leblebici, "A 90 GS/s 8 b 667 mW 64× interleaved SAR ADC in 32 nm digital SOI CMOS," in *IEEE Int. Solid-State Circuits Conf.* (ISSCC) Dig. Tech. Papers, 2014, pp. 378–379.

- [7] K. Sun, G. Wang, P. Gui, Q. Zhang, and S. Elahmadi, "A 31.5-GHz BW 6.4-b ENOB 56-GS/s ADC in 28nm CMOS for 224-Gb/s DP-16QAM coherent receivers," in *IEEE Custom Integrated Circuits Conf. (CICC)*, San Diego, CA, Apr. 2018, pp. 1–4.
- [8] Y. Frans, J. Shin, L. Zhou, P. Upadhyaya, J. Im, V. Kireev, M. Elzeftawi, H. Hedayati, T. Pham, S. Asuncion, C. Borrelli, G. Zhang, H. Zhang, and K. Chang, "A 56-Gb/s PAM4 wireline transceiver using a 32-way time-interleaved SAR ADC in 16-nm FinFET," *IEEE J. Solid-State Circuits*, vol. 52, no. 4, pp. 1101–1110, Apr. 2017.
- [9] E.-Z. Tabasy, A. Shafik, K. Lee, S. Hoyos, and S. Palermo, "A 6b 10GS/s TI-SAR ADC with embedded 2-tap FFE/1-tap DFE in 65nm CMOS," in *Symp. VLSI Circuits*, Jun. 2013, pp. 274–275.
- [10] L. Kull, T. Toifl, M. Schmatz, P. A. Francese, C. Menolfi, M. Braendli, M. Kossel, T. Morf, T. M. Andersen, and Y. Leblebici, "A 35mW 8b 8.8GS/s SAR ADC with low-power capacitive reference buffers in 32 nm digital SOI CMOS," in *VLSI Circuits Symp.*, 2013, pp. 260–261.
- [11] S.-L. Tual, P. N. Singh, C. Curis, P. Dautriche, "A 20GHz-BW 6b 10GS/s 32 mW time-interleaved SAR ADC with master T&H in 28nm UTBB FDSOI technology," in *IEEE Int. Solid-State Circuits Conf. (ISSCC)*, Mar. 2014, pp. 382–383.
- [12] L. Kull, J. Pliva, T. Toifl, M. Schmatz, P. A. Francese, C. Menolfi, M. Braendli, M. Kossel, T. Morf, T. M. Andersen, and Y. Leblebici, "A 110 mW 6 bit 36 GS/s interleaved SAR ADC for 100 GBE occupying 0.048 mm<sup>2</sup> in 32nm SOI CMOS," in *Proc. IEEE Asian Solid-State Circuits Conf. (A-SSCC)*, Nov. 2014, pp. 89–92.

- [13] Q. Fan, Y. Hong, and J. Chen, "A time-interleaved SAR ADC with bypass-based opportunistic adaptive calibration," *IEEE J. Solid-State Circuits*, Vol. 55, pp. 2082– 2093, Aug. 2020.
- [14] E. Martens, D. Dermit, M. Shrivas, S. Nagata, and J. Craninckx, "A compact 8-bit 8 GS/s 8×TI SAR ADC in 16nm with 45dB SNDR and 5GHz ERBW," in *IEEE Symp.* VLSI Circuits, Jun. 2021, pp. 1–2.
- [15] D.-J. Chang, M. Choi, and S.-T. Ryu, "A 28-nm 10-b 2.2-GS/s 18.2-mW relativeprime time-interleaved sub-ranging SAR ADC with on-chip background skew alibration," *IEEE J. Solid-State Circuits*, vol. 56, no. 9, pp. 2691–2699, Sep. 2021.
- [16] S. Linnhoff, E. Sippel, F. Buballa, M. Reinhold, M. Vossiek, and F. Gerfers, "A 12 bit 8 GS/s randomly-time-interleaved SAR ADC with adaptive mismatch correction," in *Proc. IEEE International Symposium on Circuits and Systems* (ISCAS), May 2021.
- [17] E. Swindlehurst, H. Jensen, A. Petrie, Y. Song, Y.-C. Kuan, Y. Qu, M.-C. F. Chang,
  J. -T. Wu, and S.-H. W. Chiang, "An 8-bit 10-GHz 21-mW time-interleaved SAR
  ADC with grouped DAC capacitors and dual-path bootstrapped Switch," *IEEE J. Solid-State Circuits*, vol. 56, no. 8, pp. 2347–2357, Aug. 2021.
- [18] W. Jiang, Y. Zhu, C.-H. Chan, B. Murmann, R. P. Martins, "A 7-bit 2 GS/s timeinterleaved SAR ADC with timing skew calibration based on current integrating sampler," *IEEE Trans. Circuits Syst. I*, vol. 68, no. 2, pp. 557–567, Feb. 2021.
- [19] C.-Y. Lin, Y.-H. Wei, and T.-C. Lee, "A 10b 2.6GS/s time-interleaved SAR ADC with background timing-skew calibration," in *Proc. Int. Solid-State Circuits Conf.* (*ISSCC*), Feb. 2016, pp. 468–469.

- [20] X. Wang, F. Li, W. Jia, and Z. Wang, "A 14-Bit 500-MS/s time-interleaved ADC with autocorrelation-based time skew calibration," *IEEE Trans. Circuits Syst. II*, vol. 66, no. 3, pp. 322–326, June. 2018.
- [21] J.-W. Nam, M. Hassanpourghadi, A. Zhang, and M. S.-W. Chen, "A 12-bit 1.6, 3.2, and 6.4 GS/s 4-b/cycle time-interleaved SAR ADC with dual reference shifting and interpolation," *IEEE J. Solid-State Circuits*, vol. 53, no. 6, pp. 1765–1779, Jun. 2018.
- [22] B. Xu, Y. Zhou, and Y. Chiu, "A 23mW 24GS/s 6b time-interleaved hybrid two-step ADC in 28nm CMOS," in *IEEE Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2016, pp. 1–2.
- [23] L. Wang, M.-A. LaCroix, and A.-C. Carusone, "A 4-GS/s single channel reconfigurable folding flash ADC for wireline applications in 16-nm FinFET," *IEEE Trans. Circuits Syst. II*, vol. 64, no. 12, pp. 1367–1371, Dec. 2017.
- [24] B. Verbruggen, J. Craninckx, M. Kuijk, P. Wambacq, and G. Van der Plas, "A 2.2mW 5b 1.75GS/s folding flash ADC in 90nm digital CMOS," in *Proc. Int. Solid-State Circuits Conf. (ISSCC)*, Feb. 2008, pp. 252–253.
- [25] Y.-S. Shu, "A 6b 3GS/s 11mW fully dynamic ADC in 40nm CMOS with reduced number of comparators," in *IEEE Symp. VLSI Circuits*, 2012, pp. 26–27.
- [26] J.-I. Kim, B. Sung, W. Kim, and S.-T. Ryu, "A 6-b 4.1-GS/s flash ADC with time-domain latch interpolation in 90-nm CMOS," *IEEE J. Solid-State Circuit*, vol. 48, no. 6, pp. 1429–1441, Jun. 2013.
- [27] J.-I. Kim, D.-R. Oh, D.-S. Jo, B.-R.-S. Sung, and S.-T. Ryu, "A 65 nm CMOS 7b 2
   GS/s 20.7 mW flash ADC with cascaded latch interpolation," *IEEE J. Solid-State Circuit*, vol. 50, no. 10, pp. 2319–2330, Oct. 2015.

- [28] J. Liu, C.-H. Chan, S.-W. Sin, S.-P. U, and R.-P. Martins, "A 89fJ-FOM 6-bit 3.4GS/s flash ADC with 4× time-domain interpolation," in *Proc. IEEE Asian Solid-State Circuits Conf. (ASSCC)*, 2015, pp. 1–4.
- [29] I.-M. Yi, N. Miura, H. Fukuyama, and H. Nosaka, "A 15.1-mW 6-GS/s 6-bit singlechannel flash ADC with selectively activated 8× time-domain latch interpolation," *IEEE J. Solid-State Circuits*, vol. 56, no. 2, pp. 455–464, Feb. 2021.
- [30] D.-R. Oh, J.-I. Kim, D.-S. Jo, W.-C. Kim, D.-J. Chang, and S.-T. Ryu, "A 65-nm CMOS 6-bit 2.5-GS/s 7.5-mW 8× time-domain interpolating flash ADC with sequential slope-matching offset calibration," *IEEE J. Solid-State Circuits*, vol. 54, no. 1, pp. 288–297, Jan. 2019.
- [31] X. Yang and J. Liu, "A 10 GS/s 6b time-interleaved partially active fash ADC," *IEEE Trans. Circuits Syst. I*, vol. 61, no. 8, pp. 2272–2280, Aug. 2014.
- [32] S. Cai, E.-Z. Tabasy, A. Shafik, S. Kiran, S. Hoyos, and S. Palermo, "A 25-GS/s 6b TI two-stage multi-bit search ADC with soft-decision selection algorithm in 65 nm CMOS," *IEEE J. Solid-State Circuit*, vol. 52, no. 8, pp. 2168–2179, Aug. 2017.
- [33] C. Yang and T. Kuo, "A 3 mW 6-bit 4 GS/s subranging ADC with subrangedependent embedded references," *IEEE Trans. Circuits Syst. II*, vol. 68, no. 7, pp. 2312–2316, Jul. 2021.
- [34] G. Van der Plas, S. Decoutere, and S. Donnay, "A 0.16pJ/conversion-step 2.5mW
   1.25GS/s 4b ADC in a 90nm digital CMOS process," in *Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, 2006, pp. 566–567.

- [35] A. Nikoozadeh and B. Murmann, "An analysis of latch comparator offset due to load capacitor mismatch," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 53, no. 12, pp. 1398–1402, Dec. 2006.
- [36] B. Verbruggen, P. Wambacq, M. Kuijk, and G. Van der Plas, "A 7.6 mW 1.75 GS/s
  5 bit flash A/D converter in 90-nm digital CMOS," in *Proc. IEEE Int. Symp. VLSI Circuits (VLSIC)*, Jun. 2008, pp. 14–15.
- [37] V. H.-C. Chen and L. Pileggi, "An 8.5-mW 5GS/s 6b flash ADC with dynamic offset calibration in 32nm CMOS SOI," in *Proc. IEEE Int. Symp. VLSI Circuits (VLSIC)*, Jun. 2013, pp. 264–265.
- [38] J. Yao, J. Liu, and H. Lee, "Bulk voltage trimming offset calibration for high-speed flash ADCs," *IEEE Trans. Circuits Syst. II*, vol. 57, pp. 110–114, Feb. 2010.
- [39] X. Yang, S.-J. Bae, and H.-S. Lee, "An 8-bit 2.8 GS/s flash ADC with time-based offset calibration and interpolation in 65 nm CMOS," in *Proc. IEEE European Solid State Circuits Conf. (ESSCIRC)*, 2019, pp. 305–308.
- [40] M. El-Chammas and B. Murmann, "A 12-GS/s 81-mW 5-bit time-interleaved flash ADC with background timing skew calibration," *IEEE J. Solid-State Circuits*, vol. 46, no. 4, pp. 834–847, Apr. 2011.
- [41] V. H.-C. Chen and L. Pileggi, "A 69.5-mW 20-GS/s 6b time-interleaved ADC with embedded time-to-digital calibration in 32 nm CMOS SOI," *IEEE J. Solid-State Circuits*, vol. 49, no. 12, pp. 2891–2901, Dec. 2014.
- [42] S. Verma, A. Kasapi, L.-M. Lee, D. Liu, D. Loizos, S.-H. Paik, A. Varzaghani, S. Zogopoulos, S. Sidiropoulos, "A 10.3-GS/s 6b flash ADC for 10G ethernet

applications," in IEEE Int. Solid-State Circuits Conf. (ISSCC), Feb. 2013, pp. 462–463

- [43] C.-C. Huang, C.-Y. Wang, and J.-T. Wu, "A CMOS 6-bit 16-GS/s time-interleaved ADC using digital background calibration techniques," *IEEE J. Solid-State Circuits*, vol. 46, no. 4, pp. 848–858, Apr. 2011.
- [44] M. M. Ayesh, S. Ibrahim, and M. M. Aboudina, "A 15.5-mW 20-GSps 4-bit chargesteering flash ADC," in *IEEE Midwest Symposium on Circuits and Systems* (MWSCAS), Oct. 2015, pp. 1–4.
- [45] D. Ferenci, M. Grozing, F. Lang, and M. Berroth, "A 3-bit 20-GS/s flash ADC in 65 nm low power CMOS technology," in *IEEE European Microwave Integrated Circuit Conf. (EMICC)*, Sep. 2010.
- [46] G. Tretter, M. M. Khafaji, D. Fritsche, C. Carta, and F. Ellinger, "Design and characterization of a 3-bit 24-GS/s flash ADC in 28-nm low-power digital CMOS," *IEEE Trans. Microw. Theory Techn.*, vol. 64, no. 4, pp. 1143–1152, Apr. 2016.
- [47] S. Sharamian, S. P. Voinigescu, and A. C. Carusone, "A 35-GS/s 4-bit flash ADC with active data and clock distribution trees," *IEEE J. Solid-State Circuits*, vol. 44, no. 6, pp. 1709–1720, Jun. 2009.
- [48] Y. Feng, Y. Tang, Q. Fan, and J. Chen, "A 25-GS/s 4-bit single-core flash ADC in 28nm FDSOI CMOS," in IEEE Aisa Pacific Conference on Circuits and Systems (APCCAS), Oct. 2018.
- [49] H. Chung, A. Rylyakov, Z. T. Deniz, J. Bulzacchelli, G.-Y. Wei, and D. Friedman,
   "A 7.5-GS/s 3.8-ENOB 52-mW flash ADC with clock duty cycle control in 65 nm
   CMOS," in *Dig. Symp. VLSI Circuits*, 2009, pp. 268–269.

- [50] W. A. Qureshi, E. Bonizzoni, and F. Maloberti, "A 5-bit 10-GS/sec flash ADC with resolution enhancement using metastability detection," in *Proc. IEEE International Symposium on Circuits and Systems (ISCAS)*, May, 2019.
- [51] X.-Q. Du, M. Grozing, M. Buck, and Manfred Berroth, "A 40 GS/s 4 bit SiGe BiCMOS flash ADC," in *IEEE Bipolar/BiCMOS Circuit and Technology Meeting* (BCTM), Oct. 2017.
- [52] H.-W. Kang, H.-K. Hong, W. Kim, and S.-T. Ryu, "A time-interleaved 12-b 270-MS/s SAR ADC with virtual-timing-reference timing-skew calibration scheme," *IEEE J. Solid-State Circuits*. vol. 53, pp. 2584–2594, Sept. 2018.
- [53] X. Li, C. Huang, D. Ding, and J. Wu, "A review on calibration methods of timing-skew in time-interleaved ADCs," *Journal of Circuits, Systems and Computers*, Vol. 29, No. 2, 2020.
- [54] J. Song and N. Sun, "A 10-b 600-MS/s 2-way time-interleaved SAR ADC with mean absolute deviation based background timing-skew calibration," in *IEEE Custom Integrated Circuits Conf. (CICC)*, San Diego, CA, 2018, pp. 1–4.
- [55] H. Wei, P. Zhang, B. D. Sahoo, and B. Razavi, "An 8-bit 4-GS/s 120-mW CMOS ADC," in *IEEE Custom Integrated Circuits Conf. (CICC)*, San Diego, CA, Aug. 2013, pp. 1–4.
- [56] H. Mohammadnezhad, H. Wang, A. Cathelin, and P. Heydari, "A 115-135-GHz 8PSK receiver using multi-phase RF-correlation-based direct-demodulation method," *IEEE J. Solid-State Circuits*, vol. 54, no. 9, pp. 2435–2448, Sep. 2019.

- [57] K. Deguchi, N. Suwa, M. Ito, T. Kumamoto, and T. Miki, "A 6-bit 3.5 GS/s 0.9-V
  98-mW flash ADC in 90 nm CMOS," *IEEE J. Solid-State Circuit*, vol. 43, no. 10, pp. 2303–2310, Oct. 2008.
- [58] R. E. J. van de Grift, I. W. J. M. Rutten, and M. van der Veen, "An 8-bit video ADC incorporating folding and interpolation techniques," *IEEE J. Solid-State Circuits*, vol. 22, no. 22, pp. 944–953, Dec. 1987.
- [59] K. Sushihara, H. Kimura, Y. Okamoto, K. Nishimura, and A. Matsuwasa, "A 6b 800 MSample/s CMOS A/D converter," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers.*, 2001, pp. 428–429.
- [60] X. Yang, G. Cui, Y. Zhang, J. Ren, and J. Liu, "A metastability error detection and reduction technique for partially active flash ADCs," *IEEE Trans. Circuits Syst. II*, vol. 63, no. 4, pp. 331–335, Apr. 2016.
- [61] Z. Zheng, L. Wei, J. Lagos, E. Martens, Y. Zhu, C.-H. Chan, J. Craninckx, and R. P. Martins, "A 3.3-GS/s 6b fully dynamic pipelined ADC with linearized dynamic amplifier," *IEEE J. Solid-State Circuits*, DOI: 10.1109/JSSC.2021.3096938.
- [62] Q. Fan, and J. Chen, "A 500-MS/s 13-bit SAR-assisted time-interleaved digital-slope
   ADC," in *Proc. IEEE Int. Symp. Circuits Syst. (ISCAS)*, May 2019, pp. 1–5.
- [63] C.-C. Liu, C.-H. Kuo, and Y.-Z. Lin, "A 10 bit 320 MS/s low-cost SAR ADC for IEEE 802.11ac applications in 20 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 50, no. 11, pp. 2645–2654, Nov. 2015.
- [64] K. -M. Lei, P.-I. Mak, and R. P. Martins, "Systematic analysis and cancellation of kickback noise in a dynamic latched comparator," *Analog Integr. Circuits Signal Process.*, vol. 77, no. 2, pp. 277–284, Nov. 2013.

- [65] D. Rossi, A. Pullini, M. Gautschi, I. Loi, F. K. Gurkaynak, P. Flatresse, and L. Benini,
  "A -1.8V to 0.9V body bias, 60 GOPS/W 4-core cluster in low-power 28nm UTBB
  FD-SOI technology," in *IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S)*, Oct. 2015.
- [66] D. Lee, J. Yoo, K. Choi, and J. Ghaznavi, "Fat tree encoder design for ultra-highspeed flash A/D converters," in *IEEE Midwest Symposium on Circuits and Systems* (MWSCAS), 2002, pp. 87–90.
- [67] M. Shinagawa, Y. Akazawa, and T. Wakimoto, "Jitter analysis of high-speed sampling systems," *IEEE J. Solid-State Circuits*, vol. 25, no. 1, pp. 220–224, Feb. 1990.
- [68] S. Galal, and B. Razavi, "Broadband ESD protection circuits in CMOS technology," *IEEE J. Solid-State Circuits*, vol. 38, no. 12, pp. 2334–2340, Dec. 2003.
- [69] S. Kim, S. Kim, G. Jung, K.-W. Kwon, and J.-H. Chun, "Design of a reliable broadband I/O employing T-coil," *Journal of Semiconductor Technology and Science*, vol. 9, no.4, Dec. 2009.
- [70] N. Lotfi, P. L. Ibanez, M. Runge, and F. Gerfers, "A single-channel 18.5 GS/s 5-bit flash ADC using a body-biased comparator architecture in 22nm FDSOI," in *Proc. IEEE International Symposium on Circuits and Systems (ISCAS)*, May, 2019.