# DESIGN TECHNIQUES TO MITIGATE THE IMPACT OF PVT VARIATIONS IN NANOMETER CIRCUITS ANDRÉS FELIPE AMAYA BELTRÁN UNIVERSIDAD INDUSTRIAL DE SANTANDER FACULTAD DE INGENIERÍAS FISICOMECÁNICAS ESCUELA DE INGENIERÍA ELÉCTRICA, ELECTRÓNICA Y DE TELECOMUNICACIONES BUCARAMANGA 2020 # DESIGN TECHNIQUES TO MITIGATE THE IMPACT OF PVT VARIATIONS IN NANOMETER CIRCUITS ## ANDRÉS FELIPE AMAYA BELTRÁN Trabajo de grado presentado para optar al título de Doctor en Ingeniería, área Ingeniería Electrónica Director: ÉLKIM FELIPE ROA FUENTES INGENIERO ELECTRÓNICO. PhD. UNIVERSIDAD INDUSTRIAL DE SANTANDER FACULTAD DE INGENIERÍAS FISICOMECÁNICAS ESCUELA DE INGENIERÍA ELÉCTRICA, ELECTRÓNICA Y DE TELECOMUNICACIONES BUCARAMANGA 2020 #### **Acknowledgments** It is no possible to complete the road towards a Ph.D. without the help of many people. People such as my advisor, my friends, my co-workers at OnChip, my wife, and my daughter, contributed to building a step to accomplish this stage. First, I would like to thank Prof. Elkim Roa, who has been my advisor and my colleague during these years. His guidance re-shaped the scope of this thesis and was crucial to achieving the best results. Second, I would like to thank my friends and colleagues at OnChip. I want to thank Luis, Javier, and Héctor for their help in tunning-up each aspect of this work, and also for the non-technical conversations about great past stories. Also, I want to thank Ckristian for his massive help with the digital and programming aspects of each circuit. Moreover, I would like to thank Rolando, Juan Sebastian, Hansel, Wilmer, Juan Pablo, and all the OnChip members that contributed to complete this work. Third, I want to thank Universidad Industrial de Santander for the economic support during the first years of this road, for the resources to tape-out and test each circuit, and for funding all the patent applications. Thanks to Prof. Rodolfo Villamizar for his support and guidance during the first years. Also, thanks to CERN and Prof. Hugo Hernandez for my first tape-out. Finally, I want to thank my wife Mildred, who has been my right hand during all these years. She provided me enough motivation at the last stage to finish writing this thesis. Thanks for understanding when you had to go alone to bed in so many occasions. Thanks to my father Jesús (R.I.P) for his huge support; this work is specially dedicated to him. And thanks to Gabriela, my daughter, for giving me a new reason to complete this road. Dedicated to all the people that contributed to finish this work. Specially dedicated to my father (R.I.P) # CONTENT | 1 Introduction | 21 | |-------------------------------------------------------------------|----| | 1.1 Key aspects on robut SoC design | 23 | | 1.1.1 High-speed serial links | 23 | | 1.1.2 Digital-to-analog conversion in SoC | 24 | | 1.1.3 Power supply regulation | 25 | | 1.2 Contributions | 25 | | 1.3 Thesis overview | 27 | | 2 An Offset Reduction Technique for Dynamic Voltage Comparators | 28 | | 2 An Onset Reduction reclinique for Dynamic voltage Comparators | 20 | | 2.1 Introduction | 28 | | 2.2 Common Approach for Offset Calibration in High Speed Links | 29 | | 2.3 Proposed Offset Reduction Technique | 35 | | 2.3.1 Performance Description | 36 | | 2.4 Residual offset | 38 | | 2.5 Experimental Results for the proposed PORT sub-circuit | 43 | | 2.6 Summary | 44 | | 3 On-chip eye diagram Measurement for Comparator Characterization | 45 | | | 70 | | 3.1 Introduction | 45 | | 3.2 Measurement Strategy | 48 | | 3.3 Circuit Implementation | 54 | | 3.4 | Experimental Results | 57 | |------|-------------------------------------------------------------------|----| | 3.5 | Summary | 61 | | 4 | A 12b 10MHz Capacitive Digital to Analog Converter | 63 | | 4.1 | Introduction | 63 | | 4.2 | Capacitive DAC topology | 64 | | 4.3 | Calibration and Trimming | 67 | | 4.3. | 1 Layout | 70 | | 4.4 | Experimental Results | 74 | | 4.4. | 1 Debugging DAC performance | 76 | | 4.5 | Summary | 77 | | 5 I | mproving LDO Stability by Exploiting the Equivalent Series Resis- | | | t | or of Compensation Capacitor | 79 | | 5.1 | Introduction | 79 | | 5.2 | External Compensation of LDOs | 81 | | 5.3 | Adaptive Control of the LDO's Power Transistor | 86 | | 5.4 | Experimental Results | 89 | | 5.5 | Summary | 92 | | 6 ( | Conclusions | 94 | | 6.1 | Conclusions | 94 | | 6.2 | List of Publications | 97 | | 6.2. | 1 Conference papers | 97 | | 6.2. | 2 Journal papers in examination process | 98 | | 6.2. | 3 Patents | 98 | | 6.2. | 4 Patent requests | 98 | | 6.2. | 5 Other publications | 98 | |-------------|------------------------------------------------------------|-----| | 6.3 | Future work | 99 | | BIB | LIOGRAPHY | 101 | | A I | Process-Compatible DRAM Row-Hammering Mitigation Technique | 111 | | <b>A.</b> 1 | Introduction | 111 | | A.2 | Pseudo-parallel Memory Cell Emulation | 112 | | A.3 | On Deployability of a Weak-Cells-Based Monitoring System | 114 | | A.4 | Simulation Results | 115 | | A 5 | Conclusions | 118 | # LIST OF FIGURES | 1 | Traditional RX front-end for high-speed interfaces: a) General Im- | | |----|--------------------------------------------------------------------------|----| | | plementation; b) Calibration of sampler C; c) Calibration of sampler | | | | B. Only even (0°) phase components are only shown for explana- | | | | tion purposes | 30 | | 2 | Traditional offset-correction scheme: Detailed circuit including switche | 3 | | | for commutation. | 32 | | 3 | Alternatives implementations to reduce number of switches: a) use | | | | of Offset correction switches at each local summing point; b) turn- | | | | ing off of global summer | 33 | | 4 | Power increment when using classical offset-correction techniques | | | | in high-speed signal interfaces: a) load Capacitance, b) bias current. | 33 | | 5 | Block diagram of the proposed offset reduction technique | 34 | | 6 | Implemented phase and frequency detector | 35 | | 7 | State diagram of the FSM | 36 | | 8 | Timing diagram of the proposed technique: a) saturated compara- | | | | tor due to large offset; b) calibration evolution and convergence; c) | | | | comparator calibrated | 37 | | 9 | Block diagram of the proposed technique including a majority-voting | | | | block to perform a low-pass filter. | 39 | | 10 | Behavior description of a MJV filter | 39 | | 11 | Linear model of the calibration loop. | 40 | |----|----------------------------------------------------------------------|----| | 12 | Timing diagram of the proposed technique considering an uniform | | | | sequence (1010) for offset reduction when link tunning. | 43 | | 13 | Measured and simulated (TT) output voltage of calibration DACs | | | | during a calibration process | 44 | | 14 | Traditional high-speed serial link: use of equalizers at both TX and | | | | RX sides. | 46 | | 15 | System implemented for offset-reduction technique validation | 49 | | 16 | Conceptual diagram of voltage shifting to build the eye diagram at | | | | the pre-amplifier output. | 50 | | 17 | Conceptual diagram of phase and voltage shifting to build the eye | | | | diagram at the pre-amplifier's output | 52 | | 18 | Implemented sampling circuitry | 54 | | 19 | Implemented programmable Gm-C filter for channel emulation | 54 | | 20 | Nauta-OTA for Gm-stages implementation | 55 | | 21 | Implemented Phase-Interpolator | 55 | | 22 | Testing board setup | 56 | | 23 | Micro-photography of the proposed offset-correction technique | 56 | | 24 | Eye diagram at the pre-amplifier's output using the method de- | | | | scribed in section 3.2. The average value of each section was | | | | calculated using 50k samples of $D_{OUT}$ | 57 | | 25 | Measured on-chip eye diagram at the pre-amplifier's output using | | | | the method of section 3.2, and before offset calibration. The filter | | | | attenuation is 26 dB and induced offset is: a)+44 mV, b) -56 mV | 59 | | | | | | 26 | Measured on-chip eye diagram after calibration for filter attenuation | | |----|------------------------------------------------------------------------|----| | | of 26 dB | 59 | | 27 | Measured on-chip eye diagram before calibration with 23 dB of at- | | | | tenuation and offset of a)+37 mV, b) -37 mV, c) Measurement after | | | | calibration. | 60 | | 28 | Capacitive array used to implement D/A conversion. | 65 | | 29 | Inclusion of parasitic components and calibration capacitors in DAC | | | | core | 67 | | 30 | Implemented DAC including calibration circuits and buffers | 69 | | 31 | Effect of $C_{cal}$ trimming on correction of quantization step | 70 | | 32 | Traditional placement of CDACs: a) common-centroid layout of a | | | | 80-bit binary-weighted DAC, b) layout of circuit from Fig. 29 | 71 | | 33 | Placement of capacitors of implemented DAC (singled-ended array). | 72 | | 34 | Coupling capacitance in CDACs | 73 | | 35 | Microphotography of the designed DAC | 73 | | 36 | DAC's differential output voltage. | 74 | | 37 | DAC's differential output voltage. | 75 | | 38 | DAC's differential output voltage: $F_s$ =10 MHz, $F_{signal}$ =78 KHz | 75 | | 39 | Frequency spectrum of signal of Fig. 38 | 75 | | 40 | Traditional LDO topology based on a source-follower PMOS power | | | | transistor | 81 | | 41 | Bandwidth improvement by ESR @ $I_L=20\mathrm{mA}$ | 83 | | 42 | Phase margin vs. ESR, including PVT variations @ $I_L=20\mathrm{mA}$ | 84 | | 43 | Implementation of width and parasitic capacitance control of Power | | | | MOSFET | 85 | | 44 | Phase margin vs. bias current of the error amplifier, including PVT | | |----|-----------------------------------------------------------------------------------------------|--------------| | | variations @ $I_L=20\mathrm{mA}$ | 86 | | 45 | Phase margin vs. load current without adaptive power transistor | | | | control, and including PVT variations | 86 | | 46 | Phase margin vs. Temperature with adaptive power transistor con- | | | | trol and error amplifier biasing, and including PV variations: a) @ | | | | $I_L=20\mathrm{mA}$ b) @ $I_L=10\mu\mathrm{A}$ | 87 | | 47 | BOD circuit for detection of LDO dynamics | 89 | | 48 | Microphotography of the fabricated system highlighting the LDO, | | | | the BOD and biasing circuitry. | 90 | | 49 | Output voltage variation for a change in load current of 5 mA at | | | | 27 °C and considering: a) Low-ESR b)ESR $\sim 1\Omega$ c)ESR $\sim 10\Omega$ | | | | d)ESR $\sim 20\Omega$ . Vertical scale is 100 mV/ $^2$ , $whilehorizontal scale is 100 mV/^2$ | $ns/^{2}$ 90 | | 50 | Output voltage variation for a change in load current of 5 mA at | | | | 125 °C and considering: a) Low-ESR b)ESR $\sim 1\Omega$ c)ESR $\sim 10\Omega$ | | | | d)ESR $\sim 20\Omega$ . Vertical scale is 200 mV/ $^2$ , $whilehorizontal scale is 100 mV/^2$ | $ns/^{2}$ 91 | | 51 | Measured load regulation: Output voltage as a function of load cur- | | | | rent | 92 | | 52 | Cell with increased leakage susceptibility using non-standard sized | | | | cell | 113 | | 53 | Pseudo-parallel connection between two DRAM cells. Both cells | | | | share their bit line and word line, emulating a cell of twice the size | | | | of a standard DRAM cell | 113 | | 54 | DRAM array including the proposed monitoring system. Pseudo- | |----|---------------------------------------------------------------------| | | parallel cells consist of one modified cell and one dummy cell, ex- | | | ploiting the unusable bit of dummy cells | | 55 | Standard and monitoring cell voltage discharge including PVT vari- | | | ations: a) Typical process corner and 50°C, b) Fast process corner | | | and 125°C, c) Slow process corner and -40°C | # LIST OF TABLES | 1 | Comparison of the proposed technique with others works | 62 | |---|----------------------------------------------------------------|----| | 2 | Performance summary of the designed DAC | 76 | | 3 | Typical values for ESR of capacitors made of diverse materials | 84 | | 4 | Performance summary of the designed LDO. | 92 | ## **List of Acronyms** **PVT** Fabrication Process, Voltage and Temperature SoC System on Chip **PORT** Phase-based Offset Reduction Technique **DFE** Decision Feedback Equalizer **CTLE** Continuous Time Linear Equalizer **PD** Phase Detector FSM Finite State Machine MJV Majority Voting PRBS Pseudo-Random Bit Sequence **BER** Bit Error Rate **SPI** Serial Programmable Interface **FPGA** Field Programmable Gate Array **DAC** Digital to Analog Converter **CDAC** Capacitive Digital to Analog Converter **DNL** Differential Non-linearity **INL** Integral Non-Linearity **MSB** Most Significant Bit LSB Less Significant Bit **CMRR** Common Mode Rejection Ratio **LDO** Low Drop-out Linear Regulator **ESR** Equivalent Series Resistance PM Phase Margin **BOD** Brown-out Detector **RESUMEN** TÍTULO: TÉCNICAS DE DISEÑO PARA MITIGAR EL IMPACTO DE LAS VARIA- CIONES PVT EN CIRCUITOS NANOMÉTRICOS \* **AUTOR:** ANDRES FELIPE AMAYA BELTRAN † PALABRAS CLAVES: Reducción de offset, variaciones PVT, calibración de DNL, regulador de tensión, conversión de datos. El impacto de las variaciones del proceso de fabricación, la temperatura de operación y la tensión de alimentación (PVT) en el rendimiento de Systems-on- Chip (SoC) generalmente se mitiga mediante algoritmos de calibración. Estos algoritmos (ejecutados generalmente en segundo plano) utilizan datos de sen- sors PVT para ajustar la operación a expensas de hardware adicional, latencia y consumo de energía. Este trabajo presenta tres técnicas de diseño novedosas y de baja compleji- dad para reducir la incidencia de variaciones PVT globales, locales y aleatorias en el rendimiento de un SoC. La primera alternativa aborda la calibración de offset en ecualizadores de retroalimentación de decisión (DFE), utilizados en en- laces seriales. El offset se detecta en el dominio de fase utilizando un detector de fase en la salida del comparador. Esta detección permite eliminar la conexión clásica de modo común en la entrada del comparador. El método permite la im- \*Trabajo de Investigación. <sup>†</sup>Facultad de Ingenierías Fisicomecánicas. Escuela de Ingenierías Eléctrica, Electrónica y de Telecomunicaciones. Director: Élkim Felipe Roa Fuentes. 17 plementación de una calibración sobre la marcha sin afectar la carga en la ruta de la señal. La segunda técnica consiste en un algoritmo de calibración para ajustar la no linealidad diferencial (DNL) en convertidores digital-analógico capacitivo. El algoritmo reduce la necesidad de conectar la matriz capacitiva a Vcm mientras se calibra, lo que reduce la complejidad del circuito, la potencia y el consumo de área. La tercera técnica se concentra en mejorar la robustez de la estabilidad de los reguladores lineales. La estabilidad de frecuencia se ve mejorada por dos aspectos: un compensador de Adelanto-atraso, y un esquema adaptativo para la corriente de polarización y el tamaño del transistor de potencia. El compensador se implementa usando la resistencia en serie equivalente del capacitor externo. Además, una estimación de subimpulso realizada por el detector de brown-out de unidades de administración de energía convencionales establece la corriente de polarización y el tamaño del transistor de paso. **ABSTRACT** TITLE: DESIGN TECHNIQUES TO MITIGATE THE IMPACT OF PVT VARIA- TIONS ON NANOMETER CIRCUITS 1 **AUTHOR:** ANDRES FELIPE AMAYA BELTRAN § **KEYWORDS:** Offset reduction, PVT variations, DNL calibration, voltage regula- tor, data conversion The impact of variations of the fabrication process, operating temperature and supply voltage (PVT) on the performance of Systems-on-Chip (SoC) is typically mitigated using calibration algorithms. These algorithms (executed usually at the background) use data from PVT sensors to adjust operation at expenses of ex- tra hardware, latency, and power consumption. Even for mature technologies (>100 nm), PVT sensing has a crucial role in complex SoC's aspects, such as voltage regulation, data conversion and interface. Moreover, PVT sensors can not sense the effect of local and random variations on the SoC performance. Specifications such as offset (produced mainly by mismatch) requires the design of dedicated calibrations procedures, increasing hardware overhead. This work introduces three novel and low-overhead design techniques to re- duce the incidence of global, local, and random PVT variations on SoC's per- formance. The first alternative addresses offset calibration in decision feedback <sup>‡</sup>Research Work. §Facultad de Ingenierías Fisicomecánicas. Escuela de Ingenierías Eléctrica, Electrónica y de Telecomunicaciones. Advisor: Élkim Felipe Roa Fuentes. 19 equalizers (DFE), used in serial links. Offset is sensed in the phase domain using a phase detector at the comparator output. The phase-domain sensing allows eliminating the classical common-mode connection at the comparator's input. The method enables the implementation of an on-the-fly calibration without affecting the load at the signal path. The second technique consists of a lightweight calibration algorithm to adjust differential non-linearity (DNL) in split-capacitors digital-to-analog converters. The algorithm reduces the necessity of connecting the capacitive array to Vcm while calibrating, thus reducing circuit complexity, power, and area consumption. The third technique concentrates on improving the stability robustness of linear low-dropout regulators. Frequency stability is improved by two aspects: a lead-lag compensator, and an adaptive scheme for bias current and power transistor size. The compensator is implemented by exploiting the equivalent series resistor of the external capacitor. Also, an undershoot estimation made by the brown-out detector of conventional power management units sets bias current and pass transistor size. #### 1. Introduction The continuous scaling of CMOS technologies has allowed for the development of complex, high-performance systems-on-chip (SoCs). Typically, a SoC integrates into unique substrate functions such as volatile and non-volatile memory, multiple levels of data processing, I/O subsystems, and data conversion. Having these types of functions available in a single chip has been crucial to boosting the growth of both low-end and high-performance applications. The interconnection of everything to the internet (which constitutes the well-known internet-of-things (IoT) movement) is the most famous example of the importance of having the ability to read data from sensors, process it in the digital domain, and send it to the cloud, all in a single chip. Furthermore, the use of artificial intelligence to solve driving issues or facial recognition challenges is an example of how a high-performance SoC can deal with daily situations [1]. Even today's low-cost smartphones can perform 3D-video tasks due to the inclusion of hardware accelerators and high-speed links in the same chip. As technology continues to reduce device dimensions, and as SoC complexity keeps growing, reliability has become one of the main design issues. This concern is related to guaranteeing that all the subsystems always perform according to the initial specifications. Reliability is also connected to being able to handle the effects of environmental changes on the performance, and with variations of any large scale production system. Optimal SoC design should include the ability to measure performance deviation and to make decisions about how to adjust small subsystems (or the whole application) to meet the intended throughput. From the design point of view, a typical method to quantify the reliability of a SoC is to evaluate its performance regarding variations of the fabrication process, operating temperature, and supply voltage (known as PVT variations). PVT variations consider a set of extreme operation conditions, with the purpose of evaluating possible worst-case situations that lead to malfunction or a reduced lifetime. PVT-oriented design guarantees circuit robustness against the uncertainty of physical parameters of silicon devices (always present in any mass-scale production line). Moreover, a PVT-aware design considers that a single SoC can be used in many environmental conditions. For instance, it is possible to find the same IoT SoC in both automotive (high temperature and corrosion environment) and human body movement applications (low temperature and stress). A typical solution to reduce the impact of PVT variations is to use calibration circuits. This alternative tries to fix the circuit performance once it has been fabricated or during its operation. Calibration mainly occurs in the digital domain and involves a set of algorithms that compare an output and reference signal. Calibration adjusts parameters such as amplifier gain, bias signals, common-mode levels, or load capacitances to optimize performance. Calibration is the most popular way to fix the performance of circuits such as data converters, analog and digital filters, instrumentation systems, and wireline transceivers. However, hardware overhead needed to perform this operation adds significant power consumption and silicon area, as well as an impact on the performance. Calibration involves the solution of optimization algorithms such as Least Mean Square (LMS) [2], requiring complex digital implementations. In some cases, such algorithms can demand more current than the main application, especially when calibration must be performed along normal operation of the system. Moreover, applications that include calibration circuits add additional costs related to verification and testing. #### 1.1. Key aspects on robut SoC design Some of the most critical aspects of any SoC are data transfer, data conversion, and power supply regulation. These three aspects involve the design of pure analog or mixed-signal circuits, whose robustness is lower compared with digital-only sub-systems. Digital circuits have the advantage of using transistors solely as switches, so the probability of having a functional failure is lower and depends on other subsystems such as power supply regulation [3, 4]. The following subsections detail the key aspects on adding robustness to a SoC design. 1.1.1. High-speed serial links Transfer of large amounts of data between integrated commercial devices is typical nowadays, as modern SoCs come readily equipped with high-speed interfaces such as USB 3.0 or Gigabit Ethernet ports [5,6]. Gigabit data rates are common due to the implementation of reconfigurable TX and RX blocks. These blocks can adapt circuit parameters according to the current transmission channel, as well as counter PVT variations. From the RX point-of-view, the maximum speed is strongly limited by equalization capability and by sampler sensitivity. Maximum equalization is a problem that has been solved mainly from a high-level perspective since equalizers are often treated as adaptable high-pass filters. However, sampler sensitivity is an issue strongly linked to PVT, especially for random or local process variations. Sensitivity gives a measure of the minimum signal amplitude that the RX block can sense, thus limiting channel attenuation and link speed. This specification is a function of transistor intrinsic gain, noise, and offset, the latter being caused by mismatch or intra-die variations. As a result, any SoC with a high-speed serial link must have an offset calibration routine, resulting in an increment of silicon area and power consumption. Moreover, the SoC has to use part of its processing resources to execute the calibration algorithms, sometimes necessary even to stop data transmission. 1.1.2. Digital-to-analog conversion in SoC Digital-to-analog conversion is a crucial task when using a SoC for signal generation and audio applications. High-resolution DACs are present in many SoCs used in 3D-video and gaming platforms. SoCs are also used in feedback control systems and wireless applications which typically include medium and low-resolution converters. Despite the existence of several methods for D/A conversion, the capacitive topology is preferred over resistive and current-based methods. A capacitive DAC has reduced power consumption and noise level, and the matching of capacitors is better than the matching of resistors [7]. Moreover, a capacitive DAC can be integrated into many analog-to-digital converters, such as the successive-approximation-register (SAR) converter. Linearity is the main issue when designing a robust DAC. Parasitic elements, produced by the layout pattern, strongly increase distortion. Furthermore, PVT variations impact dielectric properties, expanding the variation of unity and parasitic capacitance even more, and introducing offset. As a result, data conversion in SoC must include a calibration method that reduces the spread of quantization step throughout the dynamic range and for each digital input code. Calibration is often executed in the background, demanding a high additional computational load. In some cases, because capacitive DAC only consumes dynamic power, the execution in the background of calibration algorithms requires more power than the power delivered to the capacitors by the reference signal. The necessity of calibrating a converter in IoT applications has a high impact on lifetime and final cost. **1.1.3. Power supply regulation** Another essential subsystem in any SoC is the power-management unit (PMU). A PMU has the function of setting the required supply voltage for the other subsystems, according to operating speed and available energy. A PMU is typically composed of a switched conversion stage (DC-DC), followed by linear and low-dropout regulators (LDO). A DC-DC converter transforms (with high energy efficiency) the battery voltage into the standard level required by each subsystem. Given the switched nature of a DC-DC converter, its output presents a large ripple. That ripple is reduced by an LDO, which is a linear feedback amplifier with an output stage. Because an LDO has a feedback network, stability is the primary design concern. Open-loop gain and phase shift are a function of physical and electrical parameters of transistors, especially the intrinsic gain and parasitic capacitance of the power device. Therefore, PVT variations have a high impact on the performance, causing a weak and slow transient response, or a complete malfunction of other subsystems. A robust LDO must have a compensation strategy that counters PVT variations according to the load current and input voltage. #### 1.2. Contributions This thesis describes three low-complexity design techniques and circuit alternatives to mitigate the impact of PVT variations in circuits such as voltage comparators for decision feedback equalizers (DFE) in serial links, digital-to-analog converters (DAC), and low-dropout voltage regulators (LDO). For voltage comparators, the proposed circuit reduces offset without the traditional connection of both inputs to a common-mode voltage. For DACs, the design technique is fo- cused on reducing the impact of parasitic capacitance on linearity. In the case of LDOs, the proposal is related to improving circuit stability through parasitic components of an external capacitor. The contributions are summarized as follows: - 1. A fully-digital, low hardware overhead offset reduction technique for dynamic voltage comparators. The technique senses offset in the phase domain using a classical phase-and-frequency detector, and can be applied to decision feedback equalizers in high-speed serial interfaces. In contrast to traditional methods, the proposed alternative does not require setting the comparator input to a common-mode voltage when calibrating offset, thus enabling the possibility of implementing an on-the-fly correction. Furthermore, a method for non-invasive eye-diagram construction was implemented for comparator characterization and validation of the proposed technique. - 2. A low-complexity DNL calibration algorithm for split-capacitor based digital-to-analog converters, that does not require the use of an additional reference voltage ( $V_{CM}$ for instance) to measure and compensate DNL. Thus, it reduces circuit complexity and power consumption. The proposed algorithm is also tied to an analysis of the impact of traditional layout techniques on DAC linearity. - 3. An alternative low-dropout regulator frequency compensation based on the implementation of a lead-lag compensator using equivalent-series-resistor (ESR) of external capacitor. In contrast to traditional methods, which implement additional circuits to eliminate the dependence regarding ESR, the proposed circuit takes advantage of the zero-pole pair at the LDO output to increase the phase margin. Moreover, a method for controlling the width of the power-MOSFET and the bias current of the error-amplifier is discussed, based on the interaction between the LDO and the brown-out detector. #### 1.3. Thesis overview This document is organized as follows: chapter 2 describes a fully-digital, low complexity technique to compensate offset in voltage comparators that can be extended to decision feedback equalizers in high-speed serial links. Chapter 3 exposes a method to calculate an eye diagram for comparator characterization and without the need for external probes. Chapter 4 presents a low-hardware-overhead DNL calibration and some design considerations about the impact of using traditional common-centroid layout techniques on linearity. Chapter 5 shows how to use the parasitic equivalent-series-resistors (ESR) of a LDO compensation capacitor to improve robustness of frequency stability, and discusses the interaction of a brown-out detection circuit with bias current and power transistor setup. Finally, chapter 6 presents some conclusions about the results and recommendations for future works. ### 2. An Offset Reduction Technique for Dynamic Voltage Comparators This chapter introduces a low-complexity technique to reduce the offset voltage of dynamic comparators used as samplers in decision feedback equalizers (DFE). The proposed method leverages an output-data all-digital phase estimation technique in which the comparator's input does not need to be set to common-mode voltage $(V_{CM})$ during offset compensation. While traditional techniques might break the data link for offset adjustment, this work allows the comparator to be calibrated on the fly. This chapter explains the behaviour of the proposed technique, and validates its performance with preliminary simulations in a 180 nm node. #### 2.1. Introduction Offset reduction is one of the major concerns at the front-end of a high-speed wireline receivers. An offset-reduction technique has to be carefully chosen considering the additional circuit complexity and capacitive load penalty to the signal path. Comparators, used as samplers or slicers in decision feedback equalizers (DFE), face the challenge of sensing signals at data rates above 20 Gb/s with limited input signal swing. Considering the aggregated losses of low-pass channels, which can reach up to 40dB, signal amplitude at comparator input could be as low as 20 V [8]. As a result, comparator sensitivity specifications become limited by the accuracy of the offset correction scheme. Furthermore, any load added to the signal path to set up an offset-calibration scheme has a highly negative impact on signal amplitude and power consumption. Traditional offset correction methods break the communication link to perform calibration. A typical correction scheme sets the comparator input to a common-mode voltage $(V_{CM})$ for offset sensing and compensation. This scheme requires the use of additional circuits (switches are often used for this task) to open the input signal path and connect the sampler input to $V_{CM}$ . If the application demands an on-the-fly operation, it is inevitable that an extra signal path that processes input data will be included while calibration is executed. Therefore, extra circuitry must be added and consequently, capacitive load is increased, demanding more power and area consumption to meet timing specifications. This thesis presents a novel scheme to reduce offset of dynamic comparators used in DFE circuits for high-speed interfaces. The chapter describes an integrated receiver scheme that implements the phase-domain offset reduction technique (PORT). Measurement results show its potential application for on-fly offset correction in high-speed link receivers. PORT works based on the output signal phase, presents a low complexity, and offers the possibility of a digital implementation without compromising speed and power. The main characteristic of PORT is the fact that calibration does not require setting the input of the comparator to a common-mode level, paving the way to eliminate the necessity of the alternative signal path. This chapter is organized as follows: Section 2.2 shows common alternatives to compensate offset in DFEs; Section 2.3 describes PORT's operation principle and its circuit structure; Section 2.4 presents a residual offset analysis; and Sections 2.5 and 2.6 introduce experimental results and conclusions, respectively. ### 2.2. Common Approach for Offset Calibration in High Speed Links Fig. 1a shows a traditional double data rate (DDR) receiver front-end composed by a resistance termination (T-coil), a continuous time linear equalizer (CTLE), Figure 1. Traditional RX front-end for high-speed interfaces: a) General Implementation; b) Calibration of sampler C; c) Calibration of sampler B. Only even (0°) phase components are only shown for explanation purposes. two decision feedback equalizer for data and timing (edge) sampling respectively, and a clock and data recovery (CDR) block. The first tap of both data and edge equalizers uses a predictive or partial response implementation (prDFE) to meet timing requirements [9]. Commonly, a third sampler adapts equalizer coefficients, performs eye diagram monitoring, and is used for offset-correction purposes. During a tuning process, the third sampler extracts the error signal (dLev) required for the adaptation algorithm. The clock signal of the adaptive comparator ( $clk_{TRAIN}$ ) has a different phase compared to that of data samplers ( $clk_{EVEN}$ and $clk_{ODD}$ ), which is necessary for eye diagram monitoring during the normal operation [10]. The third comparator also allows performing an offset-calibration on a specific sampler while maintaining data transmission, in contrast to typical offset correction at the beginning of link operation [10]. Calibration of samplers only before beginning data transmission involves losing the option to track offset changes due to temperature and power supply (VT) variations during link operation. The inclusion of an on-the-fly calibration allows compensating samplers considering link variability due to VT variations. Figs. 1b and 1c present the traditional concept of on-the-fly calibration. In Fig. 1b the signal path (yellow line) includes samplers 1 and 2 as the prDFE section (red blocks), through the setup of multiplexers A and B. At the same time, offset-calibration is performed on sampler three (gray block) through multiplexer C and using a third DAC connected to the local summing point. A similar procedure is used to compensate offset of sampler 2, resulting in an equalizer formed by samplers 1 and 3, as Fig. 1c shows. An on-the-fly offset-reduction on the samplers of Fig. 1 implies that during calibration each comparator input has to be disconnected from the signal path $(V_{IN})$ and connected to a common-mode voltage $V_{CM}$ , as presented by Figs. 1b and 1c [11]. Sampler input swapping is done by switches at the input of each path, as Fig. 2 shows. In Fig. 2 calibration is done on the third sampler, needing its input connected to $V_{CM}$ . Furthermore, comparators one and two equalize the input signal, so that their inputs are connected to the summing circuit. The main problem with the topology of Fig 2 is the load added by switches and extra signal-paths, increasing total losses and degrading signal amplitude. Increased circuitry also affects power consumption and area. Fig. 3 shows two alternatives to reduce the number of switches. The circuit of Fig. 3a uses switches at the output of each local summer (summer of each prDFE section), thus reduc- Figure 2. Traditional offset-correction scheme: Detailed circuit including switches for commutation. ing the total load of the global summer. However, the method of Fig. 3a only reduces the comparator's offset, so that the offset of summer amplifiers still affects the performance. Implementation of Fig. 3b uses switches only at the samplers' input while turning off the summer that receives the signal from high-order taps. Load capacitance is cut to 50% of the original value at the cost of losing the capability to perform on-the-fly correction. Furthermore, the works presented in [12–15] calibrate offset using digital algorithms at the back-end during regular operation, and offset sampling techniques based on setting $V_{CM}$ at comparator input. Back-end routines increase complexity and thus area and power, while traditional offset sampling methods add loading to the signal path. Other alternatives, such as the one presented in [16], achieves a fully on-fly operation by doubling the number of samplers and multiplexers, thus increasing power. Figure 3. Alternatives implementations to reduce number of switches: a) use of Offset correction switches at each local summing point; b) turning off of global summer. Figure 4. Power increment when using classical offset-correction techniques in high-speed signal interfaces: a) load Capacitance, b) bias current. An alternative to overcome the aggregated losses due to the inclusion of extra signal paths and parasitic capacitance is to increase the peaking characteristic in the frequency response of previous equalization stages. An increment in equalization results in additional power consumption. Fig. 4 presents the simu- Figure 5. Block diagram of the proposed offset reduction technique. lated increment of load capacitance and bias current of DFE summing-circuit and continuous-time linear equalizers (CTLE) respectively. Those circuits are part of two different serial links: a 28 nm 28 Gbps DDR, and a 130 nm 8 Gbps using four quad-data-rate (QDR), with and without offset correction. For a 28 nm technology, the input and output capacitance of a complementary switch corresponds to 20% of the total load. Furthermore, the parasitic component increases up to 30% when including routing and interconnection paths. In order to guarantee that CTLE+DFE-summing achieves the required bandwidth and equalization gain with the additional load, it is necessary to increase the CTLE bias current (and thus circuit dimensions) by more than 50%. This increment strongly impacts overall power consumption. Similar behavior is presented in the 8Gbps link implemented with a 130nm technology; capacitance increment is 25%, thus demanding 50% more of initial bias current. By eliminating the necessity of connecting the input's sampler to $V_{CM}$ for offset-correction, the power increment of Fig. 4 can be mitigated. Therefore, there is a need for an alternative way to measure offset that does not imply inserting switches at comparator input. Figure 6. Implemented phase and frequency detector. ## 2.3. Proposed Offset Reduction Technique Phase-domain Offset Reduction Technique (PORT) is a substitute to sense and compensate offset in dynamic voltage comparators with no need to connect their input to a common-mode voltage. PORT works by sensing comparator offset through the phase of its output signals, as shown in Fig. 5. Considering that the comparator is dynamic, its outputs change continuously between reset and comparison states, even if the input data is the same. Comparator outputs can be seen as two different oscillations, whose phase difference gives information about offset. The way to measure phase is by using a phase detector (PD) in a similar aspect as in a PLL. The PD senses the phase difference between $V_{OUT1}$ and $V_{OUT2}$ , whose output controls the transition of a finite-state-machine (FSM). Thus, the FSM outputs $X_1$ and $X_2$ set the bias current of a preamplifier. The correct adjustment of currents $I_1$ and $I_2$ reduces the offset introduced by the system, which is a combination of the offset of the comparator accumulated at the preamplifier. A phase-detector and the FSM compose the core of PORT. Fig. 6 shows the PD structure, consisting of two D-type flip-flops and an AND gate at the output, a classical frequency-phase detector [17]. Flip-flops structure corresponds to a Figure 7. State diagram of the FSM. master-slave pass-gate topology, and the AND gate is implemented using standard static CMOS logic. The FSM consists of two 8 bits UP/DOWN counters, allowing a differential variation of $X_1$ and $X_2$ . Fig. 7 presents the state diagram of the FSM. Finally, $X_1$ and $X_2$ control to two DACs. One relevant aspect of the proposed technique is the fact that the calibration circuit can be synthesized using digital standard cells, which allows migration between different technology nodes. **2.3.1. Performance Description** Fig. 8 illustrates the calibration process. The total input-referred offset at the preamplifier input without calibration (Fig. 5) is: $$V_{OFF} = V_{off1}/A_v + V_{off2} \tag{1}$$ where $A_v$ is the gain of the pre-amplifier, and $V_{off1}$ and $V_{off2}$ are the offset of the comparator and the pre-amp, respectively. Assuming an input sequence, as shown at the top of Fig. 8a, and a positive offset so that $|V_{in}| < V_{OFF}$ , the comparator cannot differentiate between a logic one or zero at its input. Therefore, output $V_{out1}$ is clamped to $V_{DD}$ , and $V_{out2}$ oscillates between $V_{DD}$ and ground. The comparator's continuing phase change from reset to comparison causes $V_{out2}$ to oscillate (Fig. 8a). The calibration circuit is turned on at point A of Fig. 8b, causing Figure 8. Timing diagram of the proposed technique: a) saturated comparator due to large offset; b) calibration evolution and convergence; c) comparator calibrated that, while the comparator is saturated, outputs UP and DW of the phase detector are always high and low, respectively. This behavior produces an increment in $X_1$ while $X_2$ decreases, producing a differential increment in bias currents $I_1$ and $I_2$ by means of the DACs. The change in bias currents eventually produces an additional offset $V_{\rm CORR}$ in the opposite direction of $V_{OFF}$ . If offset is negative, the behavior of UP and DOWN signals will be exchanged, as well as for $X_1$ and $X_2$ . A detailed implementation of each block in Fig. 5 is addressed in the next chapter. If $|V_{in}| > (V_{\text{CORR}} - V_{OFF})$ and the next logic zero reaches the input, the comparator output $V_{out1}$ will go low (point B of Fig. 8b). Then, in the next rising edge of $V_{out1}$ (point C) the DW signal will go high, causing the phase-detector to reset (point D). Consequently, the increment of bias currents stops, exchanging $V_{out1}$ and $V_{out2}$ roles. Thus, the process can be restarted in the opposite direction, making bias currents oscillate around a new reached DC level (Fig. 8c). These final currents conditions are used as a stop criterion of the calibration process. Furthermore, Fig. 8c shows a half clock cycle as the duration of DW signal because the reset path of Fig. 6 includes delay stages to eliminate glitches. The described behavior does not include setting the comparator input to a common-mode $V_{CM}$ voltage, while calibration is carried out. Therefore, PORT avoids all switches at the input of each sampler of Fig. 2. Moreover, the feedback loop includes only an accumulator (the FSM) so that the system behaves as a one-dominant-pole one. Having only an accumulator allows a stable performance for a large range of bias currents and quantization steps, offering a large tolerance to PVT variations. The correct selection of $I_{1,2}$ and DAC reference voltage creates a circuit behaving as a dominant pole system, whose phase margin is $90^{\circ}$ . Additionally, because of the FSM is an Up/Down counter, PORT achieves a reduced convergence time because its critical path does not limit the settling time of bias currents. For that reason, calibration speed is limited only by the DACs. #### 2.4. Residual offset PORT can be summarized as follows: first, to apply a bit sequence at the slicer input; then, to measure the phase difference between sampler outputs to calculate offset; finally, to adjust preamplifier bias currents based on the phase-detector output and using the FSM and DACs. Considering the feedback loop formed by the calibration circuit, correction signal $V_{CORR}$ tries to follow total input-referred offset $V_{OFF}$ . Thus, residual offset $V_{RES}$ (defined as $V_{RES} = V_{OFF} - V_{CORR}$ ) gets Figure 9. Block diagram of the proposed technique including a majority-voting block to perform a low-pass filter. Figure 10. Behavior description of a MJV filter lower as the technique converges. Reduction of $V_{RES}$ leads to an improvement in the slicer's sensitivity, as offset is a key aspect for the minimum signal amplitude that a slicer can process. The instant the magnitude of the input signal is larger than residual offset at, i.e., $|V_{in}| > V_{RES}$ , gives the stop criteria, as Fig. 8c shows. This behavior does not imply the cancellation of $V_{RES}$ . PORT tries to find an equilibrium point at which residual offset remains below input signal amplitude, so that offset does not slant the slicer decision. In order to minimize residual offset, the circuit of Fig 5 can be modified in two different aspects. First, to insert a low pass filter (LPF) between the phase detector and the FSM, as Fig. 9 suggests. Second, the DC component of the Figure 11. Linear model of the calibration loop. input data has to be zero, i.e., the number of logic ones is equal to the number of zeros. The function of the filter is to extract any long-term DC level of the slicer's output so that the negative feedback can cancel it out. The filter also reduces the ripple of $X_1$ and $X_2$ signals, which is beneficial to achieve a more accurate offset cancellation. The low pass filter can be implemented using a majority-voting (MJV) algorithm in the same way as in a clock-and-data recovery circuit (CDR) [18,19]. This type of filter uses N samples of UP and DOWN signals and a voting function to calculate its output. The chosen voting function is the average of UP and DOWN signals because of its simple hardware implementation. So, the filter will produce an effective UP $(UP_{EFF})$ signal if the number of UP samples is larger than DOWN ones, and vice-versa $(DW_{EFF})$ , as Fig. 10 shows. Using an MJV filter at the output of the phase detector avoids the use of multiplication blocks, which would be necessary when using a classical digital filter at the output of the FSM. Furthermore, the input signals of the MJV block are 1-bit long, in contrast with the 8-bit output of the FSM, resulting in a low hardware overhead and low impact on the critical path. The main drawback of including a filter in the calibration loop is an increment in convergence time. Even using a first-order, the extra pole might lead to stability issues. For that reason, the magnitude of the feedback currents and the DAC's reference voltage have to be selected so that the open-loop gain satisfies gain and phase margin requirements. A large bias current and DAC's quantization step lead to a large overshoot and non-linear behaviour at the pre-amplifier. Furthermore, a low open-loop gain results in a slow convergence. In this case, the lower the pole frequency of the filter, the higher the low-frequency feedback gain and thus the possibility to minimize the residual offset. However, stability becomes critical as the filter approaches an integrator. The necessity of a DC balanced input can be explained using a linear model of the calibration circuit (Fig. 11). Gain blocks model the comparator and phase-detector. The output $D_{OUT}$ is: $$D_{OUT} = \frac{(V_{IN} + V_{OFF2} + V_{OS_{DAC}})K_{PRE} \cdot K_C (1 - z^{-1})}{K_{PD} \cdot K_{DAC} \cdot MV \cdot K_{PRE} \cdot K_C + 1 - z^{-1}} + \frac{V_{OFF1} \cdot K_C (1 - z^{-1}) + V_{OS_{PD}} \cdot MV \cdot K_{DAC} \cdot K_{PRE} \cdot K_C}{K_{PD} \cdot K_{DAC} \cdot MV \cdot K_C K_{PRE} + 1 - z^{-1}}$$ (2) where $K_C$ , $K_{PRE}$ , $K_{DAC}$ and $K_{PD}$ represents the gain of comparator, preamplifier, DAC, and phase detector, respectively. MV is the gain of the majority voting block [19], the accumulator is related to the FSM, $V_{OS_{DAC}}$ is the offset of DACs, and $V_{OS_{PD}}$ corresponds to an equivalent offset caused by mismatch between the UP and DW paths of the phase detector. Equation 2 shows a high-pass behavior because of a zero at z=1, which is a consequence of the accumulator in the feedback path. To have a zero at z=1 implies that the calibration loop will attenuate any DC component of $V_{in}$ , as well as signals $V_{OFF1}$ , $V_{OFF2}$ and $V_{OS_{DAC}}$ , once it is turned on and reaches a steady-state. In other words, the average value of the output tends to be zero. The only offset contribution that still affects the output is $V_{OS_{PD}}$ , which, however, is attenuated by $K_{PD}$ . When the calibration process finishes and the circuit changes to normal operation, the last value of the output of the FSM is stored, generating a constant signal $V_{CORR}$ that is continuously subtracted from the input. Therefore, if the DC component of $V_{IN}$ is zero while calibrating, it is possible to cancel the contribution of $V_{OFF1}$ and $V_{OFF2}$ during normal operation. If the input signal does not have an average null component while performing offset-correction, $\overline{V_{IN}}$ will influence the calculation of the compensation signal $V_{CORR}$ , i.e., the system process $\overline{V_{IN}}$ as another offset source. For instance, if the input data corresponds to a bitstream generated by a 15th order pseudo-random bit sequence (PRBS), the number of logic ones occupies a 49.9% of the total sequence length (32 kb). So, the DC component, and thus the residual offset, is 61 $\mu$ V. To have a DC balanced $V_{IN}$ signal during offset calibration has the same effect as connecting the input signal to a common-mode voltage, which is the main advantage of the proposed technique. Considering that the transmitter of many high-speed standards has a scrambler (whose primary function is to reorganize the transmitted data to avoid undesired sequences such as a large number of consecutive logic ones (or zeros)), there is no need to include additional hardware to randomize data and reduce its average level. A high-speed link also has to execute a training and calibration procedure before data transmission starts. In a training process, the transmitter and receiver communicate with each other mainly in order to tune equalization and clock-and-data recovery parameters. Therefore, a group of specific data sequences is produced at the transmitter to adjust DFE coefficients ( $h_{1,2...n}$ ) and CDR loop. Traditional training data sequences have a period composed of a logic one followed by Figure 12. Timing diagram of the proposed technique considering an uniform sequence (1010...) for offset reduction when link tunning. a zero, as Fig. 12 shows. This sequence has a zero DC value, so PORT is compatible with current training procedures without the need for additional hardware. Although offset-reduction is a process that has to be executed before equalization tuning (because equalization depends on the sampling precision of the input signal), the pattern shown in Fig. 12 does not require being equalized by the receiver because of its uniform transition from one to zero each clock period. ## 2.5. Experimental Results for the proposed PORT sub-circuit Fig. 13 shows the measured DACs signals while the calibration process is performed. PORT was implemented in a 130 nm CMOS technology, using 1.2 V as supply voltage. DACs reach the steady state after 400 ns indicating that PORT has finished. In this test, the calibration loop does not include the low-pass filter (MJV filter) which results in a higher ripple at the DACs outputs. Fig. 13 shows the results for the typical-case simulation and measurements for one sample. A difference of 35 mV between the two signals indicates the influence of mismatch on the circuit. PORT's average current consumption is 550 $\mu$ A including DACs. It Figure 13. Measured and simulated (TT) output voltage of calibration DACs during a calibration process. is important to highlight that the calibration time could be reduced implementing a faster DAC. A detailed validation of the proposed technique will be presented in the next chapter. # 2.6. Summary In this chapter, a low hardware-overhead calibration technique for dynamic voltage comparators has been proposed and verified experimentally. The proposed technique uses output-data phase as a variable to measure offset. Relevant characteristics of the proposed technique include the possibility of avoid to set input of each comparator to $V_{CM}$ while offset-calibration is performed. Furthermore, the proposed method tracks temperature and supply voltage variations influence over offset along data transfer. Finally, the calibration sub-circuit was fully synthesized, which allow to extend the technique to different fabrication process and applications. ## 3. On-chip eye diagram Measurement for Comparator Characterization #### 3.1. Introduction Maximum data rate that can be transmitted through a serial link is limited basically by intersymbol-interference (ISI). Crosstalk between channels and lanes (FEXT and NEXT losses), impedance mismatch, and dielectric and ohmic losses of channel are the main sources of ISI. While a differential implementation can reduce crosstalk, impedance mismatch can be minimized using digital trimming of both transmitter and receiver termination resistance. However, channel losses can only be compensated using equalization [20], [21]. A high-speed link has a series of continuous and discrete time equalizers at both the transmitter and receiver blocks, as Fig. 14 shows. The transmitter block has mainly a discrete feed-forward FIR filter that implements pre-emphasis or de-emphasis equalization. Pre/de-emphasis produces a pre-distorted signal whose high-frequency components are boosted aiming to counter the low-pass characteristic of channels. The receiver has typically a continuous-time linear equalizer (CTLE) and a decision feedback equalizer (DFE), with an additional feed-forward filter (FFE). A CTLE is characterized for its high-pass transfer function (with the same purpose as that of a TX filter), and can also attenuate low-frequency components to prevent over-equalization and saturation for short channels. A DFE is a non-linear mixed-signal equalizer whose main task is to mitigate post-cursor interference based on previous samples or symbols. A DFE is adequate for links with non-smooth or highly-dispersive channels. This type of equalizers can cancel ISI for Figure 14. Traditional high-speed serial link: use of equalizers at both TX and RX sides. 15 or more unit intervals (UI) after the main cursor —where an UI is the period of one bit—. However, pre-cursor ISI can be reduced only by previous linear equalization which aims to modify the phase response of the link. Additional RX FFEs can work together with TX equalizers with the purpose of emulating a minimum-phase system, thus reducing pre-cursor ISI [22]. Combining all the characteristics of TX and RX equalizers it is possible to emulate the inverse channel's transfer function to minimize ISI. With a proper equalizer tuning according to the present channel it is possible to achieve ISI cancellation. Considering that equalizers can be modeled by a continuous or discrete transfer function, a crucial feature of a serial link is the ability to adjust each filter coefficients based on the channel. Equalizer adaptation is performed by specific training algorithms considering continuous or discrete filter characteristic. For CTLEs, training is based on power-spectral density measurements at the circuit's output. For DFEs and FFEs, the least-mean-square (LMS) and minimum-mean-square-error (MMSE) algorithms are executed respectively. Once equalizer training is executed and data transmission begins, it is crucial to monitor continuously the link efficiency and its ability to adapt to different channel and operation conditions. The bit error rate (BER) is a common parameter for link characterization, which measures the number of wrong bits received, given a fixed transmitted bit-stream. Depending of the application the link will be used in, it must comply with an specific value of BER. For instance, USB 3.1 operating at 10 Gbps and PCle4 at 16 Gbps demand one wrong bit (1 error) for each 1000 Gb transmitted (BER = $10^{-12}$ ). BER can be calculated using specific equipment such as BER analyzers, which include a series of data sources and checkers. A pseudo-random-binary-sequence (PRBS) generator is the main type of data stream because it can emulate a pseudo-random bitstream given a particular seed, that can be modeled deterministically [23]. Since a PRBS is a deterministic source, it is possible to implement a circuit that checks if the recovery data is correct or not, which is called a PRBS-checker. The main issue with BER characterization is the fact that it is not possible to measure the number of wrong bits received when the link is transmitting real data. Information is considered as a completely random signal, so it is not possible to check if received bits are correct or not. Although an interface might have error correction algorithms, such as cyclic redundancy checker (CRC), error detection is done after a complete information package is received. As a consequence, BER is an specification that can be measured at link setup and before transmission of random information. Given the restriction with BER calculation, eye diagrams become the main performance metric for high-speed interfaces during operation [24]. An eye diagram is a plot composed by the superposition of several transmitted or recovered bits taken during one UI. Traditional methods to get an eye diagram involve to oversample input signal with an enough resolution so that it is possible to capture signal transitions from 0 to 1 and vice versa. Moreover, the oversampling ratio should be greater than 10X so that timing can be sensed correctly. An eye diagram allows to calculate parameters such as jitter tolerance and voltage sensitivity, which are crucial for determining maximum transfer speed given a channel. One of the major drawbacks of measuring an eye diagram is the load imposed by the equipment connected to the circuit. High performance probes can add up to 100 fF, that can degrade link performance dramatically, and especially when transfer speed is close to technology node limit. Performing an external measurement involves to add buffers for driving pad, wire-bonding and external capacitance, thus increasing area and power consumption. Moreover, external load can affect rinsing and falling times of measured waveform, thus introducing asymmetry on the eye diagram. Furthermore, it is not always possible to have a high-speed oscilloscope when considering debugging of commercial applications. This chapter addresses the design and implementation of a on-chip non-invasive method and system to measure an eye diagram for high-speed applications. Measurement is completely digital and does not require external probes on the signal path. The method was validated experimentally using a 130 nm high-speed analog front end. An eye diagram will be used to measure slicer offset and sensitivity, so that experimental results are focused on showing the effectiveness of PORT [25]. ## 3.2. Measurement Strategy An eye diagram can be calculated by means of the implemented system on silicon described in block diagram of Fig. 15. The eye diagram measurements are performed on-chip without needing external probes. A pseudo-random bitstream Figure 15. System implemented for offset-reduction technique validation. Figure 16. Conceptual diagram of voltage shifting to build the eye diagram at the pre-amplifier output. is sent through an emulated low-pass channel for recovering using the slicer. Testing system is composed by a programmable pseudo-random bit sequence (PRBS), a digitally-programmable low-pass filter, a digitally controlled phase-mixer, a strong-arm comparator with a current-controlled pre-amplifier, and an SPI interface. The PORT's core is on the feedback path of the comparator and, given its fully-digital implementation, it is possible to control its performance —and the operation of the others blocks—by the SPI. The procedure of constructing an eye diagram can be explained as follows. The output signal of the pre-amplifier in Fig. 15 ( $v_{PRE}$ ), which is represented by the circuit of Fig. 16, is: $$v_{PRE} = v_{AMP} \pm V_{DC}$$ with $v_{AMP} = A_V \times v_{CH}$ (3) where $A_V$ is the pre-amplifier gain, $v_{CH}$ is the output of the low-pass filter, and $V_{DC}$ is a DC unbalance provoked by the difference between the two bias currents controlled by $DAC_3$ and $DAC_4$ (magnitude and sign). The larger the difference between $I_{B1}$ and $I_{B2}$ , the larger the unbalance at the pre-amp output. The inherent offset of low-pass filter and pre-amplifier also affect both $v_{CH}$ and $V_{DC}$ respectively. A DC unbalance at the pre-amplifier's output produces a vertical shift in $V_{AMP}$ . A voltage comparator, whose decision threshold is ideally zero, samples the pre-amplifier output. However, by using only one comparator, which performs as a 1-bit ADC, it is not possible to measure a full eye diagram aperture. There are three different options to sample pre-amplifier's output with enough detail to measure eye diagram aperture: increasing the number of voltage comparators with different thresholds [26,27], varying the threshold of only one comparator, or shifting pre-amp output through $V_{DC}$ . The first alternative implies a large power consumption because of the increased number of comparators performing as an ADC. This alternative is also not compatible with traditional DFE topologies. The second option is disadvantageous for high-speed operation because of the increased load at comparator input to produce a variable threshold. The third alternative implies that the shift caused by $V_{DC}$ displaces the upper and lower limit of $v_{AMP}$ ( $v_{eye1}$ and $v_{eye2}$ ) up to the comparator's threshold, as Fig. 16 shows. If $|V_{DC}|$ is greater than $v_{eye1}$ , so that $v_{eye1} - V_{DC} < 0$ , the comparator's output is always 0 (frame A of Fig. 16); otherwise, when $v_{eye1} - V_{DC} > 0$ , $D_{OUT}$ varies between 1 and 0 as a function of input data (frames B, C and D). The point Figure 17. Conceptual diagram of phase and voltage shifting to build the eye diagram at the pre-amplifier's output. where $v_{eye1}$ is equal to the vertical displacement $V_{DC}$ ( $v_{eye1}-V_{DC}=0$ ) sets the upper aperture of the eye diagram. Taking into account that $V_{DC}$ can be set digitally using $DAC_{3,4}$ , it is possible to have a digital representation of $v_{eye1}$ . Following the same procedure, $v_{eye2}$ can be measured by varying $V_{DC}$ so that the lower aperture of eye diagram reaches the comparator threshold, i. e. $v_{eye2}+V_{DC}=0$ (frame E). To find out whether $V_{DC}$ is lower or greater than the upper and lower apertures of $v_{eye1}$ and $v_{eye2}$ it is necessary to capture $D_{OUT}$ and measure its mean value. When vertical displacement $V_{DC}$ adjusts $V_{amp}$ so that comparator's threshold is lower than $v_{eye1}$ and greater than $v_{eye2}$ , output data $D_{OUT}$ coincides with a recovered version of input data $D_{in}$ . The data source is a PRBS generator whose mean value is 0 after all the sequence, i.e., produces the same number of symbols for ones and zeros. Hence, if $D_{OUT}=D_{IN}$ the recovered data has the same statistical properties regarding input stream, and thus $\mu_{D_{OUT}}$ is equal to 0 too: $$\mu_{D_{OUT}} = \begin{cases} 0 & v_{eye1} \le V_{DC} \le v_{eye2} \\ -1 & V_{DC} < v_{eye2} \\ 1 & V_{DC} > v_{eye1} \end{cases}$$ (4) If the mean value of $D_{OUT}$ is equal to 0, the comparator can recover data and $V_{DC}$ is bounded within an open region of the eye diagram. However, if the mean value is greater or lower than 0, $V_{DC}$ corresponds to a closed region of the eye diagram. When a DC level is added to the pre-amplifier's output, $D_{OUT}$ is slanted to 1 (and thus $\mu=1$ ) if $V_{DC}$ is larger than the maximum value of $V_{eye1}$ and vice versa. Average value is calculated from a collection of 50k samples of $D_{OUT}$ for each step of $DAC_{3,4}$ digital words, which is adequate for a $PRBS_7$ and $PRBS_{15}$ sources (128b and 32kb length). The horizontal aperture of the eye diagram at the pre-amplifier's output can be measured by performing the previous procedure given a phase difference between PRBS and comparator input clocks. Using a phase mixer it is possible to calculate vertical amplitude at different sampling instants, as Fig. 17 shows. A phase mixer can be configured digitally for four-quadrant operation, which adds the characteristic of shifting the comparator's clock along an entire unit interval. As a result, by combining vertical and horizontal displacement through $DAC_{3,4}$ and phase-mixer respectively, it is possible to measure the eye diagram at the pre-amplifier's output without physical access. Figure 18. Implemented sampling circuitry. Figure 19. Implemented programmable Gm-C filter for channel emulation. ## 3.3. Circuit Implementation The implementation of each building block of the scheme at Fig. 5 and Fig. 15 is based on classical structures as follows: - Dynamic voltage comparator: This circuit is implemented using a strongarm topology [28]. The pre-amplifier is based on a degenerated commonsource circuit with active load (Fig. 18). Two current mirrors form its bias current for calibration and other two for twisting and eye diagram construction. - PRBS: It is implemented using a shift-register counter with programmable Figure 20. Nauta-OTA for Gm-stages implementation. Figure 21. Implemented Phase-Interpolator. word length, producing pseudo-random sequences based on 7th, 15th, 21th, and 31st order polynomials. - Low-pass filter: The filter is emulating a channel, and corresponds to a Gm-C topology. The gain and bandwidth can be controlled by varying the number of input transconductors and the total capacitance of each node, respectively (Fig. 19). Each transconductance stage was implemented using Nauta amplifiers, as in Fig. 20, due to their high bandwidth and rapid prototyping by using digital standard-cells [29]. - Phase mixer: It corresponds to the well known analog phase interpolator that uses in-phase and quadrature input clock signals provided by an ex- Figure 22. Testing board setup. Figure 23. Micro-photography of the proposed offset-correction technique ternal source (Fig. 21) to produce 32 different output phases (from $0^{\circ}$ to $360^{\circ}$ ) [30]. • DAC: We selected a classical R-2R 8-bit DAC to simplify the design [7]. Considering that the phase detector is connected directly to the output of the comparator, the additional load imposed by flip-flops (Fig. 6) could be critical in high-speed applications. A typical DFE structure uses a comparator to resolve 1-tap and 2-tap within 1 UI. Thus additional loading could degrade timing performance. However, a traditional 18T-flop only adds a load of about a 1X fanout of four (FO4) inverter. Figure 24. Eye diagram at the pre-amplifier's output using the method described in section 3.2. The average value of each section was calculated using 50k samples of $D_{OUT}$ . ## 3.4. Experimental Results Experimental validation of PORT was achieved using the setup of Fig. 22. The testing board contains the fabricated circuit employing the chip-on-board technique. An FPGA was used to set up configuration registers and to extract data by communicating with the on-chip SPI interface. Fig. 23 presents the micro-photography of the implemented system, which was taped out in a CMOS 130 nm standard technology with a 1.2 V supply voltage to prove the concept. The dimensions of the calibration circuit are $134 \, \mu \text{m} \times 35 \, \mu \text{m}$ . Both the phase detector and FSM are fully synthesizable, allowing migration between different technology nodes. Moreover, the FSM occupies $55 \, \mu \text{m} \times 35 \, \mu \text{m}$ , and includes features such as variable output resolution for coarse and fine calibration and variation of feedback gain and convergence time, and sign controlling for negative feedback testing. As a first test, the filter is configured to provide a 26dB attenuation at 800 Mbps, and the PRBS length is 15. As a consequence, filter output is 60mV since the output signal of PRBS has an amplitude of 1.2 V (supply voltage). Although such data rate is lower than state-of-art serial links, filter configuration emulates the same attenuation that a common 3 meter cable for USB3.1@7 Gpbs has [31]. The main purpose of this prototype is to serve as a proof-of-concept for PORT. Figure 24 shows an eye diagram at the pre-amplifier's output using the methodology described in section 3.2. Data rate is 800Mbps (generated from a PRBS<sub>15</sub> source) and the filter is configured to have an attenuation of 26 dB. The yellow area corresponds to an open-region of the eye diagram because the average value of $D_{OUT}$ is 0.5, thus implying that $V_{DC}$ is bounded between $-v_{eye2}$ and $v_{eye1}$ (Fig. 16). The blue region refers to the closed region because the mean of $D_{OUT}$ is different from 0.5. The vertical amplitude of Fig. 24 is quantified based on the calculation of DC unbalance at the pre-amplifier's output and considering bias current of each transistor: $$V_{DC} = \frac{1}{2} K_N \frac{W}{L} \left( V_{OV3}^2 - V_{OV4}^2 \right) R_D + V_{OFF1} + V_{OFF2}$$ $$= \frac{1}{2} K_N \frac{W}{L} \left[ \frac{4V_{REF}}{2^N} (2^{N_3} - 2^{N_4}) V_{OV3,4} \right] R_D + V_{OFF1} + V_{OFF2}$$ (5) where $V_{OFF1}$ is the offset of the comparator and $V_{OFF2}$ the offset of the preamplifier. Furthermore, $N_3$ and $N_4$ are the digital words that control bias of twisting transistor, and $V_{REF}$ is the reference voltage for both DACs. $N_3$ and $N_4$ are set to produce $V_{REF}/2$ at DAC's output, and are varied differentially: first, $N_3$ increases while $N_4$ decreases for finding $v_{eye2}$ ; then $N_3$ decreases while $N_4$ increases for measuring $v_{eye1}$ . As a result, maximum and minimum value of yellow region is 65mV and 55mV respectively, implying an inherent offset of 5mV. Figure 25a,b show the eye diagram at the input of the slicer without applying PORT. The input data also corresponds to a PRBS<sub>15</sub> source (32kb). This test Figure 25. Measured on-chip eye diagram at the pre-amplifier's output using the method of section 3.2, and before offset calibration. The filter attenuation is 26 dB and induced offset is: a)+44 mV, b) -56 mV. Figure 26. Measured on-chip eye diagram after calibration for filter attenuation of 26 dB. also considers 50 mV for both positive and negative offset. Offset measurement was done based on the difference between the maximum and minimum values of each diagram. Measurement also includes contributions of the pre-amplifier, the comparator and the twister's DACs. The vertical amplitude of both eye diagrams is 113 mV for a filter attenuation of 26 dB, while the time window is 1.25 ns — indicating a data-rate of 800 Mb/s—. This diagram was constructed with a 5-bit time (phase difference) resolution —which is related to the resolution of the phase mixer—, and 8-bit for amplitude shifts —the DACs resolution—. These values impose a step of 78 ps and 4.7 mV for X-axis and Y-axis, respectively. Figure 27. Measured on-chip eye diagram before calibration with 23 dB of attenuation and offset of a)+37 mV, b) -37 mV, c) Measurement after calibration. Figure 26 shows the eye diagram after applying PORT. A majority-voting-based digital low-pass filter was included aiming to minimize the residual offset. The MJV block was implemented in software using data extracted through an FPGA and applied via the SPI interface. The diagram is now centered around 0 V, showing the effectiveness of the proposed technique. The residual offset is 6 mV, which is caused mainly by the DAC's resolution and corresponds to the minimum value that can be sensed by equation 5. Fig. 27a,b presents another two eye diagrams with an offset of 30 mV and for a filter attenuation of 23 dB; and Fig. 27c shows its corrected version indicating also a successful offset-correction. The use of a large resolution for the DAC results in a smaller residual offset. For instance, if the DAC's resolution is increased by 3-bit, the residual offset is scaled by a factor of 8 (750 $\mu$ V). The main issue of modifying the DACs is the necessity of a high-resolution converter (greater than 12 bits) with a highly linear behavior (low DNL and INL) to achieve an offset lower than 1 mV. Any DAC non-ideality will affect the effectiveness of the offset correction. Another experiment was implemented in order to find the maximum offset that the circuit can compensate. It is analogue to displacing the eye diagram vertically and find out the maximum displacement that the calibration circuit can reduce. First, a larger unbalance compared to the one measured in Fig 25 and 27 is induced by the twister, and then the calibration circuit is turned on; next, the unbalance is increased even more and the calibration is performed again. Using the twister's DACs was possible to generate an offset of 245mV, resulting again in a successful calibration. Residual offset is also 6 mV, caused by the minimum DAC's quantization step. Finally, table 1 presents a comparison of other works reported in the literature about offset correction in dynamic comparators, with the proposed circuit. The table includes both offset calibration techniques for high-speed links and data converters. It is important to highlight that PORT is an on-fly technique that does not imply breaking the communication link. Other techniques, although they are implemented within the DFE, they are not on-fly calibrations, needing to stop the transmission chain completely. Comparison with comparators for other applications is also included and might not be a fair analogy. ## 3.5. Summary In this chapter, a method for eye diagram measurement applied to high-speed serial links was addressed. Eye diagram is calculated by processing comparator's output data, which is affected by voltage twisting and phase shifting. Voltage twisting allows to calculate vertical amplitude, while phase-shifting permits to compute | Ref | Tech. | $V_{DD}$ | Freq | Power | DFE<br>Compatible | On-Fly | |------|--------|----------|---------|----------------|-------------------|--------| | This | 130 nm | 1.2 V | 0.8Gb/s | 550 μ <b>A</b> | Yes | Yes | | work | | | | <b>/</b> | | | | [12] | 90 nm | 1.2 V | 1.5 GHz | 91.5 μW** | No | No | | [16] | 45 nm | _ | 16 Gb/s | 385 mW* | Yes | No | | [32] | 130 nm | 1.7 V | 8 Gb/s | 280 mW* | Yes | No | | [33] | 28 nm | 1 V | 10 Gb/s | 4.1 mW* | Yes | No | | [34] | 65 nm | 1 V | 160 MHz | 23 mW*** | No | No | <sup>\*</sup>Power of the whole link. Table 1. Comparison of the proposed technique with others works horizontal aperture. Eye diagram calculation is carried out digitally and without the inclusion of external probes, so the technique can be extended to high performance nodes. The procedure was used to validate the offset reduction technique presented in chapter 2. <sup>\*\*</sup>Does not include calibration circuit power. \*\*\*Power of the whole converter. # 4. A 12b 10MHz Capacitive Digital to Analog Converter This chapter presents the design of a 12-bit re-configurable capacitive digital-to-analog converter, operating at 10MHz. The circuits includes calibration of DNL to adjust linearity. Moreover, some aspects about layout design are discussed looking for minimize parasitic components. The DAC was included in the E31 Coreplex RISC-V platform, which allows to validate the performance as a microcontroller's peripheral. #### 4.1. Introduction Digital to analog conversion is a key aspect in System-on-Chip design, either for performing as an stand-alone device for analog signal generation, or for internal trimming of analog circuits. Although resolution is the main specification of a DAC, linearity, area and power consumption have an strong impact on the performance, restricting the application the DAC can be used in. A digital-to-analog conversion can be carried out using basically resistive, capacitive, or current controlled circuits. The main drawback of resistive and current-steering DACs is the static power consumption, while the capacitive version drains only dynamic current. To increase resistor size is an alternative for reducing static power, at a cost of an increment of area and noise level. If considering current-steering DACs, decreasing total current is related with an increment of noise and offset, and a decrement of speed. A capacitive DAC combines dynamic power consumption with an inverse relation between speed and area, thus making it suitable for low-power and low-cost applications. However, the smaller the unity capacitance, the higher the sensitivity to parasitic components and mismatch. Calibration is a critical aspect in DAC design, because specifications such as differential and integral non-ideality (DNL and INL), offset and gain are affected by process, voltage and temperature (PVT) variations. Parasitic capacitance — due to metal interconnections and coupling with substrate— degrades charge distribution in capacitive DACs, thus affecting DNL and INL. Moreover, parasitic components add delay in resistive and current-steering converters, thereby reducing bandwidth and sampling frequency. Furthermore, mismatch affects offset, DNL, and monotonicity, thus impacting SNR and total distortion. Therefore, it is crucial to include an additional degree of freedom in DAC design that allows the adjustment of the performance during operation and considering PVT variations. This chapter presents the design of a 12-bit, 10MHz, differential capacitive digital-to-analog converter, focusing on design considerations to mitigate the impact of PVT variations on the performance. DNL and offset calibration are included by using a voltage comparator with its own offset trimming. The circuit is used as an analog output signal generator, thus output buffers are included as well as a buffer for the reference voltage. The DAC has a dedicated synthesizable digital circuit used to control the capacitive array and calibration routines. The inclusion of a dedicated digital interface allows to include features such as multiresolution (from 6 to 12 bits), power-down mode and single-ended operation. Moreover, the converter was integrated within the E31 RISC-V Coreplex platform, giving the opportunity to implement a complete programmable SoC. ## 4.2. Capacitive DAC topology Figure 28 shows the capacitive array used for implementing the DAC's core, and two arrays are used to produce a differential conversion. The circuit is based on the split-capacitance topology, which allows to reduce the total capacitance Figure 28. Capacitive array used to implement D/A conversion. and hence parasitic components. A traditional 12-bit differential capacitive DAC requires $2 \times 2^{12}$ unity capacitors to perform a D/A conversion. By using a split-capacitor topology, the area budget is reduced down to $2 \times 2^6$ capacitors. Moreover, if the DAC performs a pseudo-differential operation, the total area is 50% less at a cost of having a variable common-mode voltage. The output voltage of one array is calculated based on the charge re-distribution principle [35]: $$V_{OUT1} = \sum_{i=7}^{11} \frac{X_i \times 2^{i-6}}{32} + \frac{1}{32} \left( \sum_{i=3}^{6} \frac{X_i \times 2^i}{32} + \frac{X_2 + X_1}{64} \right)$$ (6) where $X_i$ corresponds to the $i_{th}$ bit of the input word of the first array, that is set by connecting capacitor's bottom plate to the reference voltage or ground. The first and second summing terms of equation 6 refer to the capacitors of the MSB and LSB bank respectively. Although the DAC resolution is 12-bit, equation 6 includes only the first eleven bits of the input word. The least significant bit is calculated by subtracting the contribution of $Y_1$ (LSB of the second array), so that the differential output is: $$V_{OUT} = V_{OUT1} - V_{OUT2}$$ $$= \sum_{i=7}^{11} \frac{X_i \times 2^{i-6}}{32} + \frac{1}{32} \left( \sum_{i=3}^{6} \frac{X_i \times 2^i}{32} + \frac{X_2 + X_1 - Y_1}{64} \right)$$ (7) A 12-bit DAC can perform 4096 D/A conversions with a LSB voltage of $V_{REF}/2^{12}-$ 1. However, equation 7 achieves only 2048 steps, so that to complete the full-scale range requires a pseudo-differential operation. The main drawback is the fact that the output common-mode voltage varies between $V_{REF}$ and $V_{REF}/2$ , challenging output buffer design. Moreover, Fig. 28 shows that the value of the last two capacitors of LSB bank is a half of the unity capacitance. Considering mismatch and PVT constraints, two capacitors in series connection are used to emulate C/2. Finally, each array has two additional switches, $SW_1$ and $SW_2$ , for pre-charging operations. Each time the circuit turns on it is necessary to drain all the electric charge stored in the top plate for preventing hysteresis and memory effects. Both switches connect each capacitor's top plate to ground, ensuring zero charge stored. Figure 29. Inclusion of parasitic components and calibration capacitors in DAC core. # 4.3. Calibration and Trimming Each array of Fig. 28 has 63 unity capacitors, implying a large parasitic component due to interconnection metal layers. Fig. 29 shows the same capacitive array used as the DAC's core, but including two capacitors $C_{P1}$ and $C_{P2}$ , which represent parasitic components. These additional capacitors impact the charge re-distribution process, so that equation 7 is modified as: $$V_{OUT} = V_{OUT1} - V_{OUT2}$$ $$= \sum_{i=7}^{11} \frac{X_i \times 2^{i-6}}{32 + C_{P1}} + \frac{1}{32 + C_{P1}} \left( \sum_{i=3}^{6} \frac{X_i \times 2^i}{32 + C_{P2}} + \frac{X_2 + X_1 - Y_1}{2(32 + C_{P2})} \right)$$ (8) The main impact of $C_{P1,2}$ on the output voltage is the reduction in the gain of the LSB bank. The charge re-distributed by the LSB bank is smaller than the charge stored in the smaller capacitance of the MSB side, because $C_{P2}$ is always connected to ground. Therefore, the quantization step is not uniform when input data changes in the $7^{th}$ bit, affecting DNL mainly. A group of five different capacitors is added in parallel to the LSB bank of each array for calibration purposes, as Fig. 29 shows [36]. Each capacitor can be connected to ground or disconnected from the array. Capacitors smaller than the unity cell are emulated by a series connection of two or four devices respectively. The purpose of calibration is to match the re-distributed charge (and thus the produced voltage) by all the capacitors of the LSB bank with the contribution of the smaller capacitance of the MSB group. As a result, the quantization step is now uniform because the contribution of both banks are the same. Taking into account that parasitic components attenuate the voltage produced by the LSB bank compared with the MSB group, it is necessary to increase the gain of the capacitive voltage divider, so that calibration capacitors can equalize contribution of both sides. The array of Fig. 29 shows a split-capacitance $C_s$ of twice the unity cell, which doubles the voltage from LSB bank. Therefore, the new output voltage is: $$V_{OUT} = V_{OUT1} - V_{OUT2}$$ $$= \sum_{i=7}^{11} \frac{2^{i-6}}{33 + C_{P1}} + \frac{2}{33 + C_{P1}} \left( \sum_{i=3}^{6} \frac{X_i \times 2^i}{33 + C_{P2} + C_{cal}} + \left(\frac{1}{2}\right) \frac{X_2 + X_1 - Y_1}{33 + C_{cal} + C_{P2}} \right)$$ (9) where $C_{cal}$ is the sum of the calibration capacitors connected to the LSB bank. The purpose of such calibration is to match the contribution of the first term of the first summation and the second summation of equation 9, so: Figure 30. Implemented DAC including calibration circuits and buffers. $$\frac{2}{33 + C_{P1}} = \frac{2}{33 + C_{P1}} \left( \sum_{i=3}^{6} \frac{2^{i}}{33 + C_{P2} + C_{cal}} + \frac{1/2}{33 + C_{cal} + C_{P2}} \right)$$ $$\implies 1 = \sum_{i=3}^{6} \frac{2^{i}}{33 + C_{P2} + C_{cal}} + \frac{1/2}{33 + C_{cal} + C_{P2}}$$ (10) The adjustment of $C_{cal}$ to satisfy equation 10 includes the use of a voltage comparator connected at the output of both capacitive array, as Fig. 30 presents. The calibration process begins by setting inputs of $CDAC_1$ and $CDAC_2$ to $000001111111_2$ and $00001000000_2$ respectively, and disconnecting all the calibration capacitors from the LSB bank. Because of the split-capacitor $C_s$ is twice the unity cell, the gain of LSB bank is doubled too. Hence, output of $CDAC_2$ is larger than $CDAC_1$ , and the comparator's output is zero. Then, $C_{cal}$ increases by C/4 and a new comparison is done. Each time that a $C_{cal}$ raises it is necessary to turn on pre-charge switches to erase previous charge. The calibration finishes when comparator goes to high as a consequence of an increment of $C_{cal}$ , implying that the LSB Figure 31. Effect of $C_{cal}$ trimming on correction of quantization step. bank has the same weight than the smaller capacitor of MSB. Fig. 31 shows the effect of calibration on the DAC's output voltage. When $C_{cal}$ is equal to zero, the difference between $V_{o1}$ and $V_{o2}$ levels —which correspond to the output voltage of two consecutive input digital codes— is greater than the quantization step, thus DNL is larger than 1 LSB and the DAC is not monotonic. As $C_{cal}$ raises, $V_{o1} - V_{o2}$ decreases down to the point that comparator output is one, indicating that DNL was reduced. **4.3.1. Layout** Minimization of parasitic capacitance is critical to improve DAC accuracy and linearity. Any unbalance of parasitic between both arrays also introduces offset. Hence, placement and routing of each capacitor have to be optimized aiming to minimize metal trace length and increase proximity. Although calibration can reduce the impact of additional load in the LSB bank, there will be Figure 32. Traditional placement of CDACs: a) common-centroid layout of a 80-bit binary-weighted DAC, b) layout of circuit from Fig. 29 always a penalty related to an increment in area and circuit complexity. Mismatch and process variations are other aspects that have to be considered when defining placement and routing of capacitors. Common-centroid is a well-known methodology that places the smallest capacitance at the center of the array, surrounded by groups of larger capacitors. Furthermore, devices are interdigitated with the purpose of reducing planarization effects and temperature gradients, as Fig. 32 shows [37–39]. Fig. 32a shows the typical placement of a binary-weighted capacitive DAC, showing a symmetry regarding the center of the layout. Larger capacitors surround smaller ones with a interdigitation pattern too. Dummy devices fill the spaces between C7 and C6 and forms a shield for the whole array. From this pattern we can note the creation of large parasitic capaci- | D | D | D | D | D | D | D | D | D | |---|------|------|------|-----|-------|-------|-------|-------| | D | CC5A | CC6A | CC8A | D | CC9A | CC9A | CC10A | CC10A | | D | CC2A | CC4A | CC8A | D | CC9A | CC9A | CC10A | CC10A | | D | CC1A | CC3A | CC7A | D | CC10A | CC10A | CC10A | CC10A | | D | C16C | C16C | C16C | D | C16A | C16A | C16A | C16A | | D | C16C | C16C | C16C | D | C16A | C16A | C16A | C16A | | D | C16C | C16C | C16C | D | C16A | C16A | C16A | C16A | | D | C16C | C16C | C16C | D | C16A | C16A | C8A | C8A | | D | C16C | C16C | C16C | D | C16A | C16A | C8A | C8A | | D | C16C | C8C | C8C | D | C4A | C4A | C8A | C8A | | D | C4C | C8C | C8C | D | C4A | C4A | C8A | C8A | | D | C4C | C8C | C8C | D | CS4A | CS3A | C2A | C2A | | D | C4C | C8C | C8C | D | CS2A | CS1A | CX4A | CX2A | | D | C4C | C2C | C2C | C1C | D | CS0A | CX3A | CX1A | | D | D | D | D | D | D | D | D | D | Figure 33. Placement of capacitors of implemented DAC (singled-ended array). tance between C7 and C4-C5-C6. Figure 32b presents the capacitors placement of the array of Fig. 29, and considering also common-centroid guidelines. The placement includes both positive and negative arrays, and calibration devices. The design of Fig. 32b also shows the creation of parasitic capacitance between LSB and MSB banks, specially between the bottom plate of **C-D** capacitors (MSB bank) with the top layer of **A-B** devices. Charge re-distributed by a parasitic capacitor from LSB to MSB bank that does not pass through the bridge capacitor $C_s$ , can increase (or decrease) output voltage in more than one quantization step, thus degrading linearity. For instance, in Fig. 34, $C_{c1}$ is a coupling capacitor between the bottom plate of $C_{LSB}$ and the output node. Each time that the switch $SW_{LSB}$ changes from ground to $V_{REF}$ , charge re-distributed by $C_{LSB}$ modifies the output voltage due to the capacitive divider formed by $C_s$ . However, charge induced by $C_{c1}$ affects the output node without scaling by LSB bank gain, which is 1/32 as equation 7 shows. As a result, parasitic capacitance $C_{c1}$ of C/32 changes output voltage by 1LSB, thus increasing DNL. If the unity capacitance is 50 fF, a parasitic element of 1.5 fF ( $\sim$ C/32) is enough to degrade linearity. A DAC implemented with a split-capacitor topology is not a good candidate thus for the use common-centroid layout and interdigitation, specially if D/A con- Figure 34. Coupling capacitance in CDACs. Figure 35. Microphotography of the designed DAC. version if pseudo-differential. Figure 33 shows the capacitor's placement of the implemented DAC (singled ended), where the smallest capacitor is at the bottom of the array, surrounded by larger and dummy devices. Calibration capacitors are placed at the top of array because of their lower sensitivity to process variations. Minimization of parasitic capacitance includes insulation of LSB from MSB bank, and routing using top metal layers avoiding parallel strips. Moreover, shielding of both arrays (left and right) is crucial to achieve a high CMRR. Figure 36. DAC's differential output voltage. # 4.4. Experimental Results The DAC was fabricated in a TSMC 180nm CMOS digital process, and occupies an area of $500 \times 550 \, \mu \mathrm{m}$ including output and reference buffers, as Fig 35 shows. Supply voltage is 1.8 V for calibration logic and capacitive array, while output buffers have a power supply of 3.3 V. Buffers where designed to drive an output capacitance of 16 pF at 10 MHz, which corresponds to the refreshing rate of capacitive arrays. DNL was measured by generating a 12-bit upward digital sequence that produces changes at output voltage with an increment of one quantization step. Figure 36 shows the differential analog output voltage and Fig. 37 shows the measured DNL after calibration. Maximum DNL is 0.4 LSB, which ensures a monotonic converter. Fig. 38 shows DAC's analog output given a sinusoidal input digital sequence. Sampling frequency is 10 MHz, and each period was constructed using 128 samples, being the output frequency is 78.1 KHz. Figure 39 presents the spectrum of Fig. 38. The spurious-free dynamic range (SFDR) is a critical performance metric that can be evaluated by measuring the difference between the amplitude of the Figure 37. DAC's differential output voltage. Figure 38. DAC's differential output voltage: $F_s$ =10 MHz, $F_{signal}$ =78 KHz. Figure 39. Frequency spectrum of signal of Fig. 38. | Specification | Value | | |------------------------------------------|-------------------------------------------------------|--| | Technology | 180 nm GP | | | Supply Voltage | Analog & Dig. : 1.8 V<br>Buffers = 3.3 V | | | Resolution | 12-bit | | | DNL | 0.4 | | | SFDR | 40.1 dBc | | | Analog Current | Cap. Core: $1.5 \mu A$<br>Buffers: $2272 \mu A \star$ | | | Digital Current | 45 $\mu$ A | | | Area | 0.275 mm <sup>2</sup> | | | *Including output and reference buffers. | | | Table 2. Performance summary of the designed DAC. fundamental tone and the harmonic with maximum power. Accordingly to Fig. 39, the second harmonic is the largest spurious with a power of -40.1dB, resulting in a SFRD of -40.1 dBc. Finally, table 2 summaries the performance of the designed DAC. 4.4.1. Debugging DAC performance The designed DAC was used as a peripheral for the Tucan microcontroller [40]. This microcontroller was designed by the group OnChip at UIS in collaboration with SiFive Inc. from USA. Tucan is composed by a RISC-V 32-bit digital core, a TileLink bus, and some peripherals such as SPI, I<sup>2</sup>C and UART modules, a TRNG, eleven GPIO, PWM generation, and the DAC. The main advantage of testing the DAC using a complete microcontroller is the possibility to set each control or calibration signal using standard C code, which is executed directly by the core. Specific routines that cover signal generation and calibration can be scheduled taking into account, for instance, core frequency and activity, and current consumption. All the input data used for DNL and offset measurement was stored in Tucan RAM and was transmitted to the input register by means of the bus. Each time that the DAC's input code changes, the core has to write the new word stored in an specific RAM address into its own registers. Then, data has to pass through the bus and the register mapper before finally reach the DAC. Each step in data transmission demands some clock cycles, so that the refreshing frequency of the DAC input register is lower than the clock rate. As a result, to obtain 10 MHz of effective sampling frequency demands that the core executes a program dedicated for DAC testing at a clock rate of 40 MHz, and without the inclusion of other tasks inside the loop that refresh input registers. The evaluation of conditions to change resolution, waveform type, trimming and other functions, demands additional instructions —and hence clock cycles— inside the main loop, reducing effective sampling frequency. The DAC was intended to use as a signal generator being able to generate square, sinusoidal and triangle waveforms, with variable frequency and amplitude. All the instructions needed to adjust waveform parameters restrict refresh rate up to 1 MHz (10 times lower than maximum DAC frequency), because the maximum core frequency is 80MHz. An alternative to increase effective sampling frequency is to include at hard-ware level a first-input-first-output (FIFO) stack into the DAC control logic. Data can be pre-charged in that stack before enable the DAC. Then, the control logic can copy each word to capacitive array without the intervention of the core. ## 4.5. Summary This chapter presented the design and implementation of a capacitive 12-bit digital-to-analog converter performing as a peripheral for the Tucan microcontroller. The DAC is based on a pseudo-differential split-capacitor topology, and includes a DNL calibration to improve linearity. Layout of capacitive array was designed considering generation of parasitic coupling capacitances between LSB and MSB bank, thus improving tolerance to mismatch and process variations. Because of dynamic operation, power consumption is concentrated in output buffers, allowing to use the DAC for internal trimming tasks. # 5. Improving LDO Stability by Exploiting the Equivalent Series Resistor of Compensation Capacitor This chapter explores the impact of the equivalent series resistor (ESR) on the stability of linear regulators. While traditional compensation schemes seek to mitigate the effect of the ESR on regulator performance, the advantages of using ESR as a lag-lead compensator to improve phase and gain margin are presented in below, notwithstanding the possibility of using low ESR values. Besides, an adaptive biasing strategy is also presented, that: 1) reduces the variation of non-dominant poles; 2) improves efficiency for low load currents [41]. #### 5.1. Introduction Stability is a critical constraint in low-dropout regulators (LDO) design since their open-loop parameters exhibit a strong dependence on the load current. The power transistor is designed to operate in strong-inversion to achieve a large transconductance when delivering maximum current, as in Fig. 40. In contrast, for low loading, the power transistor might operate in sub-threshold. A change from strong to weak inversion produces a variation of more than 200% in the transistor's intrinsic gain, specifically in its output resistance, thereby affecting its bandwidth as well. There are two scenarios for addressing LDO frequency compensation: internal and external. Internal compensation is preferred for low current applications, while external compensation is used for medium to high currents considering the additional degree of freedom that an external capacitor imposes. Furthermore, an external method is preferred when the LDO is driving an off-chip load. Ex- ternal compensation demands a dominant-pole at LDO's output, ensuring that the frequency of non-dominant poles remains higher than the unity-gain bandwidth. The large parasitic capacitance of the power transistor and the output resistance of the error amplifier produce non-dominant poles. However, having a high-frequency non-dominant pole entails reducing the error amplifier's output resistance at the cost of losing output voltage tracking. A common practice for circumventing this trade-off is to add an output stage between the power transistor and the error amplifier, but the new stage introduces another non-dominant pole. Additionally, the bandwidth of LDO must be reduced even more to mitigate phase margin deviations due to process, voltage, and temperature variations (PVT). A low-bandwidth LDO helps filtering noise from voltage references —inherent thermal and flicker noise and noise coupled from supply voltage through bandgap reference source—, as well as from LDO amplifiers and resistors. However, low bandwidths in LDOs entails slower lower settling time and lower PSRR. Extending the LDO bandwidth when using external compensation is crucial toward achieving a higher line and load regulation, especially for the rejection of ground variations. Although literature reports that the ESR of an output capacitor inserts a zero that helps with LDO compensation [42], to choose a capacitor with the correct ESR is still a challenge. In contrast to reported LDOs [43–46] —which look for an ESR-independent phase margin—, here a design methodology that takes advantage of the extra zero inserted by ESR to cancel phase shifting and improve bandwidth is presented. An adaptive biasing schemes also presented, aiming at reducing the spread of non-dominant poles when the load changes from full to a minimum value. Figure 40. Traditional LDO topology based on a source-follower PMOS power transistor. # 5.2. External Compensation of LDOs Figure 40 shows the typical topology of a PMOS LDO with external compensation. ESR and $C_{EXT}$ model the external compensation capacitor. $L_B$ represents the wirebond inductance of output voltage and ground terminals, while $C_L$ is the internal decoupling capacitor. Resistors $R_{F1}$ and $R_{F2}$ compose the feedback network; trimming is added to $R_{F2}$ to adjust the output voltage and compensate for offset caused by the error amplifier and voltage reference. The regulator includes a brown-out detector and a power-on-reset circuit to monitor output voltage during the power-up sequence and the regular operation. $C_{EXT}$ and the parallel connection of output resistance of the power transistor $(r_{oP})$ and feedback network impose the dominant pole of Fig. 40, such that: $$\omega_{p1} = \begin{cases} \frac{1}{(ESR + r_{oP})C_{EXT}} & \text{for } I_L = I_{LMAX} \\ \\ \frac{1}{(ESR + R_{F1} + R_{F2})C_{EXT}} & \text{for } I_L = I_{LMIN} \end{cases}$$ $$(11)$$ The parasitic capacitance of the power transistor produces as well a non- dominant pole, located at: $$\omega_{p2} = \frac{1}{R_1(C_{GS}+A_{V2}C_{GD})}$$ (12) with $A_{V2} = gm_2(r_{oP}||(R_{F1}+R_{F2}))$ where $R_1$ is the output resistance of the error amplifier. In order to guarantee that the regulator is stable, a non-dominant pole and a right-plane zero has to be placed at frequencies larger than GBW to achieve a phase margin higher than $45^{\circ}$ . Specifically, then: $$\omega_{p2} \ge A_{V1}A_{V2}\omega_{P1}$$ $$\omega_{z1} = \frac{gm_2}{C_{GD}} \ge 10A_{V1}A_{V2}\omega_{P1}$$ (13) Equations 12 and 13 show that when the LDO is delivering a low current, the open-loop gain increases, and right-plane zero is closer to $\omega_{p2}$ and the GBW. A reduction in load current implies that both the intrinsic gain of the power transistor and the LDO DC gain increase, thus suggesting that $\omega_{p1}$ has to be low enough to achieve a large phase margin. A series combination of a capacitor and a resistor produces a zero that can be used together with the LDO's output resistance to implement a lag-lead compensator. The ESR produces a left-plane zero that can be used to reduce phase shifting and increase bandwidth and phase margin, as Fig. 41 shows. The frequency of extra zero is: $$\omega_{zc} = \frac{1}{ESR \times C_{EXT}} \tag{14}$$ Figure 41. Bandwidth improvement by ESR @ $I_L=20\,\mathrm{mA}$ while the compensator transfer function is: $$C(s) = \frac{1+s(ESR\times C_{EXT})}{1+s((ESR+R_{OUT})C_{EXT})}$$ with $R_{OUT}=r_{o_{M_B}}||(R_{F1}+R_{F2})$ The zero frequency in Eq. 15 is always larger than the pole frequency, resulting in a lag-lead compensator. A lag-lead network may be used to set the dominant pole, and the zero frequency may be near non-dominant poles or the unity-gain frequency. The dominant pole strongly depends on regulator output resistance, and compensation zero is a function of the ESR. The selection of an external compensation capacitor can be made based on Eq. 13 and 15. The idea is to select a capacitor whose series resistance sets the compensation zero near the second pole. $\omega_{p2}$ is calculated based on the power-MOSFET parasitic capacitance and variations of $R_1$ . Figure 42. Phase margin vs. ESR, including PVT variations @ $I_L=20\,\mathrm{mA}$ | Type | C=22µF | C=100µF | |--------------------|------------------------|------------------------| | Std. Aluminum | $7$ -30 $\Omega$ | $2\text{-}7\Omega$ | | Low-ESR Aluminum | 1-3 $\Omega$ | $0.3\text{-}1.6\Omega$ | | Std Solid Tantalum | 1.1-2.5 $\Omega$ | $0.9\text{-}1.5\Omega$ | | Low-ESR Tantalum | $0.2\text{-}1\Omega$ | $0.08$ - $0.4\Omega$ | | Ceramic | $\sim$ 0.1-10 $\Omega$ | _ | Table 3. Typical values for ESR of capacitors made of diverse materials. The ESR vary from $10\,\text{m}\Omega$ to $1\,\Omega$ for ceramic capacitors, and from $1\,\Omega$ to $30\,\Omega$ for electrolytic devices, as table 3 shows. Some applications, such as high-frequency filtering, might need capacitors with a corner frequency higher than circuit bandwidth. The larger the capacitor corner frequency, the lower the ESR, and thus higher the cost. High-frequency capacitors are usually made from tantalum and aluminum or are composited structures made from thin-film layers. The use of non-conventional materials results in high-cost capacitors, thus increasing costs. If ESRs are used to compensate the LDO, it is possible to utilize very-low-cost capacitors with large ESRs, instead of designing complex compensation networks capable to drive expensive low-ESR capacitors. Figure 42 shows the simulated phase margin of a 1.8 V LDO as a function of the ESR. The regulator is implemented in a TSMC 180 nm CMOS technology. Figure 43. Implementation of width and parasitic capacitance control of Power MOSFET. Input voltage varies from 2 V to 3.3 V, and the maximum output current is 20 mA. Simulations also include fabrication process corners and temperature variation from -40 °C to 120 °C. Phase margin is lower than 45 ° for very-low ESRs, thus degrading transient response (specially overshoot and undershoot). If the ESR is higher than 0.5 $\Omega$ , the phase margin is greater than 70 °, so that the LDO has a better performance when the capacitor has a low-quality factor. Given that the bandwidth of the implemented LDO is lower than 1 MHz, the ESR of low-cost X7R ceramic capacitors is larger than 1 $\Omega$ . Figure 44. Phase margin vs. bias current of the error amplifier, including PVT variations @ $I_L=20\,\mathrm{mA}$ Figure 45. Phase margin vs. load current without adaptive power transistor control, and including PVT variations. ## 5.3. Adaptive Control of the LDO's Power Transistor A way to increase phase margin for low output currents is to decrease the parasitic capacitance of power-MOSFET, as equation 12 shows. When the LDO is driving low loads, the power transistor can operate in moderate or weak inversion region, thus increasing its gain. An increment in $M_P$ gain has an impact on the second pole because the Miller effect of $C_{GD}$ raises. If the regulator is driving low- Figure 46. Phase margin vs. Temperature with adaptive power transistor control and error amplifier biasing, and including PV variations: a) @ $I_L=20\,\mathrm{mA}$ b) @ $I_L=10\,\mu\mathrm{A}$ . loads, the efficiency improves by reducing the bias current of the error amplifier. This reduction leads to an increment of output resistance of the error amplifier. Consequently, stability is affected considering that the frequency of the second pole reduces. An alternative for preventing the second pole getting close to the unity-gain frequency is to decrease the effective capacitance at the error amplifier's output employing an adaptive control of the power-MOSFET's width. $M_P$ is divided into fifteen (4-bit control) different transistors, as Fig. 43 shows. Each $M_P$ section is connected to $V_{DDIN}$ in order to turning-off when no current capability is needed. The purpose of the adaptive control of power-MOSFET is to connect at the error amplifier's output the minimum number of power transistor sections required to deliver a specific current, thus minimizing $C_{GS}$ and $C_{GD}$ . An adaptation of the error amplifier's bias current implies also a variation of the power-MOSFET's width. When the LDO is driving a large load, the power-MOSFET's width has to be maximum, thus increasing its parasitic capacitance and reducing the frequency of the second pole. Therefore, it is necessary to reduce the output resistance of the error amplifier —by an increment of its bias current—, to reduce the time constant associated with $\Omega_2$ . Moreover, when the load current drops, $M_P$ 's width and the amplifier's bias current decrease in order to improve regulator efficiency. Figure 44 shows the phase margin of the LDO regarding the bias current of the error amplifier and for a load of 20 mA. A low bias current reduces the phase margin because of an increment of second pole time constant, especially when including PVT variations. The phase margin is higher than $60^{\circ}$ when $I_B$ is larger than $2 \mu$ A, resulting in a low-efficient regulator. Figure 45 presents phase margin as a function of load current when $I_B$ is $500 \, \text{nA}$ , and including PVT variations. A high load leads to a phase margin of $35^{\circ}$ , causing significant overshoot and undershoot in transient response. Figure 46a and 46b present the phase margin for an output current of 20 mA and 10 $\mu$ A, respectively, and including the adaptive control. The minimum phase margin is 72°, implying that the regulator behaves as a dominant-pole system for all load currents. Moreover, $I_B$ can be reduced down to 300 nA for $I_L=10~\mu$ A. Brown-out Detection (BOD) may control the power-MOSFET's width and the biasing of the error amplifier. A BOD is a circuit that senses regulator output aiming to measure voltage drops and glitches. Fig. 47 shows a classical BOD circuit composed of a voltage comparator and a temperature-compensated voltage reference. The output of the BOD comparator goes low when the output voltage is lower than the reference. Required minimum supply voltage for the load circuits sets the reference signal. When the LDO passes from high to low load, the phase margin gets lower, thus increasing overshoot, undershoot, and settling time. When undershoot of Figure 47. BOD circuit for detection of LDO dynamics. $V_{OUT}$ is lower than the BOD reference, the comparator's output goes low, thus indicating that the power-MOSFET's width and the error amplifier's bias current have to be adapted. A fully-digital circuit is connected at the BOD's output to carry out the adjustment regarding voltage drop detection. ## 5.4. Experimental Results The regulator was taped out in a TSMC 180 nm general purpose technology, occupying an area of $15200 \, \mu \text{m}^2$ , as Fig. 48 shows. The fabricated circuit includes a bandgap reference, a bias current generation, and a power-on-reset and brownout-detector, allowing to implement complete power-supply monitoring. A measure of output voltage overshoot and settling time gives information about the LDO's stability. Figure 49 shows $V_{OUT}$ given a load current step of 5 mA at room temperature, and for different values of ESR. Figure 49a shows $V_{OUT}$ considering a low-ESR capacitor of 4.7 $\mu$ F, while Fig 49b, 49c, and 49d use a ESR of 1 $\Omega$ , 10 $\Omega$ and 20 $\Omega$ , respectively. A low-ESR capacitor produces an overshoot of 100 mV, while for a large-ESR is 20 mV. Furthermore, settling time Figure 48. Microphotography of the fabricated system highlighting the LDO, the BOD and biasing circuitry. Figure 49. Output voltage variation for a change in load current of 5 mA at 27 °C and considering: a) Low-ESR b)ESR $\sim 1\,\Omega$ c)ESR $\sim 10\,\Omega$ d)ESR $\sim 20\,\Omega$ . Vertical scale is 100 mV/², $whilehorizontalscaleis100\,ns/²$ varies from 600 ns (low ESR) to 200 ns (large ESR). The reduction of overshoot and settling time implies an enhancement of the phase margin and bandwidth of the LDO because the zero frequency of the lead-lag compensator decreases as ESR increases. As a result, low-cost electrolytic capacitors will produce a Figure 50. Output voltage variation for a change in load current of 5 mA at 125 °C and considering: a) Low-ESR b)ESR $\sim 1\,\Omega$ c)ESR $\sim 10\,\Omega$ d)ESR $\sim 20\,\Omega$ . Vertical scale is 200 mV/², $whilehorizontal scale is 100\,ns/²$ better performance on LDO transient response than a high-cost tantalum ceramic capacitors. Figure 50 shows the LDO's output voltage operating at 125°C for the same current step, showing that an enlargement of ESR has the same effect as presented in Fig. 49. Table 4 summarizes the performance of the designed LDO, where 80% of the quiescent current is imposed by the size of feedback resistors, and the other 20% is used to bias the error amplifier. Line regulation is measured considering a change in the input voltage from 2 V to 3.3 V. Figure 51 shows the line regulation, indicating a maximum variation of 40 mV in output voltage. Figure 51. Measured load regulation: Output voltage as a function of load current. | Specification | Value | |---------------------|----------------------| | Technology | 180 nm GP | | Nominal voltage | 1.802 V | | Supply voltage | 2-3.3 V | | Max. output current | 20 mA | | Settling time | 200 ns | | Line regulation | 114 $\mu$ V/mV | | Load regulation | 2 mV/mA | | Quiescent current | 8 $\mu$ A | | Area | $0.275\mathrm{mm}^2$ | Table 4. Performance summary of the designed LDO. ## 5.5. Summary An LDO voltage regulator delivering 1.8 V was implemented in a CMOS 180 nm standard technology, focusing on compensation employing the ESR and an adaptive biasing strategy. Results show that stability can be improved by using low-quality factor capacitors, which is beneficial for low-cost applications. A BOD-based adaptive control of the power-MOSFET's width and the error amplifier's bias current been also introduced, in order to minimize the variation of non-dominant poles frequency regarding load current. Measurement results show a phase margin greater than 70 °C for both load currents of 10 $\mu\text{A}$ and 20 mA. ## 6. Conclusions The main contributions of this thesis are: - An offset reduction technique for dynamic comparators based on output data phase difference [47–49]. - An on-chip all-digital method for eye diagram construction for comparator and transmission link characterization [50, 51]. - A lightweight calibration method for DNL reduction in split-capacitor DACs. - A design methodology to improve the robustness of frequency compensation in LDO regulators [41]. The results to validate all of these contributions were taken from silicon prototypes and contrasted with state-of-the-art design techniques. #### 6.1. Conclusions Reliability is now one of the most important design problems in today's SoCs. This dissertation proposed three different techniques and circuit topologies to enhance PVT robustness of three key aspects in a SoC. Contributions include the reduction of offset in wireline receivers, the correction of distortion in DACs, and the improvement of frequency compensation in linear LDO regulators. Based on the research work done, the following conclusions are offered: Digital calibration of analog circuits does not always imply the execution of complex algorithms that demand a large area and power increment. As a probe, this dissertation proposed an offset-reduction method —which is based on sensing the phase of the comparator's output signal— requiring the use of only a classical phase-and-frequency detector and a simple finite state machine. The technique is suitable for high-speed applications since the phase detector adds only two inverters at the signal path (the input circuits of flip-flops). In addition, the FSM corresponds to an UP/DOWN counter performing as an integrator, thus limiting hardware overhead. Furthermore, the proposed technique can be extended to calibrate a complete analog front-end (including continuous equalizers summing circuits), or general-purpose circuits such as an operational amplifier, as shown in [48] and [47] respectively. Considering that offset is caused by mismatch, the proposed calibration is oriented to sense local and random intra-die variations. However, because of its low hardware penalty, the technique can be used on-the-fly to track the influence temperature and voltage variations, as well as aging. • Although the use of complex layout techniques to reduce the impact of mismatch and temperature gradients in capacitive DACs, such as common-centroid with balanced routing, it is not feasible to reduce the incidence of calibration circuits. When implementing complex layout schemes the creation of parasitic capacitance is inevitable. These parasitic elements have a larger variability regarding fabrication process and temperature, compared with native capacitors. As this dissertation shows, a granulated layout (which is required for the common-centroid technique) increases the need of executing a calibration algorithm. This observation is opposite to what is typically reported in literature [37–39]. - The incidence of parasitic capacitance in CDACs can be reduced with simple calibration algorithms. This dissertation proposed a lightweight algorithm to improve linearity in split-capacitor DACs, without the need for an extra reference voltage. The algorithm is oriented to reduce the impact of parasitic capacitance on the less-significant-bit side, and needs only 6 clock cycles. Combination of proposed calibration method with a compact layout results in a low-power DAC, suitable for performing as a SoC output peripheral, or for being integrated into a SAR ADC. - The compensation method of a LDO regulator will always be a critical aspect in IoT-oriented low-power SoCs, because it adds a fixed static current consumption, even when the SoC has turned on only digital processing units. The implementation of complex compensation networks for dealing with ideal output capacitors increases hardware overhead and degrades efficiency. As this thesis shows, to try to stabilize the loop using a zero-ESR capacitor is not a realistic scenario. The use of ESR as an element to decrease phase shift relaxes the complexity of the internal compensator and enables the use of simple single-stage circuits as error amplifiers. - Interaction between the brown-out detector and the supply voltage regulator is another confirmation that digital calibration can be carried out without the need for complex algorithms and tasks. Despite a BOD being typically used to warn the SoC about supply voltage drops, its output can be an indicator of a large overshoot at LDO output (related with a degraded frequency response and stability). Continuous operation of BOD enables the possibility of tracking the impact of temperature variations and aging on the LDO response. #### 6.2. List of Publications # 6.2.1. Conference papers - C. Duran, M. Wachs, L. Rueda, A. Huntington, J. Ardila, J. Kang, A. Amaya, et al., "An Energy-Efficient RISC-V RV32IMAC Microcontroller for Periodical-Driven Sensing Applications", 2020 Custom Integrated Circuits Conference, United States, 2020. - A. Amaya, F. Castro and E. Roa, "Improving Low-Dropout Regulator Frequency Stability by Exploiting the Equivalent Series Resistor and Featuring an Adaptive Biasing Strategy," 2019 IEEE International Symposium on Circuits and Systems (ISCAS), Sapporo, Japan, 2019, pp. 1-5. - A. Amaya and E. Roa, "On-Fly Offset-Correction Method for High-Speed Comparators using All-Digital Phase Measurement," 2018 IEEE International Symposium on Circuits and Systems (ISCAS), Florence, 2018, pp. 1-4. - A. Amaya, J. Ardila and E. Roa, "A Digital Offset Reduction Method for Dynamic Comparators Based on Phase Measurement," 2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Bochum, 2017, pp. 661-664. - A. Amaya, H. Gomez and E. Roa, "A digital offset correction method for high speed analog front-ends," 2016 29th Symposium on Integrated Circuits and Systems Design (SBCCI), Belo Horizonte, 2016, pp. 1-4. - A. Amaya, R. Villamizar and E. Roa, "An offset reduction technique for dynamic voltage comparators," 2016 12th Conference on Ph.D. Research in Microelectronics and Electronics (PRIME), Lisbon, 2016, pp. 1-4. ## 6.2.2. Journal papers in examination process A. Amaya, J. Ardila, E. Roa, "A Digital Phase-Based On-Fly Offset Compensation Method for Decision Feedback Equalizers". Submmitted to IET Circuits, Devices and Systems. #### 6.2.3. Patents - A. Amaya, R. Villamizar and E. Roa, Method and Circuit to Compensate Offset Voltage in Electronic Circuits, at Superintendencia de Industria y Comercio (Colombia). Granted on March 18th 2019. - J. Ardila, A. Amaya, E. Roa, Method and Circuit to Clock and Data Signals Recovery, at Superintendencia de Industria y Comercio (Colombia). Granted on February 20th 2020. # 6.2.4. Patent requests - 1. E. Roa, **A. Amaya**, Clock Generation Using Digital Synthesis, at Superintendencia /de Industria y Comercio (Colombia), December 07, 2017. - A. Amaya, H. Gomez, E. Roa, Method and Apparatus for RAM Memory Protection Against Row-Hammering Attacks, at Superintendencia de Industria y Comercio (Colombia) December 2018. ## 6.2.5. Other publications A. Amaya, L. E. Rueda G and E. Roa, "A Multi-Level Power-on Reset for Fine-Grained Power Management," 2018 28th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS), Platja d'Aro, 2018, pp. 129-132. - L. Fernandez, A. Amaya and E. Roa, "A 0.007mm<sup>2</sup> 50mA Three-Stage Fully-Integrated Capacitor-Less Low-Dropout Regulator," 2018 IEEE International Symposium on Circuits and Systems (ISCAS), Florence, 2018, pp. 1-4. - C. Duran, A. Amaya, et al., "A system-on-chip platform for the internet of things featuring a 32-bit RISC-V based microcontroller," 2017 IEEE 8th Latin American Symposium on Circuits & Systems (LASCAS), Bariloche, 2017, pp. 1-4. - 4. **A. Amaya**, H. Gomez and E. Roa, "Mitigating Row Hammer attacks based on dummy cells in DRAM," 2017 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, 2017, pp. 442-443. - H. Gomez, A. Amaya and E. Roa, "DRAM row-hammer attack reduction using dummy cells," 2016 IEEE Nordic Circuits and Systems Conference (NORCAS), Copenhagen, 2016, pp. 1-4. - A. Amaya, H. Gomez and G. Espinosa, "An area efficient, high speed, fully on-chip low-dropout —LDO— voltage regulator", Ingenieria y Competitividad, Volume 17, No. 1, Pages 153-160, 2015, ISSN: 0123-3033. - 7. **A. Amaya**, G. Espinosa and R. Villamizar, "A robust to PVT variations low-voltage low-power current mirror," 2014 IEEE 5th Latin American Symposium on Circuits and Systems, Santiago, 2014, pp. 1-4. ## 6.3. Future work The publications listed in this chapter evidence the contributions of this dissertation to the state-of-the-art techniques for improving PVT robustness of a SoC. However, there are still some aspects that need improvement with the purpose of extending techniques to other SoC aspects. Therefore, the following paragraphs present some recommendations for future research work in this field. This dissertation addressed the implementation of low-complexity design techniques for improving PVT robustness of specific SoC sub-systems, such as high-speed interfaces, data conversion and supply voltage regulation. For high-speed links, calibration is oriented to reduce offset without increasing hardware complexity and load capacitance. The digital construction of eye diagrams validates the technique in a prototype taped-out in a 130 nm node. Despite emulated channel imposed an attenuation of 26 dB, the circuit only served as a proof-of-concept prototype because the maximum data rate was 800 Mbps. Therefore, the proposed technique has to be validated in a complete state-of-art serial link, whose transfer speed reaches up to 20 Gpbs for a single lane. In addition, it is crucial to test its interaction with the clock-and-data recovery circuit (CDR) during link training. DAC calibration was achieved only for the worst-case input code, which corresponds to half of the dynamic range. Choosing this point allows to charge and discharge the largest amount of capacitors for an output voltage variation of only one quantization step. However, given the non-linear behavior of some parasitic capacitors, such as the input capacitance of output buffers, it is essential to calculate the calibration code for a larger set of input words. As a result, calibration words have to be stored in a separate section of the SoC memory. If the SoC has enough available memory, it is possible to calculate a calibration word for each input code, thus resulting in an error-correction vector of 4KB. The interaction of a Brown-out-detector with an LDO needs further research in order to improve the dynamics of the control loop. The design of that block must take into consideration that bandwidth of the BOD loop has to be lower (at least $10\times$ ) than the bandwidth of the LDO core. As a result, adaptation of the power-Mosfet's width and the amplifier's bias current will not interfere with the LDO's transient response. If these conditions are not met, the power supply will have two different control loops performing at the same time, which can degrade transient response. # **Bibliography** - [1] Synopsys, *The DNA of an Artificial Intelligence SoC*, 2018. [Online]. Available: https://www.synopsys.com/designware-ip/technical-bulletin/the-dna-of-an-ai-soc-dwtb\_q318.html - [2] K. Iniewski, VLSI Circuits for Biomedical Applications. Artech House, 2008. - [3] P. Gupta, S. Gourishetty, H. Mandadapu, and Z. Abbas, "Pvt variations aware robust transistor sizing for power-delay optimal cmos digital circuit design," in 2019 IEEE International Symposium on Circuits and Systems (ISCAS), 2019, pp. 1–5. - [4] F. G. R. G. da Silva, P. F. Butzen, and C. Meinhardt, "Pvt variability analysis of finfet and cmos xor circuits at 16nm," in 2016 IEEE International Conference on Electronics, Circuits and Systems (ICECS), 2016, pp. 528–531. - [5] R. Foundation, Raspberry Pi 4 Computer Model B Product Brief, 2019. [Online]. Available: https://static.raspberrypi.org/files/product-briefs/ 200206+Raspberry+Pi+4+1GB+2GB+4GB+Product+Brief+PRINT.pdf - [6] Broadcom, BCM5871X Series Processors, 2016. [Online]. Available: https://docs.broadcom.com/docs/1211168571391 - [7] F. Maloberti, *Data Converters*. Springer, 2007. - [8] T. Norimatsu, T. Kawamoto, K. Kogo, N. Kohmu, F. Yuki, N. N. an Takashi Muto, J. Nasu, T. Komori, H. Koba, T. Usugi, T. Hokari, T. Kawamata, Y. Ito, S. Umai, M. Tsuge, T. Y. M. Hasegawa, and K. Higeta, "A 25Gb/s - Multistandard Serial Link Transceiver for 50dB-Loss Copper Cable in 28nm CMOS," *International Solid State Circuit Conference ISSCC 2016*, January 2016. - [9] V. Stojanovic, A. Ho, B. Garlepp, F. Chen, J. Wei, E. Alon, C. Werner, J. Zerbe, and M. A. Horowitz, "Adaptive Equalization and Data Recovery in a Dual-mode (PAM2/4) Serial Link Transceiver," in 2004 Symposium on VLSI Circuits. Digest of Technical Papers (IEEE Cat. No.04CH37525), June 2004, pp. 348–351. - [10] B. Garlepp, A. Ho, V. Stojanovic, F. Chen, C. Werner, G. Tsang, T. Thrush, A. Agarwal, and J. Zerbe, "A 1-10 Gbps PAM2, PAM4, PAM2 partial response Receiver Analog Front-end with Dynamic Sampler Swapping Capability for Backplane Serial Communications," in *Digest of Technical Papers*. 2005 Symposium on VLSI Circuits, 2005., June 2005, pp. 376–379. - [11] P. A. Francese, T. Toifl, P. Buchmann, M. Brändli, C. Menolfi, M. Kossel, T. Morf, L. Kull, and T. M. Andersen, "A 16 Gb/s 3.7 mW/Gb/s 8-Tap DFE Receiver and Baud-Rate CDR With 31 kppm Tracking Bandwidth," *IEEE Journal of Solid-State Circuits*, vol. 49, no. 11, pp. 2490–2502, Nov 2014. - [12] C.-H. Chan, Y. Zhu, U.-F. Chio, S.-W. Sin, S.-P. U, and R. Martins, "A Reconfigurable low-noise Dynamic Comparator with Offset Calibration in 90nm CMOS," in *Solid State Circuits Conference (A-SSCC)*, 2011 IEEE Asian, Nov 2011, pp. 233–236. - [13] A. Gines, E. Peralias, and A. Rueda, "Background Digital Calibration of Comparator Offsets in Pipeline ADCs," *Very Large Scale Integration (VLSI) Systems, IEEE Transactions on*, vol. 23, no. 7, pp. 1345–1349, July 2015. - [14] M. Miyahara and A. Matsuzawa, "A low-offset Latched Comparator Using Zero-static Power Dynamic Offset Cancellation Technique," in *Solid-State Circuits Conference*, 2009. A-SSCC 2009. IEEE Asian, Nov 2009, pp. 233–236. - [15] C. Chen, Z. Feng, H. Chen, M. Wang, J. Xu, F. Ye, and J. Ren, "A Low-offset Calibration-free Comparator with a Mismatch-suppressed Dynamic Preamplifier," in *Circuits and Systems (ISCAS), 2014 IEEE International Symposium on*, June 2014, pp. 2361–2364. - [16] G. R. Gangasani, C. M. Hsu, J. F. Bulzacchelli, S. Rylov, T. Beukema, D. Freitas, W. Kelly, M. Shannon, J. Qi, H. H. Xu, J. Natonio, T. Rasmus, J. R. Guo, M. Wielgos, J. Garlett, M. A. Sorna, and M. Meghelli, "A 16-Gb/s backplane transceiver with 12-tap current integrating DFE and dynamic adaptation of voltage offset and timing drifts in 45-nm SOI CMOS technology," in 2011 IEEE Custom Integrated Circuits Conference (CICC), Sept 2011, pp. 1–4. - [17] B.-H. Park and P. E. Allen, "A 1 GHz, low-phase-noise CMOS frequency synthesizer with integrated LC VCO for wireless communications," in *Proceedings of the IEEE 1998 Custom Integrated Circuits Conference (Cat. No.98CH36143)*, May 1998, pp. 567–570. - [18] M. S. Chen, Y. N. Shih, C. L. Lin, H. W. Hung, and J. Lee, "A Fully-Integrated 40-Gb/s Transceiver in 65-nm CMOS Technology," *IEEE Journal of Solid-State Circuits*, vol. 47, no. 3, pp. 627–640, March 2012. - [19] J. L. Sonntag and J. Stonick, "A Digital Clock and Data Recovery Architecture for Multi-Gigabit/s Binary Links," *IEEE Journal of Solid-State Circuits*, vol. 41, no. 8, pp. 1867–1875, Aug 2006. - [20] V. Stojanovic and M. Horowitz, "Modeling and analysis of high-speed links," in *Proceedings of the IEEE 2003 Custom Integrated Circuits Conference*, 2003., Sept 2003, pp. 589–594. - [21] H. Fu, Equalization for High-Speed Serial Interfaces in Xilinx 7 Series FPGA Transceivers, Xilinx, 2006. - [22] P. K. Hanumolu, G.-Y. Wei, and U.-K. Moon, "Equalizer for high-s peed serial links," *International Journal of High Speed Electronics and Systems*, vol. 15, no. 2, pp. 429–458, 2005. - [23] E. Laskin, "On-chip Self-test Circuit Block for High-Speed Applications," Master's thesis, University of Toronto, Canada, 2006. - [24] F. Loi, "Adaptive Analog Transversal Equalizers for High-Speed Serial Links," Ph.D. dissertation, University of Pavia, Italy, 2015. - [25] A. Amaya and E. Roa, "On-fly offset-correction method for high-speed comparators using all-digital phase measurement," in 2018 IEEE International Symposium on Circuits and Systems (ISCAS), May 2018, pp. 1–4. - [26] S. Palermo, S. Hoyos, A. Shafik, E. Z. Tabasy, S. Cai, S. Kiran, and K. Lee, "Cmos adc-based receivers for high-speed electrical and optical links," *IEEE Communications Magazine*, vol. 54, no. 10, pp. 168–175, October 2016. - [27] S. Kiran, S. Cai, Y. Luo, S. Hoyos, and S. Palermo, "A 52-gb/s adc-based pam-4 receiver with comparator-assisted 2-bit/stage sar adc and partially unrolled dfe in 65-nm cmos," *IEEE Journal of Solid-State Circuits*, pp. 1–13, 2018. - [28] B. Razavi, "The StrongARM Latch [A Circuit for All Seasons]," *IEEE Solid-State Circuits Magazine*, vol. 7, no. 2, pp. 12–17, Spring 2015. - [29] S. Kumaravel, A. Gupta, and B. Venkataramani, "VLSI Implementation of Gm-C Filter using Modified Nauta OTA with Double CMOS Pair," in *Recent Advances in Intelligent Computational Systems (RAICS)*, 2011 IEEE, Sept 2011, pp. 216–220. - [30] R. Kreienkamp, U. Langmann, C. Zimmermann, T. Aoyama, and H. Siedhoff, "A 10-Gb/s CMOS Clock and Data Recovery Circuit with an Analog Phase Interpolator," *IEEE Journal of Solid-State Circuits*, vol. 40, no. 3, pp. 736–743, March 2005. - [31] "USB 3.1 SuperSpeed Equalizer Design Guidelines," Standard, 2014. - [32] J. E. Jaussi, G. Balamurugan, D. R. Johnson, B. Casper, A. Martin, J. Kennedy, N. Shanbhag, and R. Mooney, "8-gb/s source-synchronous i/o link with adaptive receiver equalization, offset cancellation, and clock deskew," *IEEE Journal of Solid-State Circuits*, vol. 40, no. 1, pp. 80–88, 2005. - [33] S. Shahramian and A. Chan Carusone, "A 0.41 pj/bit 10 gb/s hybrid 2 iir and 1 discrete-time dfe tap in 28 nm-lp cmos," *IEEE Journal of Solid-State Circuits*, vol. 50, no. 7, pp. 1722–1735, 2015. - [34] W. Li, F. Li, J. Liu, H. Li, and Z. Wang, "A 13-bit 160ms/s pipelined subranging-sar adc with low-offset dynamic comparator," in 2017 IEEE Asian Solid-State Circuits Conference (A-SSCC), 2017, pp. 225–228. - [35] J. Um, J. Kim, J. Sim, and H. Park, "Digital-domain calibration of split-capacitor dac with no extra calibration dac for a differential-type sar adc," in *IEEE Asian Solid-State Circuits Conference 2011*, Nov 2011, pp. 77–80. - [36] Y. Chen, X. Zhu, H. Tamura, M. Kibune, Y. Tomita, T. Hamada, M. Yoshioka, K. Ishikawa, T. Takayama, J. Ogawa, S. Tsukamoto, and T. Kuroda, "Split capacitor DAC mismatch calibration in successive approximation ADC," in 2009 IEEE Custom Integrated Circuits Conference, Sept 2009, pp. 279–282. - [37] W. Hsiao, Y. He, M. P. Lin, R. Chang, and S. Lee, "Automatic common-centroid layout generation for binary-weighted capacitors in charge-scaling DAC," in 2012 International Conference on Synthesis, Modeling, Analysis and Simulation Methods and Applications to Circuit Design (SMACD), Sept 2012, pp. 173–176. - [38] Y. Li, Z. Zhang, D. Chua, and Y. Lian, "Placement for Binary-Weighted Capacitive Array in SAR ADC Using Multiple Weighting Methods," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 33, no. 9, pp. 1277–1287, Sept 2014. - [39] C. Lin, J. Lin, Y. Chiu, C. Huang, and S. Chang, "Common-centroid capacitor placement considering systematic and random mismatches in analog integrated circuits," in 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC), June 2011, pp. 528–533. - [40] C. Duran, M. Wachs, L. E. Rueda G., A. Huntington, J. Ardila, J. Kang, A. Amaya, H. Gomez, J. Romero, L. Fernandez, F. Flechas, R. Torres, J. Moya, W. Ramirez, J. Arenas, J. Gomez, H. Morales, C. Rojas, A. Mantilla, E. Roa, and K. Asanovic, "An energy-efficient risc-v rv32imac microcontroller - for periodical-driven sensing applications," in *2020 IEEE Custom Integrated Circuits Conference (CICC)*, 2020, pp. 1–4. - [41] A. Amaya, F. Castro, and E. Roa, "Improving low-dropout regulator frequency stability by exploiting the equivalent series resistor and featuring an adaptive biasing strategy," in *2019 IEEE International Symposium on Circuits and Systems (ISCAS)*, May 2019, pp. 1–5. - [42] J. Fallin., "ESR, Stability, and the LDO Regulator," Texas Instruments, Report SLVA115, 2002. - [43] R. Magod, B. Bakkaloglu, and S. Manandhar, "A 1.24μA Quiescent Current NMOS Low Dropout Regulator With Integrated Low-Power Oscillator-Driven Charge-Pump and Switched-Capacitor Pole Tracking Compensation," *IEEE Journal of Solid-State Circuits*, vol. 53, no. 8, pp. 2356–2367, Aug 2018. - [44] X. L. Tan, S. S. Chong, P. K. Chan, and U. Dasgupta, "A LDO Regulator With Weighted Current Feedback Technique for 0.47 nF-10 nF Capacitive Load," *IEEE Journal of Solid-State Circuits*, vol. 49, no. 11, pp. 2658–2672, Nov 2014. - [45] T. Y. Man, K. N. Leung, C. Y. Leung, P. K. T. Mok, and M. Chan, "Development of Single-Transistor-Control LDO Based on Flipped Voltage Follower for SoC," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 55, no. 5, pp. 1392–1401, June 2008. - [46] Y. Lam and W. Ki, "A 0.9V 0.35μm Adaptively Biased CMOS LDO Regulator with Fast Transient Response," in 2008 IEEE International Solid-State Circuits Conference Digest of Technical Papers, Feb 2008, pp. 442–626. - [47] A. Amaya, R. Villamizar, and E. Roa, "Method and circuit for compensating offset voltage of electronic circuits," Colombian Patent WO2 018 002 843A1, 2016. - [48] A. Amaya, H. Gomez, and E. Roa, "A digital offset correction method for high speed analog front-ends," in 2016 29th Symposium on Integrated Circuits and Systems Design (SBCCI), Aug 2016, pp. 1–4. - [49] A. Amaya, R. Villamizar, and E. Roa, "An Offset Reduction Technique for Dynamic Voltage Comparators," in 2016 12th Conference on Ph.D. Research in Microelectronics and Electronics (PRIME), June 2016, pp. 1–4. - [50] A. Amaya and E. Roa, "On-Fly Offset-Correction Method for High-Speed Comparators using All-Digital Phase Measurement," in 2018 IEEE International Symposium on Circuits and Systems (ISCAS), 2018. - [51] A. Amaya, J. Ardila, and E. Roa, "A digital offset reduction method for dynamic comparators based on phase measurement," in 2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), July 2017, pp. 661–664. - [52] D. H. Kim, P. J. Nair, and M. K. Qureshi, "Architectural Support for Mitigating Row Hammering in DRAM Memories," *IEEE Computer Architecture Letters*, vol. 14, no. 1, pp. 9–12, Jan 2015. - [53] H. Gomez, A. Amaya, and E. Roa, "Dram row-hammer attack reduction using dummy cells," in 2016 IEEE Nordic Circuits and Systems Conference (NORCAS), Nov 2016, pp. 1–4. - [54] K. C. H. et. al, "A High-Performance, High-Density 28nm eDRAM Technology with High-K Metal-Gate," in *Electron Devices Meeting (IEDM), 2011 IEEE International*, Dec 2011, pp. 24.7.1–24.7.4. - [55] S. K. h. Fung et. al, "65nm SOI CMOS Technology for High Performance Microprocessor Application," in *2006 International Symposium on VLSI Technology, Systems, and Applications*, April 2006, pp. 1–2. ## A. Process-Compatible DRAM Row-Hammering Mitigation Technique #### A.1. Introduction Awareness of DRAM row-hammering-based attacks has increased due to its extended incidence over system failures and security issues. A row-hammering attack (RHA) consists of corrupting information stored in adjacent cells to a specific memory address. During these types of attacks, some instruction executions lead the system to read a particular row or address continuously. The attack takes advantage of capacitive coupling between rows to enlarge leakage and charge or discharge retention capacitors. An RHA decreases retention time in memory cells, thus reducing the effectiveness of refreshing operations, enabling the attacker to flip stored data. Literature reports different hardware and software approaches to mitigate row-hammering bug outcomes [52,53]. Although some works report mitigation approaches at hardware level [52], the bug is still present in modern chips. Other methods offer low compatibility with current DRAM technologies when using modified DRAM cells [53]. In contrast, the two mitigation strategies outlined in this letter share enhanced compatibility. The first one is based on a pseudo-parallel connection to enable monitoring cells with only standard DRAM cells. The latter one takes advantage of intrinsic weak cells present in a DRAM process after fabrication. Simulations in a 65nm CMOS technology validate the pseudo-parallel approach. # A.2. Pseudo-parallel Memory Cell Emulation An RHA may be monitored using DRAM cells with modified leakage susceptibility. Gomez el al. [53] proposed to include an altered cell employing a wider transistor and a reduced capacitance, as Fig. 52 shows. The modified cell exhibits a larger leakage current resulting in an accelerated discharge during an RHA. A system may recognize an attack by checking the stored information in the modified cells. In normal operation, the information remains fixed. When an attack is carried out, data in modified cells are corrupted faster than regular cells, triggering an attack alert. Although results from the Gomez approach support good performance, wider cells incur extra fabrication costs considering that a significant number of fabrication masks must be updated to include a non-regular cell per row. In contrast to [53], this work proposes a monitoring cell using only standard DRAM cells without altering compatibility with current DRAM technologies. The main idea corresponds to the implementation of cells with a higher retention time than regular ones, but without altering the dimensions of the access transistor and retention capacitor. The proposed cell uses two regular memory units to implement a pseudo-parallel connection, as Fig. 53 shows. If two DRAM cells controlled by the same word line(WLn) share their bit line(BLn), the stored information in both cells is the same, prompting a pseudo-parallel behavior. The pseudo-parallel connection refers to both cells sharing all their nodes except the source terminals. Those pseudo-parallel cells may act as attack indicators since their information is corrupted later than data stored in regular cells during an attack. A monitoring scheme using the proposed pseudo-parallel cell consists of three conventional DRAM cells per row. Two cells compose the emulated pseudo- Figure 52. Cell with increased leakage susceptibility using non-standard sized cell. Figure 53. Pseudo-parallel connection between two DRAM cells. Both cells share their bit line and word line, emulating a cell of twice the size of a standard DRAM cell. parallel cell, and one more cell works as a reference for the alert mechanism. If both a pseudo-parallel and a reference standard cell always store the same information, data must be equal in regular operation. During a row-hammering attack, a difference may be detected due to the enhanced retention time of the proposed cell. A simple logic comparison can identify this difference to trigger an error correction algorithm. The proposed system offers an attack indicator without modifying conventional DRAM cells, enhancing compatibility as the pseudo-parallel implementation requires slightly modified fabrication masks. Furthermore, a reduction in capacitor size, as required in [53], might infringe minimal process dimensions. The proposed alternative avoids these issues since the dimensions of the access transistor and the retention capacitor are not altered. Figure 54. DRAM array including the proposed monitoring system. Pseudo-parallel cells consist of one modified cell and one dummy cell, exploiting the unusable bit of dummy cells. The proposed approach also enables a straightforward implementation within a DRAM array. Fig. 54 presents a DRAM array highlighting conventional DRAM cells and the implementation of pseudo-parallel cells with a simple modification of conventional cells. By mirroring a standard cell, we enable the construction of a pseudo-parallel cell using the bit line associated with the dummy cells commonly employed for layout matching. One regular bit line might be enough to obtain a reference value for monitoring. # A.3. On Deployability of a Weak-Cells-Based Monitoring System Compatibility issues of the monitoring system proposed in [53] may be solved using intrinsic weak cells of a DRAM process. In conventional DRAM technologies, process variations cause a significant decrease in retention time in some cells. A reduction in the retention time might lead cells to suffer a similar leakage suscep- tibility to the weak cells described in [53]. Intrinsic weak cells may carry out the same function as the modified weak cells. A monitoring system based on intrinsic weak cells offers enhanced compatibility with current DRAM arrays, regarding that the system does not require modifications of conventional DRAM cells. A procedure to constitute the monitoring system may be performed as follows. First, conventional fault DRAM tests mark the weakest cells. Then, a row-hammering test finds which of those weakest memory units still have a high susceptibility to leakage. Instead of labeling those units as unusable, the weakest cells store fixed data. These data should remain unchanged in regular operation; any alteration in the data indicates an attack. When an attack identification is flagged, the memory controller may trigger a refresh operation to avoid possible bit-flipping in the conventional cells. ## A.4. Simulation Results A DRAM array including one pseudo-parallel cell per row was implemented in a 65nm CMOS standard process. Results in this technology node may be extended to a state-of-art DRAM dedicated process, taking into account that memory-dedicated technologies have devices with reduced current-capability and performance in comparison to conventional CMOS nodes [54,55]. We performed a transient analysis, emulating a row-hammering attack in a 64x64 DRAM array for validation purposes. We applied row-hammering access to the second row of the array, while the other 63 were disabled. The attacker row—the second row— was continuously read at a frequency of 1GHz, and the voltage in the cells' capacitor within adjacent rows was analyzed. The parasitic coupling was also extracted from layout implementation. Fig. 55 shows the results over process and temperature variations. All cells Figure 55. Standard and monitoring cell voltage discharge including PVT variations: a) Typical process corner and 50°C, b) Fast process corner and 125°C, c) Slow process corner and -40°C in the array (including the pseudo-parallel cells) store a high logic value as the initial condition. A reference threshold is set to flag when data may start to be deliberately altered. This threshold is set to $V_{\rm DD}/2=0.5V$ since the bit lines are pre-charged at this value for proper operation of sensing amplifiers. The induced row-hammering leakage accelerates the discharge of the cell capacitor for standard and pseudo-parallel cells. For cases in Fig. 55, the blue line represents the discharge in a pseudo-parallel cell, and the red line represents the discharge in a standard cell. Results in Fig. 55a) correspond to typical operating conditions (typical process corner, a temperature of $50^{\circ}$ C, and $V_{\rm DD}=1$ V). Fig. 55b) corresponds to a corner case associated with the slow process and a temperature of $-40^{\circ}$ C, and Fig. 55c) describes operation at the fast process corner with a temperature of $120^{\circ}$ C. When a row-hammering attack begins, the coupling signal from the attacker row induces a voltage in victim rows. $V_{\rm GS}$ of victim cells continuously increments during the attack, starting the discharge of retention capacitors due to the increment of leakage current. Both standard and monitoring cells lose their charge at a similar rate until the main transistor current dominates the discharge over the leakage current. The pseudo-parallel cell experiments a reduced discharge rate due to the double capacitance emulation. As a result, the standard cell discharges faster, crossing the threshold before the pseudo-parallel cell. For the typical corner in Fig. 55a), the pseudo-parallel cell enables a discharge-time difference of $4\mu s$ with a voltage difference of 150mV. The results for the extreme corners are congruent with the typical case, showing a discharge-time difference of $3\mu s$ in the worst case. The resulting time difference ensures that a memory controller may detect that the data value of both cells is different, thus triggering a correction algorithm to re-establish original data. Besides, Fig. 55 shows that the voltage difference is always larger than 100mV, giving enough margin to avoid bit flipping in a pseudo-parallel cell happening before a standard cell. The proposed method not only enables compatibility with current DRAM process but also offers a low area overhead if the pseudo-parallel cell is constructed with dummy cells for layout matching. The whole monitoring system needs an additional bit line, which results in one extra standard memory cell per row. For instance, this additional cell includes an area increment of less than 0.2% in a conventional DDR4 memory that uses rows of 512 bytes (4096 cells). #### A.5. Conclusions Literature provides some row-hammering mitigation approaches. However, row-hammering bugs are still present in modern DRAM chips. This letter offers a monitoring system using a proposed pseudo-parallel cell attack indicator. The pseudo-parallel cell emulates a double-sized capacitor showing an enhanced retention time, maintaining the information for a longer time than a standard cell in the presence of a row-hammering attack. This feature enables a simple logic comparison to identify the attack if the proposed cell and a standard cell always store the same information. The proposed system is compatible with any DRAM process and adds minimum overhead. Simulations in a 65nm standard technology validate the concept over process and temperature variations. Lastly, we also discussed an alternative monitoring system using weak cells.