# CLOCK AND DATA RECOVERY TECHNIQUES FOR INTEGRATED HIGH SPEED INTERFACES **JAVIER FERNEY ARDILA OCHOA** UNIVERSIDAD INDUSTRIAL DE SANTANDER FACULTAD DE INGENIERÍAS FISICOMECÁNICAS ESCUELA DE INGENIERÍA ELÉCTRICA, ELECTRÓNICA Y DE TELECOMUNICACIONES BUCARAMANGA 2021 ## CLOCK AND DATA RECOVERY TECHNIQUES FOR INTEGRATED HIGH SPEED INTERFACES #### **JAVIER FERNEY ARDILA OCHOA** A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor in Engineering #### Advisor: ÉLKIM FELIPE ROA FUENTES INGENIERO ELECTRÓNICO. PhD. UNIVERSIDAD INDUSTRIAL DE SANTANDER FACULTAD DE INGENIERÍAS FISICOMECÁNICAS ESCUELA DE INGENIERÍA ELÉCTRICA, ELECTRÓNICA Y DE TELECOMUNICACIONES BUCARAMANGA 2021 #### **ACKNOWLEDGEMENTS** It is incredible the number of people who have enriched and encourage me through this journey. It is astonishing how time goes that fast and how many people have influenced me during my PhD. The pages in this book will not be enough to list each of them. First, I want to thank Elkim for accepting to be my advisor. I cannot finish my PhD without supervision and guidance from you. I admire your envision and courage to take big challenges and achieve them. It was a great pleasure for me to work at the Onchip group. I would like to thank my old friends and colleagues Andrés Amaya, Luis Rueda, and Héctor Gómez for the sharp technical discussions and ideas. Also, to them and to Camilo Rojas for the nice and fun conversations in our coffee breaks. Thanks to Néstor Cuevas, Hanssel Morales, Rolando Torres, Julián Arenas, Juan Moya, Ckristian Duran, Luisa Dovale, Sergio Leal, and Alex Mantilla, whose contributions were of valuable help. I would like to thank all the other staff and former staff in the group for the sharing of thoughts, discussions, and care. Thank you very much. I also want to thank the great design team AMSIP at NXP semiconductors in The Netherlands. I am really proud to have had the opportunity to expand my expertise and technical skills with such an excellent team: Jos, Dobson, Rene, Maoqiang, and other members. You are not only excellent engineers but also so amiable and gentle. Moreover, I want to thank Oscar for the nice talks and time we spend in Eindhoven. Very special gratitude goes to Professors Jaime Barrero and Carlos Fajardo, for the nice conversations we have shared and the advice you gave me for life and professional situations. Apart from professional and research, there are plenty of friends who have en- riched my daily life, they are Diana, Lache, Juanjo, Jefferson, Convers, Alexandra, Diego Velandia, and all the guys from "Arcos" who still are in touch since childhood. Finally and most importantly, thanks to my family. Many thanks to my grandma and grandpa Paulina and Pedro, to my father Javier and mother Maricela, to my uncles, Holger, Nelson, and Néstor. Special and great thanks to my uncle Nelson Ochoa. Your unconditional support helps me countless times through all this process and journey. You are always supporting and backing me for everything I do. The unselfish love you have been giving me is priceless. ## **TABLE OF CONTENTS** | | Page. | |----------------------------------------------------------|-------| | INTRODUCTION | 20 | | 1 PROJECT OVERVIEW | 21 | | 1.1 CLOCK AND DATA RECOVERY BACKGROUND | 21 | | 1.1.1 High-Speed Interfaces in Communication Systems | 21 | | 1.1.2 Clock and Data Recovery Circuits | 23 | | 1.1.3 State of the Art | 29 | | 1.2 TECHNICAL CHALLENGES IN HIGH-SPEED CDRS | 35 | | 1.3 DISSERTATION AIM AND SCOPE | 39 | | 1.3.1 Scope of this Dissertation | 40 | | 1.4 ORIGINAL CONTRIBUTIONS | 41 | | 1.4.1 Channel Losses Impact on Digital CDRs | 41 | | 1.4.2 Stochastic Resonance in CDRs | 41 | | 1.4.3 Design Methodology | 42 | | 1.4.4 Cross-Correlation Based Adaptive Loop Gain - XCALG | 42 | | 1.4.5 Nonlinear Laplacian Spectral Analysis | 42 | | 1.4.6 Additional Contributions | 42 | | 1.5 DISSERTATION OUTLINE | 43 | | 2 JITTER AND CHANNEL LOSS IN DIGITAL CDR | 45 | | 2.1 INTRODUCTION | 45 | | 2.2 JITTER NOISE AND EXTRACTION PROCEDURE | 46 | | 2.3 IMPACT OF CHANNEL LOSS ON $K_{PD}$ | 48 | | 2.3.1 Channel Loss with Gaussian Noise | 49 | | 2.3.2 Channel Loss with Uniform Noise | 50 | |--------------------------------------------------------|----| | 2.3.3 Channel Loss with Sinusoidal Noise | 50 | | 2.3.4 Impact on the CDR Dynamics | 52 | | 2.3.5 Channel Loss Probability Density Function | 53 | | 2.4 SUMMARY | 55 | | 0. 0T00U40TI0 DE00NANGE IN DANG DANG DU40E DETECTOR DA | | | 3 STOCHASTIC RESONANCE IN BANG-BANG PHASE DETECTOR BA | | | CDR | 56 | | 3.1 INTRODUCTION | 56 | | 3.2 STOCHASTIC RESONANCE | 57 | | 3.3 MATHEMATICAL APPROACH | 59 | | 3.4 SIMULATION RESULTS | 62 | | 3.4.1 Impact on CDR frequency response | 63 | | 3.4.2 Impact on CDR jitter tolerance function | 64 | | 3.5 SUMMARY | 66 | | 4 MODELING AND DESIGN METHODOLOGY | 67 | | 4.1 DPLL-CDR MODELING | 67 | | 4.1.1 Linear Frequency Model: the z-model | 67 | | 4.1.2 Time-Step Model: the tstep-model | 72 | | 4.1.3 Verilog Model: the vlog-model | 74 | | 4.2 MODELING CONSIDERATIONS IN MULTI-RATE DPLL-CDR | 77 | | 4.2.1 The Accumulator as Basic Unit | 77 | | 4.2.2 Modeling Update | 78 | | 4.3 NON-LINEAR CONSIDERATIONS | 81 | | 4.3.1 BBPD Gain | 81 | | 4.3.2 MJV nonlinearity | 82 | | 4.3.3 Slew Rate | 84 | | 4.4 N | MODEL SIMULATIONS | 84 | |--------|------------------------------------------------------|------------| | 4.5 D | DESIGN METHODOLOGY | 89 | | 4.5.1 | Design Space Generation | 90 | | 4.5.2 | Mapping Equations | 91 | | 4.5.3 | Including the Noise Profiles | 92 | | 4.5.4 | JTF and JTOL Extraction | 92 | | 4.5.5 | Filtering Chain | 92 | | 4.5.6 | Large Signal Behavior | 93 | | 4.5.7 | Go to Verilog | 93 | | 4.6 S | SUMMARY AND DISCUSSION | 93 | | 5 CR | ROSS-CORRELATION BASED LOOP GAIN ADAPTATION FOR BANG | <b>3</b> - | | ВА | ING CDR | 95 | | 5.1 II | NTRODUCTION | 95 | | 5.2 S | SPECTRAL ANALYSIS OF AUTOCORRELATION AND | | | C | CROSS-CORRELATION FUNCTIONS | 98 | | 5.2.1 | Power Spectral Density and Autocorrelation | 99 | | 5.2.2 | Cross-Power Spectral Density and Cross-correlation | 100 | | 5.2.3 | Comparison and Discussion | 101 | | 5.3 C | CROSS-CORRELATION PROPERTIES IN CDRS | 103 | | 5.3.1 | Observability Enhancement | 104 | | 5.3.2 | Filtering Properties | 105 | | 5.3.3 | PI Jitter Impact | 108 | | 5.4 P | PROPOSED LOOP GAIN ADAPTATION | 110 | | 5.4.1 | Adaptation Procedure | 110 | | 5.4.2 | Implementation Diagram | 114 | | 5.5 S | SIMULATIONS AND RESULTS | 116 | | 5.5.1 | Behavioral Simulations | 116 | | 5.5. | 2 Cross-correlation Hardware Implementation | 120 | |-------|---------------------------------------------|-----| | 5.5.3 | 3 Window Size and Area Penalty | 122 | | 5.6 | CONCLUSION | 125 | | 6 ( | CONTRIBUTIONS AND CONCLUSIONS | 127 | | 6.1 | CONTRIBUTIONS SUMMARY | 127 | | 6.2 | CONCLUSIONS | 128 | | 6.3 | SUGGESTIONS FOR FUTURE RESEARCH | 131 | | 6.4 | PUBLICATIONS | 132 | | BIB | BIBLIOGRAPHY | | | APF | APPENDICES | | ## LIST OF FIGURES | | | Page. | |-----------|-----------------------------------------------------------------------|-------| | Figure 1 | Communication interface. | 21 | | Figure 2 | CDR clocking scheme. | 22 | | Figure 3 | Typical classification for CDR architectures. | 25 | | Figure 4 | Typical CDR structure where Din, Rdata, and clk are the input data | , | | the rec | covery data, and the recovery clock signal respectively. | 28 | | Figure 5 | Number of presented CDR papers per year in relevant conferences | 3 | | and pu | ublished in one journal. | 30 | | Figure 6 | Papers published in ISSCC and JSSC classified into analog, digital | I | | or hyb | rid. | 31 | | Figure 7 | The Outline of the Dissertation. | 43 | | Figure 8 | Traditionally discrete linear model of a CDR system. | 45 | | Figure 9 | $K_{PD}$ gain vs noise for: a) gaussian, b) uniform and sinusoida | I | | case. | | 47 | | Figure 10 | General view of the implementation. | 48 | | Figure 11 | $K_{PD}$ dependence on $f_c/Drate$ taking into account gaussian jit- | - | | ter no | ise and channel loss. | 49 | | Figure 12 | $K_{PD}$ dependence on $f_c/Drate$ taking into account uniform jitter | • | | noise | and channel loss. | 50 | | Figure 13 | $K_{PD}$ dependence on $f_c/Drate$ taking into account sinusoida | I | | jitter n | oise and channel loss. | 51 | | Figure 14 | Total convolution of PDFs fixing $Sj_{pp}=0.4$ and $Rj=0.02$ | | | Upper | r-right plot indicates $K_{PD}$ values as function of $Dj_{pp}$ . | 52 | | Figure 15 | Impact of channel loss reflected on a) JTF and b) JTOL. | 53 | | Figure 16 | Time simulations results vs convolution approach for gaus- | | |-----------|--------------------------------------------------------------------|----| | sian r | oise at low $f_c/Drate$ levels. The $Conv$ graph in the right cor- | | | respo | nds to the convolution of gaussian PDF and the extracted PDF | | | at $f_c$ | $Drate \approx 0.23.$ | 54 | | Figure 17 | Discrete-time linear model for typical DPLL-based CDR. | 57 | | Figure 18 | System performance vs Noise level. | 58 | | Figure 19 | Typical DPLL-based CDR. | 59 | | Figure 20 | PDFs for uniform and sinusoidal jitter. | 59 | | Figure 21 | Mathematical model. | 62 | | Figure 22 | Mathematical model vs Simulations. | 63 | | Figure 23 | JTF frequency response. | 65 | | Figure 24 | Jitter Tolerance Response. | 66 | | Figure 25 | CDR discrete linear frequency model. | 68 | | Figure 26 | CDR time domain model. | 72 | | Figure 27 | RTL description model for the CDR. | 74 | | Figure 28 | Continuous to discrete time transformation for an integrator (a), | | | and th | e filter implementation (b). | 78 | | Figure 29 | Equivalent accumulator gain adjustment due to decimation. | 80 | | Figure 30 | Majority voting policies and noise effect. | 83 | | Figure 31 | Frequency (a), and time response (b) for the CDR described | | | by the | parameters in Table 3 | 86 | | Figure 32 | Procedure for frequency and time models comparison. | 87 | | Figure 33 | Worst case comparison for low frequency (a), peak frequency | | | (b), ar | nd high frequency (c) conditions. | 88 | | Figure 34 | Statistical error between the time and the frequency domain | | | CDR i | model. | 89 | | Figure 35 | Implementation of a guad-rate CDR architecture. | 89 | | Figure 37 Summary of relevant recent reported works in loop gain adapt | <b>)</b> - | |-------------------------------------------------------------------------------------|------------| | tation using correlation functions. Left side corresponds to the sin | 1- | | plified diagram schemes, and right side illustrates main features. (a | a) | | Adaptation using an alike objective function $F_{OBJ}$ based on autoco | r- | | relation at BBPD output, (b) adaptation methods using extra filtering | g | | at the BBPD output and avoiding some apriori assumptions, and ( | <b>c</b> ) | | proposed XCALG method. | 97 | | Figure 38 Linear z-model of a BB-CDR. | 99 | | Figure 39 Magnitude of $H_{ER}(f)H_{CK}^*(f)$ and the power spectrum of $H_{CK}(f)$ | f) | | and $H_{ER}(f)$ . | 103 | | Figure 40 Time-step model used for the BB-CDR. | 103 | | Figure 41 Observability comparison between $R_X(n)$ and $R_{XY}(n)$ for: a | a) | | $K_G$ set to 2.5, and b) $K_G$ set to 1.0. | 105 | | Figure 42 Observability improvement on $R_X(n)$ when low-pass filtering | is | | added at BBPD output. | 106 | | Figure 43 Case 1. Filtering property comparison for three different fre | ∋- | | quencies between autocorrelation (red) and cross-correlation (blue | ∍) | | for a well dumped condition. PM = 66°. | 107 | | Figure 44 Case 2. Filtering property comparison for three different free | <b>∋</b> - | | quencies between autocorrelation (red) and cross-correlation (blue | ∍) | | for an underdamped condition with PM = 22°. | 108 | | Figure 45 Bandwidth-limited jitter noise profiles for $J_{PI}$ used to compar | ·e | | the response of $R_X(n)$ and $R_{XY}(n)$ . a) Time domain, and b) frequence | ;y | | domain. | 109 | 90 Figure 36 Design methodology process. | Figure 46 | Correlation functions at the BBPD output using the time-step | | |-----------|----------------------------------------------------------------------------|------| | model | for different jitter profiles in $J_{PI}$ : a) Autocorrelation, b) cross- | | | correla | ation. | 109 | | Figure 47 | Definitions for $m_0$ and $m_{peak}$ in the cross-correlation function. | 111 | | Figure 48 | Ratios $m_{peak}/m_0$ and $n_{peak}/n_0$ as functions of $K_G$ gain. | 113 | | Figure 49 | Ratios $m_{peak}/m_0$ and $n_{peak}/n_0$ as functions of latency $(N_L)$ . | 114 | | Figure 50 | Phase margin and $m_{\it peak}/m_0$ relation across variations pre- | | | sented | I in Figs. 48 and 49. | 115 | | Figure 51 | Proposed XCALG system diagram. | 116 | | Figure 52 | Evolution of the adapted $K_G$ for only Gaussian $\Psi_{IN}$ . | 118 | | Figure 53 | Gaussian $\Psi_{in}$ and jitter profiles injected in $J_{PI}$ . | 118 | | Figure 54 | Extracted optimal $K_G$ procedure. (a) Minimum JTOL value ob- | | | tained | by manual seeking and via adaptation technique. (b) Example | | | of how | each point of manual extracted $K_G$ is obtained from JTOL curve | .119 | | Figure 55 | Extracted optimal $K_G$ procedure. Minimum JTOL value ob- | | | tained | by manual seeking and via adaptation technique for $m_{\it peak}/m_0$ | | | assum | ing 1.2, 1.5, and 1.8 values. | 119 | | Figure 56 | RTL implementation for the proposed cross-correlation estima- | | | tor circ | cuit. | 121 | | Figure 57 | Cross-correlation post-synthesis results vs Matlab xcorr. | 122 | | Figure 58 | Matlab 8192 samples per window vs cross-correlation circuit. | 122 | | Figure 59 | Layout implementation for the cross-correlation estimator cir- | | | cuit. a | ) Layout, b) area teardown. | 124 | | Figure 60 | Chip layout of the High-Speed Serial Interface designed in this | | | work. | Size: <b>1.66mm x 1.66mm.</b> | 130 | | Figure 61 | High-speed interface diagram system and power domains. | 148 | | Figure 62 | Simplified schematic of the clock and data recovery. | 148 | | Figure 63 | Sampler circuit schematic. | 148 | |---------------|-----------------------------------------------------------------------------|-----| | Figure 64 | Amplifier and comparator schematic. | 149 | | Figure 65 | StrongArm cell circuit. | 150 | | Figure 66 | Aligners circuits. | 150 | | Figure 67 | Deserializer 1 to 2 unit cell. | 151 | | Figure 68 | D-Flipflop circuit schematic. | 151 | | Figure 69 | Phase interpolator circuit. | 152 | | Figure 70 | CML to CMOS cell. | 152 | | Figure 71 | NLSA reconstruction of a jitter-free signal. Taken from the comple- | | | mentar | y information presented in <sup>1</sup> . | 155 | | Figure 72 | NLSA reconstruction of a signal corrupted by Gaussian jitter with | | | $\sigma = 50$ | fs. Taken from the complementary information presented in <sup>2</sup> . | 156 | | Figure 73 | a) Proposed CDR scheme using NLSA processing, b). Phase signals | | | (in UI) i | n the system: data jitter (blue), recovered clock phase (red), jitter error | | | signal ( | (green), and ideal recovery clock phase (black). | 157 | | Figure 74 | Differential FGNVR cell concept. | 161 | | Figure 75 | Bitcell schematic of the FGNVR and operation during reading, | | | progra | mming and stand-by/locking process. | 161 | | Figure 76 | Detailed schematic and operation of the sense amplifier. | 162 | | Figure 77 | Block diagram of the FGPUF proposed macro. | 163 | | Figure 78 | FGPUF testing: a) Micrograph of the FGPUF on the test chip | | | and de | etailed layout; b) Testboard for the FGPUF macro. | 164 | | | | | <sup>&</sup>lt;sup>1</sup> R. FUNG et al. "Dynamics from noisy data with extreme timing uncertainty". In: *Nature* (2016), pp. 471–475. DOI: 10.1038/nature17627. R. FUNG et al. "Dynamics from noisy data with extreme timing uncertainty". In: *Nature* (2016), pp. 471–475. DOI: 10.1038/nature17627. | Figure 79 | Raw unstable bits: a) Measured raw unstable bit percentage | | |-----------|---------------------------------------------------------------------|-------| | across | 8 8 chips at nominal conditions; b) Measured raw unstable bit | | | percer | ntage versus $V_{DD}$ variations. | 165 | | Figure 80 | Normalized Hamming Weight of PUF keys across 8 different | | | chips. | | 166 | | Figure 81 | Measured raw unstable bit percentage after the application of | | | TMV i | n a single chip. | 167 | | Figure 82 | Bit count decision for every bit in 1000 readings of a 32 bits cell | . 167 | | Figure 83 | Final PUFs keys bitmap. | 168 | ## LIST OF TABLES | | | Page. | |----------|-------------------------------------------------------------|-------| | Table 1 | Conditions tested in JTOL performance. | 66 | | Table 2 | BBPD gain expressions. | 82 | | Table 3 | Parameters for model comparison. | 85 | | Table 4 | CDR parameters | 91 | | Table 5 | Model parameters used for the linear z-model of Fig. 2. | 102 | | Table 6 | Test Conditions for Adaptation. | 117 | | Table 7 | Test Conditions for Adaptation. | 120 | | Table 8 | Synthesized estimated area among different window sizes for | r | | the c | cross-correlation estimator circuit. | 124 | | Table 9 | MOSFET size in each type of cell | 163 | | Table 10 | Measured Perfomance Comparison. | 169 | ## **LIST OF APPENDICES** | APPENDIX A. ANALOG FRAMEWORK AND SATELLITE PROJECTS | 147 | |----------------------------------------------------------|-----| | APPENDIX B. NONLINEAR LAPLACIAN SPECTRAL ANALYSIS - NLSA | 153 | | APPENDIX C. NVRAM-BASED STABLE PHYSICALLY UNCLONABLE FUN | IC- | | TION | 159 | #### **RESUMEN** **TÍTULO:** TÉCNICAS DE RECUPERACIÓN DE DATOS Y RELOJ PARA INTERFACES INTEGRADAS DE ALTA VELOCIDAD \* AUTOR: JAVIER FERNEY ARDILA OCHOA \*\* PALABRAS CLAVE: RELOJ Y RECUPERACIÓN DE DATOS, CDR, SERDES, INTERFAZ SERIAL, ENLACE DE ALTA VELOCIDAD, AUTOCORRELACIÓN, CORRELACIÓN CRUZADA, XCALG. #### **DESCRIPCIÓN:** La demanda de ancho de banda y el aumento gradual de la densidad de pines en los sistemas electrónicos han impulsado las interconexiones eléctricas y ópticas hacia una mayor tasa de transferencia. Desde dispositivos electrónicos portátiles hasta supercomputadoras, el ancho de banda de comunicación de datos por cable también debe crecer para evitar limitar la escala de rendimiento de estos sistemas. En este trabajo se explora el impacto y modelado de las pérdidas de canal en los sistemas de comunicación serial de alta velocidad, específicamente en los circuitos de recuperación de reloj y datos (CDR). Se presenta y se define una metodología de diseño para los circuitos CDR dentro de las interfaces de comunicación de alta velocidad. Además, se propone el método XCALG como alternativa para la adaptación de la ganancia de lazo en estos sistemas CDR. El principio básico es el uso de la función de correlación cruzada. Las propiedades de filtrado de la densidad espectral de potencia cruzada permiten la adaptación mientras mantienen un margen de fase apropiado en el sistema. Las principales ventajas y limitaciones de esta técnica sobre las tradicionales que utilizan autocorrelación son discutidas. Lo anterior es implementado mediante la fabricación de un circuito integrado en una tecnología CMOS de 0.18um. <sup>\*</sup> Tesis de Doctorado Facultad de Ingenierías Físico-Mecánicas. Escuela de Ingenierías Eléctrica, Electrónica y de Telecomunicaciones. Director: Elkim Felipe Roa Fuentes. PhD. #### **ABSTRACT** TITLE: CLOCK AND DATA RECOVERY TECHNIQUES FOR INTEGRATED HIGH SPEED INTERFACES \* **AUTHOR:** JAVIER FERNEY ARDILA OCHOA \*\* **KEYWORDS:** WIRELINE, CLOCK AND DATA RECOVERY, CDR, SERDES, SERIAL INTERFACE, HIGH-SPEED LINK, AUTOCORRELATION, CROSS-CORRELATION, XCALG. #### **DESCRIPTION:** The demand for bandwidth and the gradual increase in pin density in electronic systems have driven electrical and optical interconnections towards higher transfer rates. From handheld electronic devices to supercomputers, wireline data communication bandwidth must also grow to avoid limiting the performance scaling of these systems. This work explores the impact and modeling of channel losses in high-speed serial communication systems, specifically in clock and data recovery (CDR) circuits. A design methodology for CDR circuits within high-speed communication interfaces is presented and defined. Furthermore, the XCALG method is proposed as an alternative for the adaptation of the loop gain in these CDR systems. The basic principle is the use of the cross-correlation function. Cross-power spectral density filtering properties allow adaptation while maintaining an appropriate phase margin in the system. The main advantages and limitations of this technique over the traditional ones that use autocorrelation are discussed. The above is implemented by manufacturing an integrated circuit in 0.18um CMOS technology. <sup>\*</sup> PhD Thesis Facultad de Ingenierías Físico-Mecánicas. Escuela de Ingenierías Eléctrica, Electrónica y de Telecomunicaciones. Advisor:Elkim Felipe Roa Fuentes. PhD. #### INTRODUCTION The constant increment of data consumption in the daily basis is becoming a norm. As an example, it is estimated that global IP traffic included 3.9 billion internet users in 2018, and it will be around 5.3 billion by 2023 $^1$ . Then, the Zettabyte era has started and by 2022 the global traffic will reach an average run rate of 4.8 zettabytes ( $1ZB = 10^{12}GB$ ) per year. This amount of data is (and will be) in constant motion from one place to another in several levels of abstraction: internet connections through modems, devices communications using coaxial wire or optical fiber, and the exchanging of data between chips inside the same board or even circuits inside the same chip. Regarding this, wireline transceivers that push the limits of data rates, energy efficiency and reliability are extremely critical. In this context, clock and data recovery (CDR) circuits are essential systems in many modern transceiver architectures because they have to recover the data and timing information at the receiver-end combining high performance, low cost, low power, and small area. - et al T. BARNETT J. "Cisco Visual Networking Index (VNI): Complete Forecast Updated, 2017-2022". In: APJC Cisco Knowledge Network (CKN) (2018). #### 1. PROJECT OVERVIEW In this chapter, the background of this work is summarized and described, the technical challenges are presented as well as the contributions, aim, scope, and outline of this dissertation. #### 1.1. CLOCK AND DATA RECOVERY BACKGROUND 1.1.1. High-Speed Interfaces in Communication Systems — Communication interfaces are commonly composed by the transmitter (TX), the channel and the receiver (RX), as the simplified diagram in Fig. 1 illustrates. The TX adapts the data signal to be sent through the channel. Along the travel to the receiver, the data undergo attenuation, inter symbol interference (ISI) and delay due to the frequency response of the channel. Also they can experiment electronic cross talk interferences and electronic noise disturbances, both in amplitude and in time (jitter noise). Then, the RX has the task of recovering the data and the system synchronization which could incorporate or not a clock and data recovery circuit unit. Thus, the whole system goal is to be able to transmit data with a low bit-error rate (BER). Figure 1. Communication interface. It is common to refer to high-speed interfaces or high-speed links as those, in which the channel defines the maximum data transmission rate. In this work, the discussion is limited to electrical wireline channels, where the applications can operate up to the multi-gigabit per second domain <sup>2</sup>. Commonly, the clocking schemes at the system level are: global clocking (synchronous), source-synchronous (mesochronous) and CDR scheme (plesiochronous) <sup>2</sup>. The global clocking scheme uses the same clock generator to synchronize the transmitter and receiver and exchanges the data through a dedicated connection; in source-synchronous clocking, the linked interfaces use wide parallel buses and a clock forwarded along with the data. In contrast, in the CDR scheme, the transmitter delivers only the data to the receiver without any kind of clock signal. However, in the receiver-end, it is where the time synchronization is recovered using a CDR circuit. Fig. 2 shows a CDR clocking scheme. Figure 2. CDR clocking scheme. The magnitude of the timing uncertainties, which are directly related to the data rate, compared to the unit interval (UI) determines the type of a clocking scheme. For low rates (i.e., less than 100 Mb/s) parallel links using the source-synchronous scheme are suitable <sup>3</sup>; these interconnections were widely used in the past. Global scheme is preferred to short channels where the time delay is low in com- <sup>&</sup>lt;sup>2</sup> O. TYSHCHENKO. "Clock and Data Recovery for High-Speed ADC-based Receivers". In: *PhD Thesis, University of Toronto* (2011). S. RAVIKUMAR. "Circuit Architectures for High Speed CMOS Clock and Data Recovery Circuits". In: *Master Thesis, University of Illinois at Urbana-Champaign* (2015). parison with one UI, thus making possible to share the same clock for both the TX and RX. Eventually, the increasing in the information density, which involves the rise of cloud computing and mobile communications, has driven a great need to expand data communication bandwidth. As data rates went beyond several Gb/s regimen, several issues appeared with the system performance as noise, power consumption, cross talk, skew, channel loss and routing; all of them threatening the transmission reliability. Therefore, parallel links started to be replaced by serial binary links. A simple example is the Advanced Technology Attachment (ATA) moving towards Serial ATA (SATA). These serial interfaces and many others applications as High-Definition Multimedia Interface (HDMI) <sup>4</sup>, Peripheral Component Interconnect Express (PCIe) <sup>5</sup>, Serial Advanced-Technology Attachment (SATA) <sup>6</sup> and Universal Serial Bus (USB) <sup>7</sup> operate at multi-gigabit per second rates and thus, the problem of developing an effective CDR architecture for several Gb/s rates is becoming increasingly common <sup>89</sup>. **1.1.2. Clock and Data Recovery Circuits** Transceivers for wireline interfaces have been taking advantage of the proposed recovery techniques over the years. <sup>&</sup>lt;sup>4</sup> "HDMI Specification Version 1.3a". In: *HDMI Licensing, LLC, Sunnyvale, CA, USA* (2006). <sup>&</sup>lt;sup>5</sup> "PCI Express Base 2.1 Specification". In: *PCI-SIG, Beaverton, OR, USA* (2009). <sup>6 &</sup>quot;Serial ATA Revision 3.0 Specification". In: SATA-IO Administration, Beaverton, OR, USA (2009). <sup>&</sup>lt;sup>7</sup> "Universal Serial Bus 3.1 Specification." In: *Revision 1.0* (2013). <sup>&</sup>lt;sup>8</sup> B. RAZAVI. Design of Integrated Circuits for Optical Communications. 2nd. Wiley, 2012. M. T. HSIEH and G. E. SOBELMAN. "Architectures for Multi-Gigabit Wire-Linked Clock and Data Recovery". In: *IEEE Circuits and Systems Magazine* 8.4 (2008), pp. 45–57. DOI: 10. 1109/MCAS.2008.930152. With the technology scaling, the bandwidth and power consumption has been improving as the trends presented in <sup>10</sup> shows. By changing technology to smaller nodes, it is possible for designers to achieve higher maximum data rates at the same time that energy efficiency also can be improved. On the other hand, because the power consumption scales in a different manner and also depending of the complexity of the system, the relation between energy efficiency and data rate is not well defined, and only a lower bound around 4mW/Gb/s that underlays the state of the art transceivers can be detected. The role of the clock and data recovery system in a high-speed receiver is to extract the symbol timing from the received signal and use this timing for the data recovery in the presence of timing uncertainties or jitter in the received signal. Initially, the strategies were simple, sampling at the baud rate and making decisions based on processing the samples in order to generate a control signal to track the phase. An interesting review of the timing recovery problem is also presented in <sup>11</sup>. Recovery techniques are implemented as analog, digital or hybrid approaches since more than 20 years ago <sup>12</sup>. At that time, it was implemented interpolation methods in order to control the sampling clock in modern modems using a digital signal processing (DSP) <sup>12</sup> in a feedforward path. It was not long before several architectures and strategies for recovering appeared, including the CDR S. SAXENA et al. "A 2.8 mW/Gb/s, 14 Gb/s Serial Link Transceiver". In: IEEE Journal of Solid-State Circuits 52.5 (2017), pp. 1399–1411. DOI: 10.1109/JSSC.2016.2645738. K. MUELLER and M. MULLER. "Timing Recovery in Digital Synchronous Data Receivers". In: IEEE Transactions on Communications 24.5 (1976), pp. 516–531. DOI: 10.1109/TCOM. 1976.1093326. F. M. GARDNER. "Interpolation in digital modems. I. Fundamentals". In: *IEEE Transactions on Communications* 41.3 (1993), pp. 501–507. DOI: 10.1109/26.221081. schemes. Papers as <sup>13</sup> presents the basis of first implementations in CDR circuits. One way to classify the CDR architectures is presented in <sup>9</sup>, where CDR topologies are based on the phase relationship between the received data and the local clock at the receiver as the Fig. 3 summarizes. Figure 3. Typical classification for CDR architectures. According to <sup>9</sup>, the CDR can be classified in: - 1. Topologies using feedback phase tracking: PLL-based, DLL-based, phase-interpolator-based (PI-based) and injection locked (Injection). - 2. Oversampling without feedback tracking. - 3. Topologies using phase alignment without feedback phase tracking: Gated-oscillators (GVCO) and high-Q filters. B. RAZAVI. "Challenges in the Design High-Speed Clock and Data Recovery Circuits". In: *IEEE Communications Magazine* 40.8 (2002), pp. 94–101. DOI: 10.1109/MCOM.2002.1024421. PLL-based CDR can be divided according to whether or not they have a reference clock. Also depending of the signal domain, they can be analog or digital. Another classification is presented in <sup>2</sup> with more general criteria to group the CDR circuits. In contrast, in this work it is adopted a different way to organize the recovery techniques based on three main aspects: phase sampling, control core and timing adjustment. This is a structural classification, which is good for understanding the building blocks that compose the CDR. Fig. 4 shows the general conceptual scheme for a CDR which is composed by: • According to the phase sampling, CDR circuits can be: linear 14, binary D. RENNIE and M. SACHDEV. "A 5-Gb/s CDR Circuit With Automatically Calibrated Linear Phase Detector". In: *IEEE Transactions on Circuits and Systems I: Regular Papers* 55.3 (2008), pp. 796–803. DOI: 10.1109/TCSI.2008.916400. <sup>151617</sup>, oversampling <sup>1819</sup> or ADC-based <sup>202122</sup>. • Regarding the control core they can be: analog <sup>23</sup>, digital <sup>152425</sup> or hybrid - J. SARMENTO and J. T. STONICK. "A Minimal-Gate-Count Fully Digital Frequency-Tracking Oversampling CDR Circuit". In: *Proceedings of 2010 IEEE International Symposium on Circuits and Systems*. 2010, pp. 2099–2102. DOI: 10.1109/ISCAS.2010.5537061. - O. TYSHCHENKO et al. "A Fractional-Sampling-Rate ADC-based CDR with Feedforward Architecture in 65nm CMOS". in: 2010 IEEE International Solid-State Circuits Conference (ISSCC). 2010, pp. 166–167. DOI: 10.1109/ISSCC.2010.5434004. - B. ABIRI et al. "A 5Gb/s Adaptive DFE for 2x Blind ADC-Based CDR in 65nm CMOS". in: 2011 IEEE International Solid-State Circuits Conference. 2011, pp. 436–438. DOI: 10.1109/ISSCC.2011.5746386. - C. TING et al. "A Blind Baud-Rate ADC-Based CDR". in: 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers. 2013, pp. 122–123. DOI: 10.1109/ISSCC. 2013.6487664. - R. KREIENKAMP et al. "A 10-gb/s CMOS Clock and Data Recovery Circuit with an Analog Phase Interpolator". In: *IEEE Journal of Solid-State Circuits* 40.3 (2005), pp. 736–743. DOI: 10.1109/JSSC.2005.843624. - C. C. CHUNG and W. C. DAI. "A Referenceless All-Digital Fast Frequency Acquisition Full-Rate CDR Circuit for USB 2.0 in 65nm CMOS Technology". In: VLSI Design, Automation and Test (VLSI-DAT), 2011 International Symposium on. 2011, pp. 1–4. DOI: 10.1109/VDAT. 2011.5783614. - T. LEE et al. "A 5-Gb/s 2.67-mW/Gb/s Digital Clock and Data Recovery With Hybrid Dithering Using a Time-Dithered Delta-Sigma Modulator". In: IEEE Transactions on Very Large Scale J. L. SONNTAG and J. STONICK. "A Digital Clock and Data Recovery Architecture for Multi-Gigabit/s Binary Links". In: *IEEE Journal of Solid-State Circuits* 41.8 (2006), pp. 1867–1875. DOI: 10.1109/JSSC.2006.875292. P. K. HANUMOLU et al. "A 1.6Gbps Digital Clock and Data Recovery Circuit". In: IEEE Custom Integrated Circuits Conference 2006. 2006, pp. 603–606. DOI: 10.1109/CICC.2006. 320829. M. S. JALALI et al. "A Reference-Less Single-Loop Half-Rate Binary CDR". in: IEEE Journal of Solid-State Circuits 50.9 (2015), pp. 2037–2047. DOI: 10.1109/JSSC.2015.2429714. N. NEDOVIC et al. "A 40-44 Gb/s 3 times; Oversampling CMOS CDR/1:16 DEMUX". in: *IEEE Journal of Solid-State Circuits* 42.12 (2007), pp. 2726–2735. DOI: 10.1109/JSSC.2007.908714. 16 With respect to the time adjustment: phase-interpolator-based <sup>2326</sup>, voltage controlled oscillator (VCO) <sup>2718</sup> or digital controlled oscillator (DCO) <sup>242829</sup>. Figure 4. Typical CDR structure where Din, Rdata, and clk are the input data, the recovery data, and the recovery clock signal respectively. Combinations of the above aspects result in the several practical implementations of CDR circuits. It is important to note from Fig. 4 that CDRs also may have or not a feedback loop and a clock reference in some cases. Integration (VLSI) Systems 24.4 (2016), pp. 1450-1459. DOI: 10.1109/TVLSI.2015.2449866. G. WU et al. "A 1-16-Gb/s All-Digital Clock and Data Recovery With a Wideband, High-Linearity Phase Interpolator". In: *IEEE Transactions on Very Large Scale Integration (VLSI) Systems* PP.99 (2016), pp. 1–1. DOI: 10.1109/TVLSI.2015.2418277. J. LEE and M. LIU. "A 20Gb/s Burst-Mode CDR Circuit Using Injection-Locking Technique". In: 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. 2007, pp. 46–586. DOI: 10.1109/ISSCC.2007.373580. W. YIN et al. "A TDC-Less 7mW 2.5Gb/s Digital CDR with Linear Loop Dynamics and Offset-Free Data Recovery". In: 2011 IEEE International Solid-State Circuits Conference. 2011, pp. 440–442. DOI: 10.1109/ISSCC.2011.5746388. T. LEE, Y. H. KIM, and L. S. KIM. "A 5-Gb/s Digital Clock and Data Recovery Circuit With Reduced DCO Supply Noise Sensitivity Utilizing Coupling Network". In: *IEEE Transactions on Very Large Scale Integration (VLSI) Systems* PP.99 (2016), pp. 1–5. DOI: 10.1109/TVLSI. 2016.2566927. 1.1.3. State of the Art In the literature, there are plenty of works regarding the high speed interfaces and CDR circuits, the Fig. 5 is elaborated based on the number of published CDR papers in relevant journals and conferences over the last decade considering the time this dissertation is written. In this graph, some transceiver or receiver papers are also included, in which the design focus is the CDR. The searching sources were the *International Solid-State Circuits Confer*ence (ISSCC), the Journal of Solid-State Circuits (JSSC), the Symposium on Very Large Scale Integration Circuits (VLSIC) and the Custom Integrated Circuits Conference (CICC). The total number of papers is dominated by the ISSCC and the JSSC and it is noted that an average number of 13 papers/year were published before the 2012. After the 2012 the average drops to about 6 papers/year with a peak of 12 papers in 2014. In 2012 there were not any publication in ISSCC regarding only CDR proposals. Although there has been some work reported in other journals and conferences, the main point to highlight is the fact that a mature CDR architecture has been accepted and it is used within research and commercial transceivers. The recent wireline work has been refocused to solve equalization issues considering the new standards with data rates over 28 Gb/s using modulation PAM4. We can assert that the new challenges in equalization and data conversion for PAM4 have dominated the work in high speed interfaces <sup>3031</sup>. However, latency and data sampling implication on jitter tolerance need to be addressed to have a digital CDR working in the new standards. According to the Section 1.1.2, the papers presented in Fig. 5 can be classified K. GOPALAKRISHNAN et al. "3.4 A 40/50/100Gb/s PAM-4 Ethernet Transceiver in 28nm CMOS". in: 2016 IEEE International Solid-State Circuits Conference (ISSCC). 2016, pp. 62–63. DOI: 10.1109/ISSCC.2016.7417907. Y. FRANS et al. "A 56-Gb/s PAM4 Wireline Transceiver Using a 32-Way Time-Interleaved SAR ADC in 16-nm FinFET". in: *IEEE Journal of Solid-State Circuits* 52.4 (2017), pp. 1101–1110. DOI: 10.1109/JSSC.2016.2632300. Figure 5. Number of presented CDR papers per year in relevant conferences and published in one journal. into analog, digital or hybrid in terms of the control core. The Fig. 6 shows this classification taking into account only the ISSCC and JSSC papers, where it is a clear trend the use of digital implementations. Analog and hybrid dominate in 2008 because this year burst-mode CDRs became very popular architectures. These CDR types are very common in passive optical networks, in which, gated oscillator based and oversampling CDRs are the trends in these implementations. It is important to note the advantages of digital implementations over analog counterparts. For example, digital CDRs can be integrated in a small area and are more robust to process, voltage and temperature variations (PVT). In addition, in terms of testability it is easier to read register states and quantities in digital applications rather than a voltage or current appearing in analog circuits. For these reasons, it is preferred digital architectures in high speed interfaces. Specifically, among digital implementations, the digital phase-locked loop (DPLL) based CDR is widely used due to the power efficient, flexibility and effective functionality for Figure 6. Papers published in ISSCC and JSSC classified into analog, digital or hybrid. Gb/s data links 9153233. Hybrid analog and digital loop filter alternative is presented in $^{34}$ in order to eliminate the large capacitor used in a full analog implementation. Proportional path is done in the analog domain and integral path using a digital filter. Multi-rate from 155Mb/s to 2.5Gb/s is achieved with jitter tolerance (JTOL) greater than 0.55Ul and generated jitter (JGEN) of $1.2ps_{rms}$ . However, this approach is power hungry, with a power consumption about 425mW including drivers for a 2.5V of supply M. TALEGAONKAR, R. INTI, and P. K. HANUMOLU. "Digital Clock and Data Recovery Circuit Design: Challenges and Tradeoffs". In: 2011 IEEE Custom Integrated Circuits Conference (CICC). 2011, pp. 1–8. DOI: 10.1109/CICC.2011.6055346. A. ZARGARAN-YAZD and W. T. BEYENE. "Discrete-Time Modeling and Simulation Considerations for High-Speed Serial Links". In: 2014 IEEE 23rd Conference on Electrical Performance of Electronic Packaging and Systems. 2014, pp. 165–168. DOI: 10.1109/EPEPS. 2014.7103624. M. H. PERROTT et al. "A 2.5Gb/s Multi-Rate 0.25/spl mu/m CMOS CDR Utilizing a Hybrid Analog/Digital Loop Filter". In: 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers. 2006, pp. 1276–1285. DOI: 10.1109/ISSCC.2006.1696175. voltage. In the same year, the paper in <sup>35</sup> published a CDR with the highest speed and lowest power consumption per data rate of 3.9 mW/(Gb/s) for CMOS implementations reported at that time. Instead of increasing the loop bandwidth, a different CDR architecture that results in a combination of phase-tracking and blind oversampling is proposed in <sup>36</sup>. The main idea is to improve the JTOL response of the system by combining the responses of the classical phase-tracking CDR and a 5x blind oversampling approach. In this implementation, the jitter tolerance of a phase-tracking CDR alone is increased by a factor of 32 at frequencies below its loop filter's bandwidth. Although the frequency response of JTOL is improved, this approach is not a good choice for low power applications because of the power penalty impose for the additional hardware. Work in $^{37}$ illustrates the DPLL-based design. It reviews and describes in a very clearly way the main specifications for these types of CDR. The CDR presented employs a second-order digital loop filter and combines delta-sigma modulation with the analog PLL to achieve sub-picosecond phase resolution and better than 2 ppm frequency resolution. However, this work presents a recovered clock jitter about $28ps_{rms}$ after fabrication which is a large value in comparison with the state of the art. The main reason to occur that is the excessive PLL bandwidth which cannot do enough filtering in the shape noise in the delta-sigma modulator. <sup>35</sup> C. KROMER et al. "A 25-Gb/s CDR in 90-nm CMOS for High-Density Interconnects". In: *IEEE Journal of Solid-State Circuits* 41.12 (2006), pp. 2921–2929. DOI: 10.1109/JSSC.2006. 884389. M. VAN IERSSEL et al. "A 3.2 Gb/s CDR Using Semi-Blind Oversampling to Achieve High Jitter Tolerance". In: *IEEE Journal of Solid-State Circuits* 42.10 (2007), pp. 2224–2234. DOI: 10.1109/JSSC.2007.905233. P. K. HANUMOLU, G. Y. WEI, and U. K. MOON. "A Wide-Tracking Range Clock and Data Recovery Circuit". In: *IEEE Journal of Solid-State Circuits* 43.2 (2008), pp. 425–439. DOI: 10.1109/JSSC.2007.914290. In <sup>38</sup> circuits details for practical implementation of a digital CDR are well described. The main proposal of this work is a novel data rate selection logic, which allows to select data rates from 5.75 Gb/s to 44 Gb/s. Typical phase interpolator is used for the timing adjustment and power consumption per data rate is about 5.3mW/Gb/s with frequency offset of 650ppm. This is a classic example of how to improve the performance of the system by proposing simple but novel ideas in the basic CDR circuits blocks. Focused on a combination of a frequency-locked loop (FLL) with typical DPLL-based CDR, the architecture in <sup>39</sup> achieves a power efficiency 10 times better than all reported reference-less CDRs with the widest frequency acquisition at that time. However, the frequency detector performance depends on input transition density. Alleviating this issue, and also implementing a FLL with the digital CDR, a new scheme is proposed in <sup>40</sup>. It is presented a continuous rate digital CDR with automatic frequency acquisition in 65nm CMOS. This architecture, is immune to variations in transition density, and an unlimited range is achieved requiring minimum additional hardware. Same author also proposed in <sup>41</sup> a new L. RODONI et al. "A 5.75 to 44 Gb/s Quarter Rate CDR With Data Rate Selection in 90 nm Bulk CMOS". in: *IEEE Journal of Solid-State Circuits* 44.7 (2009), pp. 1927–1941. DOI: 10.1109/JSSC.2009.2021913. R. INTI et al. "A 0.5-to-2.5Gb/s Reference-Less Half-Rate Digital CDR with Unlimited Frequency Acquisition Range and Improved Input Duty-Cycle Error Tolerance". In: 2011 IEEE International Solid-State Circuits Conference. 2011, pp. 438–450. DOI: 10.1109/ISSCC. 2011.5746387. <sup>40</sup> G. SHU et al. "8.7 A 4-to-10.5Gb/s 2.2mW/Gb/s continuous-rate digital CDR with automatic frequency acquisition in 65nm CMOS". in: 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC). 2014, pp. 150–151. DOI: 10.1109/ISSCC.2014. 6757377. G. SHU et al. "A Reference-Less Clock and Data Recovery Circuit Using Phase-Rotating Phase-Locked Loop". In: *IEEE Journal of Solid-State Circuits* 49.4 (2014), pp. 1036–1047. DOI: 10.1109/JSSC.2013.2296152. architecture which removes the dependence between jitter transfer and jitter tolerance functions. At 5 Gb/s, the CDR consumes 13.1 mW power and achieves a recovered clock long-term jitter of 5.0ps when operating with PRBS31 input data. Since the DCO was implemented using a ring oscillator, it consumed more than 50% of CDR power and contributed to a large portion of recovered clock jitter. A different alternative is proposed in <sup>22</sup> with a blind baud rate ADC-based CDR. Feed-forward blind architectures eliminates the feedback loop between digital and analog domains. The word "blind" is because this kind of architectures does not sample the data at the center of the eye. Fabricated in 65nm CMOS. At 10Gb/s, the CDR demonstrates a high-frequency jitter tolerance of 0.19UI with 300ppm of frequency offset. The digital CDR contains a feedback loop including a data interpolator, a speculative 2-tap DFE, a speculative Mueller-Muller phase detector (MMPD), and a conventional 2nd-order loop filter. The CDR consumes 111.6mW. The main novel idea here is to add controlled ISI into the data to open the eye span. Most recently works on CDRs techniques bet on adaptive strategies in some of the building blocks in the digital filter loop. Authors in <sup>42</sup> proposed a frequency detection scheme for automatic adjustment of the phase detector in a reference-less baud-rate CDR. This technique corrects the frequency error and improve the capture range by more than 200x. It claims that baud-rate clock and data recovery circuits are becoming more prevalent in high-speed receivers because they consume less power due to that the data sampling is done only once per UI. In contrast, <sup>43</sup> bets on the adaptive loop gain using autocorrelation function and \_ W. RAHMAN et al. "6.6 A 22.5-to-32Gb/s 3.2pJ/b Referenceless Baud-rate Digital CDR with DFE and CTLE in 28nm CMOS". in: 2017 IEEE International Solid-State Circuits Conference (ISSCC). 2017, pp. 120–121. DOI: 10.1109/ISSCC.2017.7870290. <sup>&</sup>lt;sup>43</sup> J. LIANG et al. "A 28Gb/s Digital CDR With Adaptive Loop Gain for Optimum Jitter Tolerance". showing better performance in the JTOL function. A complete adaptive block is used after the majority voting circuit in order to change the proportional gain of the DPLL-based CDR. #### 1.2. TECHNICAL CHALLENGES IN HIGH-SPEED CDRS To understand the main challenges and details on the DPLL-based CDR, the paper presented in <sup>15</sup> is an excellent reference. The basic architecture, its linear modeling and implementation details regarding the mapping from linear model to circuit building blocks are presented. With similar architecture, the work in <sup>44</sup> introduces classical approaches for digital filters in the control loop. It presents a 3.125 Gb/s CDR using first, second and higher-order CDR. Tracking greater than 5000ppm is achieved, which allows to use this approach in applications that require spread spectrum clocking (SSC). SSC requirement is mandatory in standards for consumer electronics such as USB3.1 which is one of the goals in this work. Among the transceivers architectures, those that use DPLL-based CDRs became very popular and it did not take long to see this type of CDR in complete transceivers (receivers) systems. For instance, work in <sup>45</sup> presents a receiver which uses a very simple 2nd order filter CDR. This CDR also implements data In: 2017 IEEE International Solid-State Circuits Conference (ISSCC). 2017, pp. 122–123. DOI: 10.1109/ISSCC.2017.7870291. Haechang LEE et al. "Improving CDR Performance via Estimation". In: 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers. 2006, pp. 1296–1303. DOI: 10.1109/ISSCC.2006.1696177. B. S. LEIBOWITZ et al. "A 7.5Gb/s 10-Tap DFE Receiver with First Tap Partial Response, Spectrally Gated Adaptation, and 2nd-Order Data-Filtered CDR". in: 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. 2007, pp. 228–599. DOI: 10. 1109/ISSCC.2007.373377. filtering after the BBPD in order to mitigate the generated jitter implicit in this architecture. In addition, data filtering allows to have another degree of freedom in the effective close-loop gain. Data filtering is also mentioned in $^{46}$ , where the summation over 8 samples is done in order to smooth the control on the digital controlled oscillator. A maximum rate of 2.87Gb/s is achieved with 1.2V for suppy voltage. The power consumption is 13.2mW @2.5Gb/s for the core only and the jitter at same conditions are $7.2ps_{rms}$ , $47.2ps_{pp}$ . In multi-standard applications also the digital CDRs are found as is shown in <sup>47</sup>, where SATA/SAS, USB3.0, and PCIe are supported. To extend the functionality to these standards, interfaces require 5000ppm for SSC to suppress electromagnetic emissions, thus the digital CDRs must have much wider tracking range. In this paper, a tracking range of 15.6kppm is achieved with a tracking bandwidth between 8 to 10MHz @ 8Gb/s. The CDR implemented in this transceiver consumes 12mW @ 8Gb/s. In general, there are several approaches that improve one or more of the CDR specifications. Some architectures are very elaborate and others simple by efficient solutions in terms of hardware with a dominant trend moving towards DPLL-based CDRs. The definition of high-speed interfaces also has changed from 10 Gb/s interfaces in 2010, transceivers over 20 Gb/s in 2014, and to more than 50 Gb/s in 2017. Few papers presents detailed work on majority voting strategies and there is a trend for adaptive loop gain. Regarding the DPLL-based architectures, the challenges are translated into the following main issues: proper D. H. OH et al. "A 2.8Gb/s All-Digital CDR with a 10b Monotonic DCO". in: 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. 2007, pp. 222–598. DOI: 10.1109/ISSCC.2007.373374. <sup>47</sup> H. PAN et al. "A Digital Wideband CDR with +/-15.6kppm Frequency Tracking at 8Gb/s in 40nm CMOS". in: 2011 IEEE International Solid-State Circuits Conference. 2011, pp. 442–444. DOI: 10.1109/ISSCC.2011.5746389. modeling of CDR dynamics, latency in the control loop, jitter noise, and power consumption. All the above, let us with the main technical challenges. As discussed in the previous sections, high-speed wireline serial communication interfaces have become more challenging in terms of performance requirements and high transmission rates. For this reason, CDR circuits have been explored and have increased their popularity over the last decade with some architectures emerging as an alternative to mitigate new challenges. However, many transceiver architectures used today, still employ strategies proposed ten or more years ago, especially in digital architectures, which are the most common alterna- tives adopted by industry in these systems <sup>4849505152535455</sup>. Although there has been some work done to improve partially the CDR, new high speed standards requiring more than 10Gb/s data rate with large associated jitter, have imposed the need to look for new techniques when low performance nodes are used due to cost limitations. It is a fact that to push higher speeds in wireline communication interfaces, the digital filter used in the CDR must run at maximum synthesized clock. Further increasing in the speed of the received data brings some issues with latency and stability in the loop of DPLL-based CDR, because digital CDR M. POZZONI et al. "A Multi-Standard 1.5 to 10 Gb/s Latch-Based 3-Tap DFE Receiver With a SSC Tolerant CDR for Serial Backplane Communication". In: *IEEE Journal of Solid-State Circuits* 44.4 (2009), pp. 1306–1315. DOI: 10.1109/JSSC.2009.2014203. <sup>&</sup>lt;sup>49</sup> G. R. GANGASANI et al. "A 16-Gb/s Backplane Transceiver With 12-Tap Current Integrating DFE and Dynamic Adaptation of Voltage Offset and Timing Drifts in 45-nm SOI CMOS Technology". In: *IEEE Journal of Solid-State Circuits* 47.8 (2012), pp. 1828–1841. DOI: 10.1109/JSSC.2012.2196313. J. F. BULZACCHELLI et al. "A 28-Gb/s 4-Tap FFE/15-Tap DFE Serial Link Transceiver in 32-nm SOI CMOS Technology". In: IEEE Journal of Solid-State Circuits 47.12 (2012), pp. 3232–3248. DOI: 10.1109/JSSC.2012.2216414. P. A. FRANCESE et al. "A 16 Gb/s 3.7 mW/Gb/s 8-Tap DFE Receiver and Baud-Rate CDR With 31 kppm Tracking Bandwidth". In: *IEEE Journal of Solid-State Circuits* 49.11 (2014), pp. 2490–2502. DOI: 10.1109/JSSC.2014.2344008. G. R. GANGASANI et al. "A 32 Gb/s Backplane Transceiver With On-Chip AC-Coupling and Low Latency CDR in 32 nm SOI CMOS Technology". In: *IEEE Journal of Solid-State Circuits* 49.11 (2014), pp. 2474–2489. DOI: 10.1109/JSSC.2014.2340574. H. KIMURA et al. "A 28 Gb/s 560 mW Multi-Standard SerDes With Single-Stage Analog Front-End and 14-Tap Decision Feedback Equalizer in 28 nm CMOS". in: *IEEE Journal of Solid-State Circuits* 49.12 (2014), pp. 3091–3103. DOI: 10.1109/JSSC.2014.2349974. R. NAVID et al. "A 40 Gb/s Serial Link Transceiver in 28 nm CMOS Technology". In: IEEE Journal of Solid-State Circuits 50.4 (2015), pp. 814–827. DOI: 10.1109/JSSC.2014.2374176. Y. FRANS et al. "A 0.5-16.3 Gb/s Fully Adaptive Flexible-Reach Transceiver for FPGA in 20 nm CMOS". in: *IEEE Journal of Solid-State Circuits* 50.8 (2015), pp. 1932–1944. DOI: 10.1109/JSSC.2015.2413849. operates at lower speed that data rate transmission. Moore's law has allowed to increase the maximum running clock for digital circuits when a smaller geometry node is used, which translates to a relaxed latency specs for same data rate. This fact allows some margin to operate interfaces at same rates with better latency or higher rates without incurring in stability issues. However with interfaces pushing the limit well over 10Gb/s the margin is not enough and maximum data rate eventually has been imposed by technological constraints. On the other hand, power consumption increases as long as the operating frequency does. Alleviating this issue, also new smaller technology nodes are preferred. However, power efficiency has achieved a lower bound around 4 mW/Gb/s in the state of the art transceivers <sup>10</sup> and the benefits of technology and voltage scaling has been tapering off. At this point, it is important to avoid power hungry operations in the design and optimization of recovery functions in order to implement new techniques in a power efficient manner. ### 1.3. DISSERTATION AIM AND SCOPE The aim of this dissertation is to improve the performance of CDR circuits for high-speed interfaces regarding jitter and power consumption in order to implement them in consumer electronics applications. To accomplish that, the following objectives are proposed: - To propose new techniques for designing CDR systems in links up to 10Gb/s using low cost technology nodes, mainly focused on DPLL based CDR. - To explore and propose new solutions for dynamic loop control adaptation, taking into account the spread-spectrum clocking requirement for the USB3.1 standard. - To design a complete CDR system for USB3.1 with the proposed tech- niques. In order to test and validate the design, the circuit will be implemented and tested on silicon as a part of a transceiver. **1.3.1. Scope of this Dissertation** The scope of this dissertation is described below. **Digital Clock and Data Recovery** The work presented here is based on digital clock and data recovery. As described before, there are several approaches to perform clock and data recovery, all of these with advantages and drawbacks according to the application. The main focus of this dissertation falls on digital CDRs. **Bang-Bang Phase Detector** Among the digital CDR implementations, Bangbang phase detectors (BBPD) are the preferred phase error detection scheme in digital CDRs regarding their simplicity and accuracy advantages. However, the contributions presented in this dissertation are not limited by the phase detection scheme used. Loop Gain Adaptation The performance of high-speed communication interfaces would improve through the optimization of the fundamental functions for the CDR building blocks. Technological limitations have led to the search for new architectures approaches and alternative dynamic loop gain adaptation is still an open door that can be exploited with the intention of bringing improved architectures based on further qualitative and quantitative analysis of these blocks. In this dissertation, the CDR system is improved through the focus on the loop gain adaptation techniques. **Analog Framework** CDR systems are not stand alone blocks, they required a complete analog framework to proper validation. To accomplish this, this work is limited to design the necessary analog circuitry for a basic but satisfactory analog framework as: samplers, aligners, de-serializer, phase interpolator, clock distribution circuits, output buffers, and biasing. **CMOS Technology** The scope is also limited to CMOS technology. Most of the simulations and implementations presented in this dissertation are performed in a 180nm CMOS node; only the preliminary implementation costs of the cross-correlation function are discussed using a 65nm CMOS technology node. ### 1.4. ORIGINAL CONTRIBUTIONS The main original contributions of this dissertation are presented below. 1.4.1. Channel Losses Impact on Digital CDRs This dissertation shows that under certain conditions of incoming jitter in clock and data recovery circuits (CDR), the bang-bang phase detector (BBPD) gain can rise even for increments in the channel loss. Even more, it is shown how the BBPD gain can increase when sinusoidal and uniform jitter noise are combined; impacting on the CDR dynamic response. These observations are not clearly reported in the literature and here are presented in two approaches. First, direct measurements by using an extraction procedure that allows get the BBPD gain and second, by presenting an explanation through the convolution of probability density functions. **1.4.2. Stochastic Resonance in CDRs** A phenomenon called stochastic resonance (SR) is modeled and validated for clock and data recovery (CDR) circuits. In order to explain the underlying of the phenomenon in these systems, uniform and sinusoidal jitter noise are faced. This dissertation proposes an analytical model and validate the results over a CDR design with specifications from the USB 3.1 standard <sup>7</sup>. - **1.4.3. Design Methodology** A clear and compact design methodology is proposed and validated trough a complete design flow. - 1.4.4. Cross-Correlation Based Adaptive Loop Gain XCALG In this dissertation XCALG method is proposed as a new alternative to perform loop gain adaptation in digital CDRs. Cross-correlation function inherent properties and their link with the cross-power spectral density help consolidating the adaptive loop gain technique, XCALG, for clock and data recovery systems that implement bang-bang phase detector. XCALG features better observability, less impact from jitter sources, automatic CDR bandwidth traking, while keeping a safe phase margin. Theory behind the idea is presented in detail, and the XCALG is demonstrated through behavioral system simulations. - **1.4.5. Nonlinear Laplacian Spectral Analysis** Nonlinear Laplacian Spectral Analysis (NLSA) is a mathematical tools which arises as one unexplored alternative to perform clock and data recovery. The initial step in this study is presented in this dissertation as well as the advantages and drawbacks that can arise from this approach. - **1.4.6. Additional Contributions** Additional contributions of this dissertation include the work done in two microcontrollers projects called Tucan and Guerinni, as part of the research activities to complement the learning process. In addition, a switched capacitive voltage generator and a voltage-controlled oscillator in 130nm BCD technology were design during the internship at NXP Semicon- ductors. The above activities complement the academic and professional training process. ### 1.5. DISSERTATION OUTLINE The journey through this dissertation is illustrated in 7 and described below. Figure 7. The Outline of the Dissertation. After the technical background, aim, and scope of this dissertation presented in this *Chapter 1*, *Chapter 2* introduces the impact of channel loss in digital CDRs. Several tests and experimental simulations are presented to observe the channel losses effect on the bang-bang phase detector gain. In *Chapter 3* the main observations from the previous chapter is formally explained and modeled introducing the stochastic resonance phenomenon (SR). Chapter 4 presents the design methodology developed and used in this dissertation. A clear explanation and the main remarks for design is summarized there. At the end of the chapter, the methodology is implemented for the design of the digital CDR used in the following chapter. Chapter 5 adds the cross-correlation-based adaptive loop gain technique (XCALG) Filtering properties of the crosspower spectral density enhance the observability of loop dynamics allowing adaptation while maintaining the phase margin at a safe value. Also, main advantages and drawbacks of the novel technique is discussed in this chapter. Chapter 6 summarizes the main conclusions and remarks of this dissertation. In most of the cases a dissertation work is neither a clear nor a linear journey where each step is previous planned or anticipated. Based on this idea, additional projects arises as a byproduct of the work involved throughout the development of this dissertation. The place for these projects and contributions are the Appendices 6.4, 6.4, and 6.4. ### 2. JITTER AND CHANNEL LOSS IN DIGITAL CDR ### 2.1. INTRODUCTION Several high speed links applications incorporate CDR circuits at the receiver end (RX); USB3.1, PClexpress and serial advanced technology attachment (SATA) are examples of those applications. Digital phase-locked loop (DPLL) based CDR is widely used due to the power efficient, flexibility and effective functionality for Gb/s data links over analog counterparts $^{15323356}$ . Addressing the design of DPLL-based CDR requires clear understanding and proper simulation of the basic equivalent linear model shown in Fig. 8; where $K_{PD}$ , $K_V$ , $K_{DPC}$ , P and F are the BBPD gain, majority voting gain, digital to phase converter (DPC) gain, proportional and frequency path gains respectively. The parameter N represents the latency for the whole system loop, $\phi_{in}$ and $\phi_{out}$ are the input data phase and output clock phase respectively. Figure 8. Traditionally discrete linear model of a CDR system. Open loop transfer function is determined by the following equation: M. HSIEH and G. E. SOBELMAN. "Architectures for multi-gigabit wire-linked clock and data recovery". In: *IEEE Circuits and Systems Magazine* 8.4 (2008), pp. 45–57. DOI: 10.1109/MCAS.2008.930152. $$H_o(s) = \left(\frac{K_{PD}K_VK_{DPC}}{1 - z^{-1}}\right) \left(P + \frac{F}{1 - z^{-1}}\right) z^{-N} \tag{1}$$ $K_{PD}$ is the representation of a nonlinear block and is one of the most sensitive parameter in Eq. (13); it changes under different operation conditions such as jitter noise, transition density (TD) and inter symbol interference (ISI) <sup>338</sup>. This parameter has high influence on the CDR dynamic response, hence, the extraction of a proper $K_{PD}$ value is critical in order to obtain correlated results between the linear model and the actual behavior of the CDR. One scenario where $K_{PD}$ can change is in the synchronization process. When CDR starts-up, the data eye diagram is too closed, then, lots of bits are lost and the data TD differs considerably from the average value of 0.5 for random data. Once the CDR circuit approaches to the lock state the data eye diagram is opened, TD increases and $K_{PD}$ increases too. On the other hand, $K_{PD}$ value also changes depending of the incoming jitter noise. Both the amplitude and the type of noise, modify this gain and therefore the system frequency response. ### 2.2. JITTER NOISE AND EXTRACTION PROCEDURE In the industry it is widely accepted that jitter is decomposed into random and deterministic components that comprise the end to end connections in a transmission link, example of that is the standard for USB 3.1 $^{7}$ . Random sources exist as gaussian noise generated by the transmitter and receiver PLL; deterministic sources, are typically referred as uniform jitter inherent to ISI in the channel and sinusoidal jitter from the power supply $^{3357}$ . For example, Fig. 9 shows the effect of noise on $K_{PD}$ gain for different types of noise sources. Fig. 9(a) corresponds Alan BLANKMAN. "Understanding SDAII Jitter Calculation Methods." In: White Paper v 2.01 (2012). to gaussian noise which is characterized by the standard deviation $\sigma_{gauss}$ UI (Unit Interval); Fig. 9(b) refers to uniform noise with $Dj_{pp}$ UI and sinusoidal noise described through $Sj_{pp}$ which are the peak-peak amplitude of the distributions. For all cases, as noise level increases $K_{PD}$ decreases in a nonlinear manner. Figure 9. $K_{PD}$ gain vs noise for: a) gaussian, b) uniform and sinusoidal case. Several analyses have been performed to relate the gaussian and uniform jitter noise with the $K_{PD}$ gain and are well summarized in references $^{585960}$ . However, the nonlinear reduction for the sinusoidal case is not clearly reported in the literature. This paper shows and explains how the $K_{PD}$ can increases even for higher values of incomming jitter or channel loss when the sinusoidal jitter is taking into account. In order to accomplish that, first of all, an extraction procedure that allows to extract the $K_{PD}$ gain is implemented. JRI LEE, K. S. KUNDERT, and B. RAZAVI. "Modeling of jitter in bang-bang clock and data recovery circuits". In: *Proceedings of the IEEE 2003 Custom Integrated Circuits Conference, 2003.* 2003, pp. 711–714. DOI: 10.1109/CICC.2003.1249492. Jri LEE, K. S. KUNDERT, and B. RAZAVI. "Analysis and Modeling of Bang-Bang Clock and Data Recovery Circuits". In: *IEEE Journal of Solid-State Circuits* 39.9 (2004), pp. 1571–1580. DOI: 10.1109/JSSC.2004.831600. A. GABR and T. KWASNIEWSKI. "Unifying Approach for Jitter Transfer Analysis of Bang-Bang CDR Circuits". In: *Electronics and Information Engineering (ICEIE), 2010 International Conference On.* Vol. 2. 2010, pp. V2–40–V2–44. DOI: 10.1109/ICEIE.2010.5559711. The diagram to accomplish the task of extracting $K_{PD}$ are shown in Fig. 10. TX module represents the transmitter. This module contains a clocked pseudorandom binary sequence (PRBS) that can be programmable; in this work a PRBS-7 is used. The clock is generated by the *Clk* module and it is possible to select between clean or noisy clock through *Noise sources* routines. Then, random data is passed to the testing block composed by *Test for BBPD* a BBPD implementation and another *Clk* module. To perform time simulation of the procedure shown in Fig. 10 it is needed to select a proper time step. In order to accomplish this, it is suggested at least an oversampling ratio (OSR) greater than 2. The *Test for BBPD* block takes the data and clock and stimulates the BBPD shifting the clock phase over all phases specified in the system. The output average is taken and saved to compute one point in the transfer curve of BBPD. The *Post-processing* block calculates the $K_{PD}$ gain. The *Noise Sources* block allows to select among any of the three types of noise mentioned before in order to generate noisy clock signal. Figure 10. General view of the implementation. # 2.3. IMPACT OF CHANNEL LOSS ON $K_{PD}$ Channel loss is modelled with a simple linear first-order low pass filter. This filter is characterized by a DC gain equal to 1 and a cut frequency denoted by $f_c$ . It is out of the scope of this work to make a more precise modelling of the channel, but the implementation of the linear filter is enough to extract some results related to the impact on the performance of CDR system. The magnitudes for input jitter noise used through the rest of the paper are reasonable values based on the jitter budgeting for the standard USB 3.1 <sup>7</sup>. **2.3.1. Channel Loss with Gaussian Noise** Several cases are evaluated for different levels of gaussian jitter noise. Fig. 11 shows the behavior of $K_{PD}$ as a function of the degraded input data. Data degradation is quantified as a relation between $f_c$ of the channel loss representation and the data rate (Drate); denoted by $f_c/Drate$ . In this test, Drate is equal to 10 Gb/s and $\sigma = [0.03, 0.04, 0.05]$ UI. Simulations are performed using the extraction procedure of Section 3.3. Figure 11. $K_{PD}$ dependence on $f_c/Drate$ taking into account gaussian jitter noise and channel loss. The flat region in the curve corresponds to low channel losses and the $K_{PD}$ values obtained in this region are different because of the different noise levels used. On the other hand, as the losses increase (low $f_c/Drate$ ) the gain obtained decreases due to the lots of transitions that are lost in the sampling process done by the BBPD, especially for frames of Nyquist data (10101...). For example, $K_{PD}$ decreases from 11.5 per UI at $f_c = 4$ GHz to only 2 per UI at $f_c = 2$ GHz for a $\sigma = 0.03$ UI noise level. However, for low levels of $f_c/Drate$ also it exists an increment of gain for high noise values. The explanation of this effect is postponed until subsection D, so far, it is enough to note that is due to the channel loss nature. **2.3.2. Channel Loss with Uniform Noise** Fig. 12 shows the simulation results when only the uniform noise is considered. In this case, the injected jitter levels are $Dj_{pp}=[0.3,0.4,0.5]$ UI and Drate corresponds to 10 Gb/s. For low channel loss, it is observed that $K_{PD}$ is higher for less injected noise; however, gain falls drastically when $fc/Drate \leq 0.25$ for all noise levels. Also, the gain is no longer higher for less noise; moreover, channel losses make this gain to be higher for higher injected noise in some cases. For example, $K_{PD}=2.1$ per UI when $f_c/Drate=0.18$ and $Dj_{pp}=0.2$ UI, but for the same $f_c/Drate$ and $Dj_{pp}=0.3$ UI, the gain has a little increment to 2.5 per UI. Thus, as in the gaussian case, the behavior of the gain for high levels of channel loss is not easy to predict. Figure 12. $K_{PD}$ dependence on $f_c/Drate$ taking into account uniform jitter noise and channel loss. **2.3.3. Channel Loss with Sinusoidal Noise** Fig. 13 shows simulation results for sinusoidal jitter noise. Here, the gain increases as the channel loss does, before the gain starts to fall, this is not evident from the behavior expected and is explicit shown in the peaks of the curves. Below some point, different for each noise level, the gain decreases considerably. Also, similar to gaussian and uniform noise, gain increases when the noise level injected is higher at low $f_c/Drate$ values. Figure 13. $K_{PD}$ dependence on $f_c/Drate$ taking into account sinusoidal jitter noise and channel loss. The presence of these peaks when the channel losses increase is due to the nature of the sinusoidal jitter noise. Interesting explanation arises when the probability density function (PDF) of noise is studied. The convolution of the noise PDFs presented in the system allows to extract the $K_{PD}$ in a theoretical manner <sup>58</sup>. The $K_{PD}$ gain corresponds with the value of this convolution at 0 UI <sup>5933</sup>. Due to the asymptotic behavior in the tails of a sinusoidal PDF, the total convolution of all types of noise presented in the system shows an irregular behavior at 0 UI. For example, Fig. 14 presents the results obtained when sinusoidal and uniform noise are faced at same time, which is a first approach when sinusoidal jitter noise is injected to data corrupted by the channel losses. In this case, the sinusoidal jitter noise is fixed at $Sj_{pp}=0.4$ UI level and the uniform $Dj_{pp}$ ranges from 0.2 to 0.5 UI; also, low $R_j$ is added only for smoothing the curves. It is observed in the curves that represent the total convolution that $K_{PD}$ increases even if uniform jitter is increased as it is shown for $Dj_{pp}$ from 0.2 to 0.4 UI. This behavior is highlighted using an extra curve that takes the $K_{PD}$ values from convolution and plot them as a function of uniform noise. Finally, at some point between 0.4 and 0.5UI the gain reaches its maximum and goes down. Therefore, the peaking of gain due to the increment of channel losses is due to the interaction between these losses and sinusoidal noise. Figure 14. Total convolution of PDFs fixing $Sj_{pp} = 0.4$ and Rj = 0.02. Upper-right plot indicates $K_{PD}$ values as function of $Dj_{pp}$ . **2.3.4.** Impact on the CDR Dynamics The unexpected behavior of $K_{PD}$ for high channel losses with sinusoidal noise impacts on the dynamic of the system. Here, the case for $Sj_{pp}=0.1$ UI presented in Fig. 13, is exercised for no channel loss and for $f_c/Drate=0.35$ which correspond to the peaking in $K_{PD}$ . Parameters others than $K_{PD}$ in the model of Fig. 8 are taken from the 5 Gb/s experiment presented in <sup>15</sup>. The jitter transfer function (JTF) for the digital CDR model is: $$J_{TF} = \frac{H_o}{1 + H_o},\tag{2}$$ where $H_0$ is the open loop gain given by the Eq. (13). The results are presented in Fig. 15. In the first case, no channel losses are considered and the $K_{PD}$ associated is 6.6 per UI (flat region in Fig. 13), producing a frequency response with a 1MHz bandwidth. In contrast, case for $f_c/Drate = 0.35$ produces a $K_{PD}$ of 8.5 per UI and a bandwidth of 1.3 MHz approximately. These results show that even with more channel loss, the CDR bandwidth is higher, an unexpected result that is not reported in the literature. The jitter tolerance function (JTOL) is given by the following equation: $$J_{TOL}(z) = \left| \frac{\gamma}{1 - J_{TF}(z)} \right|,\tag{3}$$ where $\gamma$ is the timing margin in the data eye in terms of UI and $J_{TF}$ is given by Eq. 14. For high frequencies JTOL is limited by $\gamma$ margin and is related directly with the amount of noise; it is shown in Fig. 15(b) that for higher noise levels the margin is less. However, for low frequencies, the case that corresponds to higher sinusoidal noise presents a higher JTOL. Figure 15. Impact of channel loss reflected on a) JTF and b) JTOL. **2.3.5. Channel Loss Probability Density Function** Probability density function for channel losses is a type of deterministic noise, but modelling it with merely an uniform PDF does not allow to understand the another interesting behavior observed at low values of $f_c/Drate$ in Figs. 11, 12 and 13. For some low values of $f_c/Drate$ the channel loss seems to be dominant and the gain is higher even for greater injected noise. This phenomenon suggests that channel losses are not well modelled with an uniform distribution. For this reason, the actual PDF implemented here is extracted and added to the total convolution of PDFs in order to explain the results observed with time simulation measurements at low levels of $f_c/Drate$ . Time simulations are used to extract jitter noise due only for channel losses, then, a fitting procedure is made to obtain the PDF. To validate the correct model implemented, theoretical extraction of $K_{PD}$ is contrasted with simulation results using the extraction procedure. For instance, Fig. 16 shows regions for low $f_c/Drate$ conditions using the gaussian case of Fig. 11. In this region, $K_{PD}$ is no longer less for high injected noise. Using the extracted PDF for channel loss, total convolution includes this PDF and are added in the plot in order to show the correlation with the time simulations. Fig. 16 corresponds to gaussian noise plus channel loss, in this figure $f_c/Drate \approx 0.23$ was selected for explanation; this value corresponds to a set of three PDFs, one for each gaussian noise condition. Results presented by time simulations (left) are the same obtained with the convolution approach (right); thus, the model used for channel loss is better than use only uniform PDF and can explain the unexpected behavior for low levels. Figure 16. Time simulations results vs convolution approach for gaussian noise at low $f_c/Drate$ levels. The Conv graph in the right corresponds to the convolution of gaussian PDF and the extracted PDF at $f_c/Drate \approx 0.23$ . ## 2.4. SUMMARY An extraction procedure was used to get actual value of the $K_{PD}$ under different conditions of incoming jitter and channel loss. Nonevident increasing in $K_{PD}$ for some cases where the incoming jitter is increased too, is explained through the extraction and analysis of the PDF for channel loss. Also, an increment on $K_{PD}$ where sinusoidal and uniform jitter are combined is explained and its impact on the CDR dynamic response is presented. As a final comment, maximum $K_{PD}$ value is not always reached at 0UI and this suggests that for some conditions, phase sampling point of the data can be changed from 0 UI to the point where a maximum occurs, improving CDR response. # 3. STOCHASTIC RESONANCE IN BANG-BANG PHASE DETECTOR BASED CDR ### 3.1. INTRODUCTION Digital phase-locked loop (DPLL) based CDR systems are widely used in high speed links applications, e.g., USB3.1, PClexpress and SATA due to the power efficient, flexibility and effective functionality in the Gb/s regime over analog counterparts <sup>1532</sup>. DPLL-based CDRs commonly use a bang-bang phase detector (BBPD) as the comparison block in the feedback loop which turns this type of CDR in a nonlinear system. However, works reported in literature usually employ a discrete-domain linear model as a common practice to define initial design parameters, where $\phi_{in}$ and $\phi_{out}$ represents the input data phase and output clock phase respectively presented in Fig. 17. As a result, non-linearity effects might not be perceived and design parameters might no consider undesired operation. The dynamics of the CDR system is strongly related to the BBPD gain $(K_{BB})$ which is a sensitive parameter that depends on the input jitter noise. Due to the nonlinear nature of the CDR and the different types of noise that could appear at the input data, a phenomenon called stochastic resonance might appear. Although this phenomenon has been reported and discussed by 61 in reference to bang-bang PLLs, the modeling and impact on DPLL-based CDRs have not been studied. In this work we propose an analytical model and validate the results over a CDR design with specifications from the USB 3.1 standard 7. \_ G. MARUCCI et al. "Exploiting Stochastic Resonance to Enhance the Performance of Digital Bang-Bang PLLs". In: *IEEE Transactions on Circuits and Systems II: Express Briefs* 60.10 (2013), pp. 632–636. DOI: 10.1109/TCSII.2013.2273732. Figure 17. Discrete-time linear model for typical DPLL-based CDR. ### 3.2. STOCHASTIC RESONANCE Stochastic resonance (SR) is a physical phenomenon observed in many fields of science, in either natural or artificial systems <sup>626163</sup>. The most distinguished characteristic of SR is the enhancement of a system performance indicator due to the superposition of two or more types of noise <sup>62</sup>. In fact, this indicator presents a maximum for a singular value of noise level different from zero as seen in Fig. 18, in which $K_{1,2}(NL)$ represents the noise effect over some parameter K as a function of the noise level (NL). It can be shown that for different noise level regions one of the components can dominate the overall performance, thus the total response can be view as a function defined by parts. This behavior resembles the plot of a frequency dependent system with an output response at resonance frequency, from which its name is derived. In this paper, the system performance parameter is related to $K_{BB}$ and the noise level is related to input jitter. Two main necessary, but not sufficient conditions have been recognized in systems with SR <sup>62</sup>. First, the system must be a strong nonlinear dynamical system. In a linear system, the performance indicator will change inversely proportional - M. D. MCDONNELL. "Is Electrical Noise Useful? [Point of View]". In: Proceedings of the IEEE 99.2 (2011), pp. 242–246. DOI: 10.1109/JPR0C.2010.2090991. D. G. LUCHINSKY et al. "Stochastic Resonance in Electrical Circuits. I. Conventional Stochastic Resonance". In: *IEEE Transactions on Circuits and Systems II: Analog and Digital* Signal Processing 46.9 (1999), pp. 1205–1214. DOI: 10.1109/82.793710. Figure 18. System performance vs Noise level. to the input noise, hence, any increase in the input noise will derive in the decrease of this indicator. The second condition is the presence of a random-noise source within the system, since introducing noise into a system without inherent randomization, will unlikely produce any performance increment. In high-speed serial-link interfaces, a DPLL-based CDR is a clear example of a system with SR presence. A BBPD is highly nonlinear system and suffers of inherent sources of random-noise <sup>6163</sup>. Widely accepted types of noise in the industry applications include Gaussian, uniform and sinusoidal random noise. As illustrated in the DPLLbased CDR shown in Fig. 19, the input jitter of the incoming data modulates the $K_{BB}$ value. In this context, it should be mentioned that the SR phenomenon is more noticeable for the sinusoidal case, which in combination with channel loss gives an alike deterministic noise. For a more general case, where the channel loss is modeled by a first order system and sinusoidal jitter is injected, it can be shown that stochastic resonance behavior is obtained. The general case with channel loss and sinusoidal noise has been exposed in <sup>64</sup> but a formal derivation of mathematical model has not been presented. In order to explain and under- \_ J. ARDILA and E. ROA. "On the Impact of Channel Loss on CDR Locking". In: 2016 IEEE 59th International Midwest Symposium on Circuits and Systems (MWSCAS). 2016, pp. 1–4. DOI: 10.1109/MWSCAS.2016.7870075. stand how the SR behavior arises in CDR systems, the modeling for the case of uniform and sinusoidal input noise is presented as well as the impact on the dynamic response in these systems. Figure 19. Typical DPLL-based CDR. ### 3.3. MATHEMATICAL APPROACH The $K_{BB}$ gain can be obtained by extracting the value at 0 UI from the convolution of probability density functions (PDF) associated to the input jitter components <sup>33</sup>. Then, for the discussion of the SR phenomenon, uniform and sinusoidal jitter are selected and they are enough to explain the nonevident behavior of $K_{BB}$ as the input jitter increases. Fig. 4 shows the PDFs of these types of jitter. Figure 20. PDFs for uniform and sinusoidal jitter. The mathematical expression for the uniform PDF is given by: $$p_1(x) = \frac{1}{D_{pp}} [u(x + D_{pp}/2) - u(x - D_{pp}/2)], \tag{4}$$ where $D_{pp}$ is the peak-to-peak value of the PDF and represents the maximum span that the random jitter can reach. It is easy to note that as $D_{pp}$ increases then the value at x=0, $p_1(0)=\frac{1}{D_{pp}}$ decreases; which means that a lower $K_{BB}$ is expected to be obtained for this scenario. For the sinusoidal jitter, the PDF expression is given by: $$p_2(x) = \frac{1}{\pi \sqrt{(S_{pp}/2)^2 - x^2}},\tag{5}$$ where $S_{pp}$ are the peak-to-peak value of sinusoidal jitter. This value corresponds to the peak-to-peak of a sinusoidal waveform in which the amplitude represents the amount of jitter over the time. For x=0 this expression reduces to $p_2(0)=\frac{2}{\pi S_{pp}}$ , showing a reduction as $S_{pp}$ increases. Hence, the total $K_{BB}$ gain can be estimated as follows, $$y(x) = p_1(x) * p_2(x) = \int_{-\infty}^{\infty} p_1(\tau) p_2(x - \tau) d\tau,$$ (6) $$K_{BB} = y(0) = \int_{-\infty}^{\infty} p_1(\tau) p_2(-\tau) d\tau,$$ (7) because the functions are even, then, $$K_{BB} = \int_{-\infty}^{\infty} p_1(\tau) p_2(\tau) d\tau. \tag{8}$$ Depending of the actual values of $D_{pp}$ and $S_{pp}$ this function can present two different solutions. This is the key condition that allows the SR to appear. **Case for** $D_{pp} > S_{pp}$ For the case where the uniform jitter is dominant the Eq. 8 can be reduced to: $$K_{BB} = \int_{-S_{pp}/2}^{S_{pp}/2} \frac{1}{D_{pp}\pi\sqrt{(S_{pp}/2)^2 - \tau^2}} d\tau, \tag{9}$$ with the solution, $$K_{BB} = \frac{1}{D_{pp}}. ag{10}$$ This result shows that if $S_{pp}$ is fixed but less than $D_{pp}$ , then, the increasing of $D_{pp}$ decreases the $K_{BB}$ gain. This result is in concordance with the preliminary idea that if the magnitude of input jitter is increased then the gain of the BBPD is reduced. **Case for** $D_{pp} < S_{pp}$ In this case, the dominant noise is sinusoidal and the Eq. 8 reduces to: $$K_{BB} = \int_{-D_{pp}/2}^{D_{pp}/2} \frac{1}{D_{pp}\pi\sqrt{(S_{pp}/2)^2 - \tau^2}} d\tau, \tag{11}$$ the solution of this integral is analytical and is given by the following function, $$K_{BB} = \frac{2}{\pi D_{pp}} sin^{-1} \left( \frac{D_{pp}}{S_{pp}} \right). \tag{12}$$ This is an increasing function of $D_{pp}$ for a fixed $S_{pp}$ , then for this case, the $K_{BB}$ increases as the input jitter increases through the $D_{pp}$ value. Although the qualitative result is clearly reported in <sup>64</sup>, here it is demonstrated with mathematical model and simulations. In Fig. 21, $K_{BB}$ is plotted as a function of $D_{pp}$ for three different and fixed cases of $S_{pp}$ . As $D_{pp}$ changes from 0.1 to 0.7 for a fixed $S_{pp}$ , the $K_{BB}$ gain goes up at first, reaches a maximum at $D_{pp} = S_{pp}$ , and then goes down tracking the asymptotic shape of a $\frac{1}{x}$ function. Note that for all cases the $K_{BB}$ gain reaches a maximum value at $D_{pp} = S_{pp}$ , which corresponds with the transition point between the two regions (Eq. 10 and 12) that describe the gain behavior. Figure 21. Mathematical model. In the next section, time-step simulations results are presented in order to validate the proposed mathematical approach. ### 3.4. SIMULATION RESULTS In order to validate the model, a comparison between the mathematical model exposed here and numeric simulations is presented. Numeric simulations are performed implementing the CDR system in Fig. 17 as a discrete model described by difference equations. These equations are used in a time-step simulation, where the nonlinear behavior of the BBPD is considered through a sign function over the phase difference between clock and data. Two cases are presented in Fig. 22, for $S_{pp}=0.3$ and 0.5UI and $D_{pp}$ is swept from 0.1 to 0.8UI. The region where $D_{pp} < S_{pp}$ is given by Eq. 10 and it is fit by the mathematical model, the same occurs for $D_{pp} > S_{pp}$ where the behavior is related to Eq. 12. However, the maximum error between the equations and simulations is reached just at the tran- sition point corresponding to the maximum $K_{BB}$ , this is due to the numerical error introduced by the finite step size in the numeric simulation, and also to the limited number of points used. As expected, the maximum value is reached for the condition $D_{pp} = S_{pp}$ , which happens in different places for the two tested cases. In addition, the apparently noisy characteristic seen in the time-step curves is related to the numerical nature of the simulations because the CDR works on averaged signals. Figure 22. Mathematical model vs Simulations. **3.4.1. Impact on CDR frequency response** The behavior of $K_{BB}$ due to SR when sinusoidal and uniform jitter noise are faced each other impacts on the CDR dynamics. The system of Fig. 19 is exercised with the $K_{BB}$ values presented in the curve of Fig. 21 that corresponds to $S_{pp}=0.2$ UI and varying $D_{pp}$ from 0.1 to 0.6UI. Parameters others than $K_{BB}$ in the model of Fig. 17 use the following nominal design values: $K_V=2$ , $P=5*2^{-5}$ , $F=2^{-11}$ , $K_{DPC}=2^{-8}$ and N=20 UI. Where $K_V$ , $K_{DPC}$ , P and P are the majority voting gain, digital to phase converter (DPC) gain, proportional and frequency path gains, respectively. The parameter N represents the latency for the whole system loop. These parameters were selected in order to meet the golden PLL mask for the USB 3.1 Gen1 standard $^7$ , as an example to validate the model and demonstrate a possible arise of SR in standard applications. The open loop transfer function for the linear system of Fig. 17 is given by: $$H(z) = \left(\frac{K_{BB}K_{V}K_{DPC}}{1 - z^{-1}}\right) \left(P + \frac{F}{1 - z^{-1}}\right) z^{-N}.$$ (13) The jitter transfer function (JTF) for the digital CDR model is given by: $$JTF(z) = \frac{H(z)}{1 + H(z)}. (14)$$ The results are shown in Fig. 23 where the blue plot corresponds to the nominal design values, and all the space generated due to SR is highlighted in a gray region. For this nominal case, it is assumed only Gaussian noise described by $\sigma=0.02$ UI according to the standard, which corresponds to a $K_{BB}=19.94$ per UI. Once the SR conditions presented in this work is taken into account, P and F gains must be adjusted to $30*2^{-5}$ and $6*2^{-11}$ respectively, in order to set the golden PLL in one of the $K_{BB}$ values evaluated in the test. With this adjustment the same transfer function is obtained at $K_{BB}=3.3$ per UI. It is important to note how the bandwidth (BW) of the CDR system is modified due to the SR phenomenon, it goes from 3.6MHz for $K_{BB}=1.5$ to 17.6MHz for $K_{BB}=5$ . Not only the BW is altered, but the peaking and stability can be affected, passing from 1.68 dB of peaking for low $K_{BB}$ until 2.95 dB for the highest $K_{BB}$ . For this last case, the peaking specification does not satisfy the standard USB 3.1 Gen 1, which means that SR can degrade performance as such level that can lead the CDR system out of specifications. **3.4.2. Impact on CDR jitter tolerance function** The jitter tolerance function (JTOL) is defined as, Figure 23. JTF frequency response. $$JTOL(z) = \left| \frac{\gamma}{1 - JTF(z)} \right|.$$ (15) This is a dependent input noise function because of the term $\gamma$ , which is the timing margin in the data eye diagram in terms of UI. As input jitter increases the timing margin is reduced regardless of the SR. In contrast, the JTF(z) (Eq. 14) could be strongly affected by SR as shown in last section. Thereby for the JTOL test, it is not presented a region, but several cases listed in Table 1 with the aim of understanding the impact of SR on JTOL function. Each transfer function resulting from Eq. 15 and using the values listed in Table 1 is plotted in Fig. 24 including the jitter tolerance mask for the USB 3.1 Gen1 standard. When noise is high and a reduction of $K_{BB}$ occurs, the JTOL response may fail to meet the jitter tolerance mask specified in the standard. However, notice that before this happens, the tolerance can even improve with a noise increment as in the case for low frequencies. For very high frequencies the JTOL function always reaches the $\gamma$ value and the SR does not impact. Table 1. Conditions tested in JTOL performance. | $S_{pp}$ [UI] | $D_{pp}$ [UI] | $\gamma$ [UI] | $K_{BB}$ per UI | |---------------|---------------|---------------|-----------------| | 0.2 | 0.1 | 0.71 | 3.3 | | 0.2 | 0.15 | 0.67 | 3.6 | | 0.2 | 0.2 | 0.63 | 5.0 | | 0.2 | 0.4 | 0.422 | 2.5 | | 0.2 | 0.6 | 0.2 | 1.67 | Figure 24. Jitter Tolerance Response. ### 3.5. SUMMARY A mathematical model for $K_{BB}$ value when uniform and sinusoidal jitter noise are faced in a DPLL-based CDR was presented and validated through time-step simulations. SR resonance is demonstrated under the interaction between these two types of noise, presenting a maximum value for $K_{BB}$ even when one of the noise components is increasing. The impact on the JTF response is discussed and it is shown how SR can degrade the dynamics and stability of CDR systems. Finally, at low frequencies SR can impact the JTOL function in a positive way for some cases, and it does not matter for high frequency response. ### 4. MODELING AND DESIGN METHODOLOGY ## 4.1. DPLL-CDR MODELING 4.1.1. Linear Frequency Model: the z-model A digital phase lock loop based CDR (DPLL-CDR) can be modeled as the linear model shown in Fig. 25 when the dynamics of the small signal behavioral around the locking condition is the concern $^{15}$ . In the model, $K_{BB}$ represents the bang-bang phase detector (BBPD) gain, $K_V$ the decimation equivalent gain of a majority voting (MJV) function after BBPD, $K_P$ , and $K_F$ are the proportional and integral path gains, and $K_{DPC}$ the digital to phase converter (DPC) gain. Due to the latency, it is necessary to introduce the total equivalent delay around the loop as $N_L$ . In terms of jitter sources, $\Psi_{IN}$ is the input data jitter, $J_{O,BB}$ and $J_{O,MV}$ are the quantization noise due to BBPD and MJV blocks respectively, and $I_{PI}$ corresponds to the total jitter contribution coming from the phase interpolator (PI). A PI is used as the DPC in this model. It is important to note that $J_{PI}$ includes quantization noise from the PI and noise from PLL sources. Finally, the outputs quantities labeled as $\Psi_{CK}$ and $\Psi_{ER}$ represent the recovered-clock phase and error phase, respectively. These outputs allow us to infer and characterize the dynamics of the DPLL-CDR. In order to obtain $\Psi_{ER}(f)$ , first the loop gain transfer function of the model in Fig. 25 is calculated as follows: $$L_G(z) = K_{BB}K_V \left( K_P + \frac{K_F}{1 - z^{-1}} \right) \frac{K_{DPC}}{1 - z^{-1}} z^{-N_L}.$$ (16) Defining the input-output $H_{\it CK}(z)$ and input-error $H_{\it ER}(z)$ transfer functions by: $$H_{CK}(z) = \frac{L_G(z)}{1 + L_G(z)},$$ (17) $H_{ER}(z) = \frac{1}{1 + L_G(z)}.$ (18) All the noise contributions to the phase error $\Psi_{ER}(z)$ may be expressed as Figure 25. CDR discrete linear frequency model. $$\Psi_{ER_{|IN}}(z) = H_{ER}(z)\Psi_{IN}(z), \tag{19}$$ $$\Psi_{ER_{|Q,BB}}(z) = \frac{-H_{CK}(z)}{K_{BB}} J_{Q,BB}(z), \qquad (20)$$ $$\Psi_{ER_{|Q,MV}}(z) = \frac{-H_{CK}(z)}{K_{BB}K_{V}} J_{Q,MV}(z), \qquad (21)$$ $$\Psi_{ER_{|Q,MV}}(z) = \frac{-H_{CK}(z)}{K_{BB}K_V}J_{Q,MV}(z),$$ (21) $$\Psi_{ER_{|PI}}(z) = -H_{ER}(z)J_{PI}(z).$$ (22) With the above, a total expression for the phase error can be obtained by adding the contribution of each jitter source, $$\Psi_{ER}(f) = \Psi_{ER_{|IN}}(f) + \Psi_{ER_{|Q,BB}}(f) + \Psi_{ER_{|Q,MV}}(f) + \Psi_{ER_{|PI}}(f).$$ (23) Similarly, all noise contributions at the recovered clock phase $\Psi_{CK}(z)$ we may write $$\Psi_{CK_{IN}}(z) = H_{CK}(z)\Psi_{IN}(z), \tag{24}$$ $$\Psi_{CK_{|Q,BB}}(z) = \frac{H_{CK}(z)}{K_{BB}} J_{Q,BB}(z),$$ (25) $$\Psi_{CK_{|Q,MV}}(z) = \frac{H_{CK}(z)}{K_{RR}K_{V}}J_{Q,MV}(z),$$ (26) $$\Psi_{CK_{|PI}}(z) = H_{ER}(z)J_{PI}(z). \tag{27}$$ The total expression for the recovered clock phase is expressed using the contribution of each jitter source, $$\Psi_{CK}(f) = \Psi_{CK_{|IN}}(f) + \Psi_{CK_{|O,BB}}(f) + \Psi_{CK_{|O,MV}}(f) + \Psi_{CK_{|PI}}(f).$$ (28) Usually, the major contribution in Eqs.(23) and (28) corresponds to the input jitter noise $\Psi_{CK_{|IN}}(f)$ coming from input data and the jitter coming from the $\Psi_{CK_{|PI}}(f)$ , $^{6566}$ . These jitter components are shaped by either $H_{CK}(z)$ or $H_{ER}(z)$ , which contains the same denominator. Thus, the following analysis focuses on the $H_{CK}(z)$ transfer function without a loss in generality. After that, it is still possible to extract useful conclusions to build a design methodology framework. $$H_{CK}(z) = \frac{K_1 \left( K_P (1 - z^{-1}) + K_F \right) z^{-N_L}}{(1 - z^{-1})^2 + K_1 \left( K_P (1 - z^{-1}) + K_F \right) z^{-N_L}},$$ (29) J. LEE, J. YOON, and H. BAE. "A 10-Gb/s CDR With an Adaptive Optimum Loop-Bandwidth Calibrator for Serial Communication Links". In: *IEEE Transactions on Circuits and Systems I: Regular Papers* 61.8 (2014), pp. 2466–2472. DOI: 10.1109/TCSI.2014.2309861. J. LIANG et al. "Loop Gain Adaptation for Optimum Jitter Tolerance in Digital CDRs". In: IEEE Journal of Solid-State Circuits 53.9 (2018), pp. 2696–2708. DOI: 10.1109/JSSC.2018. 2839038. with, $$K_1 = K_{BB}K_VK_{DPC}. (30)$$ In order to extract initial insight for design, the analysis is simplified by means of an approximated equivalent continuous time using the backwards differences and assuming sT << 1 for the frequencies of interest. This yields the transformation $H_{CK}(z) \to H_{CK}(s)$ : $$H_{CK}(s) = \frac{K_1(TK_P s + K_F)(1 - TN_L s)}{T^2 s^2 + K_1(TK_P s + K_F)(1 - TN_L s)},$$ (31) when the following approximation is also used to simplify the analysis and is also valid in the frequencies of interest, $$(1 - sT)^{N_L} \approx 1 - TN_L s \tag{32}$$ The Eq. (31) can be viewed as a second order system characterized by the natural frequency $\omega_n$ and damping factor $\zeta$ : $$H(s) = \frac{\alpha s^2 + 2\zeta \omega_n s + \omega_n^2}{s^2 + 2\zeta \omega_n s + \omega_n^2}$$ (33) $$\omega_n = \frac{1}{T} \sqrt{\frac{K_1 K_F}{1 - K_1 K_P N_L}} \tag{34}$$ $$\zeta = \frac{\sqrt{K_1} K_P}{2\sqrt{K_F}} \frac{1 - \frac{K_F N_L}{K_P}}{\sqrt{1 - K_1 K_P N_L}}$$ (35) $$\alpha = \frac{-K_1 K_P N_L}{1 - K_1 K_P N_L} \tag{36}$$ Equations (34) and (35) impose the conditions for stability in (33), because $\omega_n$ can not be imaginary and it is mandatory $\zeta > 0$ , otherwise right-half plane poles will appear. Thus, $$1 - K_1 K_P N_L > 0 (37)$$ and, $$1 - \frac{K_F N_L}{K_P} > 0. {(38)}$$ Then, a valid interval for $K_P$ to ensure stability can be obtained using Eqs. (37) and (38), $$K_F N_L < K_P < \frac{1}{K_1 N_L},$$ (39) where the implicit condition $K_F N_L < \frac{1}{K_1 N_L}$ must be satisfied to guarantee the existence of a valid interval. These results show how the proportional gain is related to the stability and bounded by other system parameters, especially the latency of the system. Condition $K_P > K_F N_L$ states that the proportional gain must be several times greater than the integral gain. This relation comes from the damping factor condition Eq.(35), which means that the actual value of $K_P/K_F$ controls the peaking in the magnitude response. On the other hand, the condition given by Eq. (37) means that excessive loop gain could compromise the system stability. For both cases, latency degrades the stability because $N_L$ shrinks the interval. The continuous equivalent $H_{CK}(s)$ transfer function allows us to analyze the system in the first step of a methodology design. Note that it is only an approximation of the real discrete digital filter. Despite this, the relation (39) gives good results when it is used as a start point to design $K_P$ for stability. In practice, to increase $K_P$ produce an increase in $\omega_n$ extending the CDR bandwidth as expected from Eq. (34). Then, the approximation sT >> 1 becomes less accurate. As a consequence, maximum limit $1/(K_1N_L)$ is a conservative estimation and $K_P$ can be extended a little beyond this point in the discrete system. Figure 26. CDR time domain model. **4.1.2. Time-Step Model:** the tstep-model Equivalent time-step simulation model is illustrated in Fig. 26, where the discrete sequences $\Psi_{in}[n]$ and $\Psi_{out}[n]$ correspond to the sequences of data and recovered clock phases respectively. The BBPD model is equivalent to a sign(x) function in the phase domain, a transition density (TD) mask is used in order to emulate the TD of random sequence bits. The accumulators in the gray region are updating their outputs each L samples because of the decimation by L via majority voting. The input and output sequences are running at the fast clock because they are the signals without decimation. For this reason, in the interval in which the accumulators are not performing an update, the output phase of the DPC is retained in the last value. The expressions that describe the relations among the sequences related to the accumulators are: $$w[n] = \begin{cases} frug \cdot 2^{-D_f} \cdot v[n] + w[n-1], & \text{if } n = Lm \\ w[n-1] & \text{in between} \end{cases}$$ (40) where m is an integer and the extra term $2^{-D_f}$ corresponds to the attenuation given by the dithering bits in the frequency accumulator $ACC_F$ . Similarly, for the phase accumulator $ACC_P$ , $$y[n] = \begin{cases} 2^{-(N_b + D_p)}(x[n] + w[n]) + y[n-1], & \text{if } n = Lm \\ y[n-1] & \text{in between} \end{cases}$$ (41) presenting and attenuation of $2^{-(N_b+D_p)}$ due to the DPC gain and the subresolution bits in the phase accumulator. The output phase is a delayed version of the y[n] sequence, $$\Psi_{out}[n] = y[n - N_L]. \tag{42}$$ It is important to note that relations (40) and (41) do not include the decimation factor because this effect is implicit in the updating time step. Although the time-step model is slower than the z-model in terms of simulation time, the former has several advantages as follows: - Nonlinear behavior can be studied. The time-step model allows us to capture nonlinear behavior such as slew rate operation, saturation conditions in the accumulators, among others. - Quantization noise contribution coming from BBPD and MJV blocks are implicit in the simulation. - There are not equivalent gains for BBPD and MJV, instead, the model reflects the actual behavior of these blocks each time step iteration. The major drawback of the time-step model is the run time, which is imposed by the number of vector points used in the simulation. Thus, for frequency analysis, we prefer to take advantage of the faster run time provided by the z-model. Figure 27. RTL description model for the CDR. **4.1.3. Verilog Model: the vlog-model** Once the proper models for frequency and time domain simulations are explored, the architecture for the digital implementation is selected. Fig. 27 presents a possible circuit implementation for the DPLL-CDR based on the models presented in previous sections. The register bits width is shown explicitly as well as the initial number of pipeline stages defined by the registers. Unit interval (UI) is given by the fast clock period T driven by the DPC. The MJV block imposes the decimation factor L, which defines a lower frequency in the $clk_{CDR}$ clock signal used in the digital filter. $N_b$ corresponds to the number of bits in the DPC, $D_p$ and $D_f$ are the phase and frequency accumulator subresolutions respectively. M corresponds to the top bits in the frequency accumulator and it determines the maximum number coming from this register. This number must be high enough to meet the desired maximum slew rate (SR). All the parameters mentioned can be configurable in order to obtain different CDR filters with the same digital synthesized circuit. Gains phug and frug are implemented with selectable gains using muxes instead of shifting registers in order to reduce the pipeline stages and hence the loop latency. Reduce the loop latency results in a better response regarding stability as explained in Sec.4.1-A. As an example, the Verilog description for configurable phase accumulator is shown in Listing 4.1. ``` module cdr_phaseacc(phacc_in, subresel, clk, rst, acc_out); parameter ACC_TBITS = 5; // Nb localparam T_SUBBITS = 7; // max Dp input [ACC_TBITS+T_SUBBITS-1:0] phacc_in; input [1:0] subresel; input clk; input rst; output reg [ACC_TBITS-1:0] acc_out; // intermediate sum with subres reg [ACC_TBITS+T_SUBBITS-1:0] phacc_out; always @(posedge clk or negedge rst) if (!rst) phacc_out <= {(ACC_TBITS+T_SUBBITS){1'b0}};</pre> phacc_out <= phacc_out + phacc_in;</pre> // subresolution logic always @(subresel, phacc_out) case(subresel) ``` ``` 2'b00: begin // Sub-resolution = 4 (Dp=4) acc_out = phacc_out[ACC_TBITS+3:4]; end 2'b01: begin // Sub-resolution = 5 (Dp=5) acc_out = phacc_out[ACC_TBITS+4:5]; end 2'b10: begin // Sub-resolution = 6 (Dp=6) acc_out = phacc_out[ACC_TBITS+5:6]; end 2'b11: begin // Sub-resolution = 7 (Dp=7) acc_out = phacc_out[ACC_TBITS+6:7]; end default: begin // Default Dp=5 acc_out = phacc_out[ACC_TBITS+4:5]; end endcase endmodule ``` Listing 4.1. Verilog code for phase accumulator. The mapping of the architecture of Fig. 27 into the model of Fig. 25 results in the following relations: $$K_P = phug, (43)$$ $$K_F = frug \cdot 2^{-D_f} / L, \tag{44}$$ $$K_{DPC} = 2^{-(N_b + D_p)} / L.$$ (45) The decimation factor L is included in the relations that involve any accumulator ### 4.2. MODELING CONSIDERATIONS IN MULTI-RATE DPLL-CDR Digital logic usually operates at a lower frequency than the incoming data sample rate because of the timing limitations of the CMOS standard cells. To couple the high speed from input data with the frequency limit of the digital logic, the decimation function is common in high-speed CDR designs. Decimation allows running digital circuitry at a lower rate which leads to a multi-rate system. This section shows how the model must be updated from an initial rate $F_s$ to a new sampling rate of $F_s' = F_s/L$ , where L represents the decimation factor or rate scaling factor. In order to understand the proper change in the model, it is necessary to study the impact of decimation on the basic unit of any digital filter, the accumulator. **4.2.1. The Accumulator as Basic Unit** A common approach for filter design is translating the equivalent continuous filter to the discrete time domain. As integrators are the bricks in continuous filters, accumulators are the building blocks of any digital filter. From the latter, any transfer function for a digital filter can be implemented. We focus on these blocks for a better understanding of the decimation impact on the discrete CDR model. Fig. 28(a) shows the mapping process from the continuous integrator with gain K to the digital accumulator using the backwards differences approximation with a sample rate of T seconds. Note that the equivalent discrete gain $K_d = TK$ is a function of the sampling time T used in the transformation. The discrete time model for the resulting accumulator is shown in Fig. 28(b). If the sampling rate $F_s = 1/T$ is changed, we usually need to re-map all the transfer functions using the new sample rate $F_s' = 1/(L*T)$ . However, instead of remapping the whole transfer function, a simple modification on the accumulators is enough to reflect Figure 28. Continuous to discrete time transformation for an integrator (a), and the filter implementation (b). the proper changes in the whole system. In the time domain, the change in the sampling rate produces that the accumulator updates its output value each L\*T seconds instead of each T seconds. **4.2.2. Modeling Update** For explanation purpose, it is assumed that CDR accumulators are modeled with a sampling rate $F_s = 1/T$ , producing the following expression: $$A_{CCfast}(z) = \frac{K_d}{1 - z^{-1}},\tag{46}$$ where $K_d = TK$ as mentioned before. It is of great interest to evaluate the frequency response of the CDR and therefore the frequency response of the accumulator is examined. $$A_{CCfast}(e^{j\Omega}) = \frac{K_d}{1 - e^{-j\Omega}},\tag{47}$$ with $\Omega = 2\pi f/F_s = 2\pi fT$ representing the discrete frequency in rad/s. Thus, in terms of the real frequency f, the Eq. (47) is expressed as: $$A_{CCfast}(f) = \frac{K_d}{1 - e^{-j2\pi fT}}. (48)$$ If the sampling time is changed by a factor of L, that means, the new sampling rate of the system $F'_s = F_s/L$ , then an equivalent system running at the same initial $F_s$ can be obtained with proper scaling in the accumulator gain. To demonstrate the above, let us assume Eq. (46) represents one of the accumulator transfer functions in the DPLL-CDR running with sampling time T. Now, suppose that decimation by L is added to the system, but the accumulator gain is not updated regarding this change. The new accumulator called $A_{CCslow}$ will have the same transfer function, but it will be running with a new sampling time L\*T: $$A'_{CCslow}(z) = \frac{K_d}{1 - z^{-1}}. (49)$$ Systems described by Eq. (46) and (49) are not equivalent because the gain $K_d$ is also a function of the sampling time and the systems are running with different sampling times, T and LT respectively. This becomes evident when examining the frequency response of $A_{CCslow}$ : $$A'_{CCslow}(f) = \frac{K_d}{1 - e^{-j2\pi f LT}},\tag{50}$$ where it is mandatory to guarantee that fLT << 1 for all frequencies of interest in the system. The condition becomes Eq. (51) and it is one of the constraints for proper modeling using backwards differences mapping. $$f << \frac{F_s}{L}. \tag{51}$$ Using this condition, it is possible to approximate the $A_{CCslow}(f)$ function as: $$A'_{CCslow}(f) = \frac{K_d}{1 - e^{-j2\pi fLT}} \approx \frac{K_d}{1 - (1 - i2\pi fLT)},$$ (52) $$A'_{CCslow}(f) \approx \frac{K_d}{j2\pi f LT'}$$ (53) Figure 29. Equivalent accumulator gain adjustment due to decimation. Similarly, the $$A_{CCfast}(f)$$ : $$A_{CCfast}(f) \approx \frac{K_d}{j2\pi fT} \tag{54}$$ Observing the results in Eq. (53) and (54) it can be concluded that: $$A'_{CCslow}(f) \approx \frac{1}{L} A_{CCfast}(f).$$ (55) The meaning of Eq. (55) is that for a given accumulator model $A_{CCfast}$ running with sampling time T (or sampling rate $F_s$ ), if the sampling time is changed by a factor L, then it is possible to approximate the model for the slower accumulator $A_{CCslow}$ with a scaling factor 1/L in the initial gain. The resulting model emulates the slower accumulator with the same time-basis T of the original model. The Fig. 29 summarizes this result. It is preferred to scale the accumulator gains instead of remapping the whole transfer function. The advantage of doing this is the fact that with the proper updating, the model can be run at a unique sampling rate, avoiding taking into account each different sampling rate domain in the system. Different sampling rate domains may appear in the CDR because the accumulators in the digital circuit could be updating the outputs at different clock frequencies. ### 4.3. NON-LINEAR CONSIDERATIONS In this section, we comment on the nonlinear effects that appear in CDR operation. The main nonlinear effects can be summarized as: - BBPD gain dependence on input jitter nature. - MJV nonlinearity. - Stochastic resonance. - Slew Rate limitations. **4.3.1. BBPD Gain** BBPD equivalent gain used in the z-model is nonlinear. However, in steady-state $K_{BB}$ can be either estimated or extracted via time simulations using a stand-alone BBPD. It is important to note that $K_{BB}$ gain depends on the magnitude and statistical properties of $\Psi_{IN}$ . Industrial applications consider Gaussian, uniform, and sinusoidal random noise as the main types of jitter noise that can be presented in CDRs. Table 2 summarizes the $K_{BB}$ estimations for the aforementioned types of jitter <sup>67</sup>. In general, this gain can be obtained by extracting the value at 0UI from the total mathematical convolution of the probability density functions (PDF) associated with the input jitter components <sup>33</sup>. If \_ J. ARDILA and E. ROA. "Stochastic Resonance in Bang-Bang Phase Detector Gain and the Impact on CDR Locking". In: 2018 IEEE 9th Latin American Symposium on Circuits Systems (LASCAS). 2018, pp. 1–4. DOI: 10.1109/LASCAS.2018.8399933. Table 2. BBPD gain expressions. | Jitter type | pdf(x) | $K_{BB}$ | |-------------|------------------------------------------------------------------|-------------------------------| | Gaussian | $p_G(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{x^2}{2\sigma^2}}$ | $\frac{1}{\sigma\sqrt{2\pi}}$ | | Uniform | $p_{U}(x) = \frac{1}{D_{pp}}[u(x + D_{pp}/2) - u(x - D_{pp}/2)]$ | $\frac{1}{D_{pp}}$ | | Sinusoidal | $p_S(x) = \frac{1}{\pi \sqrt{(S_{pp}/2)^2 - x^2}}$ | $\frac{2}{\pi S_{pp}}$ | the PDF is not available, then we can emulate a time sequence for input jitter and extract $K_{BB}$ using time simulations. **4.3.2. MJV nonlinearity** Majority voting (MJV) equivalent gain $K_V$ also impacts the dynamic behavior of the CDR system, but it will be shown that this gain is less sensitive to some effects than $K_{PD}$ gain. The selection of the implemented policy impacts directly the actual value of $K_V$ . As shown before, $K_{PD}$ is sensible to jitter variation and it decreases as noise level increases. On the other hand, $K_V$ has lower variations even with different noise levels and, virtually, only depends on the MJV policy chosen. **Majority Voting Policies** In order to understand the impact of MJV policies on $K_V$ gain, three policies are evaluated. An extraction procedure similar to the exposed in Section 3.3 is used to obtain the $K_V$ related to each policy. Each policy takes four samples from the BBPD output, if more samples are taken for calculation, then the $K_V$ value is reduced as it is presented in $^{33}$ . Called P1, this policy takes the samples from the BBPD output and generates a 1, -1, or 0 based on the sign of the sum of the four samples. The second policy, P2, generates a 1 if the sum of samples is greater or equal to 2, a -1 if the sum is less or equal to -2, otherwise, generates a 0. Finally, the third policy, P3, is similar to P2 but with upper and lower limits as 3 and -3 respectively. Fig. 30(a) illustrates the policies implemented. Figure 30. Majority voting policies and noise effect. **Noise effect on** $K_V$ Fig. 30(b) shows the dependence of $K_V$ as a function of noise for the three policies. The $K_V$ gain is a weak function of the input jitter noise because the MJV block makes decisions based on several samples and not only based on one as does the BBPD, thus, MJV acts as a filter for noise. On the other hand, the chosen policy in the MJV block directly affects the nominal value of $K_V$ , for example, the change in $K_V$ gain from policy P1 to P2 is about 40% less and from P2 to P3 is 62% less, then, a total change about 77% less is obtained from P1 to P3. **Stochastic Resonance** Stochastic resonance can appear and impact the loop gain of the system through the modulation of $K_{BB}$ gain $^{67}$ . This may result in the degradation of the CDR dynamics. The combination of sinusoidal jitter with Gaussian and/or uniform jitter can produce unexpected peaking in the system transfer function. Sinusoidal jitter can appear through supply noise or in JTOL test condition. **4.3.3. Slew Rate** When the phase difference increases considerably, CDR operates in a nonlinear regimen and the system starts to slew rate. The slewing effect can be seen either at low frequencies as in the jitter tolerance test or at high frequencies because of the high slew requirements to track the input signal. The time-step model is used to study the slewing condition instead of the z-model. Slewing specification for CDRs is important for protocols with a spread-spectrum clock (SSC), which varies the frequency of the clock from a narrowband centered at a nominal frequency in order to reduce EMI effects in the data transmission. ### 4.4. MODEL SIMULATIONS Initial model comparison is performed using the parameters in Table 3. Considering the proper scaling in multi-rate modeling, we can see a good correlation and consistency between both models in the frequency domain. The amplitude of input signals was selected properly to avoid slewing. In addition, an example of time domain simulation for one of the cases tested is presented in Fig. 31. The input noise phase is filtered in the same manner for both models. To obtain the time response from the z-model, an inverse Fourier transform is performed. The Fig. 31 depicts the frequency response using both z-model and time-step model. For the case of the time-step model, several input signals are tested using different frequencies in the time domain. Several CDR systems were exercised in order to quantify the error between the frequency and time domain models. The comparison process is described in Fig. 32. First, several initial parameters for the CDR models are set. Those parameters are selected only for comparison purposes rather than design, then, we can generate several systems for simulation as illustrated in the *System Generation* block. Each of the generated systems is simulated in the time domain by means Table 3. Parameters for model comparison. | Parameter | Value | Units | |--------------|--------|---------| | Data rate | 5 | Gb/s | | $\sigma_{i}$ | 0.04 | UI,rms | | ${K}_{BB}$ | 9.97 | per UI | | phug | 0.625 | - | | frug | 0.0625 | - | | Ĺ | 4 | samples | | $N_b$ | 5 | bits | | $D_f$ | 7 | bits | | $D_p$ | 5 | bits | | $N_L^{'}$ | 20 | UI | of the time-step and z models. For the z-model, the transfer function is extracted and then converted in a digital filter. The input signal for testing is generated based on input random jitter $x_{jitt}(n)$ added to a small sinusoidal input phase with amplitude A. In this experiment A=0.02UI to keep small-signal behavior, and the input frequency $F_{in}$ is chosen to be: 1) an in-band $(F_{low})$ , 2) peaking frequency $(F_{peak})$ , and c) the -3dB frequency $(F_{3dB})$ . The random noise signal $x_{jitt}(n)$ is generated using several configurations of jitter noise such as Gaussian and Uniform random noise. Gaussian noise is characterized by $\sigma_{in}$ from 0.03UI to 0.05UI, and Uniform noise using peak-to-peak values from 0UI to 0.2UI. Hundreds (even thousands) of systems can be simulated as described above depending on the number of parameters for combination. The test procedure runs 144 different systems and extracts the error from the time domain response of both time-step and z models. The worst-case scenarios regarding error among all runs and categorized by $F_{in}$ are shown in Fig. 33. The error is calculated as the fraction between the RMS value of the response subtraction and the standard deviation of the input equivalent jitter. This can be summarized as follows: Figure 31. Frequency (a), and time response (b) for the CDR described by the parameters in Table $\bf 3$ $$e_{\%} = \frac{RMS(\Psi_{OUT,t} - \Psi_{OUT,z})}{\sigma_{eff}},$$ (56) where, RMS() corresponds to the root-mean-square value function, $\sigma_{eff}$ is the Figure 32. Procedure for frequency and time models comparison. standard deviation of $x_{jitt}(n)$ component in $\Psi_{IN}(n)$ , and $\Psi_{OUT,t}$ and $\Psi_{OUT,z}$ are the output phase responses for time-step and z models respectively. Fig. 33(a,c,and e) plots the magnitude of the frequency response given by the z-model and the corresponding time response produced for both models in Fig. 33(b,d,and f), respectively. Statistical results categorized by $F_{in}$ are summarized in Fig. 34 using the 25th and 75th percentiles as boundaries for the blue boxes. Excluding the outliners presented in the $F_{3dB}$ cases we can observe an error below 14% at extreme data points with statistical significance. The outliers presented in the $F_{3dB}$ group are a consequence of the high-frequency response where slewing effect becomes relevant. We have taken care of using only proper CDR systems which still behave in the small-signal regimen, then, those outliers are not considered for model comparison. As a final comment for modeling, it is important to note that for slewing conditions, the time-step model is preferred because the CDR is working out of the small-signal condition. With all the above models, the design methodology (DM) can be presented. The Figure 33. Worst case comparison for low frequency (a), peak frequency (b), and high frequency (c) conditions. DM uses each of the aforementioned models in different stages in order to obtain satisfactory results regarding the communication protocol used. Figure 34. Statistical error between the time and the frequency domain CDR model. Figure 35. Implementation of a quad-rate CDR architecture. ## 4.5. DESIGN METHODOLOGY In order to demonstrate the methodology explained in this section, the architecture shown in Fig. 35 is used. This architecture corresponds to a quad-rate CDR for standard USB3.0 protocol. The incoming data is running at 5Gb/s and passes through the samplers. This a quad-rate implementation then a total of 4 in-quadrature phases running at 1.25GHz are used (8 phases in total). The phases come from a phase interpolator (PI) which is driven by the CDR logic filter. After the samplers, additional deserialization and alignment are performed. Total decimation of 16 is done from incoming data to the BBPD. With this decimation, it is possible to run the digital filter (in gray area) at a lower frequency of 312.5MHz which is a suitable speed for digital synthesis in a 0.18um CMOS node. The design methodology procedure (DMP) is illustrated in Fig. 36. This DMP is composed of several steps which are explained as follows. Figure 36. Design methodology process. **4.5.1. Design Space Generation** The main objective of the design generation space is to generate a matrix with all initial combinations regarding the preselected parameters. We labeled these parameters as vector variables Decx, Dpx, Phugx, and Npipes corresponding to the decimator factor, subresolution for phase integration, proportional gain, and pipeline stages number respectively. A simple combinatorial procedure is performed on these previous vectors. In addition, the $K_V$ gain values are also included to complement the initial matrix. Note, that the $K_V$ must be correlated with the decimation factor, and for this reason, it should be added just after the initial combinatorial process in order to maintain the correlation with each Decx value. The $K_V$ values are extracted from numerical simulations. The Npipes vector is included to keep some margin before the final synthesis. This allows us to explore different pipeline approaches from a system-level point of view well before the RTL compilation. This exploration is a concern especially for time-constrained scenarios. # **4.5.2. Mapping Equations** Table 4 presents the mapping equations for CDR performance parameters used in this design. Table 4. CDR parameters | Parameter | Symbol | Relation | |-----------------------------------|---------------------|------------------------------------------------------------| | Phase step | Δφ | $\Delta \phi = 1/2^{N_b}$ [UI] | | Effective phase resolution | $\Delta \phi_{eff}$ | $\Delta \phi_{eff} = \Delta \phi / 2^{D_p}$ [UI] | | Max. Number from proportional | phug | - | | Pull in range | - | $phug\Delta\phi/2^{D_p}$ [UI] | | Max (+). Number from ACCf | - | $2^{M-1}-1$ | | Max (-). Number from ACCf | - | $-2^{M-1}$ | | Max. Phase change (Max Slew rate) | $\Delta \phi_{MAX}$ | $\Delta \phi_{MAX} = (2^{M-1} - 1) \Delta \phi_{eff}$ [UI] | | Tolerance (max(+) due to ACCf)) | | $\Delta \phi_{MAX} 10^6 / L$ [ppm] | | Frequency resolution | | $=\Delta\phi_{eff}/(2^{D_f}*L)$ [UI/UI] | | Max. Slope in ACCf | | $frug\Delta f_{[UI/UI]}/(LT)$ [ppm/us] | Using the mapping equations presented in the Table 4 we obtain the remaining dependent variables Kdpcx, Dfx, Frugx, and Kfx; which are the DPC gain, subresolution in frequency integration, frequency gain for time-step simulations, and frequency gain in the linear z-model. A matrix of CDR model parameters is generated in this stage, where each column represents a complete set of CDR modeling. In other words, this matrix contains as many CDR systems as columns. **4.5.3. Including the Noise Profiles** The matrix of CDR models is enriched with the noise profiles. Noise profiles correspond to the nature of possible or estimated noise presented in the system. Several traditional design approaches assume a value for the noise magnitude in order to obtain an initial design. Here, we generate several plausible noise profiles based on the jitter budget defined in the standard and using the dual Dirac method. With these noise profiles, we can estimate the gamma factor Gammax (for JTOL) and obtain a space region for the BBPD gain. **4.5.4. JTF and JTOL Extraction** Each sheet in the hyper matrix generated in the previous stage represents a complete CDR model with a noise condition associated. We run several simulations using the CDR z-model in order to extract the JTF and JTOL functions. Many of the functions obtained may not meet the system specifications such as the JTOL mask or proper peaking. For this reason, we filter out the results using three stages of filtering, stability, JTOL mask, and JTF peaking. **4.5.5. Filtering Chain** Three filters are included in this design methodology to capture only valid transfer functions that meet the design specifications. **Filter 1: Stability** Stability is checked in this stage, providing stable systems as the main candidates for design. The unstable systems are discarded. **Filter 2: JTOL profile** The JTOL mask depends on the communication protocol, thus this stage must be supplied by the proper mask in order to obtain all the candidate functions whose frequency response meets the JTOL mask. **Filter 3: JTF peaking** A peaking check is performed based on the standard in order to avoid solutions with excess peaking, which may give systems with an oscillatory or ringing response even though they are stable. The peaking criterion is also recommended by the standard and it is a good indicator for proper phase margin response. After the filtering chain stage the remaining solutions go to the next step, which uses the time-step model in order to perform large signal analysis. If there are no solutions at this point then a new set of initial vectors must be selected and all the previous stages run again. **4.5.6.** Large Signal Behavior Using the CDR time-step models we can observe the impact of some nonlinearities like the BBPD gain, the behavior for large signal inputs, slew rate effect, step response among others. Again, only suitable solutions go to the next stage, otherwise will be necessary to consider different initial parameters. **4.5.7. Go to Verilog** We can generate a Verilog description using the final results. A custom template is used in the Verilog generator which can read the results and configure the Verilog for proper synthesis. A configurable RTL is generated covering all the solutions. With the output as a Verilog file, we use a standard digital design flow and synthesized the RTL. In the next chapter, we present the digital synthesis including an adaptive method to control loop gain and CDR stability. ## 4.6. SUMMARY AND DISCUSSION CDRs are non-linear systems, small-signal linear z-model is just an approximation that captures the system dynamics when the CDR is in a locking condition. On the other hand, the time-step model allows us to capture some of the non-linear effects at the expense of higher simulation time. Verilog model will show the main challenges for a real digital implementation circuit. For nonlinear effects considerations, we prefer to use the time-step calculations for more accuracy in preliminary simulations. We use the z-model just to infer dynamics in the locking condition when the small-signal condition is valid. The frequency model is impacted and must be adjusted where the slew rate effect is taken into account which is expected in large-signal operation because of the nonlinear behavior of the system. Considering all jitter sources, we can extract proper gains for the z-model using simulations. Stochastic resonance may appear under certain conditions where sinusoidal components are presented in the system. Moreover, multi-rate considerations were presented and they are relevant in order to have equivalent models for frequency and time domain domains. Besides, these scaling considerations allow us to have one model running at a unique sample rate $F_s$ instead of multiple models with different $F_s$ domains. As a final comment, it was shown that the majority voting gain, $K_V$ , is less sensitive to input jitter noise level but very dependent on the chosen MJV policy. # 5. CROSS-CORRELATION BASED LOOP GAIN ADAPTATION FOR BANG-BANG CDR ### **5.1. INTRODUCTION** Commercial wireline receivers commonly employ fully synthesized digital implementations of clock and data recovery circuits (CDRs) <sup>1532</sup>. Bang-bang phase detectors (BBPD) are the preferred phase error detection scheme in digital CDRs regarding their simplicity and accuracy advantages. However, jitter sources modulate the BBPD gain impacting CDR loop dynamics. Jitter noise coming from input data, and phase noise from the phase-locked loop (PLL) may alter the BBPD gain and consequently degrade CDR bandwidth. To overcome this architectural challenge, recent research has uncovered loop gain adaptation schemes to compensate for the BBPD gain modulation <sup>6869667043</sup>. In all cases, trade-offs can be detected among output jitter optimization, stability, accuracy, and CDR dynamics tracking performance. Several attempts to compensate the BBPD gain modulation arise from exploiting the autocorrelation function as a proper indicator to track loop dynamics <sup>6869</sup>. In <sup>69</sup>, the authors introduce an algorithm to perform gain optimization using au- S. JANG et al. "An Optimum Loop Gain Tracking All-Digital PLL Using Autocorrelation of Bang-Bang Phase-Frequency Detection". In: *IEEE Transactions on Circuits and Systems II:* Express Briefs 62.9 (2015), pp. 836–840. DOI: 10.1109/TCSII.2015.2435691. S. KWON et al. "An Automatic Loop Gain Control Algorithm for Bang-Bang CDRs". In: IEEE Transactions on Circuits and Systems I: Regular Papers 62.12 (2015), pp. 2817–2828. DOI: 10.1109/TCSI.2015.2495725. T. KUAN and S. LIU. "A Loop Gain Optimization Technique for Integer-N TDC-Based Phase-Locked Loops". In: IEEE Transactions on Circuits and Systems I: Regular Papers 62.7 (2015), pp. 1873–1882. DOI: 10.1109/TCSI.2015.2423793. tocorrelation with the mean-squared-error (MSE) criterion. They demonstrate a criterion for CDR lock in terms of the power spectral density (PSD) by looking for the sign of the autocorrelation in BBPD output (hereafter $R_X(n)$ ) at the D+1 point, where D is the loop delay. The point n = D+1 falls close to the first zero-crossing point in $R_X(n)$ , and even small variations in D could generate different signs in the $R_X(D+1)$ evaluation, making this criterion sensitive to small variations in loop latency. In other words, the actual zero-crossing point will be different from D+1 even for small latency variations. Nonetheless, the major concern is not about the difference between D+1 and the actual zero-crossing point of $R_X(n)$ , but the fact that at this point $R_X(D+1) \approx 0$ even for a system with poor phase margin (PM). A similar approach is presented in <sup>68</sup>, where the adaptation is decided based on the value of $R_X(n)$ sign at a different reference point, n = 2D+1. In this case, $R_X(2D+1)$ is close to the first peak of the autocorrelation function. Autocorrelation function $R_X(n)$ is not only impacted by CDR dynamics but also by jitter profiles coming from different sources in the CDR (data, PLL, etc.). Authors in [3] claim optimization of output jitter, but the authors in [5] have demonstrated that this is not true for high jittery data scenarios. On the other hand, the work presented in $^{71}$ provides a closed expression for the BBPD gain. The adaptation algorithm is based on detecting a pattern at the BBPD output by using autocorrelation. Three autocorrelation measurements are performed and summed to obtain an optimum gain regarding jitter suppression. However, the authors in $^{71}$ deduce that it is challenging to estimate some of the system parameters considering PVT variations. Thus, to extract the optimum gain, they chose to define a different objective function ( $F_{OBI}$ ) to perform - <sup>71</sup> T. KUAN and S. LIU. "A Bang Bang Phase-Locked Loop Using Automatic Loop Gain Control and Loop Latency Reduction Techniques". In: *IEEE Journal of Solid-State Circuits* 51.4 (2016), pp. 821–831. DOI: 10.1109/JSSC.2016.2519391. Figure 37. Summary of relevant recent reported works in loop gain adaptation using correlation functions. Left side corresponds to the simplified diagram schemes, and right side illustrates main features. (a) Adaptation using an alike objective function $F_{OBJ}$ based on autocorrelation at BBPD output, (b) adaptation methods using extra filtering at the BBPD output and avoiding some apriori assumptions, and (c) proposed XCALG method. the adaptation. This variability problem indeed appears in all situations where a closed expression for performance parameter is found as a function of BB-CDR parameters. So far, previous works can be summarized as Fig. 37(a) shows. All of them use autocorrelation function as the fundamental tool for gain adaptation; however, each work accomplishes that by defining different $F_{OBJ}$ . Besides, these works lack rigorous stability analysis and still require either *apriori* assumptions on jitter profiles or employ fixed evaluation points in the autocorrelation function like D+1 and 2D+1, as mentioned before. For the above reasons, and in the context of decision making criteria, non deterministic decision criteria may be preferred. Recent works $^{6643}$ avoid the aforementioned apriori assumptions and guarantee a safe PM implementing adaptive gain based on data measurements. To efficiently accomplish this, they improve the observability of the autocorrelation function by adding a low-pass filter (LPF) at the BBPD output, as depicted in Fig. 37(b). However, filtering the BBPD output demands a careful selection of the filter bandwidth (BW), which must be considered *apriori*. Regarding portability, the filter BW must be adjusted according to the specifications of different CDR designs. In contrast, we propose the cross-correlation-based adaptive loop gain technique (XCALG). We use two strategic points in the system to perform cross-correlation instead of autocorrelation, as shown in Fig. 37(c). Cross-correlation operator is linked to the cross-power spectral density (CPSD) in the frequency domain. We show how by taking advantage of the filter properties of CPSD. The proposed method considerably enhances the CDR dynamics tracking, and improves the loop adaptation algorithm. We demonstrate how, through the CPSD, the CDR can be seen as the required filter that improves the observability in the system without incurring an extra LPF in the design. The LPF must be configurable with proper BW values to cover several jitter conditions in the system. Then, if jitter condition changes, so do the BBPD gain and the CDR BW as a consequence, requiring a proper adjustment in the LPF BW. Using cross-correlation instead, it can be shown that the filtering process is done by the CPSD, which tracks the CDR BW automatically for any jitter condition. For the above reasons, we envision that XCALG could become the preferred method for gain adaptation. # 5.2. SPECTRAL ANALYSIS OF AUTOCORRELATION AND CROSS-CORRELATION FUNCTIONS In this section, we review some fundamental definitions in the context of correlation functions applied to a BB-CDR model in the frequency domain. We compare the frequency description of the autocorrelation and the cross-correlation function Figure 38. Linear z-model of a BB-CDR. and summarize leading relations. With this review, we justify the selection of the cross-correlation function as a potential alternative to perform gain adaptation. Consider the conventional linear model for a digital BB-CDR explained in Chapter 4, and shown again in Fig. 38 for the following explanation. **5.2.1. Power Spectral Density and Autocorrelation** The autocorrelation operator in the time domain is related to the power spectral density (PSD) in the frequency domain through the Fourier transform. Alternatively, to obtain the PSD of $\Psi_{ER}$ , namely $S_{ER}(f)$ , directly from frequency quantities, we must proceed as follows: $$\lim_{N \to \infty} \frac{1}{N} \mathbb{E}[\Psi_{ER}(f) \Psi_{ER}^*(f)] = S_{ER}(f).$$ (57) The operator $\mathbb{E}[\cdot]$ represents the expectation operator, N is the number of samples taken to observe the signal over the time, and $\Psi_{ER}^*(f)$ is the conjugate of $\Psi_{ER}(f)$ . Using the Eq. (23) from Chapter 4, and and applying Eq. (57), the PSD for $\Psi_{ER}(f)$ results in $$S_{ER}(f) = |H_{ER}(f)|^{2} \left[ S_{IN}(f) + S_{PI}(f) \right] + |H_{CK}(f)|^{2} \left[ \frac{S_{Q,BB}(f)}{K_{BB}^{2}} + \frac{S_{Q,MV}(f)}{(K_{BB}K_{V})^{2}} \right], \quad (58)$$ where the input-output $H_{CK}(z)$ and input-error $H_{ER}(z)$ transfer functions are given by $$H_{CK}(z) = \frac{L_G(z)}{1 + L_G(z)},$$ (59) $H_{ER}(z) = \frac{1}{1 + L_G(z)}.$ (60) Notice that in the above calculations it is assumed that the noise sources are uncorrelated. **5.2.2. Cross-Power Spectral Density and Cross-correlation** Cross-power spectral density (CPSD) is linked to the time domain through the cross-correlation function. Similar to the PSD case, in order to obtain the CPSD between $\Psi_{ER}$ and $\Psi_{CK}$ using frequency quantities, it is necessary to operate as follows: $$\lim_{N \to \infty} \frac{1}{N} \mathbb{E}[\Psi_{ER}(f) \Psi_{CK}^*(f)] = S_{ER,CK}(f). \tag{61}$$ Considering noise contributions at the recovered clock phase $\Psi_{CK}(z)$ in Eq.(28), it is possible to express the CPSD associated to $\Psi_{ER}$ and $\Psi_{CK}$ as $$S_{ER,CK}(f) = H_{ER}(f)H_{CK}^*(f)S_{IN}(f) - |H_{ER}(f)|^2 S_{PI}(f) - |H_{CK}(f)|^2 \left[ \frac{S_{Q,BB}(f)}{K_{BB}^2} + \frac{S_{Q,MV}(f)}{(K_{BB}K_V)^2} \right].$$ (62) Here, it is essential to note that our key observation lies in the definition of the signal $\Psi_W$ as shown in Fig. 38. Jitter contributions to $\Psi_W$ are scaled versions of Eqs. (24) to (27) with a scale factor of $1/K_{DPC}$ . The only and fundamental exception falls into the contribution of $J_{PI}$ component, which is recalculated as $$\Psi_{W_{|PI}}(z) = -\frac{LG(z)}{K_{DPC}(1 + L_G(z))} J_{PI}(z), \tag{63}$$ $$= -\frac{1}{K_{DPC}} H_{CK}(z) J_{PI}(z). \tag{64}$$ The resulting CPSD is $$S_{ER,W}(f) = \frac{1}{K_{DPC}} H_{ER}(f) H_{CK}^*(f) \left[ S_{IN}(f) + S_{PI}(f) \right] - \frac{1}{K_{DPC}} |H_{CK}(f)|^2 \left[ \frac{S_{Q,BB}(f)}{K_{BB}^2} + \frac{S_{Q,MV}(f)}{(K_{BB}K_V)^2} \right].$$ (65) Now the factor $|H_{ER}(f)H_{CK}^*(f)|$ affects both $S_{IN}(f)$ and $S_{PI}(f)$ PSD functions which are the major contributors to the total jitter in the system. **5.2.3. Comparison and Discussion** Eqs. (58) and (65) indicate how CDR shapes the PSD of the jitter sources at two different points. The fundamental difference between these expressions is related to the term that multiplies the input-data jitter component $S_{IN}(f)$ and the PI component $S_{PI}(f)$ . In Eq. (58), the term $|H_{ER}(f)|^2$ corresponds to a high-pass filter, but the term $H_{ER}(f)H_{CK}^*(f)$ in Eq. (65) is a band-pass response instead. Fig. 39 shows the power spectrum of $H_{ER}(f)$ and $H_{CK}(f)$ , and the filter function $|H_{ER}(f)H_{CK}^*(f)|$ . Fig. 39 is obtained using the frequency domain model described in the Fig. 38, and the model parameters listed in Table 5. This set of parameters is intended to meet the standard for USB 3.0 when $K_G = 1$ . However, here we intentionally change $K_G = 2$ in order Table 5. Model parameters used for the linear z-model of Fig. 2. | Model Parameter | Symbol | Value | |---------------------------|---------------|------------| | BBPD gain | $K_{BB}$ | 9.97 | | MJV gain | $K_V$ | 3 | | Adaptive gain | $K_G$ | 2.5 | | Proportional gain | $K_P$ | 2 | | Integral gain | $K_F$ | $2^{-9}$ | | Phase Interpolator gain | $K_{DPC}$ | $2^{-13}$ | | Latency | $N_L$ | 40 samples | | System Condition | Symbol | Value | | Gaussian RMS input jitter | $\sigma_{IN}$ | 0.04 UI | | Data rate | $F_s$ | 5GS | to observe a peaking response. The example illustrates a condition with a phase margin (PM) about 45° where peaking may be detected. A peaking observed in $|H_{ER}(f)|^2$ suggests oscillations in the CDR due to poor PM. Nonetheless, as the autocorrelation function in Eq. (58) states, high frequency components coming from $\Psi_{IN}$ and $J_{PI}$ may also appear regarding the high-pass nature of $|H_{ER}(f)|^2$ . In contrast, using $S_{ER,W}(f)$ (Eq. (65)), the $|H_{ER}(f)H_{CK}^*(f)|$ term filters not just the low-frequency content but also high-frequency components presented in $\Psi_{IN}$ and $J_{PI}$ as shown in Fig. 39. As desired, peaking is still presented, and the oscillation due to system dynamics may be observed. For both cases, high frequency components of $J_{Q,BB}$ and $J_{Q,MV}$ contributions are filtered in the same manner by $|H_{CK}(f)|^2$ . The filtering property of the CPSD at in-band and out-band frequencies overcomes one of the flaws presented in the autocorrelation approach, which can be summarized as the dependence on the PSDs of the various jitter sources. Using cross-correlation, the dependence on the PSD of the input signal can be drastically reduced. Summing up, the cross-correlation function is a viable alternative to monitor the CDR dynamics featuring a lower impact of the input jitter sources. Figure 39. Magnitude of $H_{ER}(f)H_{CK}^*(f)$ and the power spectrum of $H_{CK}(f)$ and $H_{ER}(f)$ . Figure 40. Time-step model used for the BB-CDR. # 5.3. CROSS-CORRELATION PROPERTIES IN CDRS Three factors are analyzed to further evaluate cross-correlation as a monitoring function of the CDR dynamics. They are: function observability, filter properties, and PI jitter impact. The subsequent analyses use a time-step modeling approach rather than frequency modeling. Time-step simulations provide the following advantages over the frequency domain approach. First, the time-step model includes the nonlinear behavior of the BBPD and the MJV blocks. For this reason, it is not required to model the quantization noise for these blocks. Second, the time-step model avoids recalculating the average gain for the BBPD and MJV blocks. Using a frequency model, the $K_{BB}$ and $K_V$ gains need to be recalculated for each noise condition. Third, nonlinear effects may be observed. Fig. 40 shows the time-step model for the CDR. The BBPD model is equivalent to a sign(x) function in the time domain and a transition density (TD) mask is added after the BBPD in order to emulate the TD of random sequence bits. The MJV block takes the sign of a summing process among several consecutive samples adding decimation. For example, a voting-8 policy processes eight consecutive samples, adding a total decimation of L=8. The blocks labeled as ACC represent discrete accumulators. Depending on the MJV policy, accumulators may work at a slower rate in comparison with the data rate. For the sake of simplicity, quantization noise regarding the PI can be ignored since random jitter sources are dominant<sup>66</sup>. In this context, $J_{PI}$ represents just the phase noise coming from PLL. If the contribution of quantization noise from PI cannot be ignored, then it can be included in $J_{PI}$ , and the same following procedure can be performed. Using model in Fig. 40, we compare the autocorrelation $R_X(n)$ on BBPD output, and the cross-correlation $R_{XY}(n)$ between MJV output and CDR loop filter output. We read the phase state in the digital domain through the register at the input of the PI. **5.3.1. Observability Enhancement** We refer to the observability of a function as the capability of that function to be measurable. Although this simple definition is not rigorous, it is good enough to understand the idea in the following comparison. To simplify the explanation without loss of generality, the $J_{PI}$ contribution is set to 0 and only Gaussian $\Psi_{in}(n)$ exercises the time-step model to compare the observability between $R_X(n)$ and $R_{XY}(n)$ . For this test, $K_G$ is 2.5 and 1, in order to obtain a PM about 45° and 66° respectively. The Gaussian $\Psi_{in}(n)$ has zero mean and $\sigma=0.04$ UI (UI = Unit Interval), which may represent a typical condition for wireline links used in such protocols as USB 3.1 considering jitter budgeting $^7$ . For both conditions, the Fig. 41 plots the normalized right-half bands of $R_X$ Figure 41. Observability comparison between $R_X(n)$ and $R_{XY}(n)$ for: a) $K_G$ set to 2.5, and b) $K_G$ set to 1.0. and $R_{XY}$ . Results show a clear advantage in $R_{XY}(n)$ observability concerning the $R_X(n)$ approach. Oscillations are enhanced considering low noise content in $R_{XY}(n)$ . Note that oscillations in $R_{XY}(n)$ appears for PM less than 60° (Fig.41(a)). For PMs higher than 60°, these oscillations in $R_{XY}(n)$ are reduced considerably. Filtering on $R_X(n)$ is presented in <sup>6643</sup> as a solution to improve the observability. To illustrate this point, Fig. 42 shows the signals for the case presented in Fig. 41 (underdamped PM = 45.5°) with additional filtered versions of $R_X(n)$ . Firstorder filters were used with cut frequencies ( $f_c$ ) of 2.5MHz, 5Mhz, and 10MHz considering that CDR peaking frequency is around 20MHz. As expected, the observability of the functions is improved. However, it can be seen from Fig. 42 that the $f_c$ should be chosen carefully; for example, if too much filtering is done ( $f_c = 2.5$ MHz in this example), it is difficult to detect oscillations even for an underdamped condition. The proper value for $f_c$ will also be determined by the current condition of the CDR dynamics, or in other words, by the CDR BW. With the use of $R_{XY}(n)$ , an additional filtering process is not necessary. The use of $R_{XY}(n)$ approach takes advantage of the CDR itself as the required filter with automatic BW adjustment. The following sections details this point. **5.3.2. Filtering Properties** Due to the multiplying nature of the CPSD in the frequency domain, we can take advantage of additional filtering and focus just Figure 42. Observability improvement on $R_X(n)$ when low-pass filtering is added at BBPD output. on CDR dynamics. The filtering property of CPSD is studied using the following input signal: $$\psi_{IN}(t) = \eta(t, \mu, \sigma) + A\sin(2\pi f_{ton}t), \tag{66}$$ where $\psi_{IN}(t)$ is the input jitter signal in the continuous time domain, $t=n/F_s$ , $F_s$ is the sampling frequency of the discrete system; $\eta(t,\mu,\sigma)$ corresponds to a Gaussian noise with mean $\mu$ and a standard deviation of $\sigma$ , $f_{ton}$ is the frequency of the test tone with an amplitude of A. The $f_{ton}$ value is changed among different values as follows: 100kHz, 10MHz, and 500MHz for the case 1 with $K_G=1$ obtaining a PM = 66°, and 100kHz, 27MHz and 500MHz for the case 2 with $K_G=4$ obtaining a PM = 22°. The intermediate values for $f_{ton}$ (10MHz and 27MHz) are changed according to $K_G$ to be the peaking frequency in the system. For both cases A=0.05UI in order to operate the system in a small signal regimen. Two sets of $R_X(n)$ and $R_{XY}(n)$ plots are generated for each case to view the filtering effect for different stability conditions. Fig. 43 plots these results for PM = 66°, and Fig. 44 does the same for PM = 22°. For case 1, a less underdamped response is obtained. When the tone with frequency $f_{ton}$ is in-band regarding the BB-CDR bandwidth, which is around 10MHz in this example, it can be seen that both correlation functions filter the tone signal properly. For the peaking frequency, both approaches show oscillations as ex- Figure 43. Case 1. Filtering property comparison for three different frequencies between autocorrelation (red) and cross-correlation (blue) for a well dumped condition. PM = 66°. pected. However, for high-frequency content, the autocorrelation function $R_X(n)$ can not extract information from the system because of the high-frequency components. In contrast, at $R_{XY}(n)$ tracks the same behavior as in the in-band case despite the presence of the $f_{ton}$ component, thus demonstrating filtering properties. Let us exercise a more underdamped system, which corresponds to the case with PM=22 shown in Fig. 44. For this case, the BB-CDR dynamics are changed employing a loop gain increment, leading a new BW close to 27 MHz. Both $R_X(n)$ and $R_{XY}(n)$ detect the oscillation in the system considering the poor PM. Again, when the test tone is out-band, the $R_{XY}(n)$ allows obtaining the proper information easier than $R_X(n)$ as Eq. (65) states. The above examples show a strong capability of $R_{XY}(n)$ to filter out-band noise in comparison with $R_X(n)$ . At higher frequencies, the responses obtained using $R_{XY}(n)$ are very similar to those at low frequencies regardless of the PM. For a frequency close to CDR bandwidth (BW), in this case $f_{ton} = 10$ MHz (or 27MHz), we obtained similar behavior in both methods $R_X(n)$ and $R_{XY}(n)$ , they Figure 44. Case 2. Filtering property comparison for three different frequencies between autocorrelation (red) and cross-correlation (blue) for an underdamped condition with PM = 22°. show the oscillation condition at the peaking frequency. **5.3.3. PI Jitter Impact** Jitter coming from PLL and PI may also impact the BB-CDR dynamics. As mentioned in Section 5.2, the profile presented in $J_{PI}$ may influence both $R_X(n)$ and $R_{XY}$ . For this reason, we evaluate the jitter impact coming from $J_{PI}$ among different BW conditions. Three different noise profiles are included. All profiles have Gaussian $\psi_{IN}$ with $\sigma=0.04$ UI to emulate a common jitter level coming from input data. For explanation purposes, we use the same flat power level of -112dBc/Hz for the $J_{PI}$ . The difference among conditions lies in the bandwidth of each $J_{PI}$ condition. Profile 1 has a $f_{-3dB}=100$ MHz for $J_{PI}$ , profile 2 presents a $f_{-3dB}=10$ MHz, and profile 3 has a $f_{-3dB}=1$ MHz. Fig. 45(a) and (b) show the phase noise nature in time and frequency domain respectively. Note that in this work, we are not interested in accurate modeling for noise profiles; instead, we perform the proof of concept using simpler modeling with practical values regarding communication protocols. We consider this approach is good enough to illustrate the fundamental idea. Figure 45. Bandwidth-limited jitter noise profiles for $J_{PI}$ used to compare the response of $R_X(n)$ and $R_{XY}(n)$ . a) Time domain, and b) frequency domain. Figure 46. Correlation functions at the BBPD output using the time-step model for different jitter profiles in $J_{PI}$ : a) Autocorrelation, b) cross-correlation. Fig. 46(a) plots the results for $R_X(n)$ and Fig. 46(b) for $R_{XY}(n)$ . As expected, these results are clear evidence of how the two methods differ when high-frequency band-limited jitter, coming from PLL, is injected into the system. The content of high frequencies coming from $J_{PI}$ adds the Gaussian $\psi_{IN}$ leading to an average $K_{BB}$ gain reduction. This reduction leads a CDR less underdamped. The cross-correlation function filters the high-frequency content and reveals a system with less oscillation. In contrast, the autocorrelation function still presents oscillations, considering its high-pass shape, making difficult to extract information. Although filtering the BBPD output reduces the oscillation on $R_X(n)$ as proposed in $^{66}$ , there are still some system limitations with this approach in comparison with the use of $R_{XY}(n)$ . First, careful selection of the filter bandwidth must be done manually. In contrast, $R_{XY}(n)$ performs this selection automatically due to the inherent filtering performed by the CDR itself and reflected in the CPSD. Second, to develop a very portable strategy, the filter BW must be adjusted according to the specifications of different CDR designs for the $R_X(n)$ case. Again, this is not a concern using the $R_{XY}(n)$ approach since cross-correlation function adopts the filtering BW directly from the CDR frequency response. ### 5.4. PROPOSED LOOP GAIN ADAPTATION The results of the preliminary analysis offer a compelling basis to explore and develop the crosscorrelation-based adaptation scheme XCALG. To do this, we consider the estimation of two key points in the $R_{XY}(n)$ function, namely the first zero-crossing $m_0$ , and the first peak point $m_{peak}$ . Fig. 47 highlights these two points. From the results shown in Figs. 43, 44, and Fig. 45 we see that oscillations in $R_{XY}(n)$ arise when the system presents a poor PM, in other words, excessive loop gain. A poor PM in terms of stability, usually corresponds to a PM less than 60°. A near-optimal condition to minimize total jitter contributions, while attending bandwidth requirements for the CDR, is achieved when PM approximates to $60^{\circ}$ 66. Around near-optimal stability condition, $R_{XY}(m_{peak})$ vanishes while for an excessive PM $R_{XY}(m_{peak})$ may achieve positive values. In poor PM conditions, $R_{XY}(m_{peak})$ dips into negative values. **5.4.1. Adaptation Procedure** Previous observations suggest that a loop gain adaptation scheme may be performed by monitoring $R_{XY}(m_{peak})$ . The key idea is to estimate the $R_{XY}(m_{peak})$ value based on a first calculation of $R_{XY}(m_0)$ . This allows us to calculate the $R_{XY}(n)$ function just until the first zero-crossing, avoid- Figure 47. Definitions for $m_0$ and $m_{peak}$ in the cross-correlation function. ing extra calculations for $n>m_0$ . Also, this estimation averts the numeric noise issue, which appears when the peak is searched using a derivative approach. We assume that the zero-crossing and the first peak points are related by $$m_{reak} = \zeta m_0, \tag{67}$$ where $\zeta$ is the scaling factor that the CDR system imposes on these two parameters. To find a proper relation between $m_0$ and $m_{peak}$ , it is necessary to explore an alternative form to express $S_{ER,W}(f)$ , we may write $$H_W(f) = \frac{1}{K_{DPC}} H_{ER}(f) L_G(f),$$ (68) which suggests that $H_W(f)$ is a filtered version of $H_{ER}(f)$ . To calculate $S_{ER,W}(f)$ , we proceed with the multiplication and conjugation operators, $$H_{ER}(f)H_W^*(f) = \frac{1}{K_{DPC}}H_{ER}(f)H_{ER}^*(f)L_G^*(f), \tag{69}$$ and then, taking the estimation and extracting the limit as in Eq. (57) we obtain: $$S_{ER,W}(f) = \frac{1}{K_{DPC}} S_{ER}(f) L_G^*(f).$$ (70) Unlike $S_{ER}(f)$ in Eq. 58, which is composed by only magnitude terms, $S_{ER,W}(f)$ in Eq. (70) has magnitude and phase components; in other words, this CPSD suffers from phase distortion. The phase distortion component comes from the $L_G^*(f)$ term, which corresponds to the phase conjugate in the loop-gain function $L_G(f)$ . We assume that frequency of interest f is much smaller than the sampling frequency $F_s$ , which is the typical case in digital CDR systems. Then, for $f << F_s$ we have $$1 - z^{-1} \to 1 - e^{-j2\pi fT} \approx j2\pi fT,$$ (71) with $T = 1/F_s$ . The magnitude and phase of the loop gain may be expressed as follows $$L_G(f) = K \left( K_P + \frac{K_F}{j2\pi fT} \right) \frac{e^{-j2\pi fTN_L}}{j2\pi fT},\tag{72}$$ $$|L_G(f)| = \frac{K}{2\pi f T} \sqrt{K_P^2 + \left(\frac{K_F}{2\pi f T}\right)^2},\tag{73}$$ and $$\Theta(f) = -tan^{-1} \left( \frac{K_F}{2\pi f T K_P} \right) - \frac{\pi}{2} - 2\pi f T N_L, \tag{74}$$ where K is $K_{BB}K_VK_G$ . For a phase distortionless system, the phase function $\Theta(f)$ must be linear, thus, both the group delay $\tau_g$ and the phase delay $\tau_{\psi}$ expressed by Eqs. (75) and (76) respectively must be written as $$\tau_g = -\frac{1}{2\pi} \frac{d\Theta}{df},\tag{75}$$ $$\tau_{\psi} = -\frac{\Theta}{2\pi f}.\tag{76}$$ Finding an exact expression for $m_{peak}/m_0$ ratio is a complex and impractical problem in view of the uncertainty in several parameters, such as latency and total jitter. Also, PVT variations may exacerbate the problem in real implementations. Figure 48. Ratios $m_{peak}/m_0$ and $n_{peak}/n_0$ as functions of $K_G$ gain. For the above reasons, we chose to use time-step modeling for the CDR in order to extract and analyze the $m_{peak}/m_0$ ratio across CDR parameters variations. In particular, we are interested in observing the impact on the $m_{peak}/m_0$ ratio due to variations in latency and $K_G$ gain considering the aforementioned uncertainty of these quantities. For this reason, we proceed as follows: First, the analysis of the $m_{peak}/m_0$ is performed varying the adaptive gain $K_G$ and latency. Second, we track the PM behavior across these variations in order to ensure good stability for the adapted system. Finally, we decide on the proper $m_{peak}/m_0$ ratio. Using the model described by the parameters in Table 5, Figs. 48 and 49 plot the behavior of the $m_{peak}/m_0$ as a function of $K_G$ gain and latency respectively. For the sake of completeness and comparison, the $n_{peak}/n_0$ ratio is added for the case of the autocorrelation approach. When $K_G$ is high, the CDR is well underdamped, and oscillations will become larger in both, autocorrelation and cross-correlation functions. At this condition, the $m_{peak}/m_0$ and $n_{peak}/n_0$ ratios tend to the value 2 as Fig. 48 shows. Because of both, phase and amplitude distortion inserted by the loop gain function, the $m_{peak}/m_0$ ratio changes as a function of $K_G$ and achieves a local minimum condition. On the other hand, $n_{peak}/n_0$ ratio is virtually the same for all gain conditions. Fig. 49 shows the behavior of ratios as functions of latency when other param- Figure 49. Ratios $m_{peak}/m_0$ and $n_{peak}/n_0$ as functions of latency $(N_L)$ . eters are fixed to their default values. For more latency presented in the loop, the phase of $L_G(f)$ , and thus the phase of $S_{EW}(f)$ change more quickly with frequency, reducing the PM and making the system more underdamped. For this reason, oscillations will appear on both correlations functions, and the ratios will converge again to 2 for high latency conditions. In addition, while $n_{peak}/n_0$ still remains almost constant among latency variations, $m_{peak}/m_0$ presents some dependency on latency. Due to the above observations in Figs. 48 and 49, we presume that the same local maximum may appear in the PM for both $K_G$ and latency variations. The results presented in Fig. 50 strength our claim and make it plausible that a good indicator may be extracted using a relation for $m_{peak}/m_0$ on this optimal region. In this work, we opted to choose the proper relation between them as $$m_{peak} = \frac{3}{2}m_0.$$ (77) **5.4.2. Implementation Diagram** As studied in $^{66}$ , total jitter in the error signal starts to increase when the PM drops below 60°. We take advantage of this criterion by ensuring that CDR dynamics result in an adequate PM. Also, with the relation obtained in the previous section between $m_{peak}$ and $m_0$ points in $R_{XY}(n)$ Figure 50. Phase margin and $m_{peak}/m_0$ relation across variations presented in Figs. 48 and 49. function, we assemble the XCALG scheme as depicted in Fig. 51. XCALG takes the cross-correlation function between the MJV output and the CDR filter output and estimates a value that we call $R_{XY}(m_{peak})$ using two phases. The first phase, consists in the estimation of $m_0$ , the value at which $R_{XY}(n) \approx 0$ . In the second phase, the adaptation calculates $m_{peak} = \alpha m_0$ (where $\alpha = 3/2$ in this case) obtaining an estimation of $R_{XY}(m_{peak})$ , and compares this result with a given threshold value $R_0$ . Setting $R_0$ threshold to 0 gives a PM about $60^\circ$ when the $K_G$ is adapted. Based on this comparison, the loop filter after the $R_0$ arbiter, which corresponds to an accumulator that increases (or decreases) $K_G$ by steps of $\Delta K_G = 0.05$ , and a new $R_{XY}(m_0)$ estimation starts again. The adaptation process continues until $R_{XY}(m_{peak})$ achieves a positive value. Note that the block $z^{-k}$ represents an adjustable discrete delay block that is used to perform the signal shifting for the cross-correlation estimation. The k is just the index of the cross-correlation function $R_{XY}(k)$ , which depends on the adaptation phase. In the first phase $k=m_0$ , in the second phase $k=m_{peak}$ . Unlike <sup>66</sup> and <sup>43</sup> that add a filter at the BBPD output to reduce jitter noise, we reuse the CDR loop filter and apply cross-correlation to obtain a result more independent of jitter sources. Figure 51. Proposed XCALG system diagram. # **5.5. SIMULATIONS AND RESULTS** A case-study approach of a compliant USB3.1 Gen1 CDR model was used to conduct this exploratory study. In particular, we opted to present behavioral simulations to validate the proposed XCALG. As an additional result, we added hardware implementation of the cross-correlation estimator circuit to demonstrate feasibility on silicon implementations. **5.5.1. Behavioral Simulations** The CDR is modeled using the time-step model of Fig. 40 with the following parameters: $K_P = 2$ , $K_F = 2^{-6}$ , $K_{DPC} = 2^{-10}$ , L = 8, TD = 0.5 and $N_L = 5 * L = 40$ assuming 5 pipeline stages in the digital synthesis of the CDR loop filter. In contrast with linear frequency modeling, this model intrinsically considers quantization noise from the BBPD and the MJV. Then, the simulation setup only requires input data noise and jitter coming from the PI. Random jitter sources exist as Gaussian noise generated by the transmitter and receiver PLL. The latter couples into the system through the PI. We perform simulations of the adaptation using a total of six tests. Tests 1, 2, and 3 correspond to three levels of random Gaussian noise for input data $\sigma_{IN}$ =0.03, 0.04, 0.06UI, and no jitter coming from the PI. This procedure allows seeing the impact of the Table 6. Test Conditions for Adaptation. | Test | Input Data Jitter | PI Jitter Profile | | |------|-----------------------|---------------------------|--| | | (Gaussian Noise) | (Flat Level = -112dBc/Hz) | | | 1 | $\sigma_{IN}=0.06UI$ | 0 | | | 2 | $\sigma_{IN}=0.04$ Ul | 0 | | | 3 | $\sigma_{IN}=0.03UI$ | 0 | | | 4 | $\sigma_{IN}=0.04UI$ | Profile 1: BW=100MHz | | | 5 | $\sigma_{IN}=0.04UI$ | Profile 2: BW=10MHz | | | 6 | $\sigma_{IN}=0.04$ UI | Profile 3: BW=1MHz | | jitter power due only to incoming data. The magnitudes for input jitter noise used in these tests are reasonable values based on the jitter budgeting for the standard USB 3.1 $^{7}$ . On the other hand, tests 4, 5, and 6 fix the Gaussian data jitter to $\sigma_{IN}=0.04$ UI and include BW-limited jitter sources from the PI using the same profiles as in Fig. 45. Table 6 summarizes the tests conditions used to perform the XCALG. Fig. 52 demonstrates that the XCALG converges to an adapted $K_G$ value for each test case. Fig. 52 shows the results for tests 1, 2, and 3. As expected, a large amount of random noise decreases the CDR loop gain; thus the $K_G$ obtained from adaptation is proportional to $\sigma$ . Fig. 53 shows the results for tests 4, 5, and 6. As expected, for more BW (Profile 1) more jitter adds to the input jitter, reducing the equivalent $K_{BB}$ gain. For this reason, the adaptation settles down in a higher $K_G$ in order to compensate the $K_{BB}$ reduction. In other words, more BW-limited jitter coming from the PI results in a less underdamped system for the same $K_G$ level. For all tests, the dithering presented at the end of the adaptation may be reduced by adding hysteresis. After $K_G$ adaptation, a CDR with improved BER is obtained because the XCALG derives in a PM around 60°, no ringing is observed in the jitter-tolerance function (JTOL), and the eye aperture of data is maximized. To explore that, tests 1 and 2 are used to extract the optimal $K_G$ , which improves the high-frequency JTOL Figure 52. Evolution of the adapted $K_G$ for only Gaussian $\Psi_{IN}$ . Figure 53. Gaussian $\Psi_{in}$ and jitter profiles injected in $J_{PI}$ . response. First, the XCALG is turned off and $K_G$ value is manually swept to extract the minimum JTOL value using time-step simulations. Fig. 54(a) plots the results using continuous lines. From this approach, optimal values are obtained with $K_G$ between 1 - 1.5 for $\sigma = 0.03$ UI, and 0.9 - 1.2 for $\sigma = 0.04$ UI. After that, $K_G$ is set again to a high value and the XCALG is turned on. The highlighted squared point in each curve corresponds to the $K_G$ value reached by our XCALG after dither suppression, demonstrating that the adapted $K_G$ is near to the optimal. Fig. 54(b) illustrates how minimum JTOL point in Fig. 54(a) is obtained using the test 1 as example. For low frequencies, the JTOL decreases with low $K_G$ , degrading the tracking capability for low-frequency jitter as occurs in a sinusoidal jitter (SJ) tolerance test. This is a well-known trade-off between the high and low-frequency response of JTOL. Although the work presented in $^{\rm 43}$ suggests an alternative to alleviate this issue, they do not establish details on how to implement it and nor discussion is presented on the adaptation criterion. We envision a possible approach as follows: take more samples in the M-size buffers in order to observe lower frequencies in the correlation functions, and keep monitoring the Figure 54. Extracted optimal $K_G$ procedure. (a) Minimum JTOL value obtained by manual seeking and via adaptation technique. (b) Example of how each point of manual extracted $K_G$ is obtained from JTOL curve. Figure 55. Extracted optimal $K_G$ procedure. Minimum JTOL value obtained by manual seeking and via adaptation technique for $m_{peak}/m_0$ assuming 1.2, 1.5, and 1.8 values. $R_{XY}(n)$ to capture oscillations due to SJ. However, trying to monitor low frequencies due to SJ may exacerbate the complexity of the digital implementation and area penalty as we see in the following section. Note that the adaptation requires setting a factor between $m_{peak}$ and $m_0$ as Eq. (77) states. This factor arises from the design process; however, for the sake of completeness, we decided to modify the actual value $\alpha$ set in the adaptation to +/-20% from the nominal (in this case 3/2). Table 7 summarizes the setup to analyze the impact of $m_{peak}/m_0$ on the adaptation result. Fig. 55 plots the results obtained by hand adaptation and using the XCALG. In all cases, the adaptation points obtained by XCALG are within less than 3% of the optimal solution even Table 7. Test Conditions for Adaptation. | Input Data Jitter | Adaptation ratio | |-------------------------|------------------------| | (Gaussian Noise) | (α) | | $\sigma_{IN} = 0.03$ UI | $\alpha = \zeta = 1.5$ | | $\sigma_{IN}=0.03 UI$ | +20% = 1.8 | | $\sigma_{IN}=0.03$ UI | -20% = 1.2 | | $\sigma_{IN}=0.04 UI$ | $\alpha = \zeta = 1.5$ | | $\sigma_{IN}=0.04$ UI | +20% = 1.8 | | $\sigma_{IN}=0.04$ UI | -20% = 1.2 | though the $\alpha$ value was chosen sub-optimally. It is important to distinguish between the factor $\zeta$ stated in Eq. (77) and the factor $\alpha$ employed in the adaptation. Eq. (77) shows the final relation between $m_{peak}$ and $m_0$ once the system is stabilized with a PM around 60° as Fig. 50 shows. Then, the adaptation will converge through the curve in Fig. 48 and will stop around $R_{XY}(m_{peak})=0$ . At the end of the adaptation, the actual ratio $m_{peak}/m_0$ will deviate from $\zeta$ by means of the inaccurate setting of $\alpha$ . In our view, the results emphasize the functionnality of XCALG and offer a compelling alternative for loop gain adaptation in BB-CDR. **5.5.2.** Cross-correlation Hardware Implementation Although the focus of this work is on the theoretical framework of XCALG, we are aware that our research can arise two main questions related to hardware implementation, namely power and area costs. In this context, it is important to know that the loop adaptation can also be run in background mode considering that taking a decision on the adapted gain is a low duty-cycle operation. Nonetheless, we have decided to explore custom circuit implementation to show immediate functionality tests and potential silicon manufacturing. With this in mind, we discuss the area costs considering custom logic in this section. A cross-correlation estimation circuit in the RTL equivalent is shown in Fig. 56. Figure 56. RTL implementation for the proposed cross-correlation estimator circuit. This circuit takes three input signals labeled X[m], Y[m], and K, and calculates $R_{XY}(K)$ . The input signals X[m] and Y[m] pass through FIFO buffers which sample and window the signals, these FIFOs are loaded only once before the correlation operation is performed, and will be loaded again in the second phase when the adaptation is obtaining an estimation of Rxy(mpeak). The circuit selects the outputs of the buffers depending on Wc and Wc+K values. The Wc comes from the current state of a window counter, which performs the discrete signal shifting for calculating each sample of the cross-correlation function. K is the index of the cross-correlation sample that is calculated. Therefore, using Wc and Wc+K as indexes, we can obtain the behavior of an adjustable discrete delay of $z^{-K}$ . The reference counter Wc counts from zero to WL - K. Using this reference counter the circuit takes advantage of the zero-padding technique and reduces the number of operations in $R_{XY}(K)$ estimation. When the index K increases, the number of operations decreases linearly. The delayed signals pass through a multiplieraccumulator (MAC LOGIC), Depending on the value that came from MJV (1, 0 or -1) MAC LOGIC decides if a conversion to two's complement of the shifted version of Y[m] is performed (-1 case), not performed (1 case) or the accumulator value is not updated (0 case). This boolean function reduces the combinational Figure 57. Cross-correlation post-synthesis results vs Matlab xcorr. Figure 58. Matlab 8192 samples per window vs cross-correlation circuit. delay path compared with a complete multiplicator. Once the reference counter reaches its maximum value (WL - K), the value of the MAC LOGIC is sampled, and the circuit provides $R_{XY}(K)$ . The circuit in Fig. 56 was implemented in a 65nm CMOS technology node. **5.5.3. Window Size and Area Penalty** Window size affects the observability of the cross-correlation function and also impacts the area costs in the system. To explore this impact, we implement a mixed signal simulation measuring the cross-correlation function between the MJV output and BB-CDR filter output with the $K_G$ value set to the nominal value of 1. A BB-CDR described in Verilog language is used in order to generate the test signals to make cross-correlation. The comparison results between the circuit implementation and MATLAB simulation are presented in Fig. 57. The Fig. 58 plots the RMS error normalized to the peak in the cross-correlation function near to zero for different window sizes. In this example, when the number of samples per window is less than 1024, the cross-correlation function deteriorates due to the settling time associated with the CDR dynamics. Per contra, for sample quantities greater than 1024, there are no significant changes in the error. Note that the number of samples will also determine the minimum frequency $f_{min}$ that the XCALG can track, which is $$f_{min} = \frac{F_s}{M'},\tag{78}$$ where M is the window size for the buffers. This suggests that in a real implementation, from the point of view of the adaptation system, the in-band noise can be further filtered. This is good in terms of filtering properties but could be a limitation regarding sinusoidal jitter detection. Note that this trade-off occurs regardless of whether the adaptation is implemented with autocorrelation or cross-correlation. Despite its preliminary character, the research reported here would seem to indicate a first insight into the hardware costs of the system. Table 8 presents a comparison between the area of synthesis implementation and window size. These results show that in all cases, most of the area was spent in samples storage that leads to a trade-off between area and window size. Fig. 59 shows a final layout of the cross-correlation estimator circuit with a window size of 1024 samples. The figure highlights in pink the area occupied by the FIFO which stores the data samples coming from the CDR filter. Blue area corresponds to the FIFO, which stores the data samples coming from the MJV, and the grey region is the cross-correlation datapath. As can be seen in the area teardown, about 70% of the occupied area is due to the FIFO for the signal Y[m] coming from the CDR loop filter output. The penalty Table 8. Synthesized estimated area among different window sizes for the cross-correlation estimator circuit. | Window size | Area Datapath $[\mu m^2]$ | Total Area [µm <sup>2</sup> ] | |-------------|---------------------------|-------------------------------| | 256 | 721 | 26682 | | 512 | 783 | 57047 | | 1024 | 823 | 116838 | | 2048 | 888 | 237909 | | 4096 | 952 | 464359 | | 8192 | 1016 | 1014172 | Figure 59. Layout implementation for the cross-correlation estimator circuit. a) Layout, b) area teardown. for implementing cross-correlation compared to autocorrelation depends on the number of samples taken for the desired accuracy. We can estimate a comparison based on our preliminary results. For this, the same number of samples and similar datapath logic is assumed in both cases. Looking at Fig. 59 we can see that the area is dominated by FIFOs sizes and the remarkable difference is the size of Y[m]. Y[m] size will be the same as X[m] using the autocorrelation approach. With this in mind, we extract the following area teardown of Fig. 59: Y[m] around 68.7%, X[m] around 27.4%, and logic around 3.9%. The cross- correlation implies Y + X + logic = 100%. On the other hand, autocorrelation would be 2X + logic = 58.7%, which corresponds to a reduction of 41.3%. With this approach, we still need to take into account the area costs of LPF implementation in the autocorrelation case, increasing the area costs of the autocorrelation + LPF approach. We do not expect that the extra area from LPF will exceed the 41.3% obtained before. Thus, we expect that the cross-correlation approach could be more area expensive than autocorrelation in a custom circuit logic scenario. Therefore, a trade-off may exist between the benefits of XCALG from a system point of view and the area costs if a custom circuit implementation is adopted. ## 5.6. CONCLUSION A cross-correlation-based adaptive loop gain technique (XCALG) has been demonstrated. The theoretical framework for the technique is explained in detail, exploiting the link between the cross-correlation $R_{XY}(n)$ function and the cross-power spectral density (CPSD). Filtering properties of the CPSD between the majority voting output and the CDR loop filter decrease the impact of the in-band and out-band jitter on the shape of the $R_{XY}(n)$ function while enhancing observability of the system. Although autocorrelation approach $R_X(n)$ plus filtering the BBPD output may improve the observability, there are still some system limitations with this approach in comparison with the use of $R_{XY}(n)$ . First, careful selection of the filter bandwidth must be done manually. In contrast, $R_{XY}(n)$ performs this selection automatically due to the inherent filtering performed by the CDR itself, which is reflected in the CPSD. Second, to develop a very portable strategy, the filter BW must be adjusted according to the specifications of different CDR designs for the $R_X(n)$ case. Again, this is not a concern of using the $R_{XY}(n)$ approach since cross-correlation function adopts the filtering BW directly from the CDR frequency response. Loop latency and gain variations analysis are also included. The XCALG allows the BB-CDR to achieve a near-optimal condition regarding Jitter Tolerance (JTOL) and guarantees a good phase margin. Finally, preliminary hardware implementation of the cross-correlation function in a 65nm technology CMOS node explores the direct application feasibility. #### 6. CONTRIBUTIONS AND CONCLUSIONS Several conclusions raise from the work done in this dissertation. They are compiled and explicated in this chapter. ## 6.1. CONTRIBUTIONS SUMMARY The summary of the key contributions of this dissertation is described as: - The impact of the channel loss in CDR loop gain and the demonstration of non-evident increasing in loop gain for some cases where the incoming jitter is increased too <sup>64</sup>. - Presentation and modeling of the stochastic resonance phenomenon in clock and data recovery circuits <sup>67</sup>. - A design methodology for DPLL-based CDR circuits. - A novel technique for CDR loop gain adaptation using cross-correlation functions to improve system dynamics <sup>7273</sup>. We call this method XCALG. - A method proposal for clock and data recovery using Nonlinear Laplacian Spectral Analysis <sup>74</sup>. J. ARDILA and E. ROA. "A Novel Loop Gain Adaptation Method for Digital CDRs Based on the Cross-Correlation Function". In: 2019 IEEE International Symposium on Circuits and Systems (ISCAS). 2019, pp. 1–4. DOI: 10.1109/ISCAS.2019.8702751. J. ARDILA, H. MORALES, and E. ROA. "On the Cross-Correlation Based Loop Gain Adaptation for Bang-Bang CDRs". In: *IEEE Transactions on Circuits and Systems I: Regular Papers* 67.4 (2020), pp. 1169–1180. DOI: 10.1109/TCSI.2019.2952532. J. ARDILA, A. AMAYA, and E. ROA. "Method and Circuit for Recovering Clock andData Signals.." In: Superintendencia de Industria y Comercio CO2017008770A1 (2020). In addition, this research work also includes some contributions for analog and mixed signals circuits presented inside of any SoC, which is also the common environment where the high-speed interfaces reside. Based on the above, the following conclusions are offered. 75767778798081. ## 6.2. CONCLUSIONS In certain conditions, when the jitter noise level coming from the input data increases, an increase in the phase detector gain is observed. To explain this, an extraction procedure was presented to get the actual value of the $K_{PD}$ under different conditions of incoming jitter and channel loss. The unexpected incre- J. ARDILA et al. "A Stable Physically Unclonable Function Based on a Standard CMOS NVR". in: 2020 IEEE International Symposium on Circuits and Systems (ISCAS). 2020, pp. 1–4. DOI: 10.1109/ISCAS45731.2020.9180411. C. DURAN et al. "A 32-bit RISC-V AXI4-lite bus-based microcontroller with 10-bit SAR ADC". in: 2016 IEEE 7th Latin American Symposium on Circuits Systems (LASCAS). 2016, pp. 315–318. DOI: 10.1109/LASCAS.2016.7451073. C. DURAN et al. "A system-on-chip platform for the internet of things featuring a 32-bit RISC-V based microcontroller". In: 2017 IEEE 8th Latin American Symposium on Circuits Systems (LASCAS). 2017, pp. 1–4. DOI: 10.1109/LASCAS.2017.8126878. A. AMAYA, J. ARDILA, and E. ROA. "A Digital Offset Reduction Method for Dynamic Comparators Based on Phase Measurement". In: 2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI). 2017, pp. 661–664. DOI: 10.1109/ISVLSI.2017.120. N. CUEVAS, J. ARDILA, and E. ROA. "An All-Thin-Devices Level Shifter in Standard-Cell Format for Auto Place-and-Route Flow". In: 2019 IEEE 10th Latin American Symposium on Circuits Systems (LASCAS). 2019, pp. 45–48. DOI: 10.1109/LASCAS.2019.8667578. J. SANTAMARIA et al. "A Family of Compact Trim-Free CMOS Nano-Ampere Current References". In: 2019 IEEE International Symposium on Circuits and Systems (ISCAS). 2019, pp. 1–4. DOI: 10.1109/ISCAS.2019.8702294. C. DURAN et al. "An Energy-Efficient RISC-V RV32IMAC Microcontroller for Periodical-Driven Sensing Applications". In: 2020 IEEE Custom Integrated Circuits Conference (CICC). 2020, pp. 1–4. DOI: 10.1109/CICC48029.2020.9075877. ment in the phase detector gain is explained through the extraction and analysis of the probability density functions for channel loss. Also, an increment on $K_{PD}$ where sinusoidal and uniform jitter are combined is explained and its impact on the CDR dynamic response is presented <sup>64</sup>. As a final comment, the maximum $K_{PD}$ value is not always reached at 0UI and this suggests that for some conditions, the phase sampling point of the data can be changed from 0 UI to the point where a maximum occurs. For further explanation fo the above observation, a mathematical model for $K_{BB}$ value when uniform and sinusoidal jitter noise are faced in a DPLL-based CDR was presented and validated through time-step simulations <sup>67</sup>. Stochastic resonance is demonstrated under the interaction between these two types of noise, presenting a maximum value for $K_{BB}$ even when one of the noise components is increased. The impact on the JTF response is discussed and it is shown how SR can degrade the dynamics and stability of CDR systems. Finally, at low frequencies, SR can impact the JTOL function in a positive way for some cases, and it does not matter for the high-frequency response. Due to the aforementioned phase detector gain dependence on jitter sources, the loop gain of CDR systems can vary under different conditions of incoming jitter. In some cases, the resulting loop gain can lead to a low phase margin, causing instability issues. For this reason, an adaptive gain is desired for safe CDR response across multiple operation conditions and designs. A cross-correlation-based adaptive loop gain technique (XCALG) has been demonstrated. The theoretical framework for the technique is explained in detail, exploiting the link between the cross-correlation $R_{XY}(n)$ function and the cross-power spectral density (CPSD) $^{7273}$ . Filtering properties of the CPSD between the majority voting output and the CDR loop filter decrease the impact of the in-band and out-band jitter on the shape of the $R_{XY}(n)$ function while enhancing observability of the sys- Figure 60. Chip layout of the High-Speed Serial Interface designed in this work. Size: **1.66mm x 1.66mm.** tem. Although autocorrelation approaches $R_X(n)$ plus filtering the BBPD output may improve the observability, there are still some system limitations with this approach in comparison with the use of $R_{XY}(n)$ . First, careful selection of the filter bandwidth must be done manually. In contrast, $R_{XY}(n)$ performs this selection automatically due to the inherent filtering performed by the CDR itself, which is reflected in the CPSD. Second, to develop a very portable strategy, the filter BW must be adjusted according to the specifications of different CDR designs for the $R_X(n)$ case. Again, this is not a concern of using the $R_{XY}(n)$ approach since the cross-correlation function adopts the filtering BW directly from the CDR frequency response. Loop latency and gain variations analysis are also included. The XCALG allows the BB-CDR to achieve a near-optimal condition regarding Jitter Tolerance (JTOL) and guarantees a good phase margin. In order to design a complete serial interface for the CDR system, co-modeling and co-design are very useful and effective strategies, which means that several blocks are modeled and designed synergistically across several levels of abstraction as discussed in Chapter 4. As a final product of all the experience and the main contributions exposed in this work, a complete high-speed serial interface using XCALG for loop gain adaptation was sent to fabrication in a 180nm CMOS technology. Fig. 60 shows the layout of this interface. ## 6.3. SUGGESTIONS FOR FUTURE RESEARCH Although NLSA-based CDR architecture presents several challenges to be implemented, as discussed in Appendix 6.4, it is important to highlight that this idea is new. Two approaches to accomplish a hardware implementation may rise as alternatives to be explored. First, the usage of a traditional CDR and modified it to exploit the post-processing in electronic instrumentation. And second, even more challenging, to propose a new CDR architecture that overcomes and implements the NLSA directly. Probably this novel CDR does not fall in any of the classifications presented in Chapter 1. The arduous task now is to find it. One of the major scopes of this thesis was to explore new adaptive loop gain methods. Furthermore, the proof of concept of the XCALG method seems to be good enough for implementation. The trade-offs are clear regarding area penalty in comparison with autocorrelation approaches. However, we envision a strategy which to the author's knowledge may reduce significantly the area penalty. The strategy involves the use of sub-resolution in the signal coming from the phase register. In a locking condition, the phase register contains the jitter information either around a settled value (if SSC is not presented in the system) or in a known SSC ramp scheme, then it is possible to subtract the bias component or to filter the SSC ramp and just consider the signal component which can be represented with fewer bits. As a result, the FIFO size for the phase register signal in the XCALG module could be reduced. Further exploration of this idea is a strong recommendation for future work. #### 6.4. PUBLICATIONS ### **Journal Articles** **J. Ardila**, H. Morales and E. Roa, "On the Cross-Correlation Based Loop Gain Adaptation for Bang-Bang CDRs," in IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 67, no. 4, pp. 1169-1180, April 2020, doi: 10.1109/TCSI.2019.2952532. A. Amaya, **J. Ardila**, E. Roa, "A Digital Phase-Based On-Fly Offset Compensation Method for Decision Feedback Equalizers".IET Circuits, Devices and Systems, 2020. ## **Conference Proceedings** - **J. Ardila**, J. Santamaria, K. Florez and E. Roa, "A Stable Physically Unclonable Function Based on a Standard CMOS NVR," 2020 IEEE International Symposium on Circuits and Systems (ISCAS), Sevilla, 2020, pp. 1-4, doi: 10.1109/ISCAS45731.2020.9180411. - **J. Ardila** and E. Roa, "A Novel Loop Gain Adaptation Method for Digital CDRs Based on the Cross-Correlation Function," 2019 IEEE International Symposium on Circuits and Systems (ISCAS), Sapporo, Japan, 2019, pp. 1-4, doi: 10.1109/IS-CAS.2019.8702751. - **J. Ardila** and E. Roa, "Stochastic resonance in bang-bang phase detector gain and the impact on CDR locking," 2018 IEEE 9th Latin American Symposium on Circuits Systems (LASCAS), Puerto Vallarta, 2018, pp. 1-4, doi: 10.1109/LAS- # CAS.2018.8399933. - **J. Ardila** and E. Roa, "On the impact of channel loss on CDR locking," 2016 IEEE 59th International Midwest Symposium on Circuits and Systems (MWSCAS), Abu Dhabi, 2016, pp. 1-4, doi: 10.1109/MWSCAS.2016.7870075. - C. Duran, ... **J. Ardila** et al., "An Energy-Efficient RISC-V RV32IMAC Microcontroller for Periodical-Driven Sensing Applications," 2020 IEEE Custom Integrated Circuits Conference (CICC), Boston, MA, USA, 2020, pp. 1-4, doi: 10.1109/CICC48029.2020.9075877. - J. Santamaria, N. Cuevas, G. L. E. Rueda, **J. Ardila** and E. Roa, "A Family of Compact Trim-Free CMOS Nano-Ampere Current References," 2019 IEEE International Symposium on Circuits and Systems (ISCAS), Sapporo, Japan, 2019, pp. 1-4, doi: 10.1109/ISCAS.2019.8702294. - N. Cuevas, **J. Ardila** and E. Roa, "An All-Thin-Devices Level Shifter in Standard-Cell Format for Auto Place-and-Route Flow," 2019 IEEE 10th Latin American Symposium on Circuits Systems (LASCAS), Armenia, Colombia, 2019, pp. 45-48, doi: 10.1109/LASCAS.2019.8667578. - A. Amaya, **J. Ardila** and E. Roa, "A Digital Offset Reduction Method for Dynamic Comparators Based on Phase Measurement," 2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Bochum, 2017, pp. 661-664, doi: 10.1109/ISVLSI.2017.120. - C. Duran, ... **J. Ardila** et al., "A system-on-chip platform for the internet of things featuring a 32-bit RISC-V based microcontroller," 2017 IEEE 8th Latin American Symposium on Circuits Systems (LASCAS), Bariloche, 2017, pp. 1-4, doi: 10.1109/LASCAS.2017.8126878. - C. Duran, ... **J. Ardila** et al., "A 32-bit RISC-V AXI4-lite bus-based microcontroller with 10-bit SAR ADC," 2016 IEEE 7th Latin American Symposium on Circuits Systems (LASCAS), Florianopolis, 2016, pp. 315-318, doi: 10.1109/LAS- CAS.2016.7451073. # **Patents** **J. Ardila**, A. Amaya, E. Roa. Method and Circuit for Recovering Clock and Data Signals. CO2017008770A1. Superintendencia de Industria y Comercio - Colombia. ### **BIBLIOGRAPHY** - ABIRI, B. et al. "A 5Gb/s Adaptive DFE for 2x Blind ADC-Based CDR in 65nm CMOS". In: 2011 IEEE International Solid-State Circuits Conference. 2011, pp. 436–438. DOI: 10.1109/ISSCC.2011.5746386 (cit. on p. 27). - AMAYA, A., J. ARDILA, and E. ROA. "A Digital Offset Reduction Method for Dynamic Comparators Based on Phase Measurement". In: *2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)*. 2017, pp. 661–664. DOI: 10.1109/ISVLSI.2017.120 (cit. on p. 128). - ARDILA, J., A. AMAYA, and E. ROA. "Method and Circuit for Recovering Clock and Data Signals.." In: *Superintendencia de Industria y Comercio* CO2017008770A1 (2020) (cit. on p. 127). - ARDILA, J., H. MORALES, and E. ROA. "On the Cross-Correlation Based Loop Gain Adaptation for Bang-Bang CDRs". In: *IEEE Transactions on Circuits and Systems I: Regular Papers* 67.4 (2020), pp. 1169–1180. DOI: 10.1109/TCSI. 2019.2952532 (cit. on pp. 127, 129). - ARDILA, J. and E. ROA. "A Novel Loop Gain Adaptation Method for Digital CDRs Based on the Cross-Correlation Function". In: *2019 IEEE International Symposium on Circuits and Systems (ISCAS)*. 2019, pp. 1–4. DOI: 10.1109/ISCAS. 2019.8702751 (cit. on pp. 127, 129). - ARDILA, J. and E. ROA. "On the Impact of Channel Loss on CDR Locking". In: 2016 IEEE 59th International Midwest Symposium on Circuits and Systems (MWSCAS). 2016, pp. 1–4. DOI: 10.1109/MWSCAS.2016.7870075 (cit. on pp. 58, 61, 127, 129). - ARDILA, J. and E. ROA. "Stochastic Resonance in Bang-Bang Phase Detector Gain and the Impact on CDR Locking". In: 2018 IEEE 9th Latin American - *Symposium on Circuits Systems (LASCAS).* 2018, pp. 1–4. DOI: 10.1109/LASCAS.2018.8399933 (cit. on pp. 81, 83, 127, 129). - ARDILA, J. et al. "A Stable Physically Unclonable Function Based on a Standard CMOS NVR". In: *2020 IEEE International Symposium on Circuits and Systems (ISCAS)*. 2020, pp. 1–4. DOI: 10.1109/ISCAS45731.2020.9180411 (cit. on p. 128). - BLANKMAN, Alan. "Understanding SDAII Jitter Calculation Methods." In: *White Paper v 2.01* (2012) (cit. on p. 46). - BULZACCHELLI, J. F. et al. "A 28-Gb/s 4-Tap FFE/15-Tap DFE Serial Link Transceiver in 32-nm SOI CMOS Technology". In: *IEEE Journal of Solid-State Circuits* 47.12 (2012), pp. 3232–3248. DOI: 10 . 1109 / JSSC . 2012 . 2216414 (cit. on p. 38). - CHUNG, C. C. and W. C. DAI. "A Referenceless All-Digital Fast Frequency Acquisition Full-Rate CDR Circuit for USB 2.0 in 65nm CMOS Technology". In: *VLSI Design, Automation and Test (VLSI-DAT), 2011 International Symposium on.* 2011, pp. 1–4. DOI: 10.1109/VDAT.2011.5783614 (cit. on pp. 27, 28). - CUEVAS, N., J. ARDILA, and E. ROA. "An All-Thin-Devices Level Shifter in Standard-Cell Format for Auto Place-and-Route Flow". In: *2019 IEEE 10th Latin American Symposium on Circuits Systems (LASCAS)*. 2019, pp. 45–48. DOI: 10.1109/LASCAS.2019.8667578 (cit. on p. 128). - DURAN, C. et al. "A 32-bit RISC-V AXI4-lite bus-based microcontroller with 10-bit SAR ADC". In: 2016 IEEE 7th Latin American Symposium on Circuits Systems (LASCAS). 2016, pp. 315–318. DOI: 10.1109/LASCAS.2016.7451073 (cit. on p. 128). - DURAN, C. et al. "A system-on-chip platform for the internet of things featuring a 32-bit RISC-V based microcontroller". In: 2017 IEEE 8th Latin American - Symposium on Circuits Systems (LASCAS). 2017, pp. 1–4. DOI: 10.1109/LASCAS.2017.8126878 (cit. on p. 128). - DURAN, C. et al. "An Energy-Efficient RISC-V RV32IMAC Microcontroller for Periodical-Driven Sensing Applications". In: *2020 IEEE Custom Integrated Circuits Conference (CICC)*. 2020, pp. 1–4. DOI: 10.1109/CICC48029.2020.9075877 (cit. on p. 128). - ESHEL, G. *Spatiotemporal Data Analysis*. Princeton University Press, 2012 (cit. on p. 154). - FRANCESE, P. A. et al. "A 16 Gb/s 3.7 mW/Gb/s 8-Tap DFE Receiver and Baud-Rate CDR With 31 kppm Tracking Bandwidth". In: *IEEE Journal of Solid-State Circuits* 49.11 (2014), pp. 2490–2502. DOI: 10.1109/JSSC.2014.2344008 (cit. on p. 38). - FRANS, Y. et al. "A 0.5-16.3 Gb/s Fully Adaptive Flexible-Reach Transceiver for FPGA in 20 nm CMOS". In: *IEEE Journal of Solid-State Circuits* 50.8 (2015), pp. 1932–1944. DOI: 10.1109/JSSC.2015.2413849 (cit. on p. 38). - FRANS, Y. et al. "A 56-Gb/s PAM4 Wireline Transceiver Using a 32-Way Time-Interleaved SAR ADC in 16-nm FinFET". In: *IEEE Journal of Solid-State Circuits* 52.4 (2017), pp. 1101–1110. DOI: 10.1109/JSSC.2016.2632300 (cit. on p. 29). - FUNG, R. et al. "Dynamics from noisy data with extreme timing uncertainty". In: *Nature* (2016), pp. 471–475. DOI: 10.1038/nature17627 (cit. on pp. 154–156). - GABR, A. and T. KWASNIEWSKI. "Unifying Approach for Jitter Transfer Analysis of Bang-Bang CDR Circuits". In: *Electronics and Information Engineering* (*ICEIE*), 2010 International Conference On. Vol. 2. 2010, pp. V2–40–V2–44. DOI: 10.1109/ICEIE.2010.5559711 (cit. on p. 47). - GANGASANI, G. R. et al. "A 16-Gb/s Backplane Transceiver With 12-Tap Current Integrating DFE and Dynamic Adaptation of Voltage Offset and Timing Drifts in - 45-nm SOI CMOS Technology". In: *IEEE Journal of Solid-State Circuits* 47.8 (2012), pp. 1828–1841. DOI: 10.1109/JSSC.2012.2196313 (cit. on p. 38). - GANGASANI, G. R. et al. "A 32 Gb/s Backplane Transceiver With On-Chip AC-Coupling and Low Latency CDR in 32 nm SOI CMOS Technology". In: *IEEE Journal of Solid-State Circuits* 49.11 (2014), pp. 2474–2489. DOI: 10.1109/JSSC.2014.2340574 (cit. on p. 38). - GARDNER, F. M. "Interpolation in digital modems. I. Fundamentals". In: *IEEE Transactions on Communications* 41.3 (1993), pp. 501–507. DOI: 10.1109/26.221081 (cit. on p. 24). - GIANNAKIS, D. and A. J. MAJDA. "Nonlinear Laplacian spectral analysis for time series with intermittency and low-frequency variability". In: *Proceedings of the National Academy of Sciences of the United States of America* 109.7 (2011), pp. 2222–2227. DOI: 10.1073/pnas.1118984109 (cit. on p. 153). - GOLYANDINA, N. and A. ZHIGLJAVSKY. *Singular Spectral Analysis for Time Series*. Springer, 2013 (cit. on p. 154). - GOPALAKRISHNAN, K. et al. "3.4 A 40/50/100Gb/s PAM-4 Ethernet Transceiver in 28nm CMOS". In: *2016 IEEE International Solid-State Circuits Conference* (*ISSCC*). 2016, pp. 62–63. DOI: 10.1109/ISSCC.2016.7417907 (cit. on p. 29). - HANUMOLU, P. K., G. Y. WEI, and U. K. MOON. "A Wide-Tracking Range Clock and Data Recovery Circuit". In: *IEEE Journal of Solid-State Circuits* 43.2 (2008), pp. 425–439. DOI: 10.1109/JSSC.2007.914290 (cit. on p. 32). - HANUMOLU, P. K. et al. "A 1.6Gbps Digital Clock and Data Recovery Circuit". In: *IEEE Custom Integrated Circuits Conference 2006*. 2006, pp. 603–606. DOI: 10.1109/CICC.2006.320829 (cit. on pp. 27, 28). - "HDMI Specification Version 1.3a". In: *HDMI Licensing, LLC, Sunnyvale, CA, USA* (2006) (cit. on p. 23). - HSIEH, M. and G. E. SOBELMAN. "Architectures for multi-gigabit wire-linked clock and data recovery". In: *IEEE Circuits and Systems Magazine* 8.4 (2008), pp. 45–57. DOI: 10.1109/MCAS.2008.930152 (cit. on p. 45). - HSIEH, M. T. and G. E. SOBELMAN. "Architectures for Multi-Gigabit Wire-Linked Clock and Data Recovery". In: *IEEE Circuits and Systems Magazine* 8.4 (2008), pp. 45–57. DOI: 10.1109/MCAS.2008.930152 (cit. on pp. 23, 25, 31). - IERSSEL, M. VAN et al. "A 3.2 Gb/s CDR Using Semi-Blind Oversampling to Achieve High Jitter Tolerance". In: *IEEE Journal of Solid-State Circuits* 42.10 (2007), pp. 2224–2234. DOI: 10.1109/JSSC.2007.905233 (cit. on p. 32). - INTI, R. et al. "A 0.5-to-2.5Gb/s Reference-Less Half-Rate Digital CDR with Unlimited Frequency Acquisition Range and Improved Input Duty-Cycle Error Tolerance". In: 2011 IEEE International Solid-State Circuits Conference. 2011, pp. 438–450. DOI: 10.1109/ISSCC.2011.5746387 (cit. on p. 33). - JALALI, M. S. et al. "A Reference-Less Single-Loop Half-Rate Binary CDR". In: *IEEE Journal of Solid-State Circuits* 50.9 (2015), pp. 2037–2047. DOI: 10.1109/JSSC.2015.2429714 (cit. on p. 27). - JANG, S. et al. "An Optimum Loop Gain Tracking All-Digital PLL Using Autocorrelation of Bang–Bang Phase-Frequency Detection". In: *IEEE Transactions on Circuits and Systems II: Express Briefs* 62.9 (2015), pp. 836–840. DOI: 10.1109/TCSII.2015.2435691 (cit. on pp. 95, 96). - JRI LEE, K. S. KUNDERT, and B. RAZAVI. "Modeling of jitter in bang-bang clock and data recovery circuits". In: *Proceedings of the IEEE 2003 Custom Integrated Circuits Conference, 2003.* 2003, pp. 711–714. DOI: 10.1109/CICC. 2003.1249492 (cit. on pp. 47, 51). - KIMURA, H. et al. "A 28 Gb/s 560 mW Multi-Standard SerDes With Single-Stage Analog Front-End and 14-Tap Decision Feedback Equalizer in 28 nm CMOS". - In: *IEEE Journal of Solid-State Circuits* 49.12 (2014), pp. 3091–3103. DOI: 10.1109/JSSC.2014.2349974 (cit. on p. 38). - KREIENKAMP, R. et al. "A 10-gb/s CMOS Clock and Data Recovery Circuit with an Analog Phase Interpolator". In: *IEEE Journal of Solid-State Circuits* 40.3 (2005), pp. 736–743. DOI: 10.1109/JSSC.2005.843624 (cit. on pp. 27, 28). - KROMER, C. et al. "A 25-Gb/s CDR in 90-nm CMOS for High-Density Interconnects". In: *IEEE Journal of Solid-State Circuits* 41.12 (2006), pp. 2921–2929. DOI: 10.1109/JSSC.2006.884389 (cit. on p. 32). - KUAN, T. and S. LIU. "A Bang Bang Phase-Locked Loop Using Automatic Loop Gain Control and Loop Latency Reduction Techniques". In: *IEEE Journal of Solid-State Circuits* 51.4 (2016), pp. 821–831. DOI: 10.1109/JSSC.2016. 2519391 (cit. on p. 96). - "A Loop Gain Optimization Technique for Integer-N TDC-Based Phase-Locked Loops". In: IEEE Transactions on Circuits and Systems I: Regular Papers 62.7 (2015), pp. 1873–1882. DOI: 10.1109/TCSI.2015.2423793 (cit. on p. 95). - KWON, S. et al. "An Automatic Loop Gain Control Algorithm for Bang-Bang CDRs". In: *IEEE Transactions on Circuits and Systems I: Regular Papers* 62.12 (2015), pp. 2817–2828. DOI: 10.1109/TCSI.2015.2495725 (cit. on p. 95). - LEE, Haechang et al. "Improving CDR Performance via Estimation". In: *2006 IEEE International Solid State Circuits Conference Digest of Technical Papers*. 2006, pp. 1296–1303. DOI: 10.1109/ISSCC.2006.1696177 (cit. on p. 35). - LEE, J. and M. LIU. "A 20Gb/s Burst-Mode CDR Circuit Using Injection-Locking Technique". In: 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. 2007, pp. 46–586. DOI: 10.1109/ISSCC.2007. 373580 (cit. on p. 28). - LEE, J., J. YOON, and H. BAE. "A 10-Gb/s CDR With an Adaptive Optimum Loop-Bandwidth Calibrator for Serial Communication Links". In: *IEEE Transactions* - on Circuits and Systems I: Regular Papers 61.8 (2014), pp. 2466–2472. DOI: 10.1109/TCSI.2014.2309861 (cit. on p. 69). - LEE, Jri, K. S. KUNDERT, and B. RAZAVI. "Analysis and Modeling of Bang-Bang Clock and Data Recovery Circuits". In: *IEEE Journal of Solid-State Circuits* 39.9 (2004), pp. 1571–1580. DOI: 10.1109/JSSC.2004.831600 (cit. on pp. 47, 51). - LEE, T., Y. H. KIM, and L. S. KIM. "A 5-Gb/s Digital Clock and Data Recovery Circuit With Reduced DCO Supply Noise Sensitivity Utilizing Coupling Network". In: *IEEE Transactions on Very Large Scale Integration (VLSI) Systems* PP.99 (2016), pp. 1–5. DOI: 10.1109/TVLSI.2016.2566927 (cit. on p. 28). - LEE, T. et al. "A 5-Gb/s 2.67-mW/Gb/s Digital Clock and Data Recovery With Hybrid Dithering Using a Time-Dithered Delta-Sigma Modulator". In: *IEEE Transactions on Very Large Scale Integration (VLSI) Systems* 24.4 (2016), pp. 1450–1459. DOI: 10.1109/TVLSI.2015.2449866 (cit. on p. 27). - LEIBOWITZ, B. S. et al. "A 7.5Gb/s 10-Tap DFE Receiver with First Tap Partial Response, Spectrally Gated Adaptation, and 2nd-Order Data-Filtered CDR". In: 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. 2007, pp. 228–599. DOI: 10.1109/ISSCC.2007.373377 (cit. on p. 35). - LI, YD. et al. "A 562F2 Physically Unclonable Function with a Zero-Overhead Stabilization Scheme". In: *ISSCC* (2019) (cit. on pp. 159, 169). - LIANG, J. et al. "A 28Gb/s Digital CDR With Adaptive Loop Gain for Optimum Jitter Tolerance". In: 2017 IEEE International Solid-State Circuits Conference (ISSCC). 2017, pp. 122–123. DOI: 10.1109/ISSCC.2017.7870291 (cit. on pp. 34, 95, 97, 105, 115, 118). - LIANG, J. et al. "Loop Gain Adaptation for Optimum Jitter Tolerance in Digital CDRs". In: *IEEE Journal of Solid-State Circuits* 53.9 (2018), pp. 2696–2708. - DOI: 10.1109/JSSC.2018.2839038 (cit. on pp. 69, 95, 97, 104, 105, 110, 114, 115). - LUCHINSKY, D. G. et al. "Stochastic Resonance in Electrical Circuits. I. Conventional Stochastic Resonance". In: *IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing* 46.9 (1999), pp. 1205–1214. DOI: 10.1109/82.793710 (cit. on pp. 57, 58). - MARUCCI, G. et al. "Exploiting Stochastic Resonance to Enhance the Performance of Digital Bang-Bang PLLs". In: *IEEE Transactions on Circuits and Systems II: Express Briefs* 60.10 (2013), pp. 632–636. DOI: 10.1109/TCSII. 2013.2273732 (cit. on pp. 56–58). - MATHEW, S. K. et al. "A 0.19pJ/b PVT-Variation-Tolerant Hybrid Physically Unclonable Function Circuit for 100% Stable Secure Key Generation in 22nm CMOS". In: *ISSCC* (2014) (cit. on pp. 159, 166, 169). - MCDONNELL, M. D. "Is Electrical Noise Useful? [Point of View]". In: *Proceedings* of the IEEE 99.2 (2011), pp. 242–246. DOI: 10.1109/JPROC.2010.2090991 (cit. on p. 57). - MUELLER, K. and M. MULLER. "Timing Recovery in Digital Synchronous Data Receivers". In: *IEEE Transactions on Communications* 24.5 (1976), pp. 516–531. DOI: 10.1109/TCOM.1976.1093326 (cit. on p. 24). - NAVID, R. et al. "A 40 Gb/s Serial Link Transceiver in 28 nm CMOS Technology". In: *IEEE Journal of Solid-State Circuits* 50.4 (2015), pp. 814–827. DOI: 10.1109/JSSC.2014.2374176 (cit. on p. 38). - NEDOVIC, N. et al. "A 40-44 Gb/s 3 times; Oversampling CMOS CDR/1:16 DE-MUX". In: *IEEE Journal of Solid-State Circuits* 42.12 (2007), pp. 2726–2735. DOI: 10.1109/JSSC.2007.908714 (cit. on pp. 27, 28). - OH, D. H. et al. "A 2.8Gb/s All-Digital CDR with a 10b Monotonic DCO". In: *2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers*. 2007, pp. 222–598. DOI: 10.1109/ISSCC.2007.373374 (cit. on p. 36). - PAN, H. et al. "A Digital Wideband CDR with +/-15.6kppm Frequency Tracking at 8Gb/s in 40nm CMOS". In: *2011 IEEE International Solid-State Circuits Conference*. 2011, pp. 442–444. DOI: 10.1109/ISSCC.2011.5746389 (cit. on p. 36). - PANG, Y. et al. "A Reconfigurable RRAM Physically Unclonable Function Utilizing Post-Process Randomness Source With <6x10<sup>6</sup> Native Bit Error Rate". In: *ISSCC* (2019) (cit. on pp. 159, 168, 169). - "PCI Express Base 2.1 Specification". In: *PCI-SIG, Beaverton, OR, USA* (2009) (cit. on p. 23). - PERROTT, M. H. et al. "A 2.5Gb/s Multi-Rate 0.25/spl mu/m CMOS CDR Utilizing a Hybrid Analog/Digital Loop Filter". In: *2006 IEEE International Solid State Circuits Conference Digest of Technical Papers*. 2006, pp. 1276–1285. DOI: 10.1109/ISSCC.2006.1696175 (cit. on p. 31). - POZZONI, M. et al. "A Multi-Standard 1.5 to 10 Gb/s Latch-Based 3-Tap DFE Receiver With a SSC Tolerant CDR for Serial Backplane Communication". In: *IEEE Journal of Solid-State Circuits* 44.4 (2009), pp. 1306–1315. DOI: 10.1109/JSSC.2009.2014203 (cit. on p. 38). - RAHMAN, W. et al. "6.6 A 22.5-to-32Gb/s 3.2pJ/b Referenceless Baud-rate Digital CDR with DFE and CTLE in 28nm CMOS". In: *2017 IEEE International Solid-State Circuits Conference (ISSCC)*. 2017, pp. 120–121. DOI: 10.1109/ISSCC.2017.7870290 (cit. on p. 34). - RASZKA, J. et al. "Embedded Flash Memory for Security Applications in a $0.13\mu m$ CMOS Logic Process". In: *ISSCC* (2004) (cit. on p. 160). - RAVIKUMAR, S. "Circuit Architectures for High Speed CMOS Clock and Data Recovery Circuits". In: *Master Thesis, University of Illinois at Urbana-Champaign* (2015) (cit. on p. 22). - RAZAVI, B. "Challenges in the Design High-Speed Clock and Data Recovery Circuits". In: *IEEE Communications Magazine* 40.8 (2002), pp. 94–101. DOI: 10.1109/MCOM.2002.1024421 (cit. on p. 25). - Design of Integrated Circuits for Optical Communications. 2nd. Wiley, 2012 (cit. on pp. 23, 46). - RENNIE, D. and M. SACHDEV. "A 5-Gb/s CDR Circuit With Automatically Calibrated Linear Phase Detector". In: *IEEE Transactions on Circuits and Systems I: Regular Papers* 55.3 (2008), pp. 796–803. DOI: 10.1109/TCSI.2008.916400 (cit. on p. 26). - RODONI, L. et al. "A 5.75 to 44 Gb/s Quarter Rate CDR With Data Rate Selection in 90 nm Bulk CMOS". In: *IEEE Journal of Solid-State Circuits* 44.7 (2009), pp. 1927–1941. DOI: 10.1109/JSSC.2009.2021913 (cit. on p. 33). - SANTAMARIA, J. et al. "A Family of Compact Trim-Free CMOS Nano-Ampere Current References". In: *2019 IEEE International Symposium on Circuits and Systems (ISCAS)*. 2019, pp. 1–4. DOI: 10.1109/ISCAS.2019.8702294 (cit. on p. 128). - SARMENTO, J. and J. T. STONICK. "A Minimal-Gate-Count Fully Digital Frequency-Tracking Oversampling CDR Circuit". In: *Proceedings of 2010 IEEE International Symposium on Circuits and Systems*. 2010, pp. 2099–2102. DOI: 10.1109/ISCAS.2010.5537061 (cit. on p. 27). - SAXENA, S. et al. "A 2.8 mW/Gb/s, 14 Gb/s Serial Link Transceiver". In: *IEEE Journal of Solid-State Circuits* 52.5 (2017), pp. 1399–1411. DOI: 10.1109/JSSC.2016.2645738 (cit. on pp. 24, 39). - "Serial ATA Revision 3.0 Specification". In: *SATA-IO Administration, Beaverton, OR, USA* (2009) (cit. on p. 23). - SHU, G. et al. "8.7 A 4-to-10.5Gb/s 2.2mW/Gb/s continuous-rate digital CDR with automatic frequency acquisition in 65nm CMOS". In: *2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC)*. 2014, pp. 150–151. DOI: 10.1109/ISSCC.2014.6757377 (cit. on p. 33). - SHU, G. et al. "A Reference-Less Clock and Data Recovery Circuit Using Phase-Rotating Phase-Locked Loop". In: *IEEE Journal of Solid-State Circuits* 49.4 (2014), pp. 1036–1047. DOI: 10.1109/JSSC.2013.2296152 (cit. on p. 33). - SONNTAG, J. L. and J. STONICK. "A Digital Clock and Data Recovery Architecture for Multi-Gigabit/s Binary Links". In: *IEEE Journal of Solid-State Circuits* 41.8 (2006), pp. 1867–1875. DOI: 10.1109/JSSC.2006.875292 (cit. on pp. 27, 31, 35, 45, 52, 56, 67, 95). - STRANG, G. *Linear Algebra and Its Applications*. 4th. Thompson, 2006 (cit. on p. 154). - T. BARNETT J, et al. "Cisco Visual Networking Index (VNI): Complete Forecast Updated, 2017-2022". In: APJC Cisco Knowledge Network (CKN) (2018) (cit. on p. 20). - TALEGAONKAR, M., R. INTI, and P. K. HANUMOLU. "Digital Clock and Data Recovery Circuit Design: Challenges and Tradeoffs". In: *2011 IEEE Custom Integrated Circuits Conference (CICC)*. 2011, pp. 1–8. DOI: 10.1109/CICC. 2011.6055346 (cit. on pp. 31, 45, 56, 95). - TING, C. et al. "A Blind Baud-Rate ADC-Based CDR". In: 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers. 2013, pp. 122–123. DOI: 10.1109/ISSCC.2013.6487664 (cit. on pp. 27, 34). - TYSHCHENKO, O. "Clock and Data Recovery for High-Speed ADC-based Receivers". In: *PhD Thesis, University of Toronto* (2011) (cit. on pp. 22, 26). - TYSHCHENKO, O. et al. "A Fractional-Sampling-Rate ADC-based CDR with Feedforward Architecture in 65nm CMOS". In: *2010 IEEE International Solid-State Circuits Conference (ISSCC)*. 2010, pp. 166–167. DOI: 10.1109/ISSCC. 2010.5434004 (cit. on p. 27). - "Universal Serial Bus 3.1 Specification." In: *Revision 1.0* (2013) (cit. on pp. 23, 42, 46, 49, 56, 63, 104, 117). - VIJAYAKUMAR, Arunkumar, Vinay C. PATIL, and Sandip KUNDU. "On Improving Reliability of SRAM-Based Physically Unclonable Functions". In: *Journal of Low Power Electronics and Applications* 7.1 (2017). DOI: 10.3390/jlpea7010002 (cit. on pp. 164, 166). - WU, G. et al. "A 1-16-Gb/s All-Digital Clock and Data Recovery With a Wideband, High-Linearity Phase Interpolator". In: *IEEE Transactions on Very Large Scale Integration (VLSI) Systems* PP.99 (2016), pp. 1–1. DOI: 10.1109/TVLSI.2015. 2418277 (cit. on p. 28). - WU, M. et al. "A PUF Scheme Using Competing Oxide Rupture with Bit Error Rate Approaching Zero". In: *ISSCC* (2018) (cit. on pp. 160, 167, 168). - YIN, W. et al. "A TDC-Less 7mW 2.5Gb/s Digital CDR with Linear Loop Dynamics and Offset-Free Data Recovery". In: *2011 IEEE International Solid-State Circuits Conference*. 2011, pp. 440–442. DOI: 10.1109/ISSCC.2011.5746388 (cit. on p. 28). - ZARGARAN-YAZD, A. and W. T. BEYENE. "Discrete-Time Modeling and Simulation Considerations for High-Speed Serial Links". In: *2014 IEEE 23rd Conference on Electrical Performance of Electronic Packaging and Systems*. 2014, pp. 165–168. DOI: 10.1109/EPEPS.2014.7103624 (cit. on pp. 31, 45, 46, 51, 59, 81, 82). ## **APPENDICES** ## APPENDIX A. ANALOG FRAMEWORK AND SATELLITE PROJECTS ## **HIGH-SPEED SERIAL INTERFACE** # **General specifications** - Data rate: 5Gb/s. Associated with the USB 3.1 Gen1 standard protocol. - Channel Loss: 26dB. It will define the minimum input signal and allowed offset. - Quad-rate architecture. Due to technology constraints and desired speed. - In order to adapt loop parameters we desire a configurable CDR. # **Constraints** - Technology: 0.18um CMOS node. - Temperature range: -40 to 125 °C - Several power domains (all with +/-10% in variations) ## **CLOCK AND DATA RECOVERY SYSTEM** Clock and data recovery architecture is shown in Fig. 62. It is composed of a quadrate CDR scheme with a phase interpolator for timing adjustment. XCALG method is included as part of the digital logic of the configurable CDR. #### **HSS Interface** Front-End CDR **CMOS** SPI Controller buffers Digital **ESD** CTLE Analog logic CML buffers **BIAS XCALG Power Domains** VDDA: Analog 1.8V VDD3A: Analog 3.3V VDD:Digital 1.8V **VDDAIO:** I/O 1.8V **VDDHSCK**: CML 1.8V $\textbf{VSS} \to \textbf{Global ground}$ Figure 61. High-speed interface diagram system and power domains. Figure 62. Simplified schematic of the clock and data recovery. Figure 63. Sampler circuit schematic. Figure 64. Amplifier and comparator schematic. # SAMPLER CIRCUITS The sampler circuit shown in Fig. 63 is composed of a pre-amplifier, an amplifier with a comparator, inverter buffers, and an SR latch at the output. Circuit details of the amplifier and comparator are shown in Figs. 64 and 65. ## **ALIGNERS** After the samplers, aligners circuits are necessary for proper alignment of the incoming sampled data. The schematic for the aligners for both data and edges is shown in Fig. 66. # **DESERIALIZER** In order to operate the CDR logic in a lower frequency, extra deserialization is performed by the deserializer circuit. The deserializer circuit is composed of two cascaded layers of the unit cell shown in Fig. 67, which perform a total of extra 4 to 16 deserialization (equivalent to 1 to 4 for each incoming data or edge line). Figure 65. StrongArm cell circuit. Figure 66. Aligners circuits. Figure 67. Deserializer 1 to 2 unit cell. Figure 68. D-Flipflop circuit schematic. Figure 69. Phase interpolator circuit. Figure 70. CML to CMOS cell. # PHASE INTERPOLATOR The phase interpolator (PI) architecture is shown in Fig. 69 which implements a current mode logic scheme for phase mixing according to the digital inputs control (\*\_CTRL signals). The PI output is in a CML format with a low voltage swing. In order to drive all the clock signals in the system, a CML to CMOS converter is necessary. Fig. 70 shows the CML-to-CMOS schematic. ## APPENDIX B. NONLINEAR LAPLACIAN SPECTRAL ANALYSIS - NLSA Nonlinear Laplacian Spectral Analysis (NLSA) is a data-analytical technique which could become a potential tool for using in communications systems [3]. Fundamentally, the usage of NLSA approach in clock and data recovery circuits could improve the performance of the high speed serial communication interfaces if the main issues are recognized and overcome. For this reason, it is mandatory to understand the concept of high speed interfaces and to review the state of the art of CDR applications. All of this, compose the first part of this essay. Then, the advantages and main challenges in a NLSA-based CDR implementation is discussed. This is a novel idea and there are not previous works using NLSA in communications systems, moreover, do not exist prior electronics applications. Finally, a conclusion section ends this essay. In general, there are several approaches that improve one or more of the CDR specifications. Some architectures are very elaborate and others simple by efficient solutions in terms of hardware with a dominant trend moving towards digital deployments. In the following section, NLSA method is introduced as a novel and completely different approach with the purpose of extend even more the performance of communication systems. # **NLSA TOWARDS A CDR IMPLEMENTATION** NLSA is a technique which demands some background in several topics of linear algebra and functional analysis <sup>82</sup>. In order to understand the big picture be- D. GIANNAKIS and A. J. MAJDA. "Nonlinear Laplacian spectral analysis for time series with intermittency and low-frequency variability". In: *Proceedings of the National Academy of Sciences of the United States of America* 109.7 (2011), pp. 2222–2227. DOI: 10.1073/pnas. 1118984109. hind the NLSA technique is required knowledge on: singular value decomposition (SVD) <sup>8384</sup>, singular spectrum analysis (SSA) <sup>85</sup>, manifolds, empirical orthogonal eigenfunctions (EOFs) <sup>83</sup>, diffusion maps and laplace-beltrami operator. Basically, NLSA is a generalization of SSA, an alternative time series method which requires decomposition, reconstruction and the usage of SVD. The objective is to decompose the signal into spatial and temporal modes, in order to observe the underlying system dynamics. SSA is an efficient linear algebraic method for data analysis, which is efficient when the data cloud defines a flat, low dimensional hyper surfaces. This method uses SVD and allows to extract principal components from linear systems. However, the intrinsic nature for many systems of interest, including the CDR, is not linear; instead, geometrically data from such systems give rise to intrinsic evolutionary non linear systems. NLSA is capable of extracting the dynamics evolution in both linear and non-linear systems and it is very appropriate where the noise and timing jitter are relevant. Two examples for clean and noisy signals are shown in Fig. 73 and 72 respectively <sup>86</sup>. In the first case, there is a jitter-free signal which is perfectly reconstructed using the NLSA technique. For the second scenario, a jitter noise <sup>83</sup> G. ESHEL. *Spatiotemporal Data Analysis*. Princeton University Press, 2012. <sup>&</sup>lt;sup>84</sup> G. STRANG. *Linear Algebra and Its Applications*. 4th. Thompson, 2006. N. GOLYANDINA and A. ZHIGLJAVSKY. Singular Spectral Analysis for Time Series. Springer, 2013. These figures were taken and adapted from supplementary information of the study presented in (R. FUNG et al. "Dynamics from noisy data with extreme timing uncertainty". In: *Nature* [2016], pp. 471–475. DOI: 10.1038/nature17627). This information was given directly by the authors. characterized by Gaussian noise with $\sigma = 50 fs$ is added to the signal in order to feed the NLSA algorithm. Results obtained with the dominant signal modes represent a faithful reproduction of the input signal. Figure 71. NLSA reconstruction of a jitter-free signal. Taken from the complementary information presented in <sup>87</sup>. These examples suggest that NLSA seems to be a suitable solution for signal recovery systems because the clock and data signal suffer of jitter noise. The main advantages of find implementations of NLSA in communications circuits fall into the revealing of the timing evolution which would allow to improve the precision into the clock and data phase alignment. Nevertheless, implementing this kind of approach in an analog, digital or even hybrid CDR architecture could be very challenging. First of all, due to the nature of the analytical approach, NLSA could not be applied for real-time recovery because this technique requires to have several amount of data available for processing. An immediate consequence of this, is a power consumption issue in a hardware implementation. Power consumption would be related with the sampling process Figure 72. NLSA reconstruction of a signal corrupted by Gaussian jitter with $\sigma = 50 fs$ . Taken from the complementary information presented in <sup>88</sup>. used to get the input data for NLSA. Independent whether the signal is capture in the analog or digital domain, the amount of information necessary to feed the NLSA algorithm suggest that a practical hardware implementation could spend a lot of circuitry: high speed ADCs for analog or several flip-flops for digital sampling. Thus, a strong study and validation would be necessary to compare the advantages and penalties of a hardware implementation. Another issue is the computational cost. The algorithm itself expends several intermediate calculations which involves searching, projections, and hyper data calculations. These calculations can be done using the backend of the system (a DSP or microprocessor) because de CDR is not a stand alone system. The problem with the amount of steps to reconstruct the signal may mean a lot of delay time between the start and the end of the NLSA method and thus, hindering a real-time operation. Despite the challenges, it is important to note that it does not exist any applica- Figure 73. a) Proposed CDR scheme using NLSA processing, b). Phase signals (in UI) in the system: data jitter (blue), recovered clock phase (red), jitter error signal (green), and ideal recovery clock phase (black). tion of such technique in electronics circuits, then it is not possible to quantify the real trade-off between power, precision and functionality until the first hardware implementation is designed. In addition, advantages of NLSA method could be exploited in post-processing analysis. One of the plausible applications could be the instrumentation field where it is more relevant high precision in the measurements of the signal and no the real-time availability of data. Also, it is necessary to change the conventionality CDR architecture and think in new full novel ideas which provide the way to develop NLSA-based CDRs without incurring significantly on the drawbacks mentioned before. Instead, postprocessing deployments seem to be the correct way to exploit the nature of NLSA. ## CONCLUSION NLSA is an outstanding technique that allows to extract timing information about the underlying dynamics of complex nonlinear systems. Predict and understand such systems, allows to extract relevant modes that represent the temporal evolution in the noisy signals and this make NLSA suitable, al least by its features, for using in the tracking process between the data and clock signals in CDR circuits. The implementation of such NLSA-based CDR could filter the jitter noise and even could improve the CDR response under unexpected perturbations in the system. # APPENDIX C. NVRAM-BASED STABLE PHYSICALLY UNCLONABLE FUNC-TION Encryption key generation and authentication applications have spurred architecture research towards silicon physical unclonable functions (PUFs) <sup>899091</sup>. New PUF designs have become a potential alternative to traditional approaches of storing keys in conventional embedded non-volatile random access memory (NVR). Conventional NVRs are expensive since they require additional mask layers and fabrication processing steps, apart from licensing costs. Additionally, the NVR process lag behind leading advanced CMOS nodes, which forces off-chip NVR dies, jeopardizing security vulnerability. Several reported PUF design methods acquire key outputs by amplifying signals from random physical properties <sup>91</sup>. Among these properties, the most common are propagation delays, ring oscillator jitter, and latch metastability. Unfortunately, these random variations in physical devices are sensitive to deterministic environmental conditions, such as process, supply voltage, and temperature (PVT). Generated keys using these methods are unstable and easily biased by external variables, resulting in unreliable PUF keys. In this paper, we demonstrate a standard CMOS floating-gate PUF (FGPUF) design that enables a net random variation source taking advantage of the storing of a floating-gate NVR (FGNVR) cell. The FGPUF is implemented in a standard CMOS logic process with no additional masks. By randomly storing generated S. K. MATHEW et al. "A 0.19pJ/b PVT-Variation-Tolerant Hybrid Physically Unclonable Function Circuit for 100% Stable Secure Key Generation in 22nm CMOS". in: *ISSCC* (2014). <sup>&</sup>lt;sup>90</sup> Y. PANG et al. "A Reconfigurable RRAM Physically Unclonable Function Utilizing Post-Process Randomness Source With <6x10<sup>6</sup> Native Bit Error Rate". In: *ISSCC* (2019). YD. LI et al. "A 562F2 Physically Unclonable Function with a Zero-Overhead Stabilization Scheme". In: ISSCC (2019). keys and locking the FGNVR cells electrically from writing, we achieved a stable PUF design reliable enough for encryption key generation and authentication. Measurement results from an implemented array in a $0.18\mu m$ standard CMOS technology suggest the potential application on low-power systems regarding the low current consumption of the FGNVR in a reading condition. Along with the work in $^{92}$ that introduces a different radical scheme to overcome the stability issue, this work brings back the potential of FGNVR as a solution for PUF designs in standard CMOS technologies. This work indicates a path towards employing FGNVR arrays not just for storing regular data, but also for generating stable encryption keys. ## PROPOSED PUF CELL The PUF key bits are obtained by converting the competing result of two tunneled floating gates by the Fowler-Nordheim mechanism. Fig. 74 depicts a differential FGNVR cell composed of a duplicated branch of two transistors working as capacitors $M_{1-4}$ , and readout transistors $M_{5,6}$ <sup>93</sup>. The amount of charge residing on each FG of Fig. 74 depends on intrinsic random differences. Accumulated charge differences come from equal gate exposition to ionization during the fabrication process, and from the gate dielectric construction for unprogrammed cells. Charges on $FG_1$ and $FG_2$ may also be modulated by exposing both floating nodes to high voltage during the programming mode. In this work, we demonstrate measurements from raw cells after fabrication or unprogrammed cells. \_ M. WU et al. "A PUF Scheme Using Competing Oxide Rupture with Bit Error Rate Approaching Zero". In: ISSCC (2018). J. RASZKA et al. "Embedded Flash Memory for Security Applications in a $0.13\mu m$ CMOS Logic Process". In: *ISSCC* (2004). Figure 74. Differential FGNVR cell concept. Figure 75. Bitcell schematic of the FGNVR and operation during reading, programming and stand-by/locking process. **PUF Bitcell** The competing mechanism of accumulated charges in the floating gates, during the fabrication process, produces a differential current $I_{out2} - I_{out1}$ when the cell is read, as depicted in Fig. 74. This differential current is amplified and latched by a sense amplifier during the reading mode. The differential circuit evaluates one of two stable values, determined by the relative strengths of charge accumulation/reduction on both floating gates, and the mismatch of latch transistors. Fig. 75 shows details of the bitcell architecture of the FGNVR during the reading process. The bitcell is selected for reading by setting EN to 0V. For the sake of completeness, Fig. 75 also shows the programming and standby mode of the FGNVR cell. All employed devices are I/O thick-oxide devices to withstand larger voltage operation. **Sense Amplifier** In reading mode, the current through branches BL and BLB are converted into output voltages by the latch comprised of $M_{15}$ - $M_{18}$ transistors as Fig. 76 depicts. Regarding that accumulated charge differences in the FG Figure 76. Detailed schematic and operation of the sense amplifier. nodes after fabrication may be small, output nodes should exhibit a large resistance and a latching mechanism to capture these small differences. Before the reading mode, BL, BLB, and output nodes are discharged to 0V by asserting the SAEN node. Fig. 76 illustrates operation details of the sense amplifier when output node OUTB charges up to VDD level considering a case for larger branch current, in opposition to the discharge of node OUT to 0V after latch decision. # **IMPLEMENTED PUF MACRO** We built a 2x16 FGPUF macro with different cell sizes. This macro includes one level shifter per row and one sense amplifier per column. The block diagram of the macro is shown in Fig. 77. The first row is selected for reading by setting WL0 to VDD level and EN0 to 0V. After asserting SAEN and discharging BL, BLB, OUT, and OUTB to 0V, output data is sampled and stored in internal chip registers by 16-bit nibbles. Registers are externally read by using ports of a JTAG interface built within the test chip. Figure 77. Block diagram of the FGPUF proposed macro. Table 9. MOSFET size in each type of cell | Devices* | | W | | |-----------------|---------------------|-------------------------------|--| | FGNVR Cells** | 3X | $M_1, M_2 = 3.6 \mu \text{m}$ | | | | 2.5X | $M_1, M_2 = 3 \mu \text{m}$ | | | | 2X | $M_1, M_2 = 2.4 \mu \text{m}$ | | | | 1.5X | $M_1, M_2 = 1.8 \mu \text{m}$ | | | | 1X | $M_1, M_2 = 1.2 \mu \text{m}$ | | | Sense Amplifier | $M_{11}$ - $M_{16}$ | 1 $\mu$ m | | | | $M_{17}, M_{18}$ | $2\mu m$ | | <sup>\*</sup> All devices have L = 300nm. **Transistor sizing** The FGPUF macro contains cells with five different cell sizes, associated with the sizing of transistors $M_1$ and $M_2$ operating as coupling capacitors. By changing the size of the coupling capacitor, we modulated the electric field applied to the tunneling devices $M_3$ - $M_6$ , and therefore, the capacity to tunnel charges across it. Table 9 provides the different employed transistor sizes for the five different implemented FGNVR bitcells. **Programming mode and locking/stand-by mode** Once the PUF keys are read for the first time, some of them may present insufficient net random variation to produce a stable PUF bit. Voltage, temperature, and thermal noise make unsteady bits to resolve to either logic value 0 or 1 during readings. After reading, we apply both temporal majority voting (TMV) and UP/DOWN-counter based <sup>\*\*</sup> $M_3$ - $M_{10}$ in all cells have W = 300nm. Figure 78. FGPUF testing: a) Micrograph of the FGPUF on the test chip and detailed layout; b) Testboard for the FGPUF macro. method (UDC) <sup>94</sup> separately to recurring readings of raw data as methods to stabilize generated bits and obtain reliable key values. Stable results are programmed back into the FGNVR cells to enhance the net random variations and produce a stable PUF key. To perform programming operation, WL nodes are set to ground, enable node EN is asserted to VDD level, TG node is set to a high programming voltage (e.g., 10V), as well as the CG-CGB nodes are set to high voltage accordingly to the value to write. In regular stand-by operation, a mechanism to evade re-writing the key values is required to shield against possible writing attacks. To avoid changing the cell values, $M_7$ and $M_8$ are set in accumulation to absorb possible tunneling charges in case an attacker may get access to write. Since $M_7$ and $M_8$ are biased in accumulation, tunneling would appear first across $M_7$ and $M_8$ instead of $M_5$ and $M_6$ devices. # **MEASUREMENT RESULTS** The FGPUF macro is fabricated in a $0.18\mu m$ standard CMOS technology. The FGPUF macro occupies a $100\mu m \times 188\mu m$ area including the accessory circuitry. - Arunkumar VIJAYAKUMAR, Vinay C. PATIL, and Sandip KUNDU. "On Improving Reliability of SRAM-Based Physically Unclonable Functions". In: *Journal of Low Power Electronics and Applications* 7.1 (2017). DOI: 10.3390/jlpea7010002. Figure 79. Raw unstable bits: a) Measured raw unstable bit percentage across 8 chips at nominal conditions; b) Measured raw unstable bit percentage versus $V_{DD}$ variations. Along with the FGPUF, a microprocessor was implemented and employed to register generated PUF key values. Fig. 78(a) shows a micrograph of the FGPUF location within the test chip beside, and layout details of the macro. The testing setup board is shown in Fig. 78(b) highlighting the external connections associated with the macro test. Measured Raw Keys We report chip measurements after fabrication. The measured data in this subsection represents key outputs before additional post-processing. We examined the impact of noise and insufficient random variations on the differential FGNVRs by periodically reading PUF key outputs. We counted the number of occasionally flipping or unstable bits from the repeated measured PUF bitcells to account for stability. Fig. 79(a) shows the percentage of unstable bits for 1000 consecutive readings. We read the key output of eight FGPUF chips 1000 times and then average the readings for each chip, under 3.3V nominal supply voltage and 27C. To further estimate how noise impact on stability, we plotted the unstable number of bits against VDD variations for one of the chips, as shown in Fig. 79(b). As expected, for large supply voltage, noise impact on stability decreases. For 1000 consecutive readings of a PUF key, we found the average percentage of flipping bits is under 4% in a raw chip after fabrication. Figure 80. Normalized Hamming Weight of PUF keys across 8 different chips. We estimated output PUF key randomness by quantifying Hamming weight (HW). For an ideal HW, the number of '1's should be equal to the number of '0's. We calculated the numbers of '1's of each generated bitstream for the eight measured chips, and found that the average normalized HW is 0.52 with $\sigma$ = 0.094. Fig 80 shows the measured HW across eight available chips with 32b. Average measurement current of the bitcell during readings is under the nA range. Post-processed PUF Keys We adopted a temporal majority voting (TMV) $^{89}$ , and UP/DOWN-counter (UDC) based $^{94}$ schemes to stabilize raw PUF output key readings. TMV stabilize noisy bits by computing the quantized mode of key bit responses within evaluations for odd voting windows. Fig. 81 shows the TMV evaluations of 5, 7, 11 voting samples for 1000 PUF key readings. Unlike the TMV approach where fixed trials are perform, the UDC method can be run indefinitely till the PUF value is resolved. UDC method, used a m-bit counter with initial value of $2^{m-1}$ and perform successive TMV evaluations. If the TMV result is a logic 1, then the counter number is increased by one, otherwise it is decreased. Fig 82 shows the evolution of the UDC applied for all bits when solving PUF key for one of the chips. A 9-bit counter is running and each bit reaches one final value. In addition, monotonic lines allow us to detect bits with better stability, which allow us to update the design for the non-monotonic bits, improving the design of future FGPUF implementations. Figure 81. Measured raw unstable bit percentage after the application of TMV in a single chip. Figure 82. Bit count decision for every bit in 1000 readings of a 32 bits cell. Note that UDC allow us to solve all bits completely and for this reason it is the selected method for post-processing PUF keys. Applying UDC for all sample chips, we solved the golden PUF keys outputs. Fig. 83 shows the bitmap of the final PUF keys obtained for our eight chips. Alternative Stabilization Methods and Discussion A different approach has been reported recently, which employs an extreme alternative to solve the stability issue completely. In <sup>92</sup>, authors harden the key bits by using the oxide breakdown mechanism in a transistor pair NVR cell. However, data may be acquired by applying imaging techniques regarding the physical changes created by the applied physical stress to break the oxide. A different and common approach to further stabilize noisy bits is burn-in hardening. Burn-in is a conventional test procedure of subjecting chips at high temperatures to classify chips predisposed to fail in the field. Burn-in hardening may Figure 83. Final PUFs keys bitmap. further reduce bit flipping by accelerating aging, which translates to enhance accumulation differences on the floating gates. However, burn-in may not be enough to stabilize all cells with insufficient random differences on the floating gates. Here we report 100% stable and reliable PUF key bits by taking advantage of the inherent memory of the PUF bitcell. After post-processing the PUF keys using the UDC method, we write back the obtained keys to the memory cells by using the programming scheme. Hardened keys by programming them into the memory are protected electrically against writing, as discussed in section 6.4. Similar recent works produce reliable PUF approaches by resistive random access memory technology <sup>90</sup>, and one-time programmable cells by employing the oxide rupture mechanism <sup>92</sup>. In contrast to oxide breakdown <sup>92</sup>, hardened PUF bits do not show any physical changes that may be exploited by imaging techniques. Keys of programmed PUF bitcells are 100% stable and reliable regarding the insensitivity to environmental factors. Table summarizes measurements results and compares with prior art. ## **CONCLUDING REMARKS** PUFs are a popular and low-cost alternative for secure keys and chips ID generation. In this work, a stable PUF bitcell based on CMOS non-volatile random access memory is demonstrated. Using a simple function as counting for post-processing the raw data, 100% stable key outputs are obtained. Obtained keys Table 10. Measured Perfomance Comparison. | | This Work | 95 | 96 | 97 | |---------------------------------|--------------|--------------|-------------------------------|-------------------| | Technology | $0.18 \mu m$ | $0.13 \mu m$ | 22nm | 65nm | | Туре | NVR | RRAM | Hybrid | Static | | Raw Unst. Bits | 4% | NA | 25% | 2.95% | | (Readings) | (1000) | | (1000) | (2000) | | Stabilization<br>Method | UDC<br>NVR | Burn-in | TMV<br>+Burn-in<br>+Dark Bits | TMV<br>EVB | | Unst. Bits<br>After Stab. | 0% | 0% | 0% | 0.024% | | Hamm. Weight | 52% | 50% | 51% | 50% | | Voltage Typ. | 3.3V | 1.8V | V8.0 | 1.2V | | Bit Evaluation<br>Current-Range | 100nA | 500nA | 500nA | NA | | Reconfigurable | Yes | Yes | No | No | | Write-Protection | Yes | No | - | - | | Bit Cell Area | 20μm² | 2.86µm² | 4.66μm² | 562F <sup>2</sup> | are hardened to the memory cells by using the programming scheme, and the macro memory is protected electrically against writing. In contrast to recent stable memory-based PUFs, the proposed scheme is resistant to imaging techniques and does not require additional mask layers.