# A 28-Gb/s Transmitter with 3-tap FFE and T-coil Enhanced Terminal in 65-nm CMOS Technology

Naiwen Zhou, Linghan Wu, Ziqiang Wang, Xuqiang Zheng, Weidong Cao, Chun Zhang, Fule Li, Zhihua Wang,

Institute of Microelectronic Tsinghua University Beijing China Email: wangziq@tsinghua.edu.cn

*Abstract*—This paper introduces a fully-integrated wireline transmitter operating at 28Gb/s. The transmitter incorporates a 3-tap Feed-Forward-Equalizer (FFE) with Flipflop-based delay to equalize the channel. T-coil networks are used with ESD protection circuits at transmitter's output to realize impedance matching and bandwidth enhancement. The transmitter is fabricated in 65nm CMOS technology. The measurement results show that, when the chip delivers 28Gb/s PRBS7 data over a 5cm PCB trace with a 14dB channel loss at 14GHz, the output peakto-peak jitter is 19.34ps. The power consumption is 124 mW under 1.0V supply, and the core area is 0.18 mm<sup>2</sup>.

### Keywords—SerDes; Transmitter; FFE; T-coil; High-Speed

#### I. INTRODUCTION

High-speed SerDes circuits continue to play important roles in today's wire-line communications. Over the past decades, the data rate of optical and electrical links have been pushed from kb/s toward tens of Gb/s. Such an ultra-high bandwidth inspires lots of applications, including network storage, multi-core processor with aggregate I/O bandwidth, and Big Data application. For example, 100 Gb/s Ethernet has been fully investigated [1]. These applications invoke research on high-speed transceivers.



Fig. 1. (a) Measured insertion loss of a 20-cm Rogers channel. (b) Eyediagram when applying a 30-Gb/s data into it (horizontal scale: 5 ps/div, vertical scale: 100 mV/div). (c) Single-bit response of a 40-Gb/s data passing through a 20-cm channel (horizontal scale: 20 ps/div, vertical scale: 100 mV/div).

However, there are many difficulties in realizing ultrahigh speed transmitter. First of all, the channel loss at high frequency is significant, which would cause inter symbol interference (ISI) and worsen the bit error rate (BER). Fig. 1(a) shows the insertion loss of a 20-cm Rogers channel (designed as a 50  $\Omega$  transmission line). Even for such a low-loss material, a 20-cm channel still present 10dB loss at 15 GHz. Applying a 30Gb/s random data into the channel, we have a corresponding time-domain waveform at the far-end as shown in Fig. 1(b). As expected, the eye diagram is fully closed. Actually, the 30Gb/s data eye would vanish if the channel is longer than 5 cm (-12dB loss at Nyquist frequency). A single-bit response for a 20-cm Rogers channel is also introduced in Fig. 1(c). The far-end pulse is heavily attenuated and distorted [2]. What's more, the bandwidth and impedance matching at the transmitter's output are a big issue as the data rate is up to 20Gb/s. For example, the reflection and cross talk caused by impedance mismatch would further worsen the situation.

The author presents a 5-tap FFE with LC-based delay lines to improve the bandwidth at the expense of a huge area[2]. A 1UI delay cell is achieved by using a five stage buffer chain, at a expense of a huge power consumption [3].

In proposed transmitter, a 3-tap FFE is designed to compensate the channel loss; T-coil networks are employed for impedance matching and bandwidth enhancement [4]. And a high-speed clock distribution network is used to ensure timing requirement.

This paper is organized as follows. Sections II presents the architecture of the transmitter, followed by the description of building blocks including the FFE with 2:1MUX, the T-coil terminal, the clock distribution network and PRBS generator. Section III reveals the experimental results, and Section IV is the conclusion.

# II. ARCHITECTURE AND CIRCUIT DESIGN

The block diagram of the transmitter with package and channel model is shown in Fig. 2.

Two branches of 14Gb/s differential signals generated by PRBS block are sent into the FFE. The FFE includes three taps and uses half-rate MUX (2:1) as the delay cell. The 28Gb/s outputs of the FFE are connected with ESD protection which induces much capacitance. Therefore a T-coil network is employed to enhance the bandwidth and provide good impedance matching. The Single-ended clock is converted into a pair of differential clock by balun in the Clock Distribution network.



# A. FFE with 2:1MUX

FFE is employed as the transmitter's equalizer. There are three main challenges in design: the numbers of FFE tap, the 1UI delay cell, and the high bandwidth. The CML and T-coil can realize the high bandwidth, which is described in later Tcoil Network part.

The choice of the FFE tap numbers: in theory, the more the number of FFE taps are employed, the stronger equalization ability could be achieved to compensate channel loss. But more taps means larger power consumption and greater bandwidth decrease. In general, 3-4 is one of the best choice for the number of FFE taps [5].

The structure of FFE combiner is shown in Fig. 3. The FFE includes three taps, which are named by pre, main, and post, and the weights of these taps can be adjusted by tuning the current tails through DACs. For example, increasing the tail current value of post tap, which means to increase the proportion of the sum of the post, can strengthen the FFE compensation ability of post-symbol interference.



Fig. 3. The structure of FFE combiner

The Realization of the 1UI delay unit: the inputs of three taps of FFE are produced by delay cells, as shown in Fig. 4. The delay cell uses the proposed half-rate MUX structure, which achieves the last MUX and the feature of 1UI delay.

The conventional structure firstly multiplexes two lane 14Gb/s data into 28Gb/s data and then delays the 28Gb/s data to produce 3 paths 1UI delayed serial data, therefore more delay cells work at the highest speed, which aggregates the bandwidth and power consumption. By implementing the delay cell with a 2:1 MUX instead of using the conventional structure, the 1UI delay is guaranteed, and circuits running at

the highest speed such as latches for retiming and clock tree buffers for delay matching are eliminated. The proposed 1UI delay unit is conducive to higher bandwidth and lower power consumption.

Data DN0 and DN1 are delayed by latches and then serialized by selectors to produce 3 paths 1UI delayed serial data, namely pre\_n, main\_n, and post\_n respectively. Then the 1UI interval data are supplied to FFE combiner for timing equalization. The structure of latch and selector used in this design are shown in Fig. 5 and Fig. 6.







## B. T-coil Network

The FFE's output termination network is shown in Fig. 7. The core of the network is T-coil circuit, which can realize impedance matching and bandwidth enhancement by compensating for the output capacitance and parasitic capacitance of ESD device. The  $C_L$  represents the total capacitance at FFE output termination. The ESD is realized with a PN junction, which achieves low capacitance and thus result in higher bandwidth compared with the structure based on MOS transistor [6].



#### Fig. 7. The output termination network (T-coil)

**Impedance matching:** as shown in Fig. 7, T-coil is composed of a pair directional coupled inductors  $L_1$ ,  $L_2$ , with a coupling coefficient k, and a capacitor  $C_B$  crossing them. In the proposed design, the circuit displays a purely resistive output impedance, independent of frequency and the value of  $C_L$ . When the T-coil works at low frequency, the capacitor  $C_B$  is open and the coupled inductors  $L_1$ ,  $L_2$  are short. So that the output impedance is  $R_D$ . When the T-coil works at high frequency, the capacitor  $C_B$  is short and the coupled inductors  $L_1$ ,  $L_2$  are open. The output impedance is also  $R_D$ . It can be proved that the output impedance remains  $R_D$  for all frequencies if the following conditions are hold: [7-10]

$$L_{1} = L_{2} = \frac{R_{D}^{2}C_{T}}{4} (1 + \frac{1}{4\varepsilon^{2}})$$
(1)

$$C_B = \frac{C_T}{16\varepsilon^2} \tag{2}$$

$$k = \frac{4\varepsilon^2 - 1}{4\varepsilon^2 + 1} \tag{3}$$

In formulas above,  $\varepsilon$  is the damping factor of the network's transfer function. In order to keep output resistor constant and group delay uniform, we let k=0.5 and

$$C_B = \frac{C_T}{12} \tag{4}$$

$$C_T = C_L + C_{ESD} \tag{5}$$

Therefore, in symmetrical case, it comes as:

$$L_1 = L_2 = \frac{R_D^2 C_T}{3}$$
(6)

By adjusting the parameters based on the former calculation, a good impedance matching is obtained and the bandwidth of the circuit is broadened by a factor of 2.5. Specially, the package module is also considered in this design, which is simplified to a  $\pi$  resonant network. The value of Ln should be designed carefully to obtain a higher resonant frequency.

# C. Clock Tree Distribution Network

The Clock Tree Block includes two parts: Clock Drive Block and Clock Distribution Block, as shown in Fig. 2.



The external input clock, CLK\_IP, is 14GHz single-ended clock. The CLK\_IP will be converted into a pair of differential clock, which are amplified to drive heavy circuit load of the later block, as shown in Fig. 8. Besides, the 14GHz differential

clock signals, CLK\_P and CLK\_N, are sent to the Clock Distribution Block, provides 7GHz differential clock signal for the PRBS generation module.

## D. PRBS7 Generator

As shown in Fig. 9, the PRBS block generates two paths 14Gb/s PRBS7 differential signals as the input of FFE. The block is implemented with seven cascaded TSPC (True Single Phase Clock) flip-flops. Compared with CMOS flip-flops, the TSPC has faster speed and smaller area. Two differential signals generate signal D by XOR logic operation, and then the signal D is as the input of the next flip-flop, forming a loop, in order to generate PRBS7 signal. Two differential signals IN1 (IP1) and IN2 (IP2) are the 14Gb/s PRBS7 code.



Fig. 9. PRBS7 data Generating Block

# III. MEASUREMENT RESULT

The transmitter is fabricated in 65nm CMOS Technology. The micrograph of the whole chip is shown in Fig. 10, which occupies an area of  $0.18 \text{ mm}^2$ .



Fig. 10. Chip micrograph

As shown in Fig.10, when the external 14GHz clock signal is sent to the designed transmitter, the 28Gb/s PRBS7 data will pass through a 5cm PCB trace, which has a loss of 14 dB at 14 GHz, and the eye diagram of output data is shown in the Fig.11.



#### Fig. 11. The testing setup block

Fig.11 depicts the eye diagram of output data equalized by the FFE equalizing the 5cm PCB trace. The 28Gb/s PRBS7 output has 19.34ps jitter and 200mV eye opening. Hence, the result suggests that the designed transmitter supports 28Gb/s data transmission. Finally, the power of the core of this transmitter is 124mW under 1.0V supply. The Fig. 12 depicts the simulated power distribution of the transmitter.



Fig. 11. Measured eye diagram at 28Gb/s



Fig. 12. Simulation power distribution

Table I shows the transmitter performance summary and compares this design with other works done in prior.

|                               | This work        | [11]    | [12]              | [13]              |
|-------------------------------|------------------|---------|-------------------|-------------------|
| Technology(nm)                | 65               | 65 Bulk | IBM<br>32SOI      | 28nm              |
| Supply (V)                    | 1.0              | 1.1     | 1.05              | 1.0               |
| FFE                           | 3-tap            | 4-tap   | 4-tap             | 5-tap             |
| Data rate (Gb/s)              | 28               | 20      | 14-28.05          | 28                |
| Chip Area( mm <sup>2</sup> )  | 0.18             | 0.025   | 0.81 <sup>a</sup> | 0.62 <sup>a</sup> |
| Power (mW)                    | 124 <sup>b</sup> | 167     | 211@28            | 295 <sup>c</sup>  |
| Energy Efficiency<br>(pJ/bit) | 4.43             | 8.35    | 7.5               | 5.27              |

TABLE I PERFORMANCE SUMMARY

<sup>a</sup> This is the entire transceiver area <sup>b</sup> This is the full-chip Power <sup>c</sup> This is the TX+RX Power

# IV. CONCLUSION

This paper presents a wireline transmitter with 3tap FFE and T-coil network in 65nm CMOS technology. The transmitter works properly at 28Gbps in measurement. In the design, T-coil networks are used with ESD protection circuits at transmitter's output to realize impedance matching and bandwidth enhancement.

#### ACKNOWLEDGMENT

This work is supported by National Natural Science Foundation of China (NSFC), No. 61371011. The authors thank Dr. Fangxun Lv, Shuai Yuan and Tekronix for testing.

#### REFERENCES

- M. Nowell, "Overview of Requirements and Applications for 40 Gigabit and 100 Gigabit Ethernet," Ethernet Alliance, Beaverton, OR, 2007.
- [2] Ming-Shuan Chen and Yu-Nan Shih, A Fully-Integrated 40-Gb/s Transceiver in 65-nm CMOS Technology, IEEE Journal of solid-state circuits, vol. 47, no. 3, march 2012, pp. 627-639.
- [3] Raghavan B, Cui D, Singh U, et al. A sub-2W 39.8-to-44.6 Gb/s transmitter and receiver chipset with SFI-5.2 interface in 40nm CMOS[C]//Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2013 IEEE International. IEEE, 2013: 32-33.
- [4] Chen S, Yang L, Jing H, et al, A novel SST transmitter with mutually decoupled impedance self-calibration and equalization[C]//Circuits and Systems (ISCAS), 2011 IEEE International Symposium on. IEEE, 2011: 173-176..
- [5] Wang H, Lee J. A 21-Gb/s 87-mW transceiver with FFE/DFE/analog equalizer in 65-nm CMOS technology[J]. Solid-State Circuits, IEEE Journal of, 2010, 45(4): 909-920.
- [6] Momtaz A, Green M M, An 80 mW 40 Gb/s 7-tap T/2-spaced feedforward equalizer in 65 nm CMOS[J]. Solid-State Circuits, IEEE Journal of, 2010, 45(3): 629-639..
- [7] Sheif Galal and Behzad Razavi, "Broadband ESD Protection Circuits in CMOS technology", JSSC, vol. 38, 2003, pp. 2334-2340.
- [8] P. Starič, E. Margan, Wideband Amplifiers-Inductive Peaking Circuits, Springer US, 2006, pp. 2.35-2.48.
- [9] Weidong Cao, A 15Gb/s Wireline Repeater in 65nm CMOS Technology, Electron Devices and Solid-State Circuits (EDSSC), 2015 IEEE International Conference, 2015, pp. 590-593.
- [10] Linghan Wu, Co-design of 40Gb/s Equalizers for Wireline Transceiver in 65nm CMOS Technology, Electron Devices and Solid-State Circuits (EDSSC), 2014 IEEE International Conference, 2014, pp. 1-2
- [11] R. A. Philpott; J. S. Humble, A 20Gb/s SerDes transmitter with adjustable source impedance and 4-tap feed-forward equalization in 65nm bulk CMOS, Custom Integrated Circuits Conference (CICC), 2008. IEEE,2008, pp. 623 – 626.
- [12] John Bulzacchelli, A 28Gb/s 4-tap FFE/15-tap DFE serial link transceiver in 32nm SOI CMOS technology, Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2012 IEEE International,2012, pp. 324 – 326.
- [13] Bo Zhang, Karapet Khanoyan, A 28Gb/s multi-standard serial-link transceiver for backplane applications in 28nm CMOS, Solid- State Circuits Conference - (ISSCC), 2015 IEEE International, 2015, pp. 1 – 3.