# POLITECNICO DI TORINO #### SCUOLA DI DOTTORATO Dottorato in Ingegneria Elettronica e Comunicazioni – XXIV ciclo ### Tesi di Dottorato # Digital Signal Processing on FPGA for Short-Range Optical Communications Systems over Plastic Optical Fiber Experimental investigations to increase the POF channel capacity Julio César Ramírez Molina **Tutore**Prof. Roberto Gaudino Coordinatore del corso di dottorato Prof. Ivo Montrosset Marzo 2012 $\begin{array}{c} A\ Malui,\\ a\ H\'{e}ctor,\ a\ Mami\ y\ al\ Viejo \end{array}$ # Summary Nowadays bandwidth requirements are increasing vertiginously. As new ways and concepts of how to share information emerge, new ways of how to access the web enter the market. Computers and mobile devices are only the beginning, the spectrum of web products and services such as IPTV, VoIP, on-line gaming, etc has been augmented by the possibility to share, store data, interact and work on the Cloud. The rush for bandwidth has led researchers from all over the world to enquire themselves on how to achieve higher data rates, and it is thanks to their efforts, that both long-haul and short-range communications systems have experienced a huge development during the last few years. However, as the demand for higher information throughput increases traditional short-range solutions reach their limits. As a result, optical solutions are now migrating from long-haul to short-range communication systems. As part of this trend, plastic optical fiber (POF) systems have arisen as promising candidates for applications where traditional glass optical fibers (GOF) are unsuitable. POF systems feature a series of characteristics that make them very suitable for the market requirements. More in detail, these systems are low cost, robust, easy to handle and to install, flexible and yet tolerant to bendings. Nonetheless, these features come at the expense of a considerable higher bandwidth limitation when compared to GOF systems. This thesis is aimed to the investigate the use of digital signal processing (DSP) algorithms to overcome the bandwidth limitation in short-range optical communications system based on POF. In particular, this dissertation presents the design and development of DSP algorithms on field programmable gate arrays (FPGAs) with the ultimate purpose of implementing a fully engineered 1Gbit/s Ethernet Media Converter capable of establishing data links over 50+ meters of PMMA-SI POF using an RC-LED as transmitter. # Contents | Sı | ımm | ary | ] | |----|------|----------------------------------------------------------------------|------------| | 1 | Inti | coduction | 1 | | | 1.1 | Fiber-to-the-Home | 2 | | | 1.2 | Fiber-in-the-Home | 5 | | | 1.3 | Intensive Broadband Networking for Optical Interconnect Applications | 5 | | | 1.4 | The EU POF-PLUS Project | 7 | | | 1.5 | The scope of this Thesis | | | 2 | PO | F Short-Range Communication System | 12 | | | 2.1 | PMMA-SI POF | 12 | | | 2.2 | Optoelectronic Transmitter | 14 | | | 2.3 | Optoelectronic Receiver | 16 | | | 2.4 | Investigation of Modulation Formats and Equalizing Algorithms | 17 | | | 2.5 | Summary | 24 | | 3 | Me | dia Converter Prototype: 1Gb/s over 50+ m | <b>2</b> 5 | | | 3.1 | Design Specifications | 25 | | | 3.2 | Implementation Methodology | 27 | | | 3.3 | Physical Coding Sublayer PCS | 28 | | | 3.4 | Physical Medium Attachment PMA | 29 | | | | 3.4.1 Decision Feedback Equalization | 32 | | | | 3.4.2 DSP Implementation of an Adaptive and Blind Decision Feed- | | | | | back Equalizer | 38 | | | 3.5 | Summary | 59 | | 4 | Me | 1 | <b>6</b> 0 | | | 4.1 | Experimental Set Up | 60 | | | 4.2 | Optical Power Margin, BER and PER measurements | | | | 4.3 | PCS Validation | 63 | | | 4.4 | Convergence time of the DLMS Algorithm | 64 | | | 4.5 | Media Converter Tolerance to Fiber Bendings | 65 | |----|-------|---------------------------------------------------------|----| | | 4.6 | FPGA Logic and Area Utilization | 66 | | | 4.7 | Summary | 67 | | 5 | Clo | ck Recovery System | 68 | | | 5.1 | Clock Synchronizers | 68 | | | | 5.1.1 Feedforward Synchronizer | 69 | | | 5.2 | Feedback Synchronizer | | | | 5.3 | Hybrid Implementation of an Error Tracking Synchronizer | | | | | 5.3.1 Mueller and Muller TED | | | | | 5.3.2 Loop Filter | | | | | 5.3.3 $\Delta - \Sigma$ Modulator | | | | 5.4 | Summary | | | 6 | Med | dia Converter: DFE+CR Experimental Results | 81 | | | 6.1 | Firecomms 650nm Receiver | 81 | | | 6.2 | Testing the Clock Recovery System | | | | 6.3 | 1Gbit/s Full Duplex Media Converter | | | 7 | Cor | aclusions and Recommendations | 87 | | Bi | bliog | graphy | 94 | # List of Figures | 1.1<br>1.2 | Economies with the highest penetration of FTTH/Building+LAN [2] | 2 | |--------------|------------------------------------------------------------------------------------|-----------------| | 1.2 | Interest in futuristic web services of current FTTH users over 55 years of age [3] | 3 | | 1.3 | Interest in futuristic web services of current FTTH users over 40 years | | | | of age [3] | 4 | | 1.4 | Future Broadband In-Building Networks [6] | 6 | | 1.5 | Attenuation Spectrum for different fibers | 8 | | 1.6 | OSI/ETHERNET Layers implemented by the 1Gbit/s Media Converter | 9 | | 1.7 | Envisioned Full Duplex Gigabit Ethernet Media Converter | 10 | | 2.1 | SI PMMA POF (taken from [11]) | 13 | | 2.2 | Attenuation Spectrum of SI PMMA POF (taken from [11]) | 13 | | 2.3 | (a) Firecomms 650nm RC-LED (b) Eye Diagram @1.25GHz | 15 | | 2.4 | 650nm (a) Firecomms Receiver (b) Graviton SPD-2 | 16 | | 2.5 | POF Transmission Channel | 18 | | 2.6 | 2-PAM Eye Diagram after 50 m of PMMA-SI POF | 19 | | 2.7 | Frequency Response of the POF System | 19 | | 2.8 | Impulse Response of the POF System | 20 | | 2.9 | DFE Architecture | 20 | | 2.10 | Received signal for (a) 2-PAM transmission over 50 m (b) 4-PAM | | | | transmission over 50 m (taken from [13]) | 21 | | 2.11 | Experimental Set up for off line processing | 22 | | 2.12 | BER as a function of the number of taps of the FF and FB stages of the DFE | 23 | | 2.13 | Measured BER vs extra-attenuation at the receiver | $\frac{23}{24}$ | | 3.1 | Ethernet Layer Scheme of the Media Converter | 26 | | 3.2 | Implementation Methodology Diagram | 28 | | 3.3 | Ethernet Layers | 29 | | 3.4 | PCS transmitter | 29 | | 3.5 | PCS receiver | 30 | | 3.6 | General Diagram of the PMA | 31 | | $\mathbf{o}$ | - COLICION 12100=10011 OF 0110 1 101/1 + + + + + + + + + + + + + + + + + + | | | 3.7 | General Diagram of the POF Channel | 1 | |------|----------------------------------------------------------------------------|----| | 3.8 | Linear FIR Filter | 3 | | 3.9 | Block Diagram of DFE | 5 | | 3.10 | PMA Top Level Implementation Diagram | 8 | | 3.11 | Samples Arrangement at the ADC output | 9 | | 3.12 | FSE First Level Implementation Diagram | 0 | | | FSE Fourth Level Implementation Diagram 4 | 2 | | 3.14 | FSE Third Level Implementation Diagram 4 | .3 | | | FSE Second Level Implementation Diagram 4 | 4 | | | FSE Implementation | 5 | | 3.17 | Feedback Conventional Serial Architecture | 6 | | 3.18 | Multiplier MUX based implementation 4 | 7 | | 3.19 | Feedback Conventional Serial Architecture | 8 | | 3.20 | Reformulated DFE | 9 | | 3.21 | 4-stage look ahead pipelining of a 4-to-1 multiplexer loop 5 | 2 | | 3.22 | 4-Parallel DFE Architecture | 3 | | 3.23 | DFE Block Diagram | 4 | | 3.24 | $j^{th}$ parallel stage of the Error Estimation Block | 5 | | 3.25 | Implementation diagram of the DLMS Tap Estimation (I) 5 | 7 | | 3.26 | Implementation diagram of the DLMS Tap Estimation (II) 5 | 8 | | 3.27 | Summarized Block Diagram of the Adaptive and Blind DFE $\ .\ .\ .\ .\ .$ 5 | 9 | | 4.1 | Experimental Set Up for evaluating the 1Gbit/s Media Converter 6 | 1 | | 4.2 | BER curves for FF and DFE after 50m of PMMA-SI POF 6 | 2 | | 4.3 | BER curves for FF and DFE after 75m of PMMA-SI POF 6 | 2 | | 4.4 | (a) BER and (b) PER vs Received Optical Power 6 | 3 | | 4.5 | Experimental measurement of the MSE 6 | 4 | | 4.6 | Experimental Set Up for evaluating the system tolerance to Fiber | | | | Bendings | 5 | | 4.7 | BER versus Number of Fiber Bendings 6 | 6 | | 5.1 | Feedforward Synchronizer Architecture 6 | 9 | | 5.2 | Feedback Synchronizer Architecture | | | 5.3 | Digital Feedback Synchronizer Architecture | 1 | | 5.4 | Hybrid Clock Recovery Architecture | 2 | | 5.5 | | '3 | | 5.6 | | 3 | | 5.7 | M&M Parallel Implementation | | | 5.8 | Decimator Parallel Implementation | | | 5.9 | PLL Diagram | | | 5.10 | Loop Filter Implementation as (a) an Active Filter (b) equivalent | | |------|------------------------------------------------------------------------|----| | | s-domain Diagram | 76 | | 5.11 | S-curve of the M&M TED | 78 | | 5.12 | Loop Filter Frequency response (a) before and (b) after compensating | | | | the effects of the digital arithmetic | 79 | | 5.13 | Loop Filter Final Implementation Diagram | 79 | | 5.14 | $\Delta - \Sigma$ modulator Top level implementation diagram | 80 | | 6.1 | Back-to-Back Sensitivity Comparison between the Graviton-SPD2 | | | | and the Firecomms Receiver | 82 | | 6.2 | Power Margin Comparison between the Graviton-SPD2 and the | | | | Firecomms Receiver for transmission over $50\mathrm{m}$ of PMMA-SI POF | 82 | | 6.3 | Clock Recovery Experimental Set Up | 83 | | 6.4 | Convergence Time of the Clock Recovery System | 84 | | 6.5 | Fully Engineered 1Gbit/s Media Converter | 85 | | 6.6 | Experimental Set Up for validate the Fully Engineered 1Gbit/s Media | | | | Converter | 86 | | 6.7 | BER vs Received Optical Power the Fully Engineered 1Gbit/s Media | | | | Converter | 86 | # List of Tables | 2.1 | Main Properties of PMMA-SI POF | 14 | |-----|------------------------------------------------------------------|----| | 2.2 | Summarized Features of Firecomms RC-LED | 15 | | 2.3 | Summarized Features of the Graviton SPD-2 and the Firecomms RX | 17 | | 2.4 | BER measured in offline processing after 50 meter of PMMA-SI POF | 22 | | 3.1 | Required Features for the Media Converter | 27 | | 4.1 | Performance Results for 50m and 75m of PMMA-SI POF | 63 | | 4.2 | FPGA Resources and Area Utilization | 66 | | 6.1 | Holding Window and Jitter Measurements | 83 | # Chapter 1 ## Introduction Nowadays bandwidth hunger is increasing vertiginously. In this precise moment new on-line based applications are being developed, users are finding new ways of sharing information, new ways to get in touch. Social networks have arisen as worldwide platforms capable, in terms of software, of handling a vast range of information from tweets, to high definition videos, from music to e-books. As new ways and concepts of how to share information emerge, new ways of how to access the web enter the market. Computers and mobile devices are only the beginning. The spectrum of web products and services such as IPTV, VoIP, on-line gaming, etc has been augmented by the possibility to share, store data, interact and work on the Cloud. The rush for bandwidth has led researchers from all over the world to enquire themselves on how to achieve higher data rates. And it is thanks to their efforts, that both long-haul and short-range communications systems have experienced a huge development during the last few years. In this respect, important contributions were made in the last mile of the network towards the user, defined formally as the access network. Among these valuable contributions, technologies such as Asymmetric Digital Subscriber Line (ADSL), Very high bit rate Digital Subscriber Line (VDSL) and Power Line Communications (PLC) provide communication links at speeds that range from tens of Mbps up to 100Mbps. However, as the demand for higher information throughput increases, these copper based solutions have begun to reach their limits. It is in this scenario that long-haul optical based solutions began migrating into short-range communications systems, giving place to the Fiber to the Home (FTTH) solutions. ### 1.1 Fiber-to-the-Home Statistics released by the FTTH Council in the year 2010 indicated that approximately 50.2 million users worldwide were already connected by optical fiber, the large majority of which were from the Asia Pacific Region with 38.9 million, followed by North America with 7.9 million, and last Europe with 3.4 million users [1]. Further analysis made by the same organization, established that the widespread deployment of high speed connections to the door step of the end user, together with the increasing presence inside premises of interconnected devices supporting and sharing different types of data (e.g. High Definition Video, mp3, etc) had shifted the network bottleneck to the short range networks inside the home. Earlier this year, and in order to support this position, they presented updated statistics confirming the FTTH penetration rate reported in 2010, see Fig. 1.1. Figure 1.1: Economies with the highest penetration of FTTH/Building+LAN [2] Fig. 1.1 shows the countries with the highest penetration rate of FTTH and FTTB (Fiber to the Building) plus Local Access Networks (LANs). Also, but in a collateral way, this figure shows a potential market for short-range optical fiber systems. According to the graph, subscribers categorized as FTTB+LAN, have access to fiber connections by means of LANs. Hence, the market opportunity here lies in implementing these connections using optical fiber. LANs can strongly benefit from its overall better performance, when compared with its eventual counterparts, in delivering information at high bit rates. In particular, optical fiber provides symmetrical access and larger bandwidth. It does not use electricity to transmit information, therefore it is immune to electromagnetic interference (EMI), so that it could be deployed through, let's say, power supply ducts. On the other hand, it is precisely the intrinsic roughness of such in-building/in-home installation environments what makes difficult to use the common and widely spread Glass Optical fibers (GOFs). Mainly because its deployment require specialized equipment and trained personnel, which in the end, implies more expenses and therefore a higher cost for Telecom Operators (Telcos). Figure 1.2: Interest in futuristic web services of current FTTH users over 55 years of age [3] Alongside the above mentioned study, the FTTH council also presented an interesting study of how high bit rate access networks impacted the future expectations of its current users. In particular, the study analyzes the interests in futuristic web services of users categorized by age group. Figures 1.2 and 1.3 summarize the results by ranking the expectation stated by the members of each group in relation with future web services. These results show that people, depending on their age, relate in different ways with the web. Despite the differences there seems to be a constant factor shown by the fact that an important number of envisioned services will require real-time data streaming. Requirement that would imply by default, a high quality of service in order to guarantee a high quality of experience for the client; which means in other terms that any future broadband in-home network must provide: large bandwidth, low latency and low bit error rates (BER). Figure 1.3: Interest in futuristic web services of current FTTH users over 40 years of age [3] Regarding private networks within user's homes, technologies such as wireless fidelity (WiFi), power line communications (PLC) and solutions based on coaxial and UTP CAT cables have fulfilled today requirements. In doing so, they have demonstrated both advantages and disadvantages, WiFi for instance, is very attractive because of its easy deployment and configuration. However, its future improvements are challenged by an already saturated spectrum that makes difficult the allocation of new services, and also by access issues related to its distribution within a building, task that very often requires cabling between floors or to reach an isolated area. Hence, future broadband access networks have to deal with this reality and consider not only improving current technologies and creating new ones but also try to integrate them properly. In this sense and as an alternative to improve bandwidth limitations due to in-building cabling, novel systems based on both perfluorinated plastic optical fiber (PF-POF) and silica based multi-mode fiber (MMF) have been studied and demonstrated [4]. Recently, also optical wireless has been investigated as an option to WiFi, so that these systems combined together may become in the near future a very promising solution for LANs. Nonetheless, there are still challenges to overcome, even if these fiber cabling solutions were proven feasible, they still need to be deployed by expert personnel, and therefore the challenge still lies in finding a solution that allows safe and easy "do-it-yourself" installation. Under these circumstances and for these requirements, solutions based on POF are being investigated. The next sections will show the future envisioned for POF as part of the Fiber-in-the-Home (FITH) concept. ### 1.2 Fiber-in-the-Home FITH is the term given for the foreseen future broadband in-home/in-building networks. A network in which the service provider will ensure high bit rate transmission service inside the user's house. Fig. 1.4 illustrates the envisaged POF based backbone for future residential networking. Furthermore and with the purpose of providing service to the vast and continuously increasing number of mobile devices, microwave and baseband network services were recently studied and demonstrated as part of the proposed FITH architecture [5]. # 1.3 Intensive Broadband Networking for Optical Interconnect Applications Social Networking, web searching engines, working, storing and exchanging information on the Cloud have dramatically increased the amount of power computation, and therefore, of speed at which the vast range of algorithms involved in providing these services must be executed [7]. Currently, data centers are big warehouses where thousands of computer nodes are interconnected between them to form clusters, which are connected as well to operate as High Performance Computers (HPCs). Links for composing these grids span from very short distances of around <30cm for intra-computer node connection up to 10m for inter-rack connections. The link requirements include large bandwidth capacity (>20GHz), support of both bursting and non-interrupted data transfers, and as the data centers become larger, power consumption issues must be addressed in order to guarantee low cost green solutions [8]. It is in Figure 1.4: Future Broadband In-Building Networks [6] this scenario that optical interconnects communications systems based on large bandwidth PF-POF and MMF (BW around 13GHz) arise as potential solutions for HPC Systems in Data Centers (DC). ### 1.4 The EU POF-PLUS Project Due to all the above mentioned reasons, during the last few years European Telecom Operators together with the European Union have been actively working and creating policies to bring broadband access to the European continent. As part of their effort the European Seventh Framework Program 7 (FP7) hosted the POF-PLUS Project [9], an initiative aimed to promote research and development of short-range optical communication solutions based on Plastic Optical Fiber (POF) to provide wired and wireless services for in-building/in-home networks and to investigate the feasibility of optical interconnects applications. As mentioned, the intrinsic roughness of in-building/in-home environments makes difficult the deployment of conventional GOFs. Therefore, POF-PLUS proposed 1mm polymethyl methacrylate step index (PMMA-SI) POF as an alternative solution. More commonly referred as standard SI POF, this fiber is robust, ductile and tolerates extreme bending, thus it is the perfect candidate for installation in rough and less accessible locations. Moreover, its large core diameter (up to 1 mm) enables the relaxation of connector requirements without compromising optical coupling, which simplifies the design of connectors and allows the use of low cost materials for their manufacturing. As a result, do-it-yourself installation is possible and system costs are reduced, thus increasing the appealing of POF solutions for Telecom Operators. POFs outperform GOFs in terms of robustness. However, their advantages come at the expense of a considerable higher attenuation and bandwidth limitation. As seen in Fig. 1.5, both PMMA-SI POF and PF-POFs present higher attenuation values than GOFs, and accordingly, the lowest bandwidth is presented by PMMA-SI POF whilst the highest by GOFs. A full account concerning the different types of POF, their attributes and applications, is beyond the scope of the present document. More details in this respect can be consulted in [10]. In order to overcome the effect of the mentioned impairments, a list of objectives was defined within POF-PLUS. All of them aimed at the development of devices capable of transmitting and receiving data at >1Gbit/s over several tens of meters of POFs for applications such as multimedia device connections, in-building/in-home networks and optical interconnects in DCs. In particular, the fundamental activities of the project were defined as: Design and implementation of a fully engineered real-time transceiver capable of establishing 1Gbit/s links over 50 meters of PMMA-SI POF using Light Emitting Diode (LED). Figure 1.5: Attenuation Spectrum for different fibers - Design, implementation and optimization of multi-Gbit/s optoelectronic transceivers to improve linearity, bandwidth and reliability. - Optimization of transmission techniques for Multi-Gbit/s over tens of meters of non-standard novel large-core POFs using multi-core and/or multi-fiber ribbons. - Reliable transmission of radio-over-fiber (RoF) systems over PMMA-SI POF. ### 1.5 The scope of this Thesis POF-PLUS was aimed, at least for which respect to the partnership POLITO-ISMB, to implement a fully engineered 1Gbit/s Ethernet Media Converter capable of establishing data links over 50+ meters of PMMA-SI POF using a LED as transmitter. Furthermore, the resulting system had to comply with the IEEE 802.3 Gigabit Ethernet Standard, which depending on its version defines transmission protocols for different physical layers (or transmission core materials). In this particular case the main version used was the 1000Base-X<sup>1</sup>, which defines operation with optical fiber at 1Gbit/s. Now getting into details, Fig. 1.6 shows the Ethernet layers as established by the Gigabit Ethernet Protocol (GEP) and their relation to the OSI reference model. This figure also shows the PMD, PMA and PCS sub-layers, that conform the Ethernet Physical Layer implemented by the Media Converter. Finally it also includes a summarized description in terms of their intended function within the Ethernet Media Converter for the POF system. Figure 1.6: OSI/ETHERNET Layers implemented by the 1Gbit/s Media Converter The scope of this thesis is the design, development, and implementation of the required equalizing algorithms. In particular, this dissertation presents their design and implementation by means of digital signal processing (DSP) on field programmable gate arrays (FPGAs). More in general, the final system for which all efforts within this thesis were made is illustrated by Fig. 1.7, in which two Ethernet Media Converters operate between a UTP CAT 5 cable and 50m of POF providing $<sup>^{1}10\</sup>mathrm{Gbit}$ Ethernet in its version $10\mathrm{Gbase}$ -R was also used as a reference to tackle specific tasks 1Gbit/s full duplex communication. # 50m SI-POF Cable UTP CAT 5 ## 1Gbit/s Ethernet Media Converter Figure 1.7: Envisioned Full Duplex Gigabit Ethernet Media Converter This thesis is organized as follows. Chapter 2 presents first a detailed description of the POF transmission channel (medium+PMD) by introducing its impairments and the challenges implied in transmitting through it, this is followed by a brief account concerning modulation formats and equalizing architectures that lead to the main requirements for implementing the Media Converter. Chapter 3 focuses on the design and implementation of the PMA and PCS layers. In particular, the equalizing algorithms are first mathematically formulated and then adapted or redefined in order to implement them on the FPGA. Next is Chapter 4, which presents the experimental results that concern the first 1Gbit/s Media Converter prototype capable of establishing Full Duplex communication over 50+[m] of PMMA-SI POF. Chapter 5 presents the design and development of a hybrid analog-digital clock recovery system. More in detail, its analysis and formulation, based in the theory of Phase Looked Loops (PLL), is presented. Then, this is followed by the description of its hardware implementation, giving particular attention first to the Timing Error Detector block which implements the Mueller and Muller algorithm, and later to the design and hardware implementation of the Loop Filter. Chapter 6 presents the results obtained with the fully engineered 1 Gbit/s Media Converter for transmission over 50 $[\rm m]$ of PMMA-SI POF. Chapter 7 presents the conclusions derived from the above mentioned activities together with some comments and recommendations regarding future work. # Chapter 2 # POF Short-Range Communication System This chapter introduces the POF system and makes evident the inherent challenges that must be faced to run 1Gbit/s over it. In order to set the premises on which the work carried out by the author is based, experimental and simulative results obtained by the ISMB researchers before his involvement in POF-PLUS are presented. The author contributions to the project are detailed in next chapters. For which respects to the present discussion, it is organized as follows: a first part is dedicated to present the main features of the PMMA-SI POF, followed by a second part describing the optoelectronic components, i.e. light sources and photo detectors. Then, the main features and impairments of the complete POF transmission channel are presented, and finally the chapter ends presenting the results of an study of possible modulation formats and equalizing architectures for the Media Converter. ### 2.1 PMMA-SI POF As aforesaid it was defined within POF-PLUS that the Media Converter had to be implemented using PMMA-SI POF. Also referred as standard SI POF, this fiber has a large core and a thin cladding (see Fig. 2.1) that present typical refraction indexes of 1.49 and 1.40, respectively. These characteristics lead to a large numerical aperture (NA) of $\sim 0.5$ and an acceptance angle of $\sim 30^{\circ}$ , which provide standard SI POF with the required handling robustness and tolerance to misalignments. Properties that in the end translate into lower costs by allowing the use of inexpensive splicing tools and connectors. Nonetheless PMMA-SI POF qualities come at the expense of low bandwidth Figure 2.1: SI PMMA POF (taken from [11]) and high attenuation. Concerning the latter, Fig. 2.2 illustrates the attenuation spectrum of PMMA-SI POF. As seen, there are three relatively low attenuation regions (or windows) for which transmission is considered feasible. Incidentally these transmission windows are located within the visible spectrum, specifically on wavelengths around 520nm (green), 570nm (yellow), and 650nm (red), which results to be very convenient for the system, because the location of the transmission windows together with the fact that the Media Converter should operate with a LED (instead of a laser) mean that the system can be verified with a nude eye without incurring in any risk. So that, it not only improves safety but also enables do-it yourself installation and as an overall consequence reduces even further the cost. Figure 2.2: Attenuation Spectrum of SI PMMA POF (taken from [11]) As shown in Table 2.1, PMMA-SI POF has a very limited bandwidth. condition that results from its high multi modal dispersion and that usually constitutes the main limiting factor in POF systems. Moreover, the POF channel (including the optoelectronic devices), acts upon the signal as if it were transmitted through an electrical low pass filter, which makes possible, for theoretical and analytic purposes, to approximate the frequency response of the channel (in the electrical domain) as a gaussian low pass filter. Later on, this property will be corroborated by experimental measurements obtained by means of a Vector Network Analyzer. Further information in this respect alongside with techniques to measure the bandwidth of different fibers can be consulted in [10]. | 1mm Standard SI-POF | | | | |-----------------------|--------------------|--|--| | Core Diameter | $1000[\mu { m m}]$ | | | | Numerical Aperture | $0.48 \pm 0.02$ | | | | Bandwidth | 40[MHz]·100[m] | | | | Attenuation @ 650[nm] | 160-180[dB]/[Km] | | | | Attenuation @ 520[nm] | 90[dB]/[Km]] | | | Table 2.1: Main Properties of PMMA-SI POF Moreover, in Chapter 2 of this same source, it is stated that most of the time knowledge of the bandwidth alone does not suffice for a definitive analysis of the total capacity of a link involving further elements, simply because POFs behave differently depending on the properties of the other components of the set up, i.e. spectral width of the light source, its launching conditions, etc. In the light of this information, next sections will introduce the properties of the optoelectronic components that, once integrated with the fiber, will give place to the complete transmission channel. ## 2.2 Optoelectronic Transmitter A key factor to achieve low cost, easy and safety deployment of POF technology, is the utilization of light emitting diodes (LEDs) instead of more complex and expensive lasers. Conventional LEDs are accessible and easy to use, they can be driven with CMOS technology, they present good thermal stability and they are more robust and have longer life time than lasers. For many years, red (650nm) LEDs have been used as optical sources in POF applications. Consequently, the POF-PLUS project included the development of a 650nm Resonant Cavity LED, which is a high speed variant of a LED. Figure 2.3: (a) Firecomms 650nm RC-LED (b) Eye Diagram @1.25GHz | RC-LED | Units | | |------------------------------|-------|-------| | Wavelength | 650 | [nm] | | Optical Output Power | -1.5 | [dBm] | | Optical Modulation Amplitude | -3.3 | [dBm] | | Extinction Ratio | 3 | [dB] | | Falling Time | 453 | [ps] | | Rising Time | 411 | [ps] | Table 2.2: Summarized Features of Firecomms RC-LED RC-LEDs were first proposed in 1992 [12], since then companies such as Firecomms (former partner of the POF-PLUS project) have developed commercial RC-LED sources for high speed applications as for instance Fast Ethernet. During the last few years, Firecomms together with Fraunhofer Institute (another former partner) developed a 650nm RC-LED capable of giving an open eye diagram up to 1.25Gbit/s (see Fig. 2.3) thanks to the optimization of the electronic driver circuitry. Moreover it is integrated within an optolock, which is a Firecomms plug device specially tailored for POF applications whose specific purposes are making POF connection easier and more efficient by minimizing eventual misalignments between the fiber and the LED. The main features of the RC-LED are summarized in Table 2.2. ### 2.3 Optoelectronic Receiver At the beginning of the POF-PLUS project and as part of its objectives, the development and optimization of a 650nm optoelectronic (O/E) converter was required. To this end, Firecomms working in collaboration with Fraunhofer delivered, after 18 months, an optical receiver that integrated a A<sup>3</sup>PICs photodiode with a Firecomms optolock in a Fraunhofer driving board (see Fig. 2.4 (a)). During its implementation, the set up for developing the Media Converter was integrated using an O/E converter Graviton SPD-2 (see Fig. 2.4 (b)). Given its high cost and dimensions (see Table 2.3), such a device is far from being considered a suitable candidate for a low cost POF solution, but it made possible to prototype the first the Media Converter. Figure 2.4: 650nm (a) Firecomms Receiver (b) Graviton SPD-2 Later in the project, and after achieving some positive results, the Graviton SPD-2 was substituted by the Firecomms Receiver; even if this new component arose some new considerations for the system in terms of sensitivity, gain and, as will be shown later, power margin, it didn't invalidate the premises on which the development of the Media Converter was based. Therefore and for the purposes of this dissertation, the POF system that will lead us to the first version of the Media Converter will have the Graviton as a receiver. | | Graviton SPD-2 | Firecomms RX | Units | |------------------|----------------|--------------|--------------------| | Acceptable | <1000 | <1000 | $[\mu \mathrm{m}]$ | | Core Diameter | | | | | O/E Device | $\phi 0.4$ | $\phi 0.4$ | [mm] | | Active Area | | | | | Sensitivity | 25 | 24 | [dBm] | | Peak Sensitivity | 760 | 660 | [nm] | | Wavelength | | | | | Wavelength | 380~1000 | 410~850 | [nm] | | Range | | | | | Variable | NO | YES | | | Gain Amplifier | | | | | Connector | SMA | Optolock | | | Interface | | | | | Frequency | DC~1.2 | DC~1.25 | [GHz] | | Bandwidth | | | | | Noise Equivalent | -27.3 | | [dBm] | | Power | | | | | Supply Voltage | DC±15 | DC 5 | [V] | | Supply Current | +150/-50 | 120/110 | [mA] | | Physical | 103 x 44 x 21 | | [mm] | | Dimension | | | | | Weight | 130 | | [g] | Table 2.3: Summarized Features of the Graviton SPD-2 and the Firecomms RX # 2.4 Investigation of Modulation Formats and Equalizing Algorithms At the beginning of POF-PLUS M-PAM modulation formats and post-equalization schemes for the implementation of the Gigabit Ethernet Media Converter were investigated. For this purpose a Matlab/Simulink set up was modeled based on either the measured frequency response of each element or, if not available, on ideal assumptions regarding its operation. For instance, the POF was modeled by means of its experimentally measured frequency response while the O/E transmitter was considered to be ideal, thus neglecting eventual nonlinearities. There were also assumptions related to the nature of the noise sources, the bandwidth of the overall channel, etc. For a detailed explanation, a full account of this investigation can be consulted in [13]. For the purposes of this dissertation the attention will be focused on the main conclusions derived from this intensive simulation campaign, which were that 2-PAM and 4-PAM signals together with adaptive equalization at the receiver side in the form of a Decision Feedback Equalizer (DFE) are suitable solutions for implementing the Media Converter. More in detail, this section will present the off line processing analysis carried out in order to corroborate, with the actual system, the results obtained with the above mentioned simulations. Figure 2.5: POF Transmission Channel Accordingly, an FPGA development board was configured to generate 2-PAM and 4-PAM signals at 1.25GHz and at 0.625Gbauds, having thus in both cases a total throughput<sup>1</sup> of 1.25Gbit/s. Then, the Graviton SPD-2, 50m of PMMA-SI POF and the 650nm RC-LED were put together to conform the POF transmission channel shown in Fig. 2.5. At this point, the effects of transmitting through this channel were evaluated by measuring the signal at the output of the Graviton SPD-2. The resulting eye diagram from 2-PAM transmission is reported in Fig. 2.6, showing that the signal is completely distorted and thus the eye is totally shut, condition that make evident the need of some sort of equalization in order to recover the original sequence. Consequently and in order to better understand the channel impairments, the following measurements consisted on obtaining the electrical-to-electrical frequency and impulse responses of the POF transmission channel by means of a Vector Network Analyzer (VNA), the results for 25 and 50 meters of POF are shown by <sup>&</sup>lt;sup>1</sup>this value corresponds to the desired bit rate 1Gbit/s plus the overhead due to synchronizing and Forward Error Correction (FEC) codes as defined by the IEEE 802.3 Gigabit Ethernet Standard in its optical version 1000Base-X Figure 2.6: 2-PAM Eye Diagram after 50 m of PMMA-SI POF Figure 2.7: Frequency Response of the POF System Figures 2.7 and 2.8. As it may be noticed in Fig. 2.7 , the cut frequency for the POF transmission channel is around 150MHz for 25 meters of fiber, and 75MHz for 50 meters. These values are estimated as the limits for optical 3dB bandwidth which, since the electrical power is proportional to the square of the optical power, becomes 6dB for the transfer functions obtained with the VNA. For either case, it is clear that some kind of compensation should be done in order to be able to transmit at high data rates. Considering now the impulse responses shown in Fig. 2.8, it is evident, due to the Figure 2-3: Overall impulse response h(t). Figure 2.8: Impulse Response of the POF System broadening of the impulses, that the POF channel induces a very strong intersymbol interference (ISI) on the signal. ISI is the result of overlapping between the current symbol with previous and subsequent symbols and usually is caused by the limited bandwidth of the transmission channel and/or by multipath propagation. For the present case, ISI is evidenced by the fact that while each transmitted symbol has a duration of $\sim 0.8$ [ns]<sup>2</sup>, the actual impulse response after the fiber spans 7ns for 25 meters of fiber, and 12ns for 50 meters. In other words, for the prototype target distance (50m) the pulse broadens more than 12 times its actual duration. Thus, proving once again that equalization is a must for recovering the transmitted signal. Figure 2.9: DFE Architecture $<sup>^2</sup>$ this corresponds to the inverse of the signal frequency 1.25GHz After the above quantitative characterization of the POF transmission channel, all efforts were directed to overcome its negative effects on the transmitted signal. Therefore, the acquired data were equalized (in off-line processing) by a DFE based on a Feed Forward (FF) and (FB) filters (see Fig. 2.9). More in detail, both filters i.e. FF and FB were implemented within the Matlab/Simulink<sup>TM</sup>environment using Finite Impulse Response (FIR) Filters and their taps coefficients were adapted with a gradient-based least mean square (LMS) algorithm. A detailed description of the equalizing and adapting algorithms applied during this study, as well as their hardware implementation within the Media Converter, is given in Chapter 3. Next the most relevant results of this experimental campaing are presented, for a full account the reader is referred to [13]. Figure 2.10: Received signal for (a) 2-PAM transmission over 50 m (b) 4-PAM transmission over 50 m (taken from [13]) Fig. 2.10 shows the samples resulting at the output of the DFE from both 2-PAM and 4-PAM transmissions. As seen, the 4-PAM signal presents, after equalization, a non-homogeneous distribution of the errors that lead to a higher number of them in the external levels. This could indicate the presence of non-linearities on the optoelectronic devices. The overall effect of this condition is made evident by the measured BERs reported in Table 2.4. In the end, given the available set up, the main conclusion is that multilevel amplitude modulation, although it demands less bandwidth, presents a worse performance than conventional 2-PAM. Moreover, from a digital signal processing point of view 4-PAM implies, in terms of DC-balancing, equalizing hardware architectures and clock recovery, more complex implementations for both the transmitter and the receiver. All of which led to the decision to implement the Gigabit Ethernet Media Converter using 2-PAM transmission. | Transmitted Signal | BER | |--------------------|---------------------| | 2-PAM | $2 \cdot 10^{-4}$ | | 4-PAM | $1.2 \cdot 10^{-3}$ | Table 2.4: BER measured in offline processing after 50 meter of PMMA-SI POF Once 2-PAM was chosen for implementing the Media Converter, an analysis regarding the performance of the system as a function of the DFE architecture was carried out. In particular, the system was evaluated in terms of the BER, the number of coefficients of the DFE and the optical power margin before FEC<sup>3</sup>. The set up for this study is shown in Fig. 2.11. As seen, it included: first a Pattern Generator in which a real Ethernet traffic stream was pre-stored, so that the effects of actual data on the system could be evaluated; second the 50 meters of PMMA-SI POF; third a Variable Optical Amplifier (VOA) used to vary the magnitude of the received optical power; and finally the off-line processing model for the DFE. Figure 2.11: Experimental Set up for off line processing The first experiment consisted on measuring the BER of the system while varying the number of coefficients of both DFE stages, i.e. FF and FB. The received optical power was measured as -9.5dB (after the fiber) and it was maintained constant $<sup>^3\</sup>mathrm{Here}$ the assumption is a FEC capable of correcting errors on a signal that presents a BER lower than $10^{-3}$ throughout the test. The results are summarized by the contour graph shown in Fig. 2.12. Figure 2.12: BER as a function of the number of taps of the FF and FB stages of the DFE The second experiment consisted on measuring the BER for a fixed number of coefficients while varying the magnitude of the received optical power. At this point, an analysis based on the evaluation of the power margin obtained as a function of the number of taps coefficients of each stage together with the feasibility of an eventual implementation of the DFE, led to the conclusion that 16 FF and 2 FB taps coefficients guaranteed an acceptable power margin and were at the same time realizable. In this respect, Fig. 2.13 shows a power margin before FEC of 4.5dB for a DFE with the mentioned characteristics. Figure 2.13: Measured BER vs extra-attenuation at the receiver. ### 2.5 Summary This Chapter introduced the main properties of the PMMA-SI POF and of the optoelectronic devices that together constitute the POF channel. More in detail, the distortion and attenuation observed on the received signal evidenced that the POF channel is strongly limited in terms of bandwidth. Moreover, it was shown that the bandwidth limitation is mostly due to the PMMA-SI POF and that in order to retrieve the transmitted information, it is necessary to equalize the signal at the receiver. In this sense, the results of an off-line processing experimental campaign were presented. In particular, it was concluded that a system based on 2-PAM transmission over 50 meters of PMMA-SI POF with adaptive and blind equalization at the receiver in the form of a 16 FF tap coefficients followed by a 2 DFE tap coefficients guarantee 4.5 dB of optical power margin before FEC. Chapter 3 presents formally the mathematical model for both the equalizing as well as the adaptation algorithms that were used for the above mentioned analysis. Also a detailed account explaining how they were implemented on FPGA is given. # Chapter 3 # Media Converter Prototype: 1Gb/s over 50+ m This chapter presents the design and implementation of the Gigabit Ethernet Media Converter. The discussion is organized as follows: at first the requirements for the system are summarized and the methodology followed for prototyping the system is presented. Subsequently, the Media Converter is described in terms of the Ethernet Layers as they are defined by the 802.3 Gigabit Ethernet Standard, particular attention is given to the PMA sublayer in which the equalizing schemes required for achieving transmission over the PMMA-SI POF are implemented. Accordingly, section 3.4.1 presents the theoretical model of the equalizing and adapting algorithms followed by an analysis that redefines them to allow their implementation on FPGA. ### 3.1 Design Specifications The general requirements for the Media Converter implementation are: - To achieve complete compatibility with IEEE 802.3 Gigabit Ethernet Standard. - To reduce the effects of POF limited bandwidth. - To develop a very efficient full-custom PHY tailored for POF applications that minimizes the transmitted in-line bit rate. More in detail, compliance with the IEEE 802.3 Gigabit Ethernet Standard means that the system must provide a throughput of 1Gbit/s, which implies testing and validating the prototype with traffic at gigabit data rates. For this reason the media conversion is performed between 50 meters of SI-PMMA POF and a link based on an UTP CAT-5 copper cable. In this sense and according with the Ethernet layers defined by Gigabit Ethernet Standard the complete system results in the scheme illustrated in Fig. 3.1. Figure 3.1: Ethernet Layer Scheme of the Media Converter As seen, the Media Converter implements not only the physical layer required by the optical POF link, but also the one corresponding to the analog transceiver for ethernet communication over the UTP cable. Regarding POF bandwidth limitation and in-line bit rate minimization, the PCS and the PMA sublayers (in the POF PHY) are specially designed to address these requirements. In particular, the PMA implements the pertinent equalizing algorithms while the PCS is in charge of synchronizing and correcting the eventual errors still present after treating the signal. Moreover, this diagram shows that the Media Converter is mostly implemented as an embedded system within an FPGA Virtex4 XC4VS35-FF12 that is mounted together with other electronic peripheral devices on a BITSIM Ultra High-Speed Acquisition Board (UHAB). Among these peripherals, there is the BMC5461 Gigabit transceiver, which is a commercial device that implements the Ethernet PHY layer and provides the GMII interface, thus allowing to connect directly any design implemented on the FPGA. Throughout this document, the most relevant features of this board and its components will be presented. For further and more detailed information regarding this device, the reader is referred to [14]. In the following sections, the design and/or implementation of all the above mentioned sub-layers towards the POF will be described, but before proceeding the reader should acknowledge and be aware of the required and actual characteristics of the Media Converter as summarized on Table 3.1. | Media Converter Characteristics | | | | | | |---------------------------------|-------------|----------|-------------------|--|--| | PARAMETER | VALUE | Units | NOTES | | | | Throughput | 1 | [Gbit/s] | 1000Base-X | | | | Transmission Mode | Full Duplex | | | | | | FEC | RS(237,255) | | Reed Solomon | | | | Coding Scheme | 64B/65B | | 10G Ethernet | | | | Modulation Format | 2-PAM NRZ | | | | | | Line Baud Rate | 1.0991 | [Gbit/s] | includes overhead | | | | Target Distance | 50 | [m] | | | | | Maximum BER | $10^{-3}$ | | Before FEC | | | | Transmitted Optical Power | -1.5 | [dBm] | Modulated | | | | POF bandwidth | 40(@100m) | [MHz] | | | | | POF attenuation @650nm | 160-180 | [dB/Km] | | | | | Received Optical Power | -9.5 | [dB] | After 50m of POF | | | | Power Margin | 4.5 | [dB] | | | | Table 3.1: Required Features for the Media Converter ## 3.2 Implementation Methodology The methodology followed for implementing the Media Converter is illustrated by Fig. 3.2. As the reader can see, the first step involved designing and simulating the system using Matlab/Simulink. More in detail, the Xilinx System Generator Toolbox was used to implement the required DSP algorithms. Once these systems were validated through simulation a translation tool generated the VHDL code (step 2) together with all the files needed for the instantiation within the BITSIM firmware (step 3). Only then, the VHDL project was compiled using the ISE software from Xilinx and then the FPGA was configured (step 4). The iterative nature of this process was concretized when bugs or new problems were identified during experimentation with the complete set up, so that re-designing at simulation level was needed. For further information regarding the implementation of DSP systems using the Xilinx System Generator Toolbox the reader is referred to [15]. Figure 3.2: Implementation Methodology Diagram After having described the tools and methods applied for delivering the prototype, next section begins its description by introducing the reader to the implementation of the PCS sublayer. # 3.3 Physical Coding Sublayer PCS According to the IEEE 802.3 Gigabit Ethernet Standard, the PCS sub-layer defines the way in which a datastream is arranged in code words before being transmitted over a physical link. This operation may imply words synchronizing and signaling, error correction algorithms and DC-balancing considerations. Recalling the structure of the Ethernet Physical Layer (see Fig. 3.3), the PCS operates between the GMII interface and the low sublayers towards the transmission medium, i.e. PMA and PMD. As mentioned for this project the version 1000Base-X of the Ethernet Standard was used as reference. Basically, the 1000Base-X defines the same PCS/PMA architectures for different physical layers depending on the type of fiber and transmission wavelength used. In other words, it relies mainly on the features of the transmission channel (fiber+optoelectronics) to achieve error free transmission. However, the standard 1000Base-X PHY is not enough for overcoming the impairments introduced by the POFs, so that for the present case it was necessary to add more computation power and complexity at a higher level. Figure 3.3: Ethernet Layers Figure 3.4: PCS transmitter Therefore in order to address bandwidth limitation a 64/65B coding scheme from the 10Gbit Ethernet version 10Gbase-R was used instead of the 8B/10B coding specified by Gigabit Ethernet version 1000Base-X. This decision allowed an important bandwidth overhead reduction. Moreover, its implementation together with FEC encoding based on a Reed Solomon (RS) code (255,237) resulted in a serial line rate of 1.0991GBit/s instead of the 1.25GBit/s (without FEC) defined by the more common 1000Base-X. Figures 3.4 and 3.5 show respectively the transmitter and receiver sides of the implemented PCS. For further details on this matter, the reader is referred to [16]. Next the PMA will be described. ## 3.4 Physical Medium Attachment PMA This section describes the design and implementation of the PMA layer. In particular, the mathematical formulation for both the received signal as well as the equalizing and adaptive algorithms are introduced. The description of the system begins by showing in Fig. 3.6 the general block diagram of the PMA as part of the #### POF PHY Layer. As seen, it is composed, in transmission, by a Not Return to Zero (NRZ) modulator that acts upon the serial bits arriving from the PCS while, in reception, it presents the DFE in charge of processing the incoming distorted symbols. As mentioned in section 2.4, the equalizer is formed by a Feed Forward (FF) and a Feedback (FB) filters together with the logic to adapt their coefficients. The system is adaptive and blind, which means that the channel is not known a priori so that the DFE is adapted to match it, and in doing so, no training sequence is used. In the following a description of their design will be given, for a more detailed analysis the reader can consult Chapter 8 of [17], in which many of the equations and considerations that are presented throughout this document were studied. The first step of the design process consist on obtaining an expression to model the impairments inflicted by the POF channel on the received signal. Accordingly, the analysis starts by proposing the block digram shown in Fig. 3.7, from which, considering that the overall frequency response of the channel comprehends the individual transfer function of each of its components, an expression for the POF channel can be derived as $$p(t) = (h_{TX}(t) * h_{POF}(t) * h_{RX}(t)) = \mathcal{F}^{-1}[H_{TX}(f) \cdot H_{POF}(f) \cdot H_{RX}(f)]$$ (3.1) where $H_{POF}(f)$ is modeled as a linear time invariant (LTI) low pass filter, while $H_{TX}(f)$ and $H_{RX}(f)$ correspond to the theoretical transfer functions of the 2-PAM TX + RC-LED and the Graviton SPD-2, respectively. Once the channel is modeled, Figure 3.5: PCS receiver Figure 3.6: General Diagram of the PMA Figure 3.7: General Diagram of the POF Channel the signal at the output of the optoelectronic receiver can be expressed as $$Y_R(t) = \sum_{n = -\infty}^{n = \infty} x_n p(t - nT) + v(t)$$ (3.2) where $x_n$ denotes the 2-PAM symbols transmitted, T is the bit duration and v(t) is the inherent additive colored Gaussian noise introduced during the optoelectronic conversion. Moreover, given that the equalizer is implemented in the digital domain, the signal expressed by 3.2 is sampled so that supposing periodic sampling times of t = kT, it becomes $$Y_R(kT) = x_k p(0) + \sum_{n \neq k} x_n p(kT - nT) + v(kT)$$ (3.3) or, equivalently $$Y_R(k) = x_k p_0 + \sum_{n \neq k} x_n p_{k-n} + v_k$$ (3.4) The first term on the right-hand side (RHS) of 3.4 corresponds to the transmitted symbol while the second term correspond to the ISI. As aforementioned, ISI is a form of signal distortion in which one symbol is affected by the interference of subsequent and previous symbols. It is caused by the limited frequency response and the amplitude attenuation of the transmission channel. Moreover, its presence implies synchronizing difficulties and noise margin reduction, that together produce errors at the receiver. Thence, the need to eliminate it from the received signal. In this respect, the equalizer architecture that is been proposed is specially tailored for this purpose. #### 3.4.1 Decision Feedback Equalization Decision Feedback Equalization makes use of previous decisions in attempting to estimate the current symbol. Any intersymbol interference caused by previous symbols is reconstructed and then subtracted from the current estimated symbol [18]. According to its block diagram (see Fig. 3.6), its configuration includes a FF filter for shaping the channel output signal followed by a FB filter that subtracts the ISI. Due to the presence of a decision device within the feedback loop, the DFE is inherently a nonlinear receiver. However, it can be analyzed using linear techniques by assuming all previous decisions as correct [19]. In practice, this may not be true, and can significantly affect the overall performance of the equalizer. Nonetheless, including errors in the decision feedback section would complicate its analysis. In general, the most efficient way to specify the effect of feedback errors is via measurement. The analysis of the DFE begins by analyzing its FF filter. #### Feed Forward Filter FF filters are generally implemented in the form of linear equalizers by means of FIR filters with tap coefficients $c_n$ , as illustrated in Fig. 3.8. As seen, the time delay $\tau$ between taps may be chosen equal to the symbol period T, in which case the filter is referred to as symbol spaced equalizer and its input corresponds to the sequence given by 3.4. On the contrary, if the time delay $\tau$ is chosen such that $\tau < T$ then the equalizer has fractionally spaced taps and hence it is called fractionally spaced equalizer (FSE) [20]. The impulse response of the FIR equalizer is $$h(t) = \sum_{n=0}^{N-1} c_n \delta(t - n\tau)$$ (3.5) where $c_n$ are the N equalizer coefficients, where N must be large enough so that the equalizer spans in time as much as the ISI and thus it can attempt to compensate Figure 3.8: Linear FIR Filter this spurious effect by means of the convolution between its tap coefficients and the samples of the input signal. This may be mathematically expressed by recalling Eq. (3.2) and defining its output as $$Y_S(t) = \sum_{n=0}^{N-1} c_n Y_R(t - n\tau)$$ (3.6) considering that the FSE input is sampled at times t = kT $$Y_S(kT) = \sum_{n=0}^{N-1} c_n Y_R(kT - n\tau)$$ (3.7) Now that we have an expression for the FSE output, we may proceed to derive an expression for adapting the FSE taps coefficients and thus match the frequency response of the FSE to the transmission channel. In order to do this, the mean square error (MSE) between the desire symbol $x_k$ and the equalizer output $Y_S(kT)$ is computed $$MSE = E[Y_S(kT) - x_k]^2$$ $$= E\left[\sum_{n=0}^{N-1} c_n Y_R(kT - n\tau) - x_k\right]^2$$ $$= \sum_{n=0}^{N-1} \sum_{m=0}^{N-1} c_n c_k R_Y(n-m) - 2\sum_{k=0}^{N-1} c_k R_{XY}(m) + E[x_k^2]$$ (3.8) where $$R_Y(n-m) = E[Y_S(kT - n\tau)Y_S(kT - m\tau)] R_{XY}(m) = E[Y_S(kT - m\tau)x_k]$$ (3.9) then by differentiating Eq. (3.8) with respect to the taps coefficients $c_n$ , we obtain an expression from which the optimum set of taps can be derived $$\sum_{n=0}^{N-1} c_n R_Y(n-m) = R_{XY}(m), \text{ where } m = 0,1,2,\dots,N$$ (3.10) equivalently, this may be expressed as a matrix $$\mathbf{R}_{\mathbf{Y}} \cdot \mathbf{C} = \mathbf{R}_{\mathbf{XY}} \Rightarrow \mathbf{C}_{\mathbf{opt}} = \mathbf{R}_{\mathbf{Y}}^{-1} \cdot \mathbf{R}_{\mathbf{XY}}$$ (3.11) Normally, if the system does not require to be blindly adapted, $C_{opt}$ can be estimated by transmitting a training sequence and obtaining $R_Y$ and $R_{XY}$ by means of the following estimators $$\mathbf{R}_{\mathbf{Y}} = \{\mathbf{Y}_{\mathbf{S}}\mathbf{Y}_{\mathbf{S}}^*\} \approx \widehat{R}_Y(n-m) = \frac{1}{K} \sum_{k=1}^K Y_S(kT - n\tau) Y_S(kT)$$ $$\mathbf{R}_{\mathbf{XY}} = \{\mathbf{Y}_{\mathbf{S}}\mathbf{x}_{\mathbf{k}}^*\} \approx \widehat{R}_{XY}(m) = \frac{1}{K} \sum_{k=1}^K Y_S(kT - n\tau) x_k$$ (3.12) On the contrary when the system has to be adapted blindly, as in the present case, an stochastic gradient algorithm known as Least Mean Square (LMS) can be applied, so that the tap coefficients are optimized according to the following expression $$\widehat{\mathbf{C}}_{k+1} = \widehat{\mathbf{C}}_k + \mu \varepsilon_k \mathbf{Y}_{\mathbf{S}_k} \tag{3.13}$$ where $\widehat{\mathbf{C}}_k$ is a vector with the set of taps at the $k^{th}$ iteration, $\mathbf{Y}_{\mathbf{S}k}$ corresponds to the received signal at the FSE input, $\mu$ is the step-size parameter (scales the correction) and $\varepsilon_k$ denotes the error, which is defined as the difference between the desired output from the FSE at $k^{th}$ iteration and the actual transmitted symbol. Later, after completing the description of both equalizing stages, the algorithms used to compute the error signal $\varepsilon_k$ will be presented. In the meantime, the mathematical model and architecture of the Feedback Filter will be presented. #### Feedback Filter The linear equalizer described in the previous section is effective on channels where ISI is not severe. The severity of the ISI depends strongly on the spectral characteristics of the channel. For instance, when a channel presents a strong attenuation or even a spectral null the equalizer tries to compensate it by introducing a large gain in its frequency response, which as a result leads to noise enhancement and thus affects the equalizer performance. A solution to this problem is to implement a DFE [19] by adding a feedback filter as shown in Fig. 3.9. Figure 3.9: Block Diagram of DFE The feedback filter may also be implemented as an FIR filter with symbol-spaced taps. As seen, it receives the previously decided symbols and generates a signal that is subtracted from the output of the FF section. This operation is mathematically expressed as $$\widehat{x}_k = Y_S(kT) - \sum_{n=1}^{N_b} b_n x_{k-n}$$ (3.14) substituting Eq. (3.7) in Eq. (3.14), we have $$\widehat{x}_k = \sum_{n=0}^{N_f - 1} c_n Y_R(kT - n\tau) - \sum_{n=0}^{N_b - 1} b_n x_{k-n}$$ (3.15) where $c_n$ and $b_n$ are the tap coefficients of the feed-forward and feedback filters, respectively, while $N_f$ and $N_b$ correspond to the filters length, finally $x_{k-n}$ are the previously decided symbols. Equivalently, Eq. (3.15) may also be expressed as a matrix $$\widehat{\mathbf{X}}_k = \mathbf{C}^T \mathbf{Y}_R - \mathbf{B}^T \mathbf{X}_{k-n} \tag{3.16}$$ In order to simplify Eq. (3.16), it is convenient to define an augmented vector for the tap coefficients $$\widetilde{\mathbf{C}} = \begin{bmatrix} \mathbf{C} \\ \mathbf{B} \end{bmatrix} \tag{3.17}$$ and also for the DFE inputs $$\widetilde{\mathbf{Y}}_k = \begin{bmatrix} \mathbf{Y}_k \\ \mathbf{X}_{k-n} \end{bmatrix} \tag{3.18}$$ so that Eq. (3.16) becomes $$\widehat{\mathbf{X}}_k = \widetilde{\mathbf{C}}^T \widetilde{\mathbf{Y}}_k \tag{3.19}$$ where T denotes the transpose operator. Once the DFE output has been derived including the effects of both FF and FB stages, we can re-define Eq. (3.13) as $$\widetilde{\mathbf{C}}_{k+1} = \widetilde{\mathbf{C}}_k + \mu \varepsilon_k \widetilde{\mathbf{Y}}_k \tag{3.20}$$ where $\mu$ corresponds again to the step-size parameter (scales the correction) while $\varepsilon_k$ denotes the error, which is defined depending on the adapting method that is chosen to optimize the equalizer. As aforementioned, the equalizer has to be adapted blindly, which means that the system cannot be optimized by means of a training sequence. In this sense, traditional adaptive DFEs are adapted using the Decision Directed (DD) algorithm which defines the error as $$\varepsilon_k = \hat{x}_k - x_k \tag{3.21}$$ where $\hat{x}_k$ corresponds to the signal just before the slicer and $x_k$ to the decided symbol at its output. However, it is known that DD presents convergence problems such as gradient attraction to undesired local minima, which according to [21], occurs because of the finite length of the equalizer filter and the poor selection of cost functions, i.e. the function that defines the criterion to be minimized whose surface may present more than one minima. As a consequence, it was decided to adjust the system estimating $\varepsilon_k$ by means of the Constant Modulus Algorithm (CMA)[19], which as reported in [22] constitutes a more robust method than DD for converging to global optimums. More in detail, the CMA assumes the constant modularity of the signal as the desired property to equalize, therefore the incoming sequence is considered as a modulated signal with constant amplitude, so that any amplitude deviation at the receiver constitutes a distortion introduced by the channel, for the present case this is expressed as $$\varepsilon_k = Y_{Sk} \cdot (|Y_{Sk}|^2 - \gamma^2) \tag{3.22}$$ where $Y_{Sk}$ corresponds to the FF output signal while $\gamma$ is the CMA dispersion constant defined as $$\gamma = \frac{E\{Y_{Rk}^4\}}{E\{Y_{Rk}^2\}} \tag{3.23}$$ where $Y_{Rk}$ corresponds to the input signal of the FF filter. Nonetheless, since the CMA is based on high order statistics, it suffers from slow convergence rate and high residual noise [19]. Therefore, after achieving convergence using the CMA, it was decided to add a further adaptation stage in which the system is adapted by means of the more precise DD method [23]. More in detail, during the first part of the adapting procedure the FB filter is disabled and therefore only the FF filter is adapted, meanwhile, the MSE is monitored so that once it reaches a certain low value indicating an state of CMA convergence, the equalizer switches to decision-directed (DD) mode and enables also the FB section. So far, this chapter has theoretically formulated the equalizing and adapting algorithms as they are normally presented in literature [18], [20], [19], and accordingly as they are usually modeled as part of off-line processing experimental set ups. However, these models do not necessarily suffice the requirements of an FPGA based implementation, mainly because depending on the required throughput and on the on-board peripheral devices the system may need to be implemented by means of parallel and pipelined hardware architectures. In this sense, next section presents the analysis and resulting set of redefined equations that fit the features of our chosen development platform, the BITSIM FPGA board. # 3.4.2 DSP Implementation of an Adaptive and Blind Decision Feedback Equalizer In our work, the DFE is based on DSP techniques and is entirely implemented in the digital domain as an embedded system on an FPGA. As aforementioned, an Ultra High-speed Acquisition Board (UHAB) from BITSIM was selected for prototyping the Media Converter. In order to draft the possible hardware architectures to implement the DFE, it is first necessary to know the available on board clock domains. In this sense the most relevant on board component to define the chosen FPGA clocks are the analog to digital (ADC) and Digital to Analog (DAC) converters, because both devices together with the FPGA clock determine the parallelizing requirements, clock domains and also constrain the possible designing schemes for implementing the DFE. In the following, it will be explained how the DFE presented in section 3.4.1 is implemented using a highly parallel strategy. Figure 3.10: PMA Top Level Implementation Diagram For the present case, Fig. 3.10 shows the top-level<sup>1</sup> implementation diagram of the receiver side of the PMA as part of the Media Converter. As seen, the ADC <sup>&</sup>lt;sup>1</sup>Henceforth, the description of the system will lead us to lower hardware levels, which will be referred by a cardinal number, i.e top level, first level, second level and so on, this is a common practice for among hardware designer mounted on the BITSIM board provides the FPGA with 8 parallel samples each clock cycle. This means that the ratio between the sampling frequency and the FPGA clock frequency corresponds to one-eight, therefore considering that the input signal is transmitted at 1.1Gbit/s, the ADC clock frequency was fixed at 2.2GHz (the limit of the Nyquist Rate for two samples per bit) and thus the FPGA clock frequency at 275MHz. Moreover, Fig. 3.11 shows how the samples provided by the ADC are first categorized as odd or even samples and then arranged in two groups constituted by four samples each so that they can be processed in parallel inside the FPGA. This categorizations is performed because it facilitates implementing the FF filter. More in detail, the fact that the data is provided in two groups at the same data rate as the incoming signal allows the implementation of the FF filter as a Fractionally Spaced Equalizer conformed by two parallel baud spaced filters. On the other hand it should be noticed that this system outputs only 4 parallel data streams, which corresponds to the minimum number of samples required by the on board DAC in order to be able to maintain the signal throughput, in other terms, 4 data lines at 275Mbps give place, after being serialized, to a single line at a data rate of 1.1Gbps. Further details in this regard are given later, next the implementation of the FF filter in the form of a FSE equalizer is presented. Figure 3.11: Samples Arrangement at the ADC output #### Fractionally Spaced Equalizer The fractionally spaced equalizer shown in Fig. 3.12 follows the above mentioned sample categorization. As seen, it is composed by two parallel filters that compute the convolution between the tap coefficients $(c^o, c^e)$ and the samples $(Y_R^e, Y_R^o)$ provided by the ADC. More in detail, this figure shows how the ADC samples are grouped in two sets formed by M odd (blue) and M even (red) samples, where M is the generic parallelizing factor, while in the same matter, the coefficients of each parallel baud space filter are grouped in two sets formed by L odd and L even tap coefficients. In particular and in correspondence with the sample sets shown in Fig. 3.11 the generic parallelizing factor M is equal to 4, while in correspondence to the off-line processing analysis presented in section 2.4, the L number of taps coefficients for each parallel filter is equal to 8, i.e 8 odd and 8 even taps that add together to the required 16 FF taps. In the following, the mathematical model of the DFE that was presented in section 3.4.1 will be adapted and as a result a set of equations describing the DFE hardware architecture in terms of the above mentioned generic factors will be given. Figure 3.12: FSE First Level Implementation Diagram Let us begin by deriving a first expression for the $j^{th}$ row of the FSE output signal $y_S$ $$y_S[Mk - j] = y_e[Mk - j] + y_o[Mk - j]$$ (3.24) where $y_e$ and $y_o$ correspond to the outputs of the even $F^e(z)$ and odd $F^o(z)$ filters, which are defined as $$y_e[Mk - j] = \sum_{i=0}^{L_f - 1} c_i^e Y_R^e[Mk - i - j]$$ (3.25) and $$y_o[Mk - j] = \sum_{i=0}^{L_f - 1} c_i^o Y_R^o[Mk - i - j]$$ (3.26) where $L_f$ denotes the length of the filters, c corresponds to the taps coefficients and $Y_R$ are the samples of the incoming signal provided by the ADC. Henceforward, the analysis will consider only the expressions regarding the even filter $F^{e}(z)$ , which are equivalent in form but different notation for the odd filter $F^{o}(z)$ . Also, for the moment the delays introduced by the pipelined architectures $D_{e}$ and $D_{o}$ will be temporarily neglected, an expression to estimate them will be given later. Now, considering the parallelizing factor M together with a filter length $L_{f}$ we can introduce a ratio variable such that $$L_f = \ell M \tag{3.27}$$ where $\ell$ denotes the filter length of each parallel stage. Moreover, defining $\ell$ allows us to re-define the $j^{th}$ output of the even filter $F^e(z)$ as $$y_e[Mk - j] = \sum_{\alpha=0}^{\ell-1} y_{\alpha}[Mk - j]$$ (3.28) where $$y_{\alpha}[Mk - j] = \sum_{i=0}^{M-1} c_{\alpha M+i}^{e} Y_{R}^{e}[M(k - \alpha) - i - j]$$ (3.29) Since j is bounded by M it can only take values within the set $\{0,1,...,M-1\}$ , so that Eq. (3.29) can be rearranged as $$y_{\alpha}[Mk - j] = \sum_{i=j}^{M-1} c_{\alpha M + (i-j)}^{e} Y_{R}^{e}[M(k - \alpha) - i] + \sum_{i=0}^{j-1} c_{\alpha M + (M+i-j)}^{e} Y_{R}^{e}[M(k - \alpha - 1) - i]$$ (3.30) Figure 3.13: FSE Fourth Level Implementation Diagram This last expression suggests implementing the filter by means of a systolic array, which is a hardware architecture that resembles the structure of a matrix in which each cell is constituted by a computation device [24], in our case multipliers. The resulting systolic array implementation is shown in Fig. 3.13, which also corresponds to the fourth level implementation diagram of the system. As seen, the systolic array is composed by a grid of pipelined parallel multipliers that receive the samples from the ADC and the filters tap coefficients, performs the convolution [25] between them and then adds each parallel result to generate the equalized output. Moreover, based on this diagram it is now possible to provide an expression for the delay introduce at this level, so that $$D_{\alpha} = M(M+2) \tag{3.31}$$ Now, in order to show how the FSE is implemented, we will proceed to go upwards, one level at a time. In this sense, let us consider Eq. (3.28), which shows that each filter $F_j^e(z)$ filter is conformed by $\ell-1$ parallel stages whose expression are defined as $y_{\alpha}$ in Eq. (3.30), the resulting third level hardware implementation diagram is illustrated in Fig. 3.14. Again, the delay introduced by the system at this level is computed as $$D_e = Do = D_2 + D\alpha \tag{3.32}$$ where $D_2$ denotes the delay due to the adder at the output of the $F_i^e(z)$ , and that is calculated as $$D_2 = M(int[log_2(\ell)]) \tag{3.33}$$ Figure 3.14: FSE Third Level Implementation Diagram Finally, by recalling Eq. (3.24) it is possible to conform the second level diagram by integrating j parallel $F_j^e(z)$ filters, which results in the block diagram shown in Fig. 3.15. Also, the expression for the total delay of the FSE can be given as $$D_{FFF} = D_e + M (3.34)$$ by substituting the previous delay expressions, Eq. (3.32) and Eq. (3.31), we get that $$D_{FFF} = M[(int[log_2(\ell)]) + M + 3]$$ (3.35) As aforesaid, the implementation diagrams showed up to this point hold also for the even filter $F^o(z)$ . Next, a numerical example based on parameters from the actual implementation is presented. EXAMPLE. As the reader may recall, off-line processing investigations presented in section 3.4.1 led to the conclusion that 16 feed-forward taps (8 even and 8 odd) were enough to properly equalize the pre-cursor ISI due to transmission through the POF channel. This means that for each stage, odd and even, $L_f$ is equal to 8, therefore according to Eq. (3.27) and considering that the parallelizing factor M is equal to 4, we have that $$\ell = L_f/M = 8/4 = 2 \tag{3.36}$$ Now, substituting these values into Eq. (3.35), Eq. (3.32) gives a partial delay $D_e = D_o = 28$ symbol periods and a total FSE delay $D_{FFF} = 32$ symbol periods, which results in the first level implementation diagram shown in Fig. 3.16. At this point, by considering the expressions from Eq. (3.24) to Eq. (3.30) we derive an equation for the FSE outputs $$y_{S}[Mk - j] = \sum_{\alpha=0}^{\ell-1} \left[ \sum_{i=j}^{M-1} \left[ c_{\alpha M + (i-j)}^{e} Y_{R}^{e}[M(k - \alpha) - i] + c_{\alpha M + (i-j)}^{o} Y_{R}^{o}[M(k - \alpha) - i] \right] + \sum_{i=0}^{j-1} \left[ c_{\alpha M + (M+i-j)}^{e} Y_{R}^{e}[M(k - \alpha - 1) - i] + c_{\alpha M + (M+i-j)}^{o} Y_{R}^{o}[M(k - \alpha - 1) - i] \right] \right]$$ $$(3.37)$$ where j takes values within the set $\{0,1,...,M-1\}$ , now substituting the numerical Figure 3.15: FSE Second Level Implementation Diagram Figure 3.16: FSE Implementation values for M and $\ell$ , we get $$y_{S}[4k - j] = \sum_{\alpha=0}^{1} \left[ \sum_{i=j}^{3} \left[ c_{\alpha 4 + (i-j)}^{e} Y_{R}^{e}[4(k - \alpha) - i] + c_{\alpha 4 + (i-j)}^{o} Y_{R}^{o}[4(k - \alpha) - i] \right] + \sum_{i=0}^{j-1} \left[ c_{\alpha 4 + (4+i-j)}^{e} Y_{R}^{e}[4(k - \alpha - 1) - i] + c_{\alpha 4 + (4+i-j)}^{o} Y_{R}^{o}[4(k - \alpha - 1) - i] \right] \right]$$ $$(3.38)$$ where j takes values within the set $\{0,1,...,3\}$ . After having presented expressions for structuring the FSE and for estimating its overall delay, the FB filter implementation will be described next. #### Feedback Equalizer As mentioned in section 3.4.1, an off-line processing study yielded that 2 tap coefficients were enough for mitigating the effects of post cursor ISI. Accordingly, the DFE was completed by implementing a 2 taps FB filter next to the above presented FSE. It is important to notice that the presence of a feedback loop may impose, depending on the filter architecture, a severe limit to the maximum achievable throughput. In order to address potential issues, parallelism and pre-computation techniques in the form of look-ahead architectures [26] were used to pipeline the DFE. Moreover, given the criticality of delays present within the feedback loop, the implementation of the FB filter by means of FIR filters was discarded. In this sense and to illustrate to the reader why parallelism and look ahead techniques are required, Fig. 3.17 shows a conventional serial FB architecture. The critical path in this diagram is highlighted in red, as seen, it is composed by one multiplier, one slicer, one adder and one flip flop. Moreover, their respective delays contribute to the overall delay of the loop and defines the maximum symbol rate of the system. The DFE operates properly only if the symbol rate is less than the total delay of this critical path, denoted as $T_{bound}$ , it can be expressed as [27], $$T_{bound} = D_{Loon}/L \tag{3.39}$$ where L represents the logical delay operators, e.g. a flip-flop, which in this case is equal to 1, while $D_{Loop}$ denotes the overall delay in the loop and is defined as, $$D_{Loop} = T_m + 2T_a + T_s (3.40)$$ where $T_m$ , $T_a$ and $T_s$ denote respectively the delays introduced by the multiplier, the adders and the slicer, it should be noticed that for this analysis the delay introduced by the flip flop is neglected [27]. Figure 3.17: Feedback Conventional Serial Architecture At this point, taking advantage of the fact that the system operates with 2-PAM signals and therefore the possible decided symbols a[k] take values within the set $\{0,1\}$ , a first and decisive improvement can be made by substituting the multipliers for 2-to-1 multiplexers<sup>2</sup>, as shown in Fig. 3.18. As a consequence, Eq. (3.40) becomes $$D_{Loon} = T_{mux} + 2T_a + T_s (3.41)$$ where $T_{mux}$ is the delay introduced by the multiplexer, in the end, we have that $$T_{bound} = D_{Loop}/L = [T_{mux} + 2T_a + T_s]/1 = T_{mux} + 2T_a + T_s$$ (3.42) where $T_{bound}$ still denotes the delay of the critical serial path. Figure 3.18: Multiplier MUX based implementation The next step is to parallelize the serial FB filter, but before proceeding let us define the DFE serial output as $$a_k = D[y_k - b_1 a_{k-1} - b_2 a_{k-2}] (3.43)$$ where $D[\cdot]$ represents the the decision operation performed by the slicer, $b_1$ and $b_2$ are the tap coefficients and $y_k$ corresponds to the received data. Fig. 3.19 shows the parallel architecture for the DFE. As seen, the system is now composed by 4 multiplexers, 8 adders, 4 slicers and 1 logical delay operator that due to the parallelization is equal to 4 logical delays at symbol rate. As a consequence, the overall delay is $$T_{bound} = D_{Loop}/L = [4T_{mux} + 8T_a + 4T_s]/4 = T_{mux} + 2T_a + T_s$$ (3.44) <sup>&</sup>lt;sup>2</sup>digital implementations of adders and multipliers are known for presenting delay issues Figure 3.19: Feedback Conventional Serial Architecture this is equal to the result obtained with Eq. (3.42), which demonstrates that parallelizing the system does not imply a penalty in terms of delay and throughput. However, this architecture presents an important issue, which is the potential arise of glitches at the un-registered outputs of the multiplexers. This is easily solved by pipelining their output, but this would imply delaying them so that instead of yielding $\{a[4k], a[4k-1], a[4k-2], a[4k-3]\}$ they would yield $\{a[4k-4], a[4k-5], a[4k-6], a[4k-7]\}$ , which means that the system must be able to estimate and provide the required data 4 symbol periods in advance. This problem was solved by adopting and implementing the DFE as proposed in [28], where an approach for DFEs pipelining based on look-ahead techniques and on parallel nested multiplexer loops is presented. Next a 4-look-ahead technique applied to a 4-parallel 2 taps DFE is described. The 2-tap DFE shown in Fig. 3.17 can be reformulated based on multiplexer loops containing all possible pre-computed paths as illustrated by Fig. 3.20, this Figure 3.20: Reformulated DFE reformulation with a 4-to-1 multiplexer loop is expressed by [26] $$a_{4k} = A_{4k}a_{4k-2}a_{4k-1} + B_{4k}\overline{a}_{4k-2}a_{4k-1} + C_{4k}a_{4k-2}\overline{a}_{4k-1} + D_{4k}\overline{a}_{4k-2}\overline{a}_{4k-1}$$ (3.45) where $$A_{4k} = D[y_{4k} + b_1 + b_2]$$ $$B_{4k} = D[y_{4k} + b_1 - b_2]$$ $$C_{4k} = D[y_{4k} - b_1 + b_2]$$ $$D_{4k} = D[y_{4k} - b_1 - b_2]$$ (3.46) in order to obtain the first look ahead step we first define the truth table for the reformulated DFE as | $a_{4k-1}$ | $a_{4k-2}$ | $a_{4k}$ | |------------|------------|----------| | 0 | 0 | $A_{4k}$ | | 0 | 1 | $B_{4k}$ | | 1 | 0 | $C_{4k}$ | | 1 | 1 | $D_{4k}$ | then, moving one step ahead we have that | $a_{4k-2}$ | $a_{4k-3}$ | $a_{4k-1}$ | |------------|------------|------------| | 0 | 0 | $A_{4k-1}$ | | 0 | 1 | $B_{4k-1}$ | | 1 | 0 | $C_{4k-1}$ | | 1 | 1 | $D_{4k-1}$ | so that, merging Eq. (3.47) and Eq. (3.48) yields | | $a_{4i}$ | | | | | | |------------|------------|------------|------------|------------|------------|----------| | $A_{4k-1}$ | $B_{4k-1}$ | $C_{4k-1}$ | $D_{4k-1}$ | $a_{4k-2}$ | $a_{4k-3}$ | $a_{4k}$ | | 0 | X | X | X | 0 | 0 | $A_{4k}$ | | 1 | X | X | X | 0 | 0 | $C_{4k}$ | | X | 0 | X | X | 0 | 1 | $A_{4k}$ | | X | 1 | X | X | 0 | 1 | $C_{4k}$ | | X | X | 0 | X | 1 | 0 | $B_{4k}$ | | X | X | 1 | X | 1 | 0 | $D_{4k}$ | | X | X | X | 0 | 1 | 1 | $B_{4k}$ | | X | X | X | 1 | 1 | 1 | $D_{4k}$ | simplyfing this last truth table as | $a_{4k-2}$ | $a_{4k-3}$ | $a_{4k}$ | |------------|------------|------------| | 0 | 0 | $f_{4k,1}$ | | 0 | 1 | $f_{4k,2}$ | | 1 | 0 | $f_{4k,3}$ | | 1 | 1 | $f_{4k,4}$ | where $$f_{4k,1} = A_{4k}\overline{A}_{4k-1} + C_{4k}A_{4k-1}$$ $$f_{4k,2} = A_{4k}\overline{B}_{4k-1} + C_{4k}B_{4k-1}$$ $$f_{4k,3} = B_{4k}\overline{C}_{4k-1} + D_{4k}C_{4k-1}$$ $$f_{4k,1} = B_{4k}\overline{D}_{4k-1} + D_{4k}D_{4k-1}$$ $$(3.51)$$ allows to derive an expression for the first look-ahead step $$a_{4k} = f_{4k,1}\overline{a}_{4k-2}\overline{a}_{4k-3} + f_{4k,2}\overline{a}_{4k-2}a_{4k-3} + f_{4k,3}a_{4k-2}\overline{a}_{4k-3} + f_{4k,4}a_{4k-2}a_{4k-3}$$ (3.52) By repeating iteratively this procedure, truth tables for the third and four look ahead steps can be respectively obtained as | $a_{4k-3}$ | $a_{4k-4}$ | $a_{4k}$ | |------------|------------|------------| | 0 | 0 | $g_{4k,1}$ | | 0 | 1 | $g_{4k,2}$ | | 1 | 0 | $g_{4k,3}$ | | 1 | 1 | $g_{4k,4}$ | and | $a_{4k-4}$ | $a_{4k-5}$ | $a_{4k}$ | |------------|------------|------------| | 0 | 0 | $h_{4k,1}$ | | 0 | 1 | $h_{4k,2}$ | | 1 | 0 | $h_{4k,3}$ | | 1 | 1 | $h_{4k,4}$ | where $$g_{4k,1} = (A_{4k}A_{4k-1} + C_{4k}\overline{A}_{4k-1})A_{4k-2} + (B_{4k}C_{4k-1} + D_{4k}\overline{A}_{4k-1})\overline{A}_{4k-2}$$ $$g_{4k,2} = (A_{4k}B_{4k-1} + C_{4k}\overline{B}_{4k-1})C_{4k-2} + (B_{4k}D_{4k-1} + D_{4k}\overline{D}_{4k-1})\overline{C}_{4k-2}$$ $$g_{4k,3} = (A_{4k}A_{4k-1} + C_{4k}\overline{A}_{4k-1})B_{4k-2} + (B_{4k}C_{4k-1} + D_{4k}\overline{C}_{4k-1})\overline{B}_{4k-2}$$ $$g_{4k,4} = (A_{4k}B_{4k-1} + C_{4k}\overline{B}_{4k-1})D_{4k-2} + (B_{4k}D_{4k-1} + D_{4k}\overline{D}_{4k-1})\overline{D}_{4k-2}$$ $$(3.55)$$ and $$h_{4k,1} = g_{4k,1}A_{4k-1} + g_{4k,3}\overline{A}_{4k-1}$$ $$h_{4k,2} = g_{4k,2}B_{4k-1} + g_{4k,4}\overline{B}_{4k-1}$$ $$h_{4k,3} = g_{4k,1}C_{4k-1} + g_{4k,3}\overline{C}_{4k-1}$$ $$h_{4k,1} = g_{4k,2}D_{4k-1} + g_{4k,4}\overline{D}_{4k-1}$$ (3.56) all of which allow deriving the following expression for the fourth look ahead step $$a_{4k} = h_{4k,1}a_{4k-5}a_{4k-4} + h_{4k,2}\overline{a}_{4k-5}a_{4k-4} + h_{4k,3}a_{4k-5}\overline{a}_{4k-4} + h_{4k,4}\overline{a}_{4k-5}\overline{a}_{4k-4}$$ (3.57) Fig. 3.21 illustrates the implementation diagram corresponding the the above presented model for the estimation of the output a[4k] based on a 4 look ahead technique. As seen, the system is implemented using 15 2-to-1 multiplexers, also it should be notice that one step of look ahead requires one column of multiplexers. Regarding the delay of this reformulated stage, the highlighted red path denotes the inner loop that defines $T_{bound}$ , which in this case is reduced to $\frac{3}{5}T_{mux}$ . In general, an L-to-1 multiplexer loop implies M-1 look ahead pipelined stages. Moreover, the total overhead logic is (M-1)L 2-to-1 multiplexers, which together with the L-to-1 multiplexer result in a total of ML-1 multiplexers. Also, its iteration bound is expressed as [26] $$T_{bound} = \frac{(log_2L) + 1)}{(M + (log_2L) - 1)} T_{mux}$$ (3.58) Figure 3.21: 4-stage look ahead pipelining of a 4-to-1 multiplexer loop The overall hardware complexity of P-parallel stages of an L-to-1 multiplexer loop amounts to (ML-1)P. In the present case, this means that the 4-parallel stages implementing each one 4-to-1 multiplexer loop require 60 multiplexers. Fig. 3.22 shows the complete diagram of the 4-Parallel DFE. It is important to notice that parallelizing the structure does not increase $T_{bound}$ , because it is always defined by the inner loop (red highlighted path in Fig. 3.21) within each single stage. Nonetheless, it should be considered the fact that after optimizing the system so that the throughput is not limited by the DFE architecture, it can still be worst than the expected $\frac{3}{5}T_{mux}$ , mainly due to timing constraint issues related to the FPGA implementation, for instance routing time between registered devices or within combinatory logic blocks. #### Error Estimation and Adaptation Methods Once we have completely developed the DFE architecture by integrating both the FF and FB filters, we can proceed to describe the error estimation and the adapting algorithms. In the previous section a LMS algorithm based on the CM and DD error estimation algorithms was presented as an alternative for adapting the system. This section is aimed to present to the reader its implementation. In this sense let us begin by describing the hardware implementation of the error Figure 3.22: 4-Parallel DFE Architecture estimation CM and DD algorithms for the DFE. Fig.3.23 shows the Error Computation Block, as detailed in this diagram this block computes the error using the FF and the FB output samples, which due to their parallel structure require a parallel implementation of the error estimation algorithms. Now, as the reader may recall, the error computation based on the CMA was defined as $$\varepsilon_k = Y_{Sk} \cdot (|Y_{Sk}|^2 - \gamma^2) \tag{3.59}$$ where $Y_{Sk}$ corresponds to the FF output signal while $\gamma$ is the CMA dispersion Figure 3.23: DFE Block Diagram constant defined as $$\gamma = \frac{E\{Y_{Rk}^4\}}{E\{Y_{Rk}^2\}} \tag{3.60}$$ where $Y_{Rk}$ corresponds to the sampled input signal of the FF filter. According to these expressions, the CMA algorithm uses the dispersion constant as a reference value for computing the error and therefore to adapt the system. In our case, given that we are transmitting a 2-PAM signal it is possible to use its amplitude as the adapting criterion. In other words, we can assume that the amplitude of the received samples $Y_{Rk}$ takes values within the set $\{-1,1\}$ , so that according to Eq. (3.60), $\gamma$ takes its values within this same set. As a consequence, the error estimation based on the CMA is redefined as $$\varepsilon_k = Y_{Sk} \cdot (|Y_{Sk}|^2 - V_p^2) \tag{3.61}$$ where $V_p$ corresponds to the desired peak voltage and hence takes its value within the set $\{-1,1\}$ . On the other hand, the error estimation for the DD algorithm was defined as $$\varepsilon_k = \widehat{x}_k - a_k \tag{3.62}$$ where $a_k$ is the decided symbol that in previous sections was denoted as $x_k$ while $\hat{x}_k$ corresponds to the already equalized signal just before the slicer inside the DFE. As the reader may notice, the fact that $\hat{x}_k$ is required implies having access to the pre-equalized but not yet decided signals from the Decision Block shown in Fig. 3.22. However, considering potential timing constraints issues related to signal routing together with the fact that the FB tap coefficients and previous decided symbols $a_k$ were already available within the Error Computation block made easier and more convenient to just re-estimate $\hat{x}_k$ . In this regard, Fig. 3.24 shows the $j^{th}$ row of the parallel error computation block that estimates the above defined algorithms. Figure 3.24: $j^{th}$ parallel stage of the Error Estimation Block Regarding the previous diagram, there are a few considerations that are important for the reader to notice. First of all, the fact that the Direct Decision block maps (using a multiplexer) the decided symbols $a_k$ into $V_p$ to compute the error, this convertion is required because the decided symbols take digital values within the set $\{0,1\}$ and therefore they cannot be directly used to adapt the equalizer. Second of all, it should be noticed that the Constant Modulus Block receives as input the signal provided by the Pre-Equalized Symbols Estimator, which would disagree with the definition given by Eq. (3.61), if it were not for the fact that the FB coefficients, $b_1$ and $b_2$ , are reset to zero during the blind stage of the adaptation process, thus the signal at the output of this block is actually $y_{Sk}$ , the received symbol from the FF filter output. The switching from blind to decision directed is made depending on the value of the MSE. In fact, the prototype has a monitoring block that, once the MSE reaches a certain low threshold, switches automatically from blind to DD operations, this block can also restart the adapting process by returning to blind adaption when it determines that the MSE has increased beyond a high threshold. As a result the system behaves as a comparator with hysteresis. Another important implication derived from the model presented in this section, lies in the pipelining delay introduced throughout the system. In particular, the Error Estimation Block delay $D_{EC}$ is equal to 6 delay units, which actually is equivalent to 24 symbol periods, while the delay of the FF and FB filters, denoted as, $D_{FF}$ and $D_{FB}$ , amounts to 9 and 4 delay units, respectively. In the end, the overall pipelining delay, which is 76 delay units equivalent to 324 symbol periods, hinders the possibility of updating the coefficients every symbol period by preventing computing the error signal as fast as the symbol rate. As a consequence, an approximate LMS algorithm must be implemented. #### Delayed Least Mean Square Algorithm Adapting the DFE by means of a DLMS implies estimating the optimum coefficients set using a recursive algorithm based on several error computations as well as input and output samples from the FF and FB filters. As a result, during the data accumulation and processing the equalizer optimization is "frozen". The following expressions define the delayed estimation of the new equalizer tap coefficients based on the DLMS algorithm [29] $$c_{i}^{e}[(4k+1)S_{L}] = c_{i}^{e}[4kS_{L}] - \frac{\mu}{S_{L}} \sum_{j=0}^{S_{L}-1} \varepsilon[4kS_{L} + j - D_{T}]Y_{R}^{e}[4kS_{L} + j - i - D_{T}]$$ $$c_{i}^{o}[(4k+1)S_{L}] = c_{i}^{o}[4kS_{L}] - \frac{\mu}{S_{L}} \sum_{j=0}^{S_{L}-1} \varepsilon[4kS_{L} + j - D_{T}]Y_{R}^{o}[4kS_{L} + j - i - D_{T}]$$ $$b_{i}[(4k+1)S_{L}] = b_{o}[4kS_{L}] - \frac{\mu}{S_{L}} \sum_{j=0}^{S_{L}-1} \varepsilon[4kS_{L} + j - D_{T}]a[4kS_{L} + j - i - D_{T}]$$ $$(3.63)$$ where $c_i^e$ and $c_i^o$ correspond to the even and odd FFF taps respectively, $b_i$ are the FBF taps, i denotes the number of coefficients of each DFE stage (8 even and 8 odd for the FFF and 2 for the FBF), $S_L$ is the number of 4k periods between coefficient updates, $D_T$ is the total delay given by the sum of the above mentioned delays $D_{FF}$ , $D_{FB}$ , $D_{EC}$ and $D_{UP}$ , which is the delay of the update process itself. Finally, $\mu$ is the step-size of the LMS algorithm. The second term on the expressions 3.63 corresponds to a gradient estimation averaged over $S_L$ symbols. In order to implement the DLMS as a pipelined system, the averaged gradient estimation must be redefined so that the DFE parallelization factor is taken into account when defining the number of $S_L$ symbols. Accordingly, we begin by rewriting its expression considering that $S_L = 4s$ $$\nabla \overline{G}[4k4s] = \frac{\mu}{4s} \sum_{j=0}^{4s-1} \varepsilon [16ks + j] sym[16ks + j - i]$$ $$= \frac{\mu}{4s} \sum_{\alpha=0}^{s-1} f[16ks + 4\alpha]$$ (3.64) where f is referred as inner summation and is expressed by $$f[x] = \sum_{j=0}^{3} \varepsilon[x+j]sym[x+j-i]$$ (3.65) where sym denotes either $Y_R^e(k)$ , $Y_R^o(k)$ or a(k). The reader should notice that for implementation purposes the delay, referred in this case as $D_T$ , was ignored again. Figure 3.25: Implementation diagram of the DLMS Tap Estimation (I) The implementation diagram of the inner summation 3.65 for the $i^{th}$ tap coefficient is shown in Fig. 3.25, where the exploitation of the DFE parallel structure is evidenced by the concurrent use of the available parallel samples, also it should be noticed that the resulting delay $D_1$ is equal to 12 symbol periods<sup>3</sup>. The DLMS algorithm implementation is completed by executing the outer summation defined in Eq. (3.64) and rearranged for this purpose as follows $$\nabla \overline{G}[16ks] = \frac{\mu}{4s} \sum_{\alpha=0}^{s-1} f[16ks + 4\alpha]$$ $$= \frac{\mu}{4s} \sum_{\alpha=-1}^{s-2} f[16ks + 4\alpha] + \frac{\mu}{4s} (f[16ks + 4(k-1) - f[16ks + 4])$$ (3.66) therefore, we can recursively estimate $\nabla \overline{G}$ as $$\nabla \overline{G}[16ks] = \nabla \overline{G}[16ks - 4] + \frac{\mu}{4s} (f[16ks + 4(k - 1) - f[16ks + 4])$$ (3.67) The block diagram that implements this last expression is shown in Fig. 3.26. This second and last block completes the description of the DLMS algorithm implementation, the reader should notice that the system has one DLMS block for each tap coefficient, for a total of 18 updating tap units that together conform the DLMS Optimizing Unit. Figure 3.26: Implementation diagram of the DLMS Tap Estimation (II) The overall delay $D_{up}$ introduced by the DLMS optimizing unit is $$D_{up} = D_1 + D_2 = 12 + 4 = 16 (3.68)$$ where $D_1$ and $D_2$ denote the delays corresponding to the first and second implementation blocks presented above. <sup>&</sup>lt;sup>3</sup>As the reader may remember each logic delay $z^{-1}$ is equivalent to 4 symbol periods ## 3.5 Summary Chapter 3 presented the 1Gbit/s Media Converter implementation. In particular, the discussion started by defining the system to be implemented in correspondence with the 802.3 Gigabit Ethernet Protocol, this was followed by the description of the methodology used to prototype the system on FPGA. Then, the formulation of the equalizing schemes and the required adapting algorithms was addressed in the first part of section 3.4.1 while, its second part was devoted to the analysis that led to the adaptation of the mentioned algorithms in accordance with the features of the available hardware. The results of this study are summarized in Fig. 3.27, as seen, this figure shows the general block diagram of the adaptive and blind DFE and details the equations, derived in the present Chapter, that define the operation and hardware implementation of the each of the blocks that constitute the equalizer. Figure 3.27: Summarized Block Diagram of the Adaptive and Blind DFE Chapter 4 presents the experimental results obtained from an extensive testing and debugging campaign held to validate the operation of the first 1Gbit/s Media Converter prototype. # Chapter 4 # Media Converter: DFE Experimental Results This chapter presents the most relevant results of an intensive measurement campaign aimed to evaluate the Media Converter performance. More in detail, the dissertation starts by presenting the experimental set up used to test and characterize the equalizing schemes, then the main results concerning power optical margin guaranteed by the Media Converter are presented. In particular, the benefits of implementing a full DFE are evidenced. This is followed by results regarding the convergence time of the adapting algorithm, tolerance of the system to fiber bendings and the logic resources and FPGA area utilization. # 4.1 Experimental Set Up The experimental set up used for validating the Media Converter is shown in Fig. 4.1. As seen, it is constituted by two BITSIM boards configured with the PCS and PMA sublayers described in sections 3.3 and 3.4 respectively. These boards present two connection interfaces, in one side an Ethernet port that is connected with an UTP CAT 5 cable to an Agilent N2X Router Tester capable of establishing 1 Gbit/s full duplex communication, while on the other side, these boards have in transmission an ADC that is connected by means of an SMA to a limiting amplifier which, as detailed in the diagram, squares the generated output signal and in reception they have a DAC connected also by means of an SMA to an anti-aliasing filter with a cut frequency of 1 [GHz]. Finally, the communication loop is closed by the POF Channel described in Chapter 2, which in this case is composed by the Graviton SPD-2 described in section 2.3 as optoelectronic receiver, 50 meters of PMMA-SI POF described in section 2.1 and the 650 [nm] Firecomms RC-LED described in section 2.2. It is important to notice that for this Figure 4.1: Experimental Set Up for evaluating the 1Gbit/s Media Converter experimental validation the Media Converter did not have yet a Clock Recovery System, therefore an external clock at 1.1 [GHz] was bypassed to both BITSIM boards. # 4.2 Optical Power Margin, BER and PER measurements In order to determine the optical power margin provided by the Media Converter and also to understand the implications of the DFE architecture the system was tested configuring the Agilent N2X Router Tester to establish a 1Gbit/s full duplex link based on Ethernet packets of random length with PRBS sequences as data loads. During the experiment, these Ethernet packets were sent to the Media Converter, where they were first codified by the PCS sub-layer and then transmitted serially through the POF channel. On the other end, the received signal was equalized, then the packets were reconstituted and at the same time the BER was measured inside the FPGA by an embedded BER Tester, finally, the reconstituted Ethernet packets were sent back to the Router Tester that was able to measure the packet error rate (PER). Additionally, and in order to study the advantages of implementing a complete DFE instead of only a FF Equalizer (FFE), the system was designed so that it could operate in full mode including both filters or in partial mode using only the Figure 4.2: BER curves for FF and DFE after 50m of PMMA-SI POF FF filter. The plot in Fig. 4.2 compares the curves of BER versus received optical power for the Media Converter operating in full mode, partial mode and in back to back, i.e with a few centimeters of PMMA-SI POF. The received optical power after 50 [m] of PMMA-SI POF was measured as -9.5 dB, which according to these BER curves means that fully mode operation (complete DFE) guarantees a power margin before FEC of 7 dB, thus improving by around 0.5 dB the result obtained with partial mode operation. Figure 4.3: BER curves for FF and DFE after 75m of PMMA-SI POF Following this first experiment, the 50 [m] PMMA-SI POF spool was substituted by a longer one of 75 [m] and the measurement was repeated. Fig. 4.3 shows again a comparison of the resulting curves of BER versus received optical power for full mode, partial mode and back to back operation. The received optical power without any extra attenuation was measured as -13.5 dB. Although the power margin was reduced to 2.5 dB for full mode operation and 2 dB for partial mode, the improvement of 0.5 dB was maintained. The most important results from this first measurement are summarized in Table 4.1. | POF [m] | Optical Received Power dBm | Power Margin dBm | | | |---------|----------------------------|------------------|-----|--| | | | FF | DFE | | | 50 | -9.5 | 6.5 | 7 | | | 75 | -13.5 | 2 | 2.5 | | Table 4.1: Performance Results for 50m and 75m of PMMA-SI POF ### 4.3 PCS Validation Figure 4.4: (a) BER and (b) PER vs Received Optical Power After validating the DFE, we proceed to evaluate the PCS performance by measuring both the BER and Packet Error Rate (PER) versus the received optical power for transmission over 50 [m] of POF at full rate (1Gbit/s), these measurements were aimed to confirm that the system was capable of correcting errors for signals with BERs under $10^{-4}$ . The resulting curve for the BER measurement is shown in Fig. 4.4 (a) while the PER curve is shown in Fig. 4.4 (b). The received optical power without extra attenuation was again measured as -9.5 dB, so that according with the obtained curves of BER and PER, the optical power margin guaranteed by the Media Converter was measured as 6.5 dB and 6.75 dB respectively. The difference of 0.25 dB can be explained by the fact that the Ethernet packets have as part of their frames a Cyclic Redundancy Code (CRC) field that allows correcting a certain number of bit errors per byte in a packet, which then contributes to reduce the PER. Additionally, the Router Tester allowed to measure the overall system latency as $30\mu$ s. #### 4.4 Convergence time of the DLMS Algorithm As mentioned in section 3.4.2, the convergence of the DLMS algorithm based on the CM and DD algorithms is evaluated by means of the MSE, so that when its value is lower than a certain predefined threshold then it is considered that the equalizer has been successfully adapted. For the present system, it was defined by means of off-line processing simulations that the DLMS reaches blind convergence when the MSE is equal to 0.2 [a.u] <sup>1</sup>. Consequently, the DLMS block described also in section 3.4.2 was configured to switch from blind to decision directed adaptation when the MSE is equal to 0.2 [a.u.]. Figure 4.5: Experimental measurement of the MSE <sup>&</sup>lt;sup>1</sup>a.u. stands for arbitrary units In order to be able to evaluate the convergence of the DLMS algorithm, it was necessary to implement a block inside the FPGA in charge of measuring the MSE and store the resulting values in an internal RAM memory. Then, after completing the test, the collected data were downloaded and analyzed. Fig. 4.5 shows the resulting curve of the MSE value versus the number of stored symbols. As seen, the two DFE adaptation stages can be recognized, the first one being the blind stage executed until the MSE is greater than 0.2 [a.u] and the second one being the directed decision stage that improves upon the MSE obtained previously achieving and maintaining a value of 0.05 [a.u]. Regarding the convergence time of the DLMS algorithm, it should be considered that due to memory limitations it was impossible to store the MSE every clock cycle, therefore the signal was down-sampled and stored at a frequency of 100 [MHz], accordingly the convergence time of the system can be estimated as $$conv_{time} = T_S \cdot N_{symbols} = \frac{1}{100 \text{MHz}} \cdot 1950 = 19.5 \mu s$$ (4.1) where $T_S$ denotes the storing period and $N_{symbols}$ denotes the number of symbols required for convergence. #### 4.5 Media Converter Tolerance to Fiber Bendings Figure 4.6: Experimental Set Up for evaluating the system tolerance to Fiber Bendings As aforesaid in Chapter 1, the POF system is intended for installation in rough environments. In this sense and with the purpose of evaluating its strength and tolerance to physical stress, several 90° fiber bends with radius of 14 [mm] were placed at different points of the POF spool. The set up for this measurement is illustrated in Fig. 4.6. Figure 4.7: BER versus Number of Fiber Bendings During this experiment it was noticed that the POF channel behaves differently depending on the initial location from which the fiber is bent i.e. transmitter or receiver side, so that as shown by Fig. 4.7 the number of bendings has an slightly stronger impact on the system when they are located towards the transmitter. In the end, it was concluded that the system tolerates up to 20 bendings. #### 4.6 FPGA Logic and Area Utilization | Logic | FPGA Resources | FFE | | DFE | | |------------|----------------|-------|-----|-------|-----| | Slice | 15360 | 12819 | 83% | 13341 | 87% | | Xtreme DSP | 192 | 137 | 71% | 137 | 71% | | DCM | 8 | 4 | 50% | 4 | 50% | Table 4.2: FPGA Resources and Area Utilization Regarding the hardware implementation, Table 4.2 lists the resources utilization for both partial (FFE) and full (DFE) mode operation. In particular, the total area used is represented by the logic referred as "Slices", while Xtreme DSP corresponds to special slices within the FPGA that contain an 18bits x 18bits multiplier, an adder and an accumulator. More in detail, these DSP slices are used for executing high speed pipelined operations such as the convolution required for implementing the equalizer. DCM stands for Digital Clock Manager, it is a logic core from Xilinx that is used to implement delay locked loops to regenerate inside the FPGA the clock provided externally. Anyway, this data shows that implementing the complete DFE implies only 4% more area, which is very convenient considering that it improves the optical power margin 0.5 dB. #### 4.7 Summary This Chapter presented the validation results obtained with the first prototype of the 1Gbit/s Ethernet Media Converter. In particular, it was demonstrated that the system guarantees 6.5 dB of optical power margin for a POF Channel integrated with a PMMA-SI POF spool of 50 [m] and 2.5 dB of optical power margin for 75 [m]. Moreover, it was shown how implementing a complete DFE, which implies only 4% more FPGA area utilization, improves the optical power margin by 0.5 dB. Additionally, the proper operation of the PCS sublayer and the DLMS algorithm was demonstrated, having for the latter a total convergence time of 19.5 [ $\mu s$ ] and a MSE value of 0.05 [a.u]. Finally, it was demonstrated that the system tolerates up to 20 90° fiber bends with radius of 14 [mm]. As evidenced in section 4.1, the experimental set up used for validating the Media Converter considered an external clock that was bypassed to both BITSIM boards. However, in order to have a full Media Converter, the system had to be able to synchronize its clock at the receiver with the clock at which the received data were transmitted. Consequently, the last part of POF-PLUS was dedicated to implement a timing recovery system. The following Chapter presents to the reader its design and development. ### Chapter 5 ## Clock Recovery System This chapter presents the design and implementation architecture of the timing recovery system for the 1Gbit/s Media Converter. In particular, it begins by introducing the concept of clock synchronizers and then proceeds to present its two main categories, i.e. FF and FB synchronizers. This is followed by a description of the chosen architecture for implementing the Clock Recovery System, in which the timing error detector, the loop filter and a DAC based on $\Delta - \Sigma$ modulator are presented. #### 5.1 Clock Synchronizers As the reader may recall, in Chapter 3 the 2-PAM signal transmitted through the POF channel was defined at the receiver side as $$Y_R(t) = \sum_{n = -\infty}^{n = \infty} x_n p(t - \varepsilon T) + v(t)$$ (5.1) where $x_n$ denotes the transmitted 2-PAM symbols, $\varepsilon T$ is an unknown time delay introduced by the channel $(-1/2 < \varepsilon < 1/2)$ , and v(t) is the inherent additive colored Gaussian noise introduced during the optoelectronic conversion. The digital sequence, transmitted by means of the 2-PAM signal, is recovered at the receiver by the DFE described in Chapter 4. In order to maximize noise immunity $Y_R(t)$ must be sampled at instants of maximum eye opening, referred as optimum sampling instants, their individuation implies adjusting the phase of the sampling clock according to $\varepsilon T$ . For this purpose the receiver must contain a Clock synchronizer, which is a device that makes the estimation $\hat{\varepsilon}$ of the mentioned delay. However, due to the fact that the transmitted 2-PAM signal is composed by random symbols, it does not have periodic components and therefore an ordinary PLL cannot be used to generate a clock which is in synchronism with the received sequence [30]. Bearing this in mind, we will now proceed to present two of the main types of clock synchronizers, which are categorized depending on their architecture schemes as feedforward and feedback synchronizers. #### 5.1.1 Feedforward Synchronizer Figure 5.1: Feedforward Synchronizer Architecture Fig. 5.1 shows the general diagram of a feedforward synchronizer. As seen, the received signal, denoted as Rx Data, is first processed by the timing detector, which generates an instantaneous timing estimation $\varepsilon$ . Then a low pass filter, denoted in the figure as averaging filter, receives $\varepsilon$ and yields an averaged timing estimate $\widehat{\varepsilon}$ that is finally used to drive a reference signal generator, such as a VCO, that provides the sampling clock to the ADC [30]. #### 5.2 Feedback Synchronizer Also referred as error-tracking synchronizer its general diagram is shown in Fig. 5.2. As detailed, the received signal Rx Data and the resulting sampling clock are compared by means of the timing error detector (TED), which yields an error signal that, after being averaged by the loop filter, drives a signal generator that provides the sampling clock that goes to the ADC and also closes the feedback loop. Hence, Figure 5.2: Feedback Synchronizer Architecture it is evident that the error-tracking synchronizers made use of the PLL concept to derived from the received signal a sampling clock [30]. In addition to these two categories, classifications based on other criteria can be made. For instance, if the synchronizer uses the decided DFE output to produce a timing estimate, then it is defined as decision directed, otherwise, it is non-data aided. Moreover, it can be categorized depending on its operation domain, i.e. analog or digital, as being a continuous or discrete time system. For the present case, it was decided to implement a non-data aided Error Tracking Synchronizer by means of a hybrid analog-digital architecture. This decision was taken after evaluating the eventual advantages and disadvantages that an all-digital hardware implementation, as the one shown in Fig. 5.3, may present. This solution, based on a Numerically Controlled Oscillator (NCO), a decimator and an interpolator, was discarded mainly because it implies that if the receiver sampling clock is faster than the incoming sampling rate, then at some point an extra sample is provided by the ADC, so that it is necessary to discard it, on the other hand, if the receiver sampling clock is slower than the incoming data rate, then the system avoids losing a sample by disabling the decimation after the interpolator [31]. However, given that for the present case it is required to recover synchronism based on one sample per symbol, such a technique is not implementable because losing a sample would give place to a sampling condition in which the Nyquist Criterion is not respected. Figure 5.3: Digital Feedback Synchronizer Architecture ## 5.3 Hybrid Implementation of an Error Tracking Synchronizer The general diagram of the hybrid architecture is shown by Fig. 5.4. As seen, the system is implemented in both the digital and the analog domains and its architecture corresponds to the previously described feedback schemes. More in detail, the resulting clock recovery system had to comply with a couple of very specific requirements that constrained its design. Specifically, it had to be able to recover synchronism by exploiting the only two available samples per symbol provided by the on-board ADC, and also it had to close the loop and generate the clock using an analog Voltage Controlled Oscillator (VCO) that generates a signal in the vicinity of 1.1 [GHz]. As the reader may have noticed from this diagram, the clock provided to the ADC has a frequency of 1.1GHz, while the sampling rate required is twice this value, this is so because the ADC operates in Double Data Rate (DDR) mode, which means that it samples the incoming signal in both the rising and falling edges of the clock. The first of these two requirements explains the presence of the Mueller and Muller (M&M) TED [32], categorized as a discrete-time error tracking synchronizer, it operates as the Phase Detector in an analogue Phase-Lock Loop (PLL), so that it estimates the delay difference between the received PAM signal and the sampling Figure 5.4: Hybrid Clock Recovery Architecture clock. On the other hand, the presence of the VCO and the inherent requirement of driving it using an analog control signal together with the fact that the BITSIM FPGA had no more DACs available, explains the presence of the $\Delta - \Sigma$ modulator and the following Resistor-Capacitor (RC) filter, these two devices coupled together operate as a DAC that uses as output a single CMOS digital FPGA pin. In the following, the design and implementation of each block composing this hybrid architecture is presented. #### 5.3.1 Mueller and Muller TED The M&M TED is implemented according to the timing recovering methods proposed in [32]. Typically defined as a decision-directed synchronizer, its conventional implementation diagram [30] as part of a clock recovery system is shown in Fig. 5.5. As seen, this device derives the delay $\hat{\epsilon}T$ by estimating the error $e_k$ between the equalized PAM signal $Y_S(t)$ and the decided symbols $a_k$ , so that assuming sample times t = kT we have that the error for the $k^{th}$ symbol is expressed as [30] $$e_k = a_{k-1}Y_S(kT + \widehat{\varepsilon}T) - a_kY_S((k-1)T + \widehat{\varepsilon}T)$$ (5.2) As the reader may have noticed, the diagram shown in Fig. 5.5 differs from the Figure 5.5: M&M Typical Architecture Implementation architecture proposed in Fig. 5.4 in the way in which $e_k$ is derived, so that instead of estimating the delay based on a pre-equalized signal and its corresponding decided symbol, the architecture proposed bases the estimation on the received odd or even samples $Y_R^{e,o}$ and their corresponding signs, which redefines the M&M expression as $$e_k = sign_{k-1} Y_R^{e,o}(kT + \hat{\varepsilon}T) - sign_k Y_R^{e,o}((k-1)T + \hat{\varepsilon}T)$$ (5.3) where $sign_k$ denotes the sign of the $k^{th}$ sample of $Y_R^{e,o}$ . For further details regarding the M&M formulation the reader is referred to [32]. Figure 5.6: M&M Serial Block Concerning the hardware implementation of the M&M TED, Fig. 5.6 shows its general diagram as proposed by [32] and in correspondence with Eq. (5.3). As detailed, the system requires two sign slicers, one adder and two multipliers, which are actually implemented using one multiplexer and one inverter. Moreover, it should be noticed that this diagram correspond to one M&M stage and that the parallel structure that has been analyzed throughout this document is once again exploited by integrating 4 M&M stages as illustrated in Fig 5.7. Figure 5.7: M&M Parallel Implementation As seen, the error estimations yielded by the parallel M&M TED are decimated. Decimation consists on synthesizing a single down-sampled signal by first averaging a set of sampled signals, process that makes it more accurate and reliable than conventional down-sampling, in which usually the data rate is decreased by simply discarding a certain number of samples. Fig. 5.8 shows the parallel implementation of the decimator block as a cascaded integrator-comb (CIC) filter as proposed in [33]. As a consequence, implementing the Loop Filter becomes relatively easier because it is no longer necessary to use a parallel architecture and it is also easier to avoid using FIR filters, which given the difference in terms of frequency between the FPGA clock (275MHz) and the bandwidth of the loop filter<sup>1</sup>, results in very long filters and hence in a higher delay that is added to the inherent delay introduced by the pipelined architectures required to implement the system. <sup>&</sup>lt;sup>1</sup>in the order of KHz for high precision PLLs as is this case Figure 5.8: Decimator Parallel Implementation In the end, all efforts were aimed to guarantee the stability of the PLL by optimizing the hardware architectures and therefore reducing as much as possible the overall delay. For further details regarding the operation and implementation of the decimator block the reader is referred to [34], [33]. Next the design and implementation of the Loop Filter is presented. #### 5.3.2 Loop Filter Figure 5.9: PLL Diagram After introducing the M&M TED and in particular the decimation that is performed at its output, the digital implementation of the Loop Filter as part of the clock synchronizer can be considered, for analysis purposes, as if it were an analog PLL operating at symbol frequency. Accordingly, let us begin the analysis by considering the PLL diagram shown in Fig. 5.9 and its transfer function given by [31] $$H(s) = \frac{\theta_o(s)}{\theta_i(s)} = \frac{K_d K_o F(s)}{s + K_d K_o F(s)}$$ (5.4) where $\theta_o(s)$ and $\theta_o(s)$ represent the phase of the VCO and of the incoming signal respectively, $K_d$ is the gain of the TED in [V/rad], F(s) is the transfer function of the loop filter, and $K_o$ is the gain of the VCO. For the purposes of this project, it was decided to implement a second order PLL capable of tracking the phase and frequency deviations of the incoming signal with respect to the clock generated by the VCO, such a device is obtained by designing the loop filter under the form of an integrator. Accordingly, its resulting transfer function can be expressed as [31] $$F(s) = \frac{\tau_2 s + 1}{\tau_1 s} \tag{5.5}$$ where $\tau_1 = R_1 C$ and $\tau_2 = R_2 C$ . Now, by operating Eq. (5.5) we can obtained the following expression $$F(s) = -\left[K1 + \frac{K_2}{s}\right] \tag{5.6}$$ where $\tau_1 = R_1 C$ and $\tau_2 = R_2 C$ , Eq. (5.5) and Eq. (5.6) lead to the active filter and its equivalent s-domain implementation shown in Fig. 5.10. Figure 5.10: Loop Filter Implementation as (a) an Active Filter (b) equivalent s-domain Diagram Now, substituting Eq. (5.6) in Eq. (5.4) the transfer function of the PLL becomes, $$H(s) = \frac{K_d K_o (K_2 + K_1 s)}{s^2 + s K_d K_o K_1 + K_d K_o K_2}$$ (5.7) from which the loop gain can be derived as $$K = K_d K_o K_1 \tag{5.8}$$ equivalently, Eq. (5.7) can be expressed in terms of the natural frequency $\omega_n$ and damping factor $\zeta$ as [31] $$H(s) = \frac{2\zeta\omega_n s + \omega_n^2}{s^2 + 2\zeta\omega_n s + \omega_n^2}$$ (5.9) where $$\omega_n = \sqrt{\frac{K_d K_o}{\tau_1}} = \sqrt{K_d K_o K_2} \tag{5.10}$$ and $$\zeta = \frac{\tau_2 \omega_n}{2} \tag{5.11}$$ The set of equations Eq. (5.9), Eq. (5.10), Eq. (5.11) can be used to design the PLL. Usually the damping factor $\zeta$ is defined as 0.707, and then the rest of variables are defined accordingly. The time-response of the system is directly proportional to the damping factor and therefore it contributes strongly to the system stability. For further details in this respect the reader can consult [30] and [31]. Another important fact regarding the design of the Loop Filter is that the gains of both the M&M and the VCO must be known. As aforementioned, the system was first implemented in a simulative environment by means of MATLAB/Simulink, during this phase, the s-curve of the M&M TED was derived and is shown in Fig. 5.11. This curve presents the estimated phase error as a function of the phase difference when the system is operating in open loop, and therefore it is possible to obtain the gain $K_d$ by estimating the slope of the curve in the vicinity of the zero crossing point, this analysis yielded a value of 0.35 [V/rad]. On the other hand, a commercial Voltage Controlled Crystal Oscillator VCXO SI550 from Silicon Labs was used to generate the sampling clock, according to laboratory tests this device has a gain $K_o$ of 99 [KHz/V]. #### **Digital Transformation** In order to implement the loop filter in the FPGA it is necessary to transform it from the analogue domain F(s) into the digital domain F(z). More in detail, the bilinear transformation is applied, so that the left side of the s-plane is mapped into the unit circle of the z-plane, therefore guaranteeing that any stable system Figure 5.11: S-curve of the M&M TED in the analogue domain is transformed into an stable digital system. The bilinear transformation is defined as [31] $$H(z) = H(s) \Longrightarrow s = \frac{2}{T_s} \frac{1 - z^{-1}}{1 + z^{-1}}$$ (5.12) where $T_s$ is the sample period. The Loop filter was first implemented in simulation using the architecture shown if Fig. 5.10 (b). Specifically, the gains $K_1$ and $K_2$ were implemented using shift registers instead of the more bulk and slower multipliers. However, later during experimentation and debugging with the FPGA, it was noticed that the digital arithmetic of the system resulted in a different loop gain. As a consequence, an investigation aimed to mitigate the effect of this lack of precision was carried out, leading to the results illustrated in Fig 5.12 (a) and (b), in which the value of $K_1$ and $K_2$ are shown in correspondence to the frequency response of the loop filter before and after being fitted to achieved the desired bandwidth of 3 [KHz]. In the end, the coefficients resulted in $K_1$ equal to 0.5569 and $K_2$ equal to 0.00001994 and they were approximated by a binary division (implemented with shift registers) as $2^{-1}$ and $2^{-13}$ respectively. The final implementation of the Loop Filter is shown in Fig. 5.13. Figure 5.12: Loop Filter Frequency response (a) before and (b) after compensating the effects of the digital arithmetic Figure 5.13: Loop Filter Final Implementation Diagram #### 5.3.3 $\Delta - \Sigma$ Modulator The $\Delta - \Sigma$ DAC was implemented based on the Xilinx application note number XAPP154 [35]. This document describes the modulator and provides a verilog template that can be adapted to the specific requirements of an application. The top level diagram from the system is shown in Fig. 5.14. As the reader may notice, the diagram shows a clock at 100MHz connected to the device, however, the system was adapted to operate at 275MHz (FPGA Clock). Also different values of resistance and capacitance were chosen to implement the external passive Figure 5.14: $\Delta - \Sigma$ modulator Top level implementation diagram filter. For more details regarding this device the mentioned application note should be consulted. After having described the Clock Recovery System, we will now proceed to present the results obtained with the fully 1Gbit/s Media Converter. #### 5.4 Summary This Chapter presented the design and implementation of a Clock Recovery System for the 1Gbit/s Media Converter prototype. More in detail, the system proposed does not consider pre-equalization of the received signal and it uses only one sample per symbol to synchronize the sampling clock with the data rate of the incoming signal. This is possible thanks to the adaptation of the Mueller and Muller algorithm presented in section 5.3.1. Additionally, a parallel architectures for implementing a CIC interpolator is presented and it is shown that as a consequence the loop filter can be implemented using a serial scheme. In this respect, section 5.3.2 presents the design of the loop filter based on the analysis of a regular PLL system while section 5.3.2 shows how it can be converted and implemented in the digital domain by means of the bilinear transformation. Furthermore, the results of an off-line processing analysis that allowed to define experimentally the bandwidth of the loop are presented. Specifically, it is shown that in order to obtain the desired value for the bandwidth it was necessary to fit the coefficients of the loop filter in correspondence to the measured gain of the system. Finally, section 5.3.3 presents the design and implementation of a $\Delta - \Sigma$ modulator which, given the unavailability of further DACs on the BITSIM board, was necessary to drive the VCXO SI550. The following Chapter presents the experimental results obtained with the fully engineered Gigabit Ethernet Media Converter. ## Chapter 6 ## Media Converter: DFE+CR Experimental Results This chapter presents the experimental results obtained with the fully Media Converter including the Clock Recovery System. More in detail, the first section analyzes the implications of substituting the Graviton-SPD2 by the 650 [nm] Firecomms receiver for integrating the definitve set up. Then, results concerning a characterization of the clock recovery system are presented. Finally, the optical power margin guaranteed by the complete and definitive POF System is given. #### 6.1 Firecomms 650nm Receiver In Chapter 2 it was mentioned that the Graviton-SPD2 would have been substituted by an specially designed and developed Firecomms Receiver. Before integrating the Clock Recovery system into the experimental set up, the Media Converter was tested with this new receiver to evaluate its performance. Accordingly, Fig. 6.1 shows a comparison of the back to back sensitivity corresponding to these two receivers. As seen, the Firecommes receiver has lower sensitivity than its counterpart, specifically -19.1 dB at a BER=10<sup>-9</sup> while the Graviton-SPD2 presents -21.1 dB. As a consequense, a comparison in terms of power margin after transmission over 50m of POF (see Fig 6.2) shows that the Firecomms receiver guarantees 4 dB of margin which is 2 dB lower than the level provided by the Graviton-SPD2. Once we are aware of this condition we can now proceed to present the results obtained with the complete system. Figure 6.1: Back-to-Back Sensitivity Comparison between the Graviton-SPD2 and the Firecomms Receiver Figure 6.2: Power Margin Comparison between the Graviton-SPD2 and the Firecomms Receiver for transmission over 50m of PMMA-SI POF #### 6.2 Testing the Clock Recovery System Figure 6.3: Clock Recovery Experimental Set Up A series of tests were performed to parametrize the Clock Recovery System. The first experiments were aimed to measure the holding window of the system, which is the range of frequencies for which the system is able to lock the clock, and also the jitter throughout this holding window. The experimental set up is shown in Fig. 6.3. As detailed, it consisted on a PRBS generator, the optoelectronic transmitter and receiver, the Media Converter implemented inside the FPGA, a BER tester and a real time osciloscope. The dynamic of the experiment was as follows: first a certain clock frequency was chosen, then a PRBS signal $2^{11} - 1$ was transmitted, the Media Converter was reset and then if the system was able to lock the clock and operate error free then the frequency was considered within the holding window and the jitter was measured directly from the eye diagram on the oscilloscope. Table 6.1 lists the results obtained, in particular it should be noticed that the jitter peak to peak corresponds to 15% of one symbol period (0.91 [ns]) while the RMS is only the 2.5%. | | Value Measured | Units | |----------------------|----------------|-------| | Holding Window Range | 320 | [KHz] | | Jitter RMS | 150 | [ps] | | Jitter Peak to Peak | 22 | [ps] | Table 6.1: Holding Window and Jitter Measurements Once the holding window was experimentally delimited, the convergence time was measured. To this end, a flag signal was generated inside the FPGA, so that when the error estimated by the M&M TED was bounded within certain values, that indicated an state of convergence, the flag was enabled. The resulting curve is shown in Fig. 6.4, where is evident that the convergence time is directly proportional to the frequency deviation. The fact that the frequencies are negative is just a matter of nomenclature because the reference and starting scanning point of the tracking algorithm is set to the far right limit of the holding window. Moreover, this results show that the convergence time for the chosen operating frequency is 55 [ms]. It should be noticed that this time can be reduced by either moving the operating frequency towards the starting scanning point or vice versa, start scanning in the vicinity of the operating frequency. Figure 6.4: Convergence Time of the Clock Recovery System #### 6.3 1Gbit/s Full Duplex Media Converter Figure 6.5: Fully Engineered 1Gbit/s Media Converter The fully engineered 1Gbit/s Media Converter is shown in Fig. 6.5. As seen, it is constituted by the BITSIM FPGA that implements the Media Converter, the RC-LED, the Firecomms receiver, a limiting amplifier, the clock module containing the VCXO, and a power supply. A final test to validate its operation with the Clock Recovery System and 50m of PMMA SI-POF was performed using once again the Agilent N2X Router Tester to integrate the experimental set up shown in Fig. 6.6. The Router Tester allowed to measured the overall delay of the system as $<30\mu$ s. Moreover, the system presented error free operation for transmission without extra-attenuation at the receiver, or in other words, with a received optical power of -9.5 dB. Finally, the curve of the BER as function of the received optical power is shown in Fig. 6.7, making evident that the Clock Recovery system does not introduce any penalty in terms of power margin, the system still guarantees 4 dB. This is the most important result of the project. Figure 6.6: Experimental Set Up for validate the Fully Engineered 1Gbit/s Media Converter Figure 6.7: BER vs Received Optical Power the Fully Engineered 1Gbit/s Media Converter ## Chapter 7 # Conclusions and Recommendations The present dissertation discussed the design and hardware implementation of a 1Gbit/s Media Converter for Ethernet transmission over 50[m] of PMMA-SI POF. More in detail, the implementation of this system based on DSP in FPGA was proposed and investigated in this thesis. The reason to design and implement the Media Converter as an embedded system arose from the premise, defined at the beginning of the project, that it had to be prototyped, as much as possible, in the digital domain. Moreover, as the reader may recall from Chapter 1, given the market applications for this kind of system, its robustness and adaptability to different conditions constitute inherent requirements. In this sense, the vision of the POF-PLUS project was to provide a cost-effective solution that enabled symmetrical broadband access to residential users. In order to realize this vision, Chapter 2 defined the problem by first describing the impairments inflicted to the transmitted signal by the POF communication system and then presenting the results of an investigation aimed to define suitable equalizing schemes for overcoming these impairments. In particular, a first equalizing architecture, composed by a FF filter followed by a DFE, was chosen and then used in off-line processing to equalize the transmitted signal, results for 4-PAM and 2-PAM led to the definition of the latter as the modulation format for the media converter prototype. Additionally, a first study of the number of taps suggested that using 16 FF and 2 FB taps for implementing the DFE resulted in an optical power margin before FEC of 4.5 dB. Chapter 3 presented in its first part the Ethernet Layers scheme of the Media Converter, and from there it defined the basic architecture of the system, so that the PCS sublayer, implemented according to the 10Gigabit Ethernet standard, was described. Then the PMA top level architecture was detailed. Then, the POF channel was analyzed, giving place to a model that evolved into the formulation of the equalizing and adapting algorithms, whose implementation was later addressed in the second part of the Chapter. More in detail, it was shown that implementing the FF filter implied the re-arrangement of the previously defined model, while for the DFE stage, it was necessary to substitute the more obvious FIR based solution for a look ahead pipelined architecture so that the system achieved the required throughput. Towards the end of the Chapter the implementation of the adaptive algorithm by means of a DLMS systems was presented. It should be noticed that this constituted the most important and critical part of the process, simply because even if the equalizer architecture operated properly, it was worthless without an effective adapting system, as an analogy, the reader may consider that the equalizer is to the adapting block as the body is to the brain. Chapter 4 presented the experimental results obtained with the first Media Converter prototype. In particular, it was demonstrated using a Router Tester that the system is compatible with the IEEE 802.3 Gigabit Ethernet Standard, also that it is able to handle the required throughput and that it presented an overall delay of $\langle 30\mu s$ . Moreover, it was also concluded that the system was capable of adapting itself to the POF Channel. Also two different versions of the equalizer were tested, i.e. FSE and FSE+DFE. It was shown that the complete system FSE+DFE outperforms the FSE by providing a larger optical power margin of 6 dB. Further testing showed that the Media Converter tolerates up to 20 fiber 90°bendings with 14 [mm] of radius. Regarding its hardware implementation, it was found that the complete system uses 87% of the available FPGA slices. Chapter 5 presented the completion of the Media Converter by showing the design and implementation of the Clock Recovery System. Specifically, the M&M TED formulation and implementation were presented. This was followed by the analysis regarding the estimation of the Loop Filter and therefore the definition of the resulting loop bandwidth. In this respect, it was shown the way in which using Matlab simulink the Loop Filter coefficients were fitted in order to obtain the desired cut frequency that led to a loop bandwidth of 4 [KHz]. Then, a $\Delta - \Sigma$ modulator together with an RC filter was introduced as an alternative for implementing a DAC using a digital output pin in the FPGA, which in the end allowed the system implementation. The experimental results obtained with the fully engineered Media Converter were shown in Chapter 6. First, the Graviton-SPD2 and the new and definitive Firecomms Receiver were compared with the previous set up, i.e. without clock recovery. This analysis evidenced that the system operating with the new receiver presents a total optical power margin before FEC of 4 dB, which is 2 dB less than the system working with the Graviton-SPD2. This first experiment was followed by a testing campaign aimed to parametrize the Clock Recovery system. The most important results obtained were the definition of the Holding Window, which as expected was around 320 [KHz] and the convergence time of 55 [ms]. Regarding this last result, it is also important the fact that it can be improved by setting the starting frequency of the scan algorithm nearer to the expected master clock frequency. After performing these tests, the complete set up including the fully engineered Media Converter and a Router Tester was used to validate the system. As a result, it was demonstrated that the system maintained its compatibility with the IEEE 802.3 Gigabit Ethernet Standard, the capability of operating at maximum throughput and also the overall delay of $<30\mu s$ . It was also shown that the system guarantees an optical power margin of 4 dB, which given the new considerations regarding the Firecomms receiver means that the Clock Recovery System does not imply a power penalty for the fully engineered Media Converter. This last result constitutes the most important one of the project and has been included in a journal article that is being written at this very moment and that also will consider the results of an analysis, that is currently in progress, regarding jitter transfer and tolerance to phase noise. Furthermore, the results presented in Chapters 4 and 5 led to a journal article [36], a post-deadline paper [37] and several other conference papers [38], [39], [40], [41], [42], [43], [44]. At this point, it is worth mentioning that implementing the system on FPGA was quite challenging, very often timing constraints considerations related to signal routing or propagation through a combinatorial device resulted in a re-design of the hardware architecture and therefore of the formulated model. In this sense, particular attention was paid to the finite arithmetic of each component, this became even more critical when the loop architecture of both the DFE and the clock recovery were implemented. Mainly, because an slight difference in the number of bits composing a word, or even the arrangement in terms of its mantissa and therefore its precision could lead to biased signals and as a consequence to instability. The system presented in this thesis can be further improved by updating the FB filter architecture to the reduced version proposed in [28], in which the number of multiplexers of the parallel stages that conform the complete DFE is reduced by reusing some of the already generated signals. On the other hand, the clock recovery system may be also further improved by modifying its architecture. As the reader may recall, usually this kind of recovering algorithms consider pre-equalization at the receiver side before the ADC, so that the error estimation is made based on a better received signal than the one that is used for the present prototype. Therefore, an architecture in which the FSE is configured (at reset) with a sub-optimum set of taps that allows it to perform a certain level of equalization may be considered. This modification would then allow the M&M TED to estimate the phase deviation using the FSE output. Then, it would be necessary to investigate the implications of this new architecture in terms of convergence time, tolerance to phase noise and jitter transfer. DSP for short-range optical communication systems constitutes an exciting and currently expanding field. Nowadays, the improvement in terms of integration, speed and versatility of development platforms for embedded applications is driving more attention and resources to implementation of such systems. Moreover, they are finally arising as alternatives to more traditional wireless and copper based communication systems and although the inherent complexity of the DSP systems required to guarantee robustness and achieve high bit rates still constitutes a challenge in terms of costs, projects as the one presented in this dissertation are helping to foresee a future in which they could become a fairly competitive player in the mass consumer applications market. ## List of Acronyms ADC Analog to Digital Converter ADSL Asymmetric Digital Subscriber Line AWGN Additive White Gaussian Noise BER Bit Error Rate CAT Categogy CIC Cascade Integrator-Comb CMA Constant Modulus Algorithms CMOS Complementary Metal Oxide Semiconductor CRC Cyclic Redundancy Code DAC Digital to Analog Converter DC Data Center DCM Digital Clock Manager DD Direct Decision DFE Decision Feedback Equalizer DLMS Delayed Least Mean Square DSP Digital Signal Processing EMI Electromagnetic Interference FF Feed Forward FFE Feed Forward Equalizer FB Feedback FEC Forward Error Correction FITH Fiber In The Home FIR Finite Impulse Response FP7 Seventh Framework Programme FPGA Field Programmable Gate Array FTTB Fibet To The Building FTTH Fiber To The Home GMII Gigabit Media Independent Interface GOF Glass Optical Fiber IPTV Internet Protocol Television ISI Inter-symbol interference ISMB Istituto Superiore Mario Boella HPC High Performance Computing LAN Local Access Network LED Light Emitting Diode LMS Least Mean Square LTI Linear Time Invariant MDI Medium Dependent Interface M&M Mueller and Muller MMF Multi Mode MSE Mean Square Error NA Numerical Aperture NCO Numerically Controlled Oscillator NRZ Not Return to Zero OSI Open Systems Interconnection PAM Pulse Amplitude Modulation PCS Physical Coding Sublayer PER Packet Error Rate PF-POF Perfluorinated Plastic Optical Fiber PHY Physical Layer Device PMA Physical Medium Attachment PMD Physical Medium Dependent PLC PowerLine Communications PLL Phase Locked Loop POF-PLUS Plastic Optical Fibre for Pervasive Low-cost Ultra-high capacity Systems PMMA Polymethyl Methacrylate POF Plastic Optical Fiber PRBS pseudo random binary sequences RAM Random Access Memory RC Resonant Cavity RMS Root Mean Square RS Reed Solomon RX Receiver SI Step Index SMF Single Mode Fiber TED Timing Error Detector UHAB Ultra High Acquisition Board UTP Unshielded Twisted Pair VDSL Very High Speed Digital Subscriber Line VHDL Very High Density Logic #### 7-Conclusions and Recommendations | VNA | Vector Network Analyzer | |------|---------------------------------------| | VOA | Variable Optical Amplifier | | VoIP | Voice over Internet Protocol | | VCO | Voltage Controlled Oscillator | | VCXO | Voltage Controlled Crystal Oscillator | | WiFi | Wireless Fidelity | ## Bibliography - [1] B. Reboul. (2011, November) A global overview of ftth. [Online]. Available: www.ftthcouncil.org. - [2] (2011, November) Latest country ranking shows further momentum on all-fiber deployments. Fiber to the Home Council. [Online]. Available: www.ftthcouncil.org. - [3] S. Ross and M. Zager, "The advantages of optical access," Fiber to the Home Council, Tech. Rep., 2011. - [4] S. J. Lee, "Discrite multitone modulation for short-range optical communications," Ph.D. dissertation, Technische Universiteit Eindhoven, 2009. - [5] H. Yang, "Optical techniques for broadband in-building networks," Ph.D. dissertation, Technische Universiteit Eindhoven, 2011. - [6] R. Gaudino. (2011) Pof-plus project public final report. [Online]. Available: http://www.ict-pof-plus.eu/ - [7] S. Scott, "Optical interconnects in future hpc systems," in Optical Fiber Communication Conference and Exposition (OFC/NFOEC), 2011 and the National Fiber Optic Engineers Conference, march 2011, pp. 1–3. - [8] P. Pepeljugoski, J. Kash, F. Doany, D. Kuchta, L. Schares, C. Schow, M. Taubenblatt, B. Offrein, and A. Benner, "Towards exaflop servers and supercomputers: The roadmap for lower power and higher density optical interconnects," in *Optical Communication (ECOC)*, 2010 36th European Conference and Exhibition on, Sept. 2010, pp. 1–14. - [9] (2011, June) Plastic optical fibre for pervasive low-cost ultra-high capacity systems. POF-PLUS EU Project. [Online]. Available: http://www.ict-pof-plus.eu/ - [10] O. Ziemann, J. Krauser, P. E. Zamzow, and W. Daum, *POF Handbook: Optical Short Range Transmission Systems*, 2nd ed, Ed. Spinger, 2001. - [11] (2012, February) Principle of pof. POF Application Center. [Online]. Available: http://www.pofac.de/en/index.php - [12] E. F. Schubert, Y. Wang, H., A. Y. Cho, L. Tu, and G. J. Zydzik, "Resonant cavity light emitting diode," *Applied Physics Letters*, vol. 60, no. 8, pp. 921 –923, feb 1992. - [13] S. Abrate, A. Nespola, S. Straullu, P. Savio, R. Gaudino, A. Antonino, C. Zerna, B. Offenbeck, and N. Weber, "Gigabit home networking with 1 mm pmma fibers," in *Transparent Optical Networks (ICTON)*, 2010 12th International Conference on, 27 2010-july 1 2010, pp. 1 –4. - [14] (2012, January) Uhab ultra high-speed acquisition board datasheet. BITSIM-AB. [Online]. Available: http://www.bitsim.com/en/uhab-high-speed-board. htm - [15] (2012, February) System generator for dsp: Getting started guide. Xilinx. [Online]. Available: http://www.xilinx.com/support/documentation/dt\_sysgendsp\_sysgen13-4.htm - [16] P. Savio, A. Nespola, S. Straullu, S. Abrate, and R. Gaudino, "A physical coding sublayer for gigabit ethernet over pof," 2010. - [17] J. Proakis and M. Salehi, *Communication Systems Engineering*, 1st ed. New Jersey: Prentice-Hall, 1994. - [18] S. Haykin, Adaptive Filter Theory, 4th ed. New Jersey: Prentice Hall, 2001. - [19] G. Miao, Signal Processing for Digital Communications: Theory, Algorithms And Applications. Norwood, MA, USA: Artech House, Inc., 2006. - [20] J. G. Proakis, Digital Communications, 4th ed. New York: McGraw Hill, 2000. - [21] Y. Li, K. R. L. Z. Ding, K. J. R. Liu, and Z. Ding, "On the convergence of blind channel equalization," Institute For Systems Research, Tech. Rep., 1995. - [22] R. Casas, Z. Ding, R. Kennedy, J. Johnson, C.R., and R. Malamut, "Blind adaptation of decision feedback equalizers based on the constant modulus algorithm," in Signals, Systems and Computers, 1995. 1995 Conference Record of the Twenty-Ninth Asilomar Conference on, vol. 1, oct-1 nov 1995, pp. 698 –702 vol.1. - [23] R. Casas and R. Johnson, "On the blind adaptation of an fse+dfe combination," in Signal Processing Advances in Wireless Communications, 1997 First IEEE Signal Processing Workshop on, apr 1997, pp. 113–116. - [24] P. Quinton and Y. Robert, Systolic algorithms & architectures. Prentice Hall, 1991. - [25] A. Chorevas and D. Reisis, "Efficient systolic array mapping of fir filters used in pam-qam modulators," J. VLSI Signal Process. Syst., vol. 35, pp. 179–186, September 2003. - [26] K. Parhi, "Design of multigigabit multiplexer-loop-based decision feedback equalizers," Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol. 13, no. 4, pp. 489 –493, april 2005. - [27] —, VLSI Digital Signal Processing System Design and Implementation. New York: John Wiley and Sons. Inc, 1999. - [28] D. Oh and K. Parhi, "Low complexity design of high speed parallel decision feedback equalizers," in *Application-specific Systems, Architectures and Processors*, 2006. ASAP '06. International Conference on, sept. 2006, pp. 118–124. - [29] N. R. Shanbhag and K. K. Parhi, Pipelined Adaptive Digital Filters. Norwell, MA, USA: Kluwer Academic Publishers, 1994. - [30] H. Meyr, M. Moeneclaey, and S. Fechtel, Digital Communication Receivers: Synchronization, Channel Estimation, and Signal Processing, 1st ed. John Wiley and Sons. Inc, 1998. - [31] D. Stephens, *Phase-Locked Loops for Wireless Communications*, 2nd ed. Kluwer Academic Publishers, 2002. - [32] K. Mueller and M. Muller, "Timing recovery in digital synchronous data receivers," *Communications, IEEE Transactions on*, vol. 24, no. 5, pp. 516 531, may 1976. - [33] G. Jovanovic-Dolecek and S. Mitra, "A new two-stage sharpened comb decimator," Circuits and Systems I: Regular Papers, IEEE Transactions on, vol. 52, no. 7, pp. 1414 1420, july 2005. - [34] E. Hogenauer, "An economical class of digital filters for decimation and interpolation," Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. 29, no. 2, pp. 155 162, apr 1981. - [35] J. Logue, "Virtex synthesizable delta-sigma dac," Xilinx Application Note, September 1999. - [36] A. Nespola, S. Straullu, P. Savio, D. Zeolla, J. Molina, S. Abrate, and R. Gaudino, "A new physical layer capable of record gigabit transmission over 1 mm step index polymer optical fiber," *Lightwave Technology, Journal of*, vol. 28, no. 20, pp. 2944 –2950, oct.15, 2010. - [37] A. Nespola, S. Straullu, P. Savio, D. Zeolla, S. Abrate, D. Cardenas, J. Ramirez, N. Campione, and R. Gaudino, "First demonstration of real-time led-based gigabit ethernet transmission on 50m of a4a.2 si-pof with significant system margin," in *Optical Communication (ECOC)*, 2010 36th European Conference and Exhibition on, sept. 2010, pp. 1–3. - [38] A. Antonino, S. Straullu, S. Abrate, A. Nespola, P. Savio, D. Zeolla, J. Molina, R. Gaudino, S. Loquai, and J. Vinogradov, "Real-time gigabit ethernet bidirectional transmission over a single si-pof up to 75 meters," in *Optical Fiber Communication Conference and Exposition (OFC/NFOEC)*, 2011 and the National Fiber Optic Engineers Conference, march 2011, pp. 1–3. - [39] S. Abrate, A. Nespola, S. Straullu, P. Savio, D. Zeolla, J. Ramirez, and R. Gaudino, "Fully-digital parallel adaptive decision feedback equalizer for gigabit ethernet si-pof links," in *POF Conference 2010, Yokohama, Japan*, October 2010. - [40] R. Gaudino, J. Molina, D. Zeolla, P. Savio, S. Straullu, A. Nespola, S. Abrate, C. Zerna, J. Sundermeyer, A. Fiederer, N. Verwaal, B. Offenbeck, and N. Weber, "Architectures for low-cost gbit/s pof links for home networking," in *Future Network and Mobile Summit*, 2010, june 2010, pp. 1 –7. - [41] A. Nespola, S. Straullu, P. Savio, D. Zeolla, S. Abrate, J. Ramirez, and - R. Gaudino, "Towards a new gigabit ethernet phy for si-pof," in *Optical Communication (ECOC)*, 2010 36th European Conference and Exhibition on, sept. 2010, pp. 1–3. - [42] S. Abrate, A. Nespola, P. Savio, S. Straullu, D. Zeolla, R. Gaudino, and J. Ramirez, "Fully working 1gbit/s ethernet transmission system: final results from pof-plus," in 20th International Conference on Plastic Optical Fiber (ICPOF 2011), Bilbao, Spain, September 2011. - [43] —, "Duobinary modulation format for gigabit ethernet si-pof transmission system," in 20th International Conference on Plastic Optical Fiber (ICPOF 2011), Bilbao, Spain, September 2011. - [44] —, "Bidirectional transmission of the gigabit ethernet signal over a single sipof," in 20th International Conference on Plastic Optical Fiber (ICPOF 2011), Bilbao, Spain, September 2011. ## Acknowledgements As most things in life, the project from which this thesis derives was a collective effort. Therefore, there is a number of people to whom I will be always grateful for all their support and assistance. First of all, I would like to thank Prof. Roberto Gaudino and Silvio Abrate for giving me the opportunity to be part of this challenging project and also for all the support and guidance that they provided me throughout this endeavor. Special thanks go to the PhotonLab research staff for all their help and assistance, during my PhD I had the fortune not only to work with them but also to learn about life and companionship. I am specially grateful to Dr. Antonino Nespola for all his help, advice and guidance in my first steps in both the world of science and of independent life, I will always remember his grandfather's sayings with all their wisdom. I would also like to thank Paolo Savio and Stefano Straullu for all their support, but more important for their friendships and their attitude towards life and work, I will cherish our long hours in the lab working and speaking about many things, from politics to calcio. Many thanks to Dr. Enrico Torrengo, Dr. Ramon Mata and Dr. Daniel Cárdenas who have had a great influence in my personal and professional life, thanks for all your advices, help and encouragement. During my PhD I had the opportunity to visit the ECO group at the TU/e in the Netherlands. I would like to thank Prof. Harm Dorren and Dr. Nicola Calabretta for giving me this opportunity. I would like to extend my gratitude to all members of the group, in particular to my office mates Sihuan Zou and Jun Luo for their help, friendship and for the most fun office in which I have ever worked. Thanks to Roy Uden for his advice and numerous technical discussions during my visit, also many thanks to Chigo Okonkwo, Fausto Gomez, Prasanna Gamage, Vlado Menkonvski and Stefano Di Lucente for all their advices, support, and hospitality. I am most grateful to my parents, my brother and my family in general. Thanks for teaching me that the world is ours to be made, is ours to be improved, is ours to be lived, is ours to be loved. Thank you for showing me that the strength to be better and to be good to this world is within each of us and that we together as a society are the main actors of our history and our lives. Finally, thank you Malui for all your help, understanding and most important, for all your love, this would not have been possible without you by my side, without your constant encouragement and willingness to share our time together with my science related activities. I dedicate this diploma to you, to our families and to all the people (present and absent) that have walked with us the path of life.