Real-Time Implementation of Digital Signal Processing for Coherent Optical Digital Communication Systems
Post on 23-Sep-2016
IEEE JOURNAL OF SELECTED TOPICS IN QUANTUM ELECTRONICS, VOL. 16, NO. 5, SEPTEMBER/OCTOBER 2010 1227
Real-Time Implementation of Digital SignalProcessing for Coherent Optical Digital
Communication SystemsAndreas Leven, Senior Member, IEEE, Noriaki Kaneda, Senior Member, IEEE, and Stephen Corteselli
AbstractDigital signal-processing-based coherent optical com-munication systems are widely viewed as the most promising next-generation long-haul transport systems. One of the biggest chal-lenges in building these systems is the implementation of signalprocessors that are able to deal with signaling rates of a few tensof giga-samples per second. In this paper, we discuss implementa-tion options and design considerations with respect to hardwarerealization and DSP implementation.
Index TermsDigital signal processors, optical fiber communi-cation, quadrature phase-shift keying.
COHERENT communication systems have dominated theworld of wireless communication almost since its begin-nings. Coherent systems, or more exactly phase-coherent sys-tems, offer a number of benefits over noncoherent systems .Nonetheless, practical optical coherent communication systemsbecame feasible only recently.
Coherent optical communication systems have been a mat-ter of intense research in the 1980s and early 1990s of pastcentury. At that time, main motivation was the higher sensitiv-ity coherent receivers promised. Technical difficulties inhibitedrapid transition into commercial systems. With the advent of theerbium-doped fiber amplifier, which offered comparable sensi-tivities with direct-detection systems, the main driving force fordeveloping optical coherent systems disappeared.
Todays motivation to revive coherent concepts in opticalcommunication is twofold. First, coherent receivers enable reli-able data transmission with much higher spectral efficiency thanconventional direct-detection systems, and second, coherent re-ceivers can compensate for linear impairments, most notably,polarization-mode dispersion (PMD) to a degree that is out ofreach for conventional systems.
Also, the technical difficulties that the first generation ofcoherent systems in optical communications faced have beenlessened. This is caused by two developments. First of all, the
Manuscript received January 15, 2010; revised February 12, 2010; acceptedFebruary 28, 2010. Date of publication May 18, 2010; date of current versionOctober 6, 2010.
A. Leven is with Alcatel-Lucent Bell Laboratories, 70435 Stuttgart,Germany (e-mail: firstname.lastname@example.org).
N. Kaneda and S. Corteselli are with Alcatel-Lucent Bell Laboratories,Murray Hill, NJ 07974 USA (e-mail: email@example.com;firstname.lastname@example.org).
Digital Object Identifier 10.1109/JSTQE.2010.2044977
symbol rate to carrier frequency ratio of modern optical com-munication systems approaches the ratio that is commonly usedin wireless systems. For a system that transmits at data rate of100 Gb/s in two polarization orientations utilizing QPSK sig-naling, the symbol rate is 25 GBd. With a carrier frequency ofroughly 200 THz, the symbol rate to carrier frequency ratio is1.25e-3. This indicates that it is possible for optical systems toachieve similar phase noise to symbol rate ratios, as in wirelesssystems.
Second, the performance of digital signal processing (DSP)equipment has been improved dramatically over the past twodecades, which makes it feasible to implement the complexsignal processing steps required to synchronize to the receivedsignal in digital domain. Implementations of optical coherentreceivers have been demonstrated in CMOS-based application-specific ICs (ASICs)  or field-programmable gate arrays(FPGA) , .
Albeit a coherent optical communication system can utilizesingle or multiple carrier [e.g., orthogonal frequency-divisionmultiplexing (OFDM)] transmitter and any modulation format,with QPSK being the most popular and higher order quadrature-amplitude modulation (QAM) and phase-shift keying (PSK)systems under investigation, this paper will concentrate onsingle-carrier frequency-domain-equalized systems, which hasbecome more popular in the wireless domain as well . Themodulation format discussed here will be QPSK. Phase co-herence between a data signal and the reference is typicallyestablished at the receiver side. As this paper is concerned withthe implementation of coherent systems, it will concentrate onreceiver design.
The paper is organized as follows. After a short review of thebasic architecture of optical coherent system, we will discusshardware implementation options. Then, we will discuss spe-cific challenges for the implementation of signal processing atmultiple gigabit per second signaling rates. In Section IV, ex-emplarily some of the algorithms and their implementation willbe described. Finally, some measurement results of a real-timecoherent receiver will be discussed.
II. OPTICAL COHERENT-SYSTEM ARCHITECTUREFig. 1 shows a generic block diagram of a coherent system.
The transmitter on the left side of the diagram consists of adata source, digital-to-analog converters (DACs), and driveramplifiers. Coherent systems often use polarization-division
1077-260X/$26.00 2010 IEEE
1228 IEEE JOURNAL OF SELECTED TOPICS IN QUANTUM ELECTRONICS, VOL. 16, NO. 5, SEPTEMBER/OCTOBER 2010
Fig. 1. Generic coherent system.
multiplex. In this case, the continuous wave signal of the trans-mit laser is split into two, and then, modulated independentlyby two optical in-phase/quadrature (I/Q) modulators. Thetwo signals are then combined in two orthogonal polarizationorientations.
In Fig. 1, the data source is simply displayed as a black box.In a real system, this black box comprises a number of com-plex functions. Besides the functions that are also performedin a classical system, such as data aggregation, coding, andframing, additional steps need to be performed in a transmit-ter for complex modulation formats, as it is typically used ina coherent system. First of all, the data have to be mapped toconstellation points and, in case of multiple carrier systems,to frequencies. Often, the data are also differentially precodedto cope with phase slips during receiver-side carrier synchro-nization. In a second step, the mapped data might be processeddigitally, for instance, it might be predistorted to compensatefor the nonlinearity of amplifiers and modulators, or it mightbe precompensated for deterministic fiber effects, for instance,chromatic dispersion (CD).
The processing of the earlier described data results in fourdigital data streams that subsequently need to be converted intoanalog data. In case of single-carrier QPSK signaling, each datastream carries only a single bit per symbol, and therefore, doesnot require a DAC. This reduces the complexity, and therefore,power consumption of the transmitter significantly. But evenfor multicarrier systems  or modulation formats with higherorder than QPSK, the performance requirements with respectto resolution and conversion speed are typically less restrictivefor the transmit DAC than for the receive AD converter (ADC).For instance, for a 16-QAM transmitter without preprocessing,only 2 bits (four levels) at a conversion speed equivalent to thesymbol rate are required, while at the receiver side, typically68 bits at a sampling rate of twice the data rate are needed.Technology and architecture choices are similar to the ones ofthe ADC, which will be discussed later.
The I/Q modulator most widely used today is a double-nestedMachZehnder modulator  based on LiNbO3 . However, othermore compact solutions are in development based, for instance,on electroabsorption modulator structures .
The receiver consists of a local oscillator (LO) laser, an opticalhybrid, a photoreceiver array, an ADC array, the digital signalprocessor (DSP), and a data sink, which typically comprises adecoder and a client interface.
The 90 optical hybrid mixes the received signal with thesignal of the LO laser and a 90 phase-shifted copy of the LO
laser signal . The mixing of the signal with the LO referenceis performed for each polarization separately. Preferably, theoutput of the hybrid provides differential signals for suppressionof the direct-detection terms.
Optical hybrids have been demonstrated in different designsand technology platforms. Design-wise, optical hybrids can begrouped into actively controlled devices that require a phasecontrol to maintain the 90 phase difference and passive de-vices that assure a 90 phase difference by design. Active de-vices typically consist of two splitters, one for the receivedsignal and one for the LO signal, and two signal combiners, onefor the in-phase component and one for the quadrature com-ponent . The phase of the signal in one arm of the LOsplitter needs to be adjusted by a tunable phase shifter to be90 out of phase with respect to the signal in the second arm.The phase shift can be controlled by utilizing thermal tuningor electrooptic tuning.
Passive hybrids are designed such that the signals alwaysinterfere with a phase difference of about 90. These can beimplemented, for instance, as a Michelson interferometer or amultimode interference coupler (MMI) . The advantage ofpassive hybrids is obvious; no control signal has to be generatedand distributed to the hybrid device. Nevertheless, the phaseaccuracy is often not sufficient so that a phase correction needsto be implemented in the digital signal processor.
After photodetection and linear amplification, the signalsneed to be converted from AD domain. The ADC performanceis still one of the bottlenecks that determine total data rate ofDSP-based optical coherent systems. ADCs with sampling rateof more than 20 GSample/s have been published. These de-vices have been realized in CMOS technology ,  as wellas SiGe BiCMOS , . SiGe BiCMOS devices typicallyoffer slightly better performance at the expense of increasedpower consumption, while CMOS devices offer the possibilityto integrate the ADC functionality and the DSP on one chip.
SiGe ADCs are typically implemented as flash ADCs, wherea bank of comparators is utilized to convert the analog signalinto a digital thermometer-coded signal. In a second step, thisthermometer-coded signal is converted into a binary or gray-encoded signal.
CMOS ADCs take advantage of the higher integration densityavailable in CMOS by utilizing a time-interleaved architecture.Here, multiple ADCs are employed in parallel with each ADCsampling at a fraction of the total sampling rate. For instance, nADCs sampling at a rate of R/n, each with a time offset of 1/Rto achieve a total sampling rate of R. Gain and sampling point
LEVEN et al.: REAL-TIME IMPLEMENTATION OF DIGITAL SIGNAL PROCESSING 1229
mismatch between the individual sub-ADCs require calibrationto avoid performance loss .
The digital signal processing functionality has been demon-strated in ASICs  as well as in FPGAs , . An ASICnot only offers the possibility of integration of the ADCs on thesame chip, but also allows for an optimized circuit design specif-ically tailored for the application. This results in higher speedand lower power-consumption receivers with higher function-ality and complexity than a realization in an FPGA. For largevolumes, ASICs also tend to be cheaper than FPGA implemen-tations. Unfortunately, an ASIC development is very costly andrequires a lot of resources in the design process. Furthermore, anASIC cannot be reconfigured so that the processing algorithmshave to be fully developed during the design phase. Therefore,an ASIC implementation is preferable for a commercial devel-opment, while FPGAs are the platform of choice for researchand prototyping.
FPGAs are ICs that are built with a set of configurable blocksthat can be interconnected with a reconfigurable set of wires. Themajority of the configurable blocks are so-called logic blocks,but typically modern FPGAs also offer a number of memoryand multiplier blocks. FPGAs are designed to be useful in anumber of applications, which in turn means that the resourcesavailable in a given FPGA are not necessary a perfect fit for theapplication at hand. Resource constraints often lead to adapta-tion or modification of optimum signal-processing algorithms,which in turn lead to reduction in performance (implementationpenalty).
III. IMPLEMENTATION CONSIDERATIONSTodays FPGAs offer processing speeds in the order of a few
100 MHz. The achievable processing speed for an ASIC usingthe same CMOS generation is typically higher by a factor ofabout two to three. Nevertheless, the processing speeds avail-able in todays technologies are about two to three orders ofmagnitude smaller than the data rates in optical communicationsystems. The maximum achievable processing speed of a digitalcircuit is given by the longest time a signal needs to travel be-tween two clocked storage elements (e.g., flipflops). The pathbetween these two storage elements is called the critical path.There are two commonly used techniques to reduce the length ofthe critical path, and therefore, increase processing: pipeliningand parallel processing. Pipelining reduces the critical path byinserting additional retiming elements along the signal path ina manner that does not alter the result of the processing but ofadditional latency. This only allows an increase of processingspeed up to the maximum speed of a single element (gate in anASIC or lookup table (LUT) for an FPGA).
For being able to process data at multiple gigabit per secondup to 100 Gb/s and beyond, a parallel processing structure hasto be implemented. Unfortunately, not all algorithms can beparallelized without modifications and loss of performance. Ingeneral, all structures that can be pipelined can also be processedin parallel .
Algorithms that are time invariant can simply be parallelizedwithout loss of performance by instantiating the circuitry that
Fig. 2. Signal-processing steps.
implements the algorithm multiple times. Often, the complexitycan be reduced by sharing resources between multiple instances.An example of a structure that can easily be parallelized is afinite-impulse response (FIR) filter with constant coefficients. Ifthe filter coefficients are not constant, e.g., within an adaptivefilter structure, there might not be an equivalent parallel struc-ture, e.g., when the update of the filter coefficients is performedonce per sampling period. If one is willing to compromise onupdate speed by accepting an update rate once every clock cy-cle, with the clock cycle being 1/n times the sampling period,with n the parallelization factor, an equivalent structure can beimplemented.
IV. SIGNAL-PROCESSING ALGORITHMSAND THEIR IMPLEMENTATION
Fig. 2 shows a possible flow of signal-processing stepsfor a single-carrier PSK or QAM receiver. After analog-to-digital conversion, imperfections of the receiver frontend needto be corrected. Then, the accumulated CD of the channel needsto be compensated for. In a next step, the symbol timing needs tobe recovered. Next, the polarization rotation of the fiber needs tobe undone. This is typically done in conjunction with equaliza-tion for PMD and other impairments. Finally, the carrier phase
1230 IEEE JOURNAL OF SELECTED TOPICS IN QUANTUM ELECTRONICS, VOL. 16, NO. 5, SEPTEMBER/OCTOBER 2010
Fig. 3. Circuit for quadrature-imbalance compensation.
and frequency needs to be recovered before a decision on thesymbol can be made.
The order of processing steps, as outlined in Fig. 2, is not theonly one possible. Changing the order of processing typicallyrequires a modification of the actual algorithms. In the follow-ing section, a small selection of the processing steps and theimplementation of the respective algorithms will be discussedin more detail.
A. Quadrature-Imbalance CompensationIn an initial step, impairments of the optical and electrical
frontend are compensated. One example is the correction ofquadrature imbalance stemming from imperfect phase controlin the optical hybrid. Quadrature-imbalance compensation iswell known in wireless communications and has been proposedto be used in optical communications as well . If there is noamplitude and offset error of the in-phase and quadrature com-ponent of the signal, quadrature-imbalance compensation canbe performed by first measuring the cross correlation betweenthe in-phase (I) and quadrature (Q) components of the receivedsignal, which is proportional to the sine of the phase error ofthe optical hybrid
I (t)Q (t) = 12
sin () (1)
where I and Q are assumed to be normalized. The componentscan then be transformed in corrected orthogonal components Iand Q by(
I (t)Q (t)
)(I (t)Q (t)
Equations (1) and (2) can be implemented in a feedforwardstructure. This has two major drawbacks. First, the trigonometricfunctions have to be implemented in an LUT, and second, thenormalization of the two components has to be performed accu-rately; as any error in the normalization will lead to quadratureimbalance.
The circuit shown in Fig. 3 avoids these drawbacks by em-ploying a feedback structure. The cross correlation is measuredafter the actual compensation, weighted with a convergence fac-tor and integrated. After convergence is achieved, the crosscorrelation is zero in average and the output of the integrator isconstant and, according to (1), it is proportional to the sine of thephase error. This result is then multiplied with the I-tributary and
added to the Q-tributary to yield the corrected output. Note that,according to (2), the result should also be divided by the cosine ofthe phase error to yield the correct amplitude. This step is omit-ted in Fig. 2, as it would require an LUT and another multiplier.If a gain control circuit is placed after the quadrature-imbalancecompensation, this would be automatically compensated for.
B. CD CompensationIt is advisable to split the equalization of the received signal in
two steps. First, perform a static or slowly adaptive equalizationon each polarization tributary separately, and second, perform afast adaptive joint equalization on both polarization tributaries.The first equalizer is typically chosen to have a much longerimpulse response and can be used to compensate for quasistaticeffects as CD or frequency response of the optical frontend. Thesecond one having a shorter impulse response but a faster adap-tation speed is typically used for polarization tracking, equal-ization of PMD, as well as residual CD not compensated for bythe static equalizer.
Typically, equalizers for data rates considered here employdigital block filters . Block filtering involves the calcula-tion of a finite set, or block of output values based on a finiteset of input values. This can be performed in time domain orequivalently in frequency domain . Algorithms have beendeveloped for block filtering to achieve identical outputs as se-quential filtering, most notably, the overlap-and-save methodand the closely related overlap-and-add method.
Let us assume that the input data sequence is partitioned inblocks of length n and that k is the impulse response lengthof the desired filter function. In case of overlap-and-save, ninput samples are concatenated with k symbols from the nextblock, and then, convoluted with the impulse response. Thefirst k samples of the output of the convolution are not used,while the remaining n samples constitute the correct filter output.Therefore, this method is often also referred to as overlap-and-dump or overlap-and-scrap.
In case of overlap-and-add, n input samples are padded with kzeros before being convoluted with the impulse response of thedesired filter. After convolution, k trailing samples are stored tobe added to the k leading samples of the following result of theconvolution. Overlap-and-add is typically slightly more efficientin terms of implementation complexity and is often chosen whenthe filter response is static or changes only slowly with time. Forfast adaptive filters, the overlap-and-save method is preferred,because for the overlap-and-save method, each output block isthe filtering result of exactly one impulse response function,while in case of the overlap-and-add method, the portion that issaved from the previous result to be added to the current resultmight have been calculated with a different impulse responsefunction if the filter function changed in-between two clockcycle.
Block filtering can be efficiently implemented in frequencydomain, especially if the impulse response length is compara-ble to the block length. Complexity estimations that comparefrequency-domain and time-domain implementation can, for in-stance, be found in .
LEVEN et al.: REAL-TIME IMPLEMENTATION OF DIGITAL SIGNAL PROCESSING 1231
Frequency-domain filtering requires the implementation ofdiscrete Fourier and discrete-inverse Fourier transforms. Themost commonly used algorithm to implement Fourier trans-forms in hardware is the CooleyTukey fast Fourier transform(FFT) algorithm . Basic idea of the CooleyTukey algorithmis to break up a transformation of length N in two transforma-tions, each of length N/2 (DanielsonLanczos Lemma). Thiscan be done recursively, until one reaches a transform of trivialsize (two, four, or eight, for instance). It is possible not only todivide up the FFT in two parts, as described earlier (radix-2).Very common are also radix-4 implementations, where in eachstep, the FFT is split in four sub-FFTs, or mixture of radix-2and radix-4 (split radix), which are most suitable for hardwareimplementation.
C. Timing RecoveryThe received data and the sample timing need to be synchro-
nized so that a fixed ratio (typically two samples per symbol)is established. Timing recovery comprises two components, atiming-error detector and an interpolator. Interpolation can eas-ily be implemented utilizing an FIR filter. An introduction tointerpolator filter design can be found in .
One of the most commonly used timing-error detectors in dig-ital communication is the Gardner timing-error detector .Unfortunately, if PMD causes differential group delays ap-proaching half a symbol duration, the Gardner timing-errordetector will fail . Extensions of the Gardner timing-errordetector have been proposed to overcome this limitation .Results of hardware implementation of this or other solutionshave not been published yet.
D. Polarization TrackingPolarization tracking and PMD equalization is typically per-
formed using a two-in two-out adaptive filter. An adaptive filtercan be partitioned in three parts: the actual filter bank, an errorestimator, and a device for updating the filter coefficients.
The filter itself has typically a rather short impulse response.Because it needs to follow arbitrary polarization rotation, arather fast update of the coefficients is required. Therefore,as discussed earlier, an overlap-and-save implementation ispreferable.
In a second step, the error of the signal coming from the filterbank needs to be estimated. There are a number of techniquesavailable for error estimation, namely insertion of training sym-bols, decision feedback, or measuring a known property of thesignal. The former have good tracking properties but requirethe inclusion of carrier synchronization in the feedback loop.The latter one has advantages with respect to loop delay, andtherefore, potentially offers faster tracking speed. A very popu-lar measure is the constant modulus criterion (see  and for a comprehensive treatment). The constant modulus crite-rion penalizes deviation of the amplitude of the equalized signalfrom a desired fixed value. It is obvious that this criterion isoptimally suited for PSK-modulated signals. Actually, this cri-terion can also be utilized in QAM-modulated systems, albeitwith a penalty with respect to noise and convergence speed.
Fig. 4. Circuit for polarization demultiplexing and equalization.
In a third step, from the estimated error, updated filter co-efficients have to be calculated. Several algorithms for this areknown in literature, for instance, the WienerHopf solution, themethod of steepest descent. Most practical from an implemen-tation standpoint of view is the LMS algorithm . The ideaof the LMS algorithm is to estimate the gradient of the error bypartial derivatives of the mean-squared error with respect to thefilter coefficients. The gradient estimates are calculated frominstantaneous measures of the error, i.e., the difference betweenthe desired amplitude and the instantaneous signal amplitudeafter the adaptive filter. In each step, the filter coefficients areupdated by adding a small portion proportional to the negativegradient estimate. A weighting factor is again utilized for con-trolling the adaptation speed and residual error of the adaptedfilter coefficients. Exact formulation of this algorithm and com-parison with a decision-feedback structure can, for instance, befound in .
Fig. 4 shows a simplified circuit diagram of the aforemen-tioned algorithm. From the schematics, it seems that the filter-coefficient update is more complex than the actual filtering it-self. As block processing is utilized, one can contemplate toreduce the complexity by not updating the filter coefficientsevery sample, but only for every block, based on one or mul-tiple measurements of the filter output. This compromises theadaptation speed and residual noise somewhat, but this mightbe tolerable depending on the desired adaptation speed.
Often, a coherent receiver uses oversampling to enable bet-ter equalization of the received data. Twofold oversampling issimplest in terms of implementation. This means that the adap-tive equalizer is fractionally spaced. For the calculation of theerror, only baud-spaced output data are utilized. Nevertheless,
1232 IEEE JOURNAL OF SELECTED TOPICS IN QUANTUM ELECTRONICS, VOL. 16, NO. 5, SEPTEMBER/OCTOBER 2010
Fig. 5. Experimental setup.
as fractional-spaced filter coefficients are needed, updating hasto be performed on the fractional data.
E. Carrier SynchronizationCarrier synchronization (i.e., frequency offset and phase-error
estimation and correction) is probably the most comprehen-sively treated topic in literature, as this is the minimum pro-cessing that needs to be performed in any DSP-based coherentreceiver. The first published work therefore concentrated on thistopic . Frequency offset correction and phase-errorcorrection are conceptually very similar, as both involve the es-timation of an error, filtering of that estimate, and correction ofthe data utilizing the filtered estimate. Methods for frequencyestimation have been proposed in .
Phase synchronization typically comprises two steps .First, removal of the modulated information to obtain an in-stantaneous phase estimate, and second, filtering of the phaseestimate to minimize the influence of noise.
In most communication systems, information removal is per-formed by employing a decision-directed scheme, where thedifference between a symbol before and after decision is takenas instantaneous estimate for the phase error. Decision-directedschemes typically utilize feedback, which poses a challengefor pipelining and block processing. In a direct parallelizationof the decision-directed feedback structure, the feedback delayis multiplied by the parallelization factor, which leads to anequivalent reduction in phase noise tolerance. Look-ahead tech-niques ,  can be utilized to improve performance. In caseof PSK-modulated signaling, information removal can also beperformed by applying a power-law nonlinearity, as proposedin . This is a feedforward technique that can be easily im-plemented in a block-processing scheme. The same hold for thepostestimation filtering, as a simple FIR filter can be utilized.
V. EXPERIMENTAL RESULTS
Portions of the signal-processing steps discussed here wereimplemented in an FPGA-based receiver. Fig. 5 shows the testbed and receiver configuration.
The transmitter comprises a commercial external cavity laser(ECL) having a line width of approximately 100 kHz and anIQ modulator driven by two 231 1 pseudorandom binary se-quence (PRBS) at a symbol rate of 6.1 GBd. The ECL waschosen for convenience of experimentation only, see  forperformance comparison of DFB laser and ECL. A delay-and-add polarization multiplexer is used to generate a polarization-multiplexed signal resulting in a total data rate of 24.4 Gb/s.A polarization scrambler Agilent 11896A running at a speedsetting of 8, an optical attenuator, and erbium-doped fiber am-plifiers produce a random state of polarization and noise loading.
On the receiver side, the signal is mixed with another ECL uti-lizing a polarization-diverse passive optical hybrid. Four single-ended pin-transimpedance amplifier (TIA) receivers perform theoptical-electrical (O/E) conversion. A set of variable-gain am-plifiers and electrical low-pass filters condition the signal priorto A/D conversion. The ADCs utilized in this setup are com-mercially available 8-bit data converters offered by Maxtek. Thesignal processing is implemented in four Xilinx Virtex 5 FPGAs.Interconnections between the ADCs and the FPGAs, as well asin between the FPGAs are implemented in low-voltage differen-tial signaling (LVDS). The devices marked in Fig. 5 as FPGA1and FPGA2 are used for quadrature-imbalance correction andautomatic gain control. In FPGA 3, polarization demultiplexing,carrier synchronization, and decision are implemented. Polar-ization demultiplexing was implemented utilizing a constantmodulus algorithm, as described earlier with a tap length of 5.Frequency estimation was performed, as described in , withan averaging window of 512 samples. Phase estimation was im-plemented as modified Viterbi and Viterbi phase estimator with
LEVEN et al.: REAL-TIME IMPLEMENTATION OF DIGITAL SIGNAL PROCESSING 1233
Fig. 6. BER versus optical signal-to-noise ratio (squares) with and (triangles)without quadrature-imbalance correction.
a filter length of 12, with the four center coefficients having aweight of 1, while all other coefficient weights were set to 0.5.FPGA 4 performs error counting and signal synchronization.
The hybrid used in this experiment shows a quadrature im-balance of approximately 2 and 5 for the two orthogonalpolarization orientations, respectively. Fig. 6 shows a bit errorratio (BER) measurement versus optical signal-to-noise ratiowith the quadrature-imbalance correction enabled (squares) anddisabled (triangles). As the phase error of the utilized hybrid ismoderate, the improvement at high bit error ratios is moderate(about 0.25 dB at BER of 1e-3), but it increases with improvedsignal-to-noise ratio. Essentially, the quadrature-imbalance cor-rection reduces the flooring behavior of this receiver.
VI. CONCLUSIONIn this paper, we outlined some of the challenges to imple-
ment real-time coherent receivers utilizing DSP techniques. Wediscussed implementation aspects of some of the algorithms inmore detail. We demonstrated a real-time FPGA-based coherentreceiver at a data rate of 24.4 GBd/s.
Albeit FPGA-based coherent receivers currently do not offersufficient processing capabilities for high-performance long-haul transmission systems, these platforms enable the develop-ment of novel processing algorithms and strategies. Therefore,the advent of ASIC-based receivers does not make research onFPGA-based receivers obsolete.
 A. Viterbi, Principles of Coherent Communications. New York:McGraw-Hill, 1967.
 K. Roberts, M. OSullivan, K.-T. Wu, H. Sun, A. Awadalla, D. J. Krause,and C. Laperle, Performance of dual-polarization QPSK for optical trans-port systems, J. Lightw. Technol., vol. 27, no. 16, pp. 35463559, Aug.2009.
 A. Leven, N. Kaneda, A. Klein, U.-V. Koc, and Y.-K. Chen, Real-time implementation of 4.4 GBit/s QPSK intradyne receiver using field-programmable gate array, Inst. Electr. Eng. Electron. Lett., vol. 42,pp. 14211422, 2006.
 T. Pfau, S. Hoffmann, R. Peveling, S. Bhandare, S. K. Ibrahim, O. Adam-czyk, M. Porrmann, R. Noe, and Y. Achiam, Coherent digital polarizationdiversity receiver for real-time polarization-multiplexed QPSK transmis-sion at 2.8 Gb/s, IEEE Photon. Technol. Lett., vol. 19, no. 24, pp. 19881990, Dec. 2007.
 F. Pancaldi, G. Vitetta, R. Kalbasi, N. Al-Dhahir, M. Uysal, and H. Mhei-dat, Single-carrier frequency domain equalization, IEEE Signal Pro-cess. Mag., vol. 25, no. 5, pp. 3756, Sep. 2008.
 Y. Tang, K.-P. Ho, and W. Shieh, Coherent optical OFDM transmitterdesign employing predistortion, IEEE Photon. Technol. Lett., vol. 20,no. 11, pp. 954956, May/Jun. 2008.
 R. A. Griffin, R. I. Johnstone, R. G. Walker, J. Hall, S. D. Wadsworth,K. Berry, A. C. Carter, M. J. Wale, J. Hughes, P. A. Jerram, and N. J. Par-sons, 10 Gb/s optical differential quadrature phase shift key (DQPSK)transmission using GaAs/AlGaAs integration, in Proc. OFC, 2002, pp.FD6-1FD6-3.
 I. Kang, S. Chandrasekhar, L. Buhl, P. G. Bernasconi, X. Liu, C. R. Giles,C. Kazmierski, N. Dupuis, J. Decobert, F. Alexandre, C. Jany, A. Gar-reau, J. Landreau, M. Rasras, M. Cappuzzo, L. T. Gomez, Y. F. Chen,M. P. Earnshaw, J. Lee, A. Leven, and C. Dorrer, A hybrid electroabsorp-tion modulator device for generation of high spectral-efficiency opticalmodulation formats, Opt. Exp., vol. 16, pp. 84808486, 2008.
 T. G. Hodgkinson, R. A. Harmon, D. W. Smith, and P. J. Chidgey, In-phase and quadrature detection using 90 optical hybrid receiver: Experi-ments and design considerations, Inst. Electr. Eng. Proc. J. Optoelectron.,vol. 135, pp. 260267, 1988.
 D. Hoffmann, H. Heidrich, G. Wenke, R. Langenhorst, and E. Dietrich,Integrated optics eight-port 90 hybrid on LiNbO, J. Lightw. Technol.,vol. 7, no. 5, pp. 794798, May 1989.
 K. Voigt, L. Zimmermann, G. Winzer, K. Petermann, and C. M. Weinert,Silicon-on-insulator 90 optical hybrid using 44 waveguide couplerswith C-band operation, presented at the ECOC 2008, Brussel, Belgien,Tu.3.C.5.
 K. Poulton, R. Neff, B. Setterberg, B. Wuppermann, T. Kopley, R. Jewett,J. Pernillo, C. Tan, and A. Montijo, A 20 GS/s 8 b ADC with a 1 MBmemory in 0.18 m CMOS, in Tech. Dig ISSCC, 2003, pp. 318496.
 P. Schvan, J. Bach, C. Fait, P. Flemke, R. Gibbins, Y. Greshishchev,N. Ben-Hamida, D. Pollex, J. Sitch, S.-C. Wang, and J. Wolczanski, A 24GS/s 6b ADC in 90 nm CMOS, in Tech. Dig ISSCC, 2008, pp. 544634.
 P. Schvan, D. Pollex, S.-C. Wang, C. Falt, and N. Ben-Hamida, A 22GS/s 5b ADC in 0.13 m SiGe BiCMOS, in Tech. Dig ISSCC, 2006,pp. 23402349.
 J. Lee, J. Weiner, P. Roux, A. Leven, and Y.-K. Chen, A 24 GS/s 5-bADC with closed-loop THA in 0.18m SiGe BiCMOS, in Proc. CICC2008, pp. 313316.
 C.-C. Hsu, F.-C. Huang, C.-Y. Shih, C.-C. Huang, Y.-H. Lin, C.-C. Lee,and B. Razavi, An 11b 800 MS/s time-interleaved ADC with digitalbackground calibration, in Tech. Dig ISSCC, 2007, pp. 464465.
 K. K. Parhi, VSLI Digital Signal Processing Systems. New York: WileyInterscience, 1999.
 I. Fatadin, S. J. Savory, and D. Ives, Compensation of quadrature imbal-ance in an optical QPSK coherent receiver, IEEE Photon. Technol. Lett.,vol. 20, no. 10, pp. 17331735, Oct. 2008.
 C. Burrus, Block implementation of digital filters, IEEE Trans. CircuitTheory, vol. CT-18, no. 6, pp. 697701, Nov. 1971.
 G. Clark, S. Parker, and S. Mitra, A unified approach to time- andfrequency-domain realization of FIR adaptive digital filters, IEEE Trans.Acoust., Speech, Signal Process., vol. ASSP-31, no. 5, pp. 10731083, Oct.1983.
 B. Spinnler, Complexity of algorithms for digital coherent receivers,presented at ECOC 2009, Vennia, Austria, Paper 7.3.6.
 J. Cooley and J. Tukey, An algorithm for the machine calculation ofcomplex Fourier series, Math. Comput., vol. 19, pp. 297301, 1965.
 T. I. Laakso, V. Valimaki, M. Karjalainen, and U. K. Laine, Splitting theunit delay, IEEE Signal Process. Mag., vol. 13, no. 1, pp. 3060, Jan.1996.
 F. M. Gardner, A BPSK/QPSK timing-error detector for sampled re-ceivers, IEEE Trans. Commun., vol. COM-34, no. 5, pp. 423429, May1986.
 D. Zibar, A. Bianciotto, Z. Wang, A. Napoli, and B. Spinnler, Analysisand dimensioning of a fully digital clock recovery for 112 Gb/s coherentpolmux QPSK systems, presented at ECOC 2009, Paper 7.3.4.
 M. Kuschnerov, F. N. Hauske, E. Gourdon, K. Piyawanno, B. Lankl, andB. Spinnler, Digital timing recovery for coherent fiber optic systems,presented at OFC 2008, Paper JThA63.
1234 IEEE JOURNAL OF SELECTED TOPICS IN QUANTUM ELECTRONICS, VOL. 16, NO. 5, SEPTEMBER/OCTOBER 2010
 C. R. Johnson, Jr., P. Schniter, T. J. Endres, J. D. Behm, D. R. Brown, andR. A. Casas, Blind equalization using the constant modulus criterion: Areview, in Proc. IEEE, vol. 86, no. 10, pp. 19271950, Oct. 1998.
 C. R. Johnson, Jr., P. Schniter, I. Fijalkow, L. Tong, J. D. Behm, M.G. Larimore, D. R. Brown, R. A. Casas, T. J. Endres, S. Lambotharan,A. Touzni, H. H. Zeng, M. Green, and J. R. Treichler, The core of FSE-CMA behaviour theory, in Unsupervised Adaptive Filtering. vol. 2,S. Haykin, Ed. New York: Wiley, 2000, pp. 13112.
 B. Widrow, Thinking about thinking: The discovery of the LMS algo-rithm, IEEE Signal Process. Mag., vol. 22, no. 1, pp. 100106, Jan.2005.
 S. J. Savory, Digital filters for coherent optical receivers, Opt. Exp.,vol. 16, no. 2, pp. 804817, 2008.
 M. G. Taylor, Coherent detection method using DSP to demodulate signaland for subsequent equalisation of propagation impairments, presentedat the ECOC 2003, Rimini, Italy, Paper We4.P.111.
 R. Noe, Phase noise tolerant synchronous QPSK receiver concept withdigital I&Q baseband processing, presented at the OECC/COIN 2004,Yokohama, Japan, Paper 16C2-5.
 D.-S. Ly-Gagnon, K. Katoh, and K. Kikuchi, Coherent demodulation ofdifferential 8-phase-shift keying with optical phase diversity and digitalsignal processing, in Proc. LEOS 2004, pp. 607608.
 A. Leven, N. Kaneda, U.-V. Koc, and Y.-K. Chen, Frequency estimationin intradyne reception, IEEE Photon. Technol. Lett., vol. 19, no. 6,pp. 366368, Mar. 2007.
 L. Li, Z. Tao, S. Oda, T. Hoshida, and J. C. Rasmussen, Wide-range, accu-rate and simple digital frequency offset compensator for optical coherentreceivers, presented at OFC/NFOEC 2008, Paper OWT4.
 S. Hoffmann, S. Bhandare, T. Pfau, O. Adamczyk, C. Wordehoff, R. Pevel-ing, M. Porrmann, and R. Noe, Frequency and phase estimation for co-herent QPSK transmission with unlocked DFB lasers, IEEE Photon.Technol. Lett., vol. 20, no. 18, pp. 15691571, Sep. 2008.
 M. Selmi, Y. Jaouen, and P. Ciblat, Accurate digital frequency offsetestimator for coherent PolMux QAM transmission systems, presented atECOC 2009, Paper P3.08.
 K. Piyawanno, M. Kuschnerov, B. Spinnler, and B. Lankl, Fast andaccurate automatic frequency control for coherent receivers, presented atECOC 2009, Vienna, Austria, Paper 7.3.1.
 M. G. Taylor, Phase estimation methods for optical coherent detectionusing digital signal processing, J. Lightw. Technol., vol. 27, no. 58,pp. 901914, Apr. 2009.
 K. K. Parhi and D. G. Messerschmitt, Pipeline interleaving and par-allelism in recursive digital filters. I. Pipelining using scattered look-ahead and decomposition, IEEE Trans. Acoust., Speech, Signal Process.,vol. 37, no. 7, pp. 10991117, Jul. 1998.
 K. K. Parhi and D. G. Messerschmitt, Pipeline interleaving and paral-lelism in recursive digital filters. II. Pipelined incremental block filtering,IEEE Trans. Acoust., Speech, Signal Process., vol. 37, no. 7, pp. 11181134, Jul. 1998.
 A. J. Viterbi and A. M. Viterbi, Nonlinear estimation of PSK-modulatedcarrier phase with application to burst digital transmission, IEEE Trans.Inf. Theory, vol. IT-29, no. 4, pp. 543551, Jul. 1983.
 N. Kaneda, A. Leven, and Y.-K. Chen, Block length effect on 5.0 Gbit/sreal-time QPSK intradyne receivers with standard DFB laser, Electron.Lett., vol. 43, pp. 11061107, Sep. 2007.
Andreas Leven (S97M00SM07) received thePh.D. (Dr. Ing.) degree from Karlsruhe University,Karlsruhe, Germany, in 2000.
From 1997 to 2000, he was at the FraunhoferInstitute of Applied Solid State Physics, Freiburg,Germany. In 2000, he joined Bell Laboratories,Murray Hill, NJ, were he worked on high-speedoptical receivers and broadband optical digital-to-analog converters. From 2008 to 2009, he was onleave with the Optical Networking Division, Alcatel-Lucent, Nuremberg, Germany. In 2009, he joined Bell
Laboratories, Alcatel-Lucent, Stuttgart, Germany, where he is currently the Headof the High-Speed Systems and Processing Department. His research interestsinclude signal processing for high-data-rate optical communication systems andcoding in optical communications.
Noriaki Kaneda (S98M00SM10) received thePh.D. degree in electrical engineering from the Uni-versity of California, Los Angeles, in 2000.
In 2000, he was a Member of Technical Staffof the Optical Networking Group, Lucent Tech-nologies, Holmdel, NJ. Since 2007, he has beena Member of Technical Staff in the High-SpeedElectronics Research Department, Bell Laboratories,Alcatel-Lucent, Murray Hill, NJ. He was engagedin advanced optical modulation formats, includingdirect-detection differential phase-shift keying and
digital coherent quaternary phase-shift keying for the high-bit-rate optical trans-mission systems. His current research interest includes high-speed digital signalprocessing in optical transmission systems, advanced modulation formats foroptical transmission system, and microwave and millimeter-wave devices andantennas.
Stephen Corteselli studied electrical engineering and mathematics at theMassachusetts Institute of Technology, Cambridge.
He was a Member of the Technical Staff in the Department of Chemistryand Electrical Engineering, MIT. He was an Engineering Consultant at HarvardMedical School, Boston, MA, and Analogic Corporation, Peabody, MA. Fornine years, he was an Engineering Consultant at Bell Laboratories, where hewas engaged in forward-looking work and product development. Since 1997,he has been a Member of the Bell Laboratories Technical Staff.