parallel implementation of all-digital timing recovery for high-speed and real-time optical coherent...

Parallel implementation of all-digital timing recovery for high-speed and real-time optical

coherent receivers

Xian Zhou* and Xue Chen

Key Laboratory of Information Photonics and Optical Communications (BUPT), Ministry of Education, Beijing University of Posts and Telecommunications, No.10 Xi Tu Cheng Road, Haidian, Beijing 100876, China

*[email protected]

Abstract: The digital coherent receivers combine coherent detection with digital signal processing (DSP) to compensate for transmission impairments, and therefore are a promising candidate for future high-speed optical transmission system. However, the maximum symbol rate supported by such real-time receivers is limited by the processing rate of hardware. In order to cope with this difficulty, the parallel processing algorithms is imperative. In this paper, we propose a novel parallel digital timing recovery loop (PDTRL) based on our previous work. Furthermore, for increasing the dynamic dispersion tolerance range of receivers, we embed a parallel adaptive equalizer in the PDTRL. This parallel joint scheme (PJS) can be used to complete synchronization, equalization and polarization de-multiplexing simultaneously. Finally, we demonstrate that PDTRL and PJS allow the hardware to process 112G bit/s POLMUX-DQPSK signal at the hundreds MHz range.

©2011 Optical Society of America

OCIS codes: (060.1660) Coherent communications; (060.4510) Optical communications.

References and links

1. C. R. S. Fludger, T. Duthel, D. van den Borne, and C. Schulien, “Coherent equalization and POLMUX-RZ-DQPSK for robust 100-GE transmission,” J. Lightwave Technol. 26(1), 64–71 (2008).

2. K. Roberts, D. Beckett, D. Boertjes, J. Berthold, and C. Laperle, “100G and beyond with digital coherent signal processing,” IEEE Commun. Mag. 48(7), 62–69 (2010).

3. S. J. Savory, “Digital filters for coherent optical receivers,” Opt. Express 16(2), 804–817 (2008). 4. S. J. Savory, G. Gavioli, R. I. Killey, and P. Bayvel, “Electronic compensation of chromatic dispersion using a

digital coherent receiver,” Opt. Express 15(5), 2120–2126 (2007). 5. N. Kaneda, and A. Leven, “Coherent polarization-division-multiplexed QPSK receiver with fractionally spaced

CMA for PMD compensation,” IEEE Photon. Technol. Lett. 21(4), 203–205 (2009). 6. C. Zhang, Y. Mori, K. Igarashi, K. Katoh, and K. Kikuchi, “Ultrafast operation of digital coherent receivers using

their time-division demultiplexing function,” J. Lightwave Technol. 27(3), 224–232 (2009). 7. M. S. Alfiad, D. van den Borne, S. L. Jansen, T. Wuth, M. Kuschnerov, G. Grosso, A. Napoli, and H. de Waardt,

“A comparison of electrical and optical dispersion compensation for 111-Gb/s POLMUX–RZ–DQPSK,” J. Lightwave Technol. 27(16), 3590–3598 (2009).

8. P. J. Winzer, A. H. Gnauck, C. R. Doerr, M. Magarini, and L. L. Buhl, “spectrally efficient long-haul optical networking using 112-Gb/s polarization-multiplexed 16-QAM,” J. Lightwave Technol. 28(4), 547–556 (2010).

9. A. Leven, N. Kaneda, U. V. Koc, and Y. K. Chen, “Frequency estimation in intradyne reception,” IEEE Photon. Technol. Lett. 19(6), 366–368 (2007).

10. A. J. Viterbi, and A. M. Viterbi, “Nonlinear estimation of PSK-modulated carrier phase with applications to burst digital transmission,” IEEE Trans. Inf. Theory 29(4), 543–551 (1983).

11. Z. Xian, C. Xue, Z. Hai, Z. Weiqing, and F. Yangyang, “Parallel implementation of adaptive equalization for high-speed and real-time optical coherent receivers, “The 19th Annual Wireless and Optical Communications Conference, May 2010.

12. X. Zhou, X. Chen, W. Zhou, Y. Fan, H. Zhu, and Z. Li, “All-digital timing recovery and adaptive equalization for 112 Gbit/s POLMUX-NRZ-DQPSK optical coherent receivers,” J. Opt. Commun. 2(11), 984–990 (2010).

13. F. M. Gardner, “Interpolation in digital modems–Part I: fundamentals,” IEEE Trans. Commun. 41(3), 501–507 (1993).

14. L. Erup, F. M. Gardner, and R. A. Harris, “Interpolation in digital modems–Part II: implementation and performance,” IEEE Trans. Commun. 41(6), 998–1008 (1993).

#143401 - $15.00 USD Received 1 Mar 2011; revised 20 Apr 2011; accepted 21 Apr 2011; published 27 Apr 2011(C) 2011 OSA 9 May 2011 / Vol. 19, No. 10 / OPTICS EXPRESS 9282

15. C. W. Farrow, “A continuously variable digital delay element, “in Proc. IEEE Int. Symp. Circuits & Syst., Espoo, Finland, vol.3, pp.2641–2645, June 1988.

16. F. M. Gardner, “A BPSK/QPSK timing-error detector for sampled receiver,” IEEE Trans. Commun. 34(5), 423–429 (1986).

17. F. M. Gardner, Phaselock Techniques, 3rd ed. (Wiley Interscience, 2007). 18. L. G. Kazovsky, L. Curtis, W. C. Young, and N. K. Cheung, “All-fiber 90° optical hybrid for coherent

communications,” Appl. Opt. 26(3), 437–439 (1987).

1. Introduction

One hundred Gbit/s Ethernet (100-GE) is wildly believed to be the line rate of choice for next-generation high capacity optical transmission systems after the current 10 and 40 Gbit/s systems. Therefore, the high spectral efficiency is required to meet the wavelength division multiplexed (WDM) 50 GHz channel spacing and improve tolerance to polarization mode dispersion (PMD) and chromatic dispersion (CD) [1,2]. One potential candidate is polarization multiplexing (POLMUX) combined with differential quadrature phase-shift keying (DQPSK) modulation, which enables 112 Gbit/s transmission with only 28G symbol/s by encoding 4 bits/symbol. Furthermore, coherent detection is employed in the receiver to preserve all the information on the complex amplitude of the transmitted optical signal, including the state of polarization. This facilitates the use of digital signal processing (DSP) techniques to recover polarization multiplexed data and correct all the line impairments in the electrical domain [3–5]. However, the maximum symbol rate that the digital coherent receivers can process in real-time is strictly limited by the processing speed of DSPs and field programmable gate array (FPGA) chips [6]. Therefore, in most of the transmission experiments based on digital coherent receivers, transmitted data are stored in computer after analog to digital converter (ADC), and DSP is executed offline [7,8]. At the moment, a useful way is to investigate parallel processing algorithms to break the limitation without the need to do extensive hardware development.

In the POLMUX-DQPSK optical coherent receiver, the DSP operations mainly include the modules of coarse CD compensation, timing recovery, adaptive equalization and carrier recovery. The coarse CD compensation has the task of compensating the accumulated CD along the long-haul optical link, which can simplify the design of optical link and abolish in-line optical dispersion compensation. The frequency domain equalization is usually employed by coarse CD compensators to minimize the overall DSP complexity, which is a parallel approach. The carrier recovery usually adopts 4th power [9] and Viterbi-and-Viterbi [10] algorithms to compensate for the frequency and phase mismatch between the incoming signal and local oscillator respectively. They also support parallel processing because they both are based on block processing. In the DSP operations, therefore, the speed limitation of hardware will mainly affect the implementation of timing recovery and adaptive equalization. In [11], we propose a parallel adaptive equalization scheme that can dynamically compensate for PMD and the residual CD by the large-scale parallel processing. In this paper, we focus on how to solve the problem about parallel timing recovery. On the foundation of our previous work [12], we present a novel parallel digital timing recovery loop (PDTRL). Furthermore, in order to reach high dispersion tolerance and wide compensation range of sampling clock offset (SCO), a parallel joint scheme (PJS) that embed a parallel adaptive equalizer in PDTRL is proposed. Finally, the feasibility and effectiveness of the parallel schemes are demonstrated with 112-Gbit/s POLMUX-NRZ-DQPSK signals by the simulation.

2. Parallel digital timing recovery loop (PDTRL)

Figure 1 depicts the proposed PDTRL which is composed of m parallel units (PUs). Each PU will complete timing adjustments for two samples of a symbol per parallel operation.

Therefore, in this case, the clock frequency of hardware will be reduced from sf to 2sf m ,

where sf is the sampling frequency of ADC.


Fig. 1. Block diagram of PDTRL. Z1: unit delay; l: the number of parallel operation, l = 1,2,3…; m: the number of Pus.

From the perspective of a PU, its work approach is similar with a serial timing recovery loop. The interpolators implement timing adjustments so as to remove the timing error of incoming digital signal according to control parameters that are provided by a controller. After, the interpolated samples are detected by a timing error detector (TED) and then the estimated timing error is filtered by a loop filter (LF) that is shared by all the PUs. The output of LF is provided to every controller for computing the control parameters. Here, in order to keep the mutually independent work between PUs, several problems need to be solved in the process of parallel operation, which will be discussed in the following introduction of each module working principle.

2.1 ADC and register

The analogue baseband signal x(t) is quantized using ADC, which is typically sampling at 2 sample/symbol. However the sampling frequency is difficult to reach absolute accuracy because of the non-ideal characteristics of oscillators. The largest SCO considered in this

paper is 1000ppm ( ( 2 )[ 1000 , 1000 ]

2

sf fSCO ppm ppm

f

, f is the symbol rate).

After ADC, the asynchronous samples ( )sx nT (sT is the sampling period of ADC, 1s sT f )

are stored into a register. The register updates 2m samples per parallel operation based on the first-in-first-out (FIFO) principle and counts the number of the stored samples so that the interpolator can easily extract the corresponding samples as its basic samples according to the basepoint .

2.2 Interpolator

Each PU includes two interpolators that are responsible for completing timing adjustments for two samples of a symbol per parallel operation. Considering the tradeoff between performance and computational complexity, the Lagrange Cubic (LC) interpolator [13,14] is still employed in PDTRL. The kth synchronous sample - X(kTi) at the output of interpolator is given

3 3 2

3 2 3 2

( ) [( 22) ] ( 1 6 1 6 ) [( 1) ] (1 1 2 )

[( ) ] ( 1 2 1 2 1) [( 1) ] (1 6 1 2 1 3 ),

i k s k k k s k k k

k s k k k k s k k k

X kT x T

T T

T x

x x

(1)


where Ti denotes synchronized sampling intervals (Ti = T/2, T is the symbol period), x(nTs)

denotes the nth received asynchronous sample, k ( int[ ],k i sT Tk int[x] is the integer

part of x) is the basepoint of X(kTi), which identifies the correct set of signal samples, and k

( k ki sT Tk ) is the fractional interval of X(kTi), which identifies the correct set of

filter coefficients. In order to be easily realized by hardware, the formula (1) can use Farrow structure [15] to

implement. The X(kTi) can be computed from

1 1 1 1

6 2 2 6

1 1 (

2 2

1 1 1 (

6 2 3

( ) ( [( 2) ] [( 1) ] [( ) ] [( 1) ])

[( 1) ] [( ) ] [( 1) ])

[( 2) ] [( 1) ] [( ) ] [( 1) ])

(

)

(

)

i k k s k s k s k

k s k s k s k

k k s k s k s k

kT s T T T

T T T

s T T T

X x T x x x

x x x

x T x x x

[( ) ] . k sTx

(2)

The Farrow structure of LC interpolator illustrated in Fig. 2 computes the synchronous sample directly by the fractional interval rather than the filter coefficients.

Fig. 2. Farrow structure of LC interpolator.

In practice, the sampling frequency offset between transmitter and receiver clocks is far smaller than the sampling rate. Therefore, the timing errors of two adjacent samples can be considered as constant. For the sake of reducing computational complexity, two interpolators in a PU share the control parameters provided by a controller. For example, the mth PU

computes ,2 1l mX and ,2l mX in Fig. 1. ,l m is provided to the (2m-1)th and the (2m)th

interpolator as the fractional interval of ,2 1l mX and ,2l mX , ,l m is used to identify their basic

samples sets corresponding to , , , ,[ ( 2), ( 1), ( ), ( 1)] l m l m l m l mx x x x and ,[ ( 1),l mx

, , ,( ), ( 1), ( 2)] l m l m l mx x x respectively.

Since the input asynchronous samples sequence x are known, the interpolator only need two control information- , to perform its computation. Then, the target of the rest parts is

to calculate and .

2.3. Timing error detector (TED)

Each PU includes a timing error detector that is used to provide the trend and direction of timing adjustment. For low complexity and requirement of sampling rate, the Gardner algorithm [16] is employed in TED, which is based on delay differencing between the current sample and adjacent sample delayed by half the symbol period,


, ,2 1 ,2 ,2( 1)[ ],l k l k l k l kX X X (3)

where ,l k is the estimated timing-error of the [( 1) ]l m k th symbol,

,2( 1) ,2 1,, l k l kX X

,2l kX denote three adjacent timing-adjusted samples, and ,2 1 ,2,l k l kX X

are two samples of the

[( 1) ] l m k th symbol. Based on formula (3), when the clock is recovered, the second

sample of each symbol will be adjusted to its optimum detection. Note that the TDE depends on an output sample of the adjacent PU to complete the timing

error computation. During the two parallel operations, therefore, the sample ,2l mX of the last

PU must be stored and used to compute the timing error of the first symbol in next parallel operation (see Fig. 1).

For (D)QPSK signals, the timing adjustments for imaginary and real part of a symbol must keep the same. Therefore, the estimated timing error of a symbol is the sum of timing errors of the real and imaginary part, which is defined as,

, ,2 1 ,2 ,2( 1) ,2 1 ,2 ,2( 1)[ ] [ ],l k l k l k l k l k l k l kXI XI XI XQ XQ XQ (4)

where ,XI XQ denote the real and imaginary part of signal respectively. After TED, the LF

and controller work the same as Fig. 1. However, the outputs of a controller need provide to four interpolators in a PU, two of interpolators are responsible for completing timing adjustments for the real or imaginary part of (D)QPSK signal.

2.4. Loop filter (LF)

LF filters the phase noise of TED’s output and extracts the stable component to control a number controlled oscillator (NCO). In order to track the timing phase and frequency offset between the transmitter and receiver clocks effectively, the proportional-plus-integral (P + I) LF is usually adopted in the serial timing recovery loop [17]. But the (P + I) LF cannot be used independently in each PU because the integral element of LF is a memory module, that is, the nth output of integrator depend on its (n-1)th output.

Fig. 3. The loop filter’s structure (a) conventional (P + I) LF, (b) accumulated (P + I) LF.

Therefore, all the PUs have to share a (P + I) LF that runs only once per parallel operation period in PDTRL. In this situation, only an estimated timing error will be selected to calculate by the (P + I) LF in each parallel operation. For keeping the information up to date, we use the

last timing error ,l m to compute lW that is defined as output of LF (see Fig. 3(a)). However,

this approach will greatly reduce the tracking speed of LF with increasing the number of PUs, which will significantly impact the performance of synchronization. So that the conventional (P + I) LF cannot support the parallel processing in large-scale.

Because the function of tracking timing frequency offset or timing phase error changes mainly depends on the integral element of LF, so we use the integral element to accumulate all the estimated timing errors in order to keep high tracking speed and accuracy. The difference equation of accumulated (P + I) LF (see Fig. 3(b)) is represented as,

1) Proportional element:


1 ,,

l l mP k (5)

2) Integral element:

2 1 2 2

2 1

,1 ,2 ,

,1 ,2 , ,

l l

l

l l l m

l l l m

I k I k k

k I

(6)

where 1k is a gain of proportional element, 2k is a gain of integral element and is the output

of TED. The output of LF is

.l l lP IW (7)

2.5. Controller

In the controller, two control parameters - and need to be calculated base on the output of

LF and provided to the interpolator. More specifically, the calculation process can be divided into two parts of computing NCO’s value and determining the control parameters.

A. Computing NCO’s value

NCO is a special kind of integrator [13], which can be expressed as

( ) ( 1) ( 1) mod-l,N k N k W k (8)

where ( )N k is the kth sample output of NCO, ( )W k is the kth sample output of LF as a

control word of NCO, which will be nearlys iT T under loop equilibrium condition, the

notation mod-l means that [0,1)N .

According to the formula (8), the computation of ( )N k depends on the ( 1)N k , which

cannot be satisfied in parallel loop because the premise of parallel processing is that the same modules in the different PUs have to run independently. Therefore, the calculation method of NCO has to be changed in PDTRL. In a parallel operation all the NCOs have a same input, and know their location information. As shown in Table 1, we can use above two information to compute the values of NCOs accurately and independently.

Table 1. Two Calculation Method of NCO

NCO’s value

Correlative calculation method

Independent calculation method

,1lN 1, 1( )mod-1l m lN W 1, 1( )mod-1 l m lN W

,2lN ,1

( ) mod -1l l

N W 1, 1( )mod-1l m l lN W W

,l mN , 1( )mod-1 l m lN W 1, 1( ( 1) )mod-1 l m l lN W m W

Here, the mth value of NCO and the output sample of LF in the previous parallel operation

( 1,l mN and 1lW ) have to be preserved as the initial values of NCOs in the current parallel

operation. The preservation and assignment can be carried out during the other modules work, so the real-time property cannot be affected.

B. Determining two control parameters- and

NCO’s values are changed and recycled between 0 and 1 based on W. The changes of NCO can be used to indicate the time of computing a new interpolant, as shown in Fig. 4.


Fig. 4. NCO relation.

The time of NCO register contents decreased through zero corresponds to a time point of synchronous sampling, so the fractional interval can be calculated from the contents of the NCO’s register recycling [13]. From similar triangles in Fig. 4, it can be seen that

(1 )

,( ) ( ) ( )

k s k sT T

N k N k W k

(9)

which can be solved fork as

( ) ( ).

( ) ( ) ( ) ( )k

N k N k

W k N k N k W k

(10)

An estimate for can be obtained by performing the indicated division of two values of

W and N that is available from NCO.


Fig. 5. The relationship among the values of NCO, the basepoint indexes and the sampling times. (a) Ts = Ti, (b) Ts<Ti, W = Ts/Ti = 0.83 and (c) Ts>Ti, W = Ts/Ti = 1.17. For convenient illustration, the values of Ts/Ti in Fig. 5(b), 5(c) are over far from the real values.

Afterward, another control parameter-basepoint index needs to be calculated in order to

identify the correct set of basic samples for a new interpolant. Determination of is closely

associated with the frequency offset between transmitter clock and receiver clock. Figure 5 illustrates the relation among the values of NCO, the basepoint indexes and the sampling times of input and output under three frequency offset conditions. For the sake of discussion, control word W is assumed to be a constant (W = Ts/Ti) without considering the early nonequilibrium stage of loop and N(1) = 0.6 .

The situation of s iT T (see Fig. 0.5(a)) only exists a fix sampling phase offset (SPO), so

the fractional intervals of interpolants keep the same and the basepoint indexes keep a steady

increase (k

1 1k ). There is no change in the number of input and output samples. In

contrast, if time is enough long, the inevitable quantity variances between input and output

samples will exist at the situation of s iT T . In this case, we need depend on to make

interpolators complete the correct adjustment. As can be seen from Fig. 5(b), 5(c), k is

different from that in the situation of s iT T , which always equals 1 1 k . When a sample

should be removed, k will be 1 2 k (see Fig. 5(b)); otherwise, when a sample should be

inserted, k will be 1k (see Fig. 5(c)). Here, we can use the values of N and W to judge the

proper time of inserting or removing a sample. Figure 6(a) shows a whole pseudocode of controller that includes the procedures of calculating NCO and determining two control parameters in the serial digital timing recovery loop (SDTRL).


1

1

( ) ( ( 1) ( 1)) mod -1;if ( 1) ( 1) 1 ; ( ) / ( ); elseif ( ) ( ) 0 2; ( ) ( ( ) ( )) mod -1; ( ) / ( ); else

k k

k

k k

k

N k N k W kN k W k

N k W kN k W k

N k N k W kN k W k

1 1; ( ) / ( );end (a)

k k

k N k W k

, 1, 1

, ,

, , ,

, ,

, ,

, 1,

, ,

, 1, ,

,

( 1) ;

( ) mod -1;

( );

if 0 or 1

( ) mod -1;

2 1;

/ ;

else 2 ;

l k l m l l

l k l k

l k l k l k

l k l l k

l k l k l

l k l m

l k l k l

l k l m l k

l

N N W k W

N N

k N N

N W

N N W

k

N W

k

, / ;

end

(b)

k l k lN W

Fig. 6. Matlab pseudocode of controller for (a) the SDTRL, (b) the PDTRLs.

Different from SDTRL, the basepoint index of each PU has to be gain independently in

PDTRL. Here, the determination of , (0 ) l k k m needs to know whether ,l i (0 )i k

has been adjusted due to inserting or removing a sample. So we assume that the ,l k is equal

to,

, 1, ,2 ,l k l m l kk (11)

where 1,l m is the mth basepoint index of the ( 1)l th parallel operation, , l k is the

adjustment of ,l k , which can be obtained by the location index- k minus the cycle times of

the content of the kth NCO as,

, , ,( ),l k l k l kk N N (12)

where , 1, 1 ( 1) l k l m l lN N W k W . When the ith ( )i k PU has inserted a sample or the kth

PU needs insert a sample, , 1 l k ; when the ith (i<k) PU has removed a sample,

, 1 l k ; when the ith (i<k) PU has no adjustment of sample, , 0l k . However, there is

an ambiguity that ,l k still equals zero when the operation of removing a sample occurs in

the kth PU. So we need an auxiliary condition to decide whether the present PU needs to remove a sample. Furthermore, the Nl,k needs to be update if the remove operation performs in

the ith ( )i k PU. The detailed process of controller for the PDTRL is described in Fig. 6(b).

3. Parallel joint scheme of digital timing recovery and adaptive equalization

In the sampled receivers, the timing recovery is usually required to perform at the front-end of digital signal processing, because the following digital processing algorithms, such as adaptive equalization, carrier frequency estimation etc, strongly depend on synchronous samples. So a certain dispersion tolerance of timing recovery algorithms can be desired.


Fig. 7. Block diagram of parallel timing recovery and equalization.

However, Gardner algorithm of TED has low tolerance towards chromatic dispersion (CD), which leads to the failure of timing recovery when residual CD is larger than about 300ps/nm in the 112-Gbit/s POLMUX-NRZ- DQPSK system. In our previous work [9], a joint scheme of timing recovery and adaptive equalization was proposed, which not only increase the CD tolerance effectively, but also can achieve synchronization, equalization and polarization de-multiplexing simultaneously without any extra computational cost. So this joint method is also employed in this paper. A parallel butterfly-structured adaptive equalizer proposed in [11] is embed in PDTRL, which locates between interpolator and timing error detector, as shown in Fig. 7 . The data type conversion is responsible for combining the output samples of interpolators into the complex signals according to the corresponding relationship between the real and imaginary part of the signals.

4. Simulations and discussions

4.1 Simulation system model

The VPI transmission Maker 8.3 is used for our simulations to build the transmitter, fiber link, and front-end of optical coherent receiver of 112-Gbit/s PM-NRZ-DQPSK System. Then, the following DSP blocks will be carried out by using MATLAB software. The simulation setup is described by Fig. 8.


Fig. 8. (a) Simulation setup of the 112-Gbit/s POLMUX-NRZ-DQPSK system, (b),(c) schematics of the DSP.

In the transmitter, two 28-Gb/s pseudo-random bit sequences (PRBS) with a length of 215

-1 make use of parallel Mach-Zehnder (MZ) modulator to generate a DQPSK signal in single state of polarization (V). The other state of polarization (H) is modulated in similar way. Afterwards, two states of polarization are combined to generate a POLMUX-DQPSK signal by a polarization beam combiner (PBC). Fiber link compose of single model fiber (SMF), erbium doped fiber amplifier (EDFA) and optical filter. Here, optical signal-to-noise ratio (OSNR) can be adjusted by changing the noise figure of EDFA. Afterwards, the received signals are divided into two arbitrary states of polarization (X and Y) via a polarization beam splitter (PBS). Then, both states of polarization are mapped from the optical field into four electrical signals by utilizing the passive quadrature hybrid with balanced detectors [18]. Next, the electrical signals were digitized by ADCs with 8bit of resolution and stored for DSP using MATLAB. The sample rate can be adjusted for generating asynchronous samples.

The schematics of the DSP are shown in Fig. 8(b), 8(c) to test the performance of PDTRL and PJS respectively.

4.2 Simulation results and analysis

A. Parallel digital timing recovery loop results

First of all, the performance of PDTRL is investigated under the condition of a fix SPO

expressed as T . In this case, OSNR is 16.5dB, T is 0.6Ts (Ts = T/2, SCO = 0ppm), without considering other impairments. The parameter of PDTRL- noise bandwidth BL is set to 0.003. The detailed parameter setting method of digital timing recovery loop can be found in [12]. Figure 9(a), 9(b) show the control word (the output of LF) and the fractional interval (the output of controller) for X-polarization respectively. Although the LF of PDTRL runs only once during a parallel operation period, the control words can rapidly converge to near 1 when the number of PUs - m equals 32, 64 and 128, which is attributed to all the estimated timing-errors accumulated in the integral element of LF. The converge speeds of PDTRL nearly keep the same as for SDTRL (see Fig. 9(a)) and the fractional intervals are stabilized to near 0.6 (see Fig. 9(b)). It demonstrates that the PDTRL correctly estimate and compensate the fix SPO. Whereas, in the PDTRL, some residual timing phase errors are introduced by extended update period of proportional element of LF and accumulated in the period of a parallel operation, which cause the jitter of the fractional intervals (see Fig. 9(b)). However, the small phase error cannot be reflected the size of fluctuations in control word and does not


result in a marked increase in the number of error bit. In the above-mentioned situation, the bit error rates (BERs) keep around 3.2E-4.

Fig. 9. (a) The control word of LF for X-polarization and (b) the fractional interval of controller for X-polarization.

In practical coherent optical transmission systems, there is always some frequency mismatch between transmitter and receiver clocks, as mentioned SCO before, rather than a fix SPO. It can be regarded as variational SPOs. Therefore, we continue to investigate the performance of PDTRL at 100ppm, 500ppm and 1000ppm of SCO. Here, other conditions and parameter settings are same as the former simulations. For each measurement point, the result is calculated over 5 sets of the simulated data to enhance the statistical reliability.

Fig. 10. (a) The required number of symbols to achieve synchronization and (b) the BER performances vs. the number of PUs for various SCO.

Figure 10(a) show the required number of symbols to achieve synchronization in PDTRL (judged by whether the mean square error of timing-error (T_MSE) has converged) with m varied from 1 to 160,corresponding to clock frequencies of hardware decreased from 28GHz to 175MHz that is no longer difficult to process with the current DSP chips. It can be seen that the speed of locking synchronization is decreased with the increase of SCO and the higher the SCO is, the speed more vulnerable by the increase of m. These results are obtained at same parameter setting of PDTRL. As the SDTRL, the synchronous speed can be accelerated by increasing BL [12]. After loop equilibrium (locking synchronization), the BERs are measured as shown in Fig. 10(b). It can be seen that the PDTRL can support large-scale parallel processing with a little BER performance penalties and BER has not the trend of deterioration with the increase of SCO. The good BER performance of PDTRL can be attributed to the slow change of SPO, even SCO increased to 1000ppm, which results from the sampling rate is far greater than the sampling frequency offset, and the fast speed for tracking the change of SPO guaranteed by a accumulated (P + I) LF in PDTRL.

Moreover, in the optical coherent receivers a certain carrier phase tolerance of PDTRL can be desired, because the timing recovery usually operates prior to carrier recovery. To


investigate the PDTRL performance at large impacts of carrier phase, we consider using PDTRL with 160 PUs to compensate 1000ppm of SCO under the conditions of 16.5dB of OSNR, 1MHz transmitter and LO laser linewidth and 3GHz of frequency offset between the two lasers.

Fig. 11. (a) The fractional interval of controller, (b) constellation diagrams after timing recovery using PDTRL with 160 PUs.

In Fig. 11(a), 11(b), a cyclical change of fractional interval in every 500 symbols and a ring constellation indicate that despite the presence of severe impairment of carrier frequency and phase offset, the PDTRL successfully recovers synchronous samples. This high tolerance of PDTRL depends on the carrier phase insensitivity of Gardner timing error detection algorithm.

B. Parallel joint scheme results

In the joint scheme, timing recovery and equalization need to work with each other cooperatively and compatibly. Therefore, parameters of the joint scheme are adjusted by two steps. In the beginning, the noise bandwidth BL of timing recovery is set to 0.004 and the convergence parameter u [11] is set to 0.003. These high parameters are used to accelerate

the speed of synchronization and dispersion compensation. When T_MSE and the mean square error of equalizer (E_MSE) of two polarizations both reach the steady state, we reset BL = 0.0015, for improving precision of synchronization and equalization. Besides, the dynamic CD compensation range of joint scheme is determined by the number of FIR taps in the equalizer. The largest of residual CD considered in this paper is 800ps/nm corresponding to no more than 21 taps required in each filter [12]. In the following simulations, the number of taps is set to 17.

Fig. 12. BER vs. the number of PUs for 16.5dB of OSNR, 1000ppm of SCO, (a) 10ps of PMD plus various residue CD, (b) 800ps/nm of CD puls various PMD.

To investigate the performance of PJS, SCO, CD and PMD are added to the signal. Figure 12 shows the BER performance as a function of the number of PUs for 16.5dB of OSNR, 1000ppm of SCO,different amounts of CD and PMD. As the expect, the BER is no notable penalty associated with the increasing CD and PMD which demonstrates that PJS overcomes the problem of low CD tolerance of TED to make synchronization, equalization and


polarization de-multiplexing simultaneously. Furthermore, because the speed and accuracy of synchronization and equalization are guaranteed by the accumulated (P + I) LF and parallel constant modulus algorithm [11] respectively, for an increased number of PUs the BER also does not deteriorate significantly. In above case, the required numbers of symbols for PJS to achieve a stable state (judged by whether T_MSE and E_ MSE of two polarizations both have converged) are no more than 14000, corresponding to 0.5 microseconds.

Fig. 13. BER vs. CD using the SJS and the PJS with 96 PUs respectively for 1000pm of SCO, 800ps/nm of CD, 10ps of PMD and various OSNR.

In Fig. 13, we have compared the results processed by the serial joint scheme (SJS) and the PJS with 96 PUs respectively. Here, SCO = 1000ppm, CD = 800ps/nm, PMD = 10ps and OSNR is varied from 14dB to 18dB. It can be seen from Fig. 13, the BER performance of PJS is almost equivalent to the SJS. However, its requirement for processing rate of hardware is only 0.52% of SJS, which corresponds to hardware frequency of 293MHz. Furthermore, the performance of PJS largely depends on OSNR. The BER can within the FEC limit of 3E-3 at OSNR of 16.5dB or above.

5. Conclusion

Parallel processing algorithms are critical to reduce the requirement for processing rate of hardware in the high-speed optical coherent receivers. In this paper, we have proposed a novel PDTRL based on our previous work and a PJS that merges a parallel adaptive equalizer in the proposed parallel timing recovery loop for improving dispersion tolerance of receivers. Detailed simulations reveal that the PDTRL and PJS can successfully process 112G bit/s POLMUX-NRZ-DQPSK signals by multi-PUs without the significant performance degradation. Their implementations allow clock frequencies of hardware to work at the hundreds MHz range that is no longer difficult to realize by the current DSP chips.

Acknowledgment

This study is supported by National Natural Science Foundation of China (No. 61072053) and National High Technology Research and Development Program of China (No. 2009AA01Z221).


parallel implementation of all-digital timing recovery for high-speed and real-time optical coherent...

Documents