reversible circuit synthesis using a cycle-based approach · reversible circuit synthesis using a...

25
Reversible circuit synthesis using a cycle-based approach Mehdi Saeedi, Morteza Saheb Zamani, Mehdi Sedighi, Zahra Sasanian Quantum Design Automation Lab Department of Computer Engineering and Information Technology Amirkabir University of Technology Tehran, Iran {msaeedi, szamani, msedighi, sasanian}@aut.ac.ir Abstract Reversible logic has applications in various research areas including sig- nal processing, cryptography and quantum computation. In this paper, direct NCT-based synthesis of a given k-cycle in a cycle-based synthe- sis scenario is examined. To this end, a set of seven building blocks is proposed that reveals the potential of direct synthesis of a given permu- tation to reduce both quantum cost and average runtime. To synthesize a given large cycle, we propose a decomposition algorithm to extract the suggested building blocks from the input specification. Then, a synthesis method is introduced which uses the building blocks and the decomposi- tion algorithm. Finally, a hybrid synthesis framework is suggested which uses the proposed cycle-based synthesis method in conjunction with one of the recent NCT-based synthesis approaches which is based on Reed- Muller (RM) spectra. The time complexity and the effectiveness of the proposed synthesis ap- proach are analyzed in detail. Our analyses show that the proposed hy- brid framework leads to a better quantum cost in the worst-case scenario compared to the previously presented methods. The proposed framework always converges and typically synthesizes a given specification very fast compared to the available synthesis algorithms. Besides, the quantum costs of benchmark functions are improved about 20% on average (55% in the best case). 1 Introduction Reversible computing deals with any computational process that is time-invertible, meaning that the process can also be computed backward through time. A nec- essary condition for reversibility is that the transition function applied to map inputs onto outputs works as a one-to-one function to have a unique output as- signment for each input pattern. Generally, conventional logic gates other than NOT are not reversible, as their inputs cannot be determined from the related outputs uniquely. 1 arXiv:1004.4320v2 [quant-ph] 27 Dec 2010

Upload: others

Post on 02-Mar-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Reversible circuit synthesis using a cycle-based approach · Reversible circuit synthesis using a cycle-based approach Mehdi Saeedi, Morteza Saheb Zamani, Mehdi Sedighi, Zahra Sasanian

Reversible circuit synthesis using a cycle-based approach

Mehdi Saeedi, Morteza Saheb Zamani, Mehdi Sedighi, Zahra Sasanian

Quantum Design Automation Lab

Department of Computer Engineering and Information Technology

Amirkabir University of Technology

Tehran, Iran

{msaeedi, szamani, msedighi, sasanian}@aut.ac.ir

Abstract

Reversible logic has applications in various research areas including sig-nal processing, cryptography and quantum computation. In this paper,direct NCT-based synthesis of a given k-cycle in a cycle-based synthe-sis scenario is examined. To this end, a set of seven building blocks isproposed that reveals the potential of direct synthesis of a given permu-tation to reduce both quantum cost and average runtime. To synthesizea given large cycle, we propose a decomposition algorithm to extract thesuggested building blocks from the input specification. Then, a synthesismethod is introduced which uses the building blocks and the decomposi-tion algorithm. Finally, a hybrid synthesis framework is suggested whichuses the proposed cycle-based synthesis method in conjunction with oneof the recent NCT-based synthesis approaches which is based on Reed-Muller (RM) spectra.The time complexity and the effectiveness of the proposed synthesis ap-proach are analyzed in detail. Our analyses show that the proposed hy-brid framework leads to a better quantum cost in the worst-case scenariocompared to the previously presented methods. The proposed frameworkalways converges and typically synthesizes a given specification very fastcompared to the available synthesis algorithms. Besides, the quantumcosts of benchmark functions are improved about 20% on average (55%in the best case).

1 Introduction

Reversible computing deals with any computational process that is time-invertible,meaning that the process can also be computed backward through time. A nec-essary condition for reversibility is that the transition function applied to mapinputs onto outputs works as a one-to-one function to have a unique output as-signment for each input pattern. Generally, conventional logic gates other thanNOT are not reversible, as their inputs cannot be determined from the relatedoutputs uniquely.

1

arX

iv:1

004.

4320

v2 [

quan

t-ph

] 2

7 D

ec 2

010

Page 2: Reversible circuit synthesis using a cycle-based approach · Reversible circuit synthesis using a cycle-based approach Mehdi Saeedi, Morteza Saheb Zamani, Mehdi Sedighi, Zahra Sasanian

One of the motivations for research on reversible computing is that it offersa potential way to improve the energy efficiency of computers beyond the fun-damental Landauer limit introduced in 1961 [1]. Landauer proved that usingconventional irreversible logic gates leads to at least kT × ln2 energy dissipationper irreversible bit operation, regardless of the underlying circuit, where k isBoltzmann’s constant, and T is the temperature of the environment. In 1973,Bennett stated that to avoid power dissipation in a circuit, the circuit mustbe built from reversible gates [2]. This has made reversible computing an at-tractive option for low-power design [3], [4]. Additionally, the field of reversiblecomputing has received considerable attention in quantum computing as eachquantum gate is reversible in nature [5].

Among various open research problems related to the field of reversible com-puting, reversible logic synthesis, defined as the ability to generate an efficientcircuit from a given arbitrary-size specification, is considered as a stepping-stone towards realization of useful reversible hardware. As a result, workingon synthesis methods for reversible circuits has received a significant attentionrecently (for examples see [6], [7] and [8]). As loop and fanout are not allowedin reversible circuits, and each gate must have the same number of inputs andoutputs with unique input/output assignments in the transition function, ma-ture irreversible synthesis algorithms cannot be directly applied to reversiblecircuits.

To synthesize a given reversible specification, the authors of [9] proposed asynthesis algorithm based on NOT, CNOT and Toffoli gates which representsa given permutation as a product of pairs of disjoint transpositions (2-cycles)and synthesizes each pair subsequently. A general permutation should be de-composed into a set of 2-cycles to be synthesizable using their approach. In thispaper, a k-cycle-based synthesis method is proposed and analyzed in detail. Weshow that direct synthesis of large cycles in a cycle-based synthesis scenariocan lead to a significant reduction in quantum cost. In order to achieve this,several building blocks (BBs) and synthesis algorithms are proposed to be usedin the proposed k-cycle-based synthesis method. In addition, a decompositionalgorithm for the synthesis of a general large cycle considering the suggestedbuilding blocks is introduced and analyzed. Based on the characterization ofthe proposed synthesis method, a hybrid synthesis framework, which uses thecycle-based synthesis approach in conjunction with one of the recent methods[6], is also presented. Furthermore, the average-case and worst-case quantumcosts of the proposed synthesis framework are experimented and analyzed indetail.

The main contributions of this paper are as follows.

• The analysis of cycle-based synthesis approach and its usefulness in syn-thesizing reversible functions with different characterizations,

• A k-cycle-based synthesis method with guaranteed convergence,

• A hybrid synthesis framework based on the proposed k-cycle-based syn-thesis method together with the method of [6],

2

Page 3: Reversible circuit synthesis using a cycle-based approach · Reversible circuit synthesis using a cycle-based approach Mehdi Saeedi, Morteza Saheb Zamani, Mehdi Sedighi, Zahra Sasanian

• The improved quantum cost in the worst-case scenario compared to thepreviously presented methods,

• Better average quantum costs for available benchmark functions in theNCT library,

• Improved average runtime compared to the present synthesis algorithmswith favorable synthesis costs.

The rest of this paper is organized as follows: In Section 2, basic conceptsare introduced. The proposed cycle-based synthesis method is presented in Sec-tion 3 where the building blocks and their synthesis algorithms are proposedin Subsection 3.1, the decomposition algorithm and the k-cycle-based synthe-sis method are explained in Subsection 3.2, and the worst-case analysis of theproposed cycle-based approach is discussed in Subsection 3.3. Experimental re-sults and the hybrid synthesis framework are proposed in Section 4 and finally,Section 5 concludes the paper.

2 Preliminaries

Let A be a set and define f : A → A as a one-to-one and onto transitionfunction. The function f is called a permutation function as applying f to Aleads to a set with the same elements of A and probably in a different or-der. If A = 1, 2, 3, · · · ,m there exist two elements ai and aj belonging toA such that f(ai) = aj . In addition, a k-cycle with length k is denoted as(a1, a2, · · · , ak) which means that f(a1) = a2, f(a2) = a3, ..., and f(ak) = a1.A given k-cycle (a1, a2, · · · , ak) could be written in many different ways such as(a2, a3, · · · , ak, a1). A cycle with length 2 is called transposition.

Cycles c1 and c2 are called disjoint if they have no common members, i.e.,∀ai ∈ c1, ai /∈ c2 and vice versa. Any permutation can be written uniquely,except for the order, as a product of disjoint cycles. If two cycles c1 and c2 aredisjoint, they can commute, i.e., c1c2 = c2c1. In addition, a cycle may be writtenin different ways; as a product of transpositions and using different numbers oftranspositions. A cycle (or a permutation) is called even if it can be written asan even number of transpositions. A similar definition is introduced for an oddcycle. Although there may be too many ways to decompose a given cycle intoa set of transpositions, the parity of the number of transpositions used staysthe same, i.e., all resulted decompositions have the same even/odd number oftranspositions. A k-cycle is odd (even) if k is even (odd).

An n-input, n-output, fully specified Boolean function is reversible if it mapseach input pattern to a unique output pattern. In this paper, n is particularlyused to refer to the number of inputs/outputs in a circuit. A gate is calledreversible if it realizes a reversible function. A generalized Toffoli gate CmNOT(x1, x2, · · · , xm+1) passes the first m lines unchanged. These lines are referredto control lines. This gate flips the (m+ 1)th line if and only if the control linesare all one. Therefore, the generalized Toffoli gate works as follows: xi(out) =

3

Page 4: Reversible circuit synthesis using a cycle-based approach · Reversible circuit synthesis using a cycle-based approach Mehdi Saeedi, Morteza Saheb Zamani, Mehdi Sedighi, Zahra Sasanian

Figure 1: Construction of k Toffoli gates with common controls

xi(i < m + 1), xm+1(out) = x1x2 · · ·xm ⊕ xm+1. For m = 0 and m = 1, thegates are called NOT (N ) and CNOT (C ), respectively. For m = 2, the gateis called C2NOT or Toffoli(T ). These three gates compose the universal NCTlibrary and are used in quantum computation frequently [5]. Outputs that arenot required in the function specification are considered as garbage or auxiliarybits. The number of elementary gates required for simulating a given gate iscalled quantum cost.

It has been shown that for n ≥ 5 and m ∈ {3, 4, · · · , bn/2e}, a CmNOT gatecan be simulated by a linear-size circuit which contains 12m − 22 elementarygates. In addition, for n ≥ 7, a Cn−2NOT gate can be simulated by 24n − 88elementary gates with no auxiliary bits [7]. On the other hand, a Cn−1NOTgate can be simulated with an exponential cost 2n − 3 if no garbage line isavailable [10]. To avoid the exponential size and the need for a large number ofelementary gates, several researchers used an extra garbage line for an efficientsimulation of Cn−1NOT gate (e.g., [6]). Generally, the number of availablebits is very restricted in today’s reversible and quantum implementations [11].Therefore, for two circuits with equal linear costs, the one without garbage lineis preferred. The implementation of k Toffoli gates with common controls canbe done by 2k + 3 elementary gates as illustrated in Fig. 1 [12]. Note that aToffoli gate has the cost of 5 whereas NOT and CNOT gates have unit costs.

The authors of [9] proposed an NCT-based synthesis method which appliesNOT, Toffoli, CNOT and Toffoli gates in order (the T |C|T |N synthesis method)to synthesize a given permutation. For the last Toffoli part, the authors pro-posed a synthesis algorithm that maps distinct a, b, c and d (a, b, c, d 6= 0, 2i

to have at least two ones in their binary representations) to 2n − 4, 2n − 3,2n− 2 and 2n− 1 using a circuit called π by at most 5n− 2 Toffoli gates. Then,the permutation (2n − 4, 2n − 3) (2n − 2, 2n − 1) is implemented by a circuit,κ0, using 8(n − 5) Toffoli gates and finally, the reversed π circuit, i.e., π−1, isapplied to transform 2n− 4, 2n− 3, 2n− 2 and 2n− 1 into a, b, c and d, respec-tively. Therefore, the πκ0π

−1 circuit implements the permutation (a, b)(c, d)where a, b, c, d 6= 0, 2i by at most 18n− 44 Toffoli gates.

In contrast, a given k-cycle f=(x0, x1, x2, · · · , xk) is decomposed into a set of

4

Page 5: Reversible circuit synthesis using a cycle-based approach · Reversible circuit synthesis using a cycle-based approach Mehdi Saeedi, Morteza Saheb Zamani, Mehdi Sedighi, Zahra Sasanian

transpositions in [9] by using the decomposition pattern f=(x0, x1) (xk−1, xk)(x0, x2, x3, · · · , xk−1), recursively. Subsequently, each pair of the transpositionsis implemented using the πκ0π

−1 circuit. The proposed approach leads to atmost n NOT gates, n2 CNOT gates and 3(2n + n+ 1)(3n− 7) Toffoli gates [9].An extension of [9] was suggested in [13] which produced better quantum costby applying the unit-cost NOT and CNOT gates instead of using Toffoli gateswith cost 5 in many situations.

In this paper, the πκ0π−1 circuit is improved by a k-cycle-based synthesis

method. For the rest of this paper, we use the same notations as [9] for theκ0, π, and π−1 circuits. In all figures, the (n − 1)th bit represents the mostsignificant bit (MSB) and is shown as the top line in the circuit representations.Similarly, the 0th bit represents the least significant bit (LSB) and is shown asthe bottom line in the circuit representations.

3 k-Cycle-Based Synthesis Method

3.1 Building Blocks

In this subsection, direct synthesis algorithms for seven suggested buildingblocks (i.e., a pair of 2-cycles, a single 3-cycle, a pair of 3-cycles, a single 5-cycle, a pair of 5-cycles, a single 2-cycle (4-cycle) followed by a single 4-cycle(2-cycle), and a pair of 4-cycles) are introduced and evaluated. Consider agiven 5-cycle f=(a1, a2, a3, a4, a5) defined in a 7-bit circuit. Assume that a1,a2, a3, a4, and a5 are neither 0 nor 2i to have at least two ones in their binaryrepresentations. Applying the decomposition method of [9] leads to (a1, a2)(a3, a4) (a1, a3) (a1, a5) transpositions which could be implemented by at most3× (18n− 44) = 54n− 132 Toffoli gates with cost 270n− 660. However, we willshow that a direct 5-cycle implementation of f reduces the total quantum costto at most 60n− 144.

The proposed synthesis method treats the zero and 2i terms different fromthe remaining terms. The first group is handled in a pre-process stage similarto the method presented in [9]. For an arbitrary k-cycle (a1, a2, · · · , ak) in thesecond group, it can be assumed that a1, a2, · · · ak 6= 0, 2i and a1 6= a2 6= · · ·6= ak. Throughout this paper, the binary representation is used where CNOTand Toffoli control bits are demonstrated in bold face and the rightmost bit isnumbered as the 0th (least significant) bit. In order to use the decompositionalgorithm proposed in [7], we assume that n ≥ 7.

Lemma 3.1 The κ0(2,2) circuit (Fig. 2) creates a pair of 2-cycles (2n−4, 2n−3)(2n − 2, 2n − 1) by 24n− 88 elementary gates.

Proof Lemma 20 of [9] proves the correspondence between the κ0(2,2) circuitand above cycles. As for the cost, it can be obtained by applying the results of[14].

5

Page 6: Reversible circuit synthesis using a cycle-based approach · Reversible circuit synthesis using a cycle-based approach Mehdi Saeedi, Morteza Saheb Zamani, Mehdi Sedighi, Zahra Sasanian

Figure 2: The κ0(2,2) circuit Figure 3: The circuit of Theorem 3.1

Figure 4: Synthesis of an arbitrary pair of 2-cycles (a, b) (c, d)

According to Lemma 3.1, the κ0(2,2) circuit implements the particular pairof 2-cycles (2n − 4, 2n − 3) (2n − 2, 2n − 1). In order to implement an arbitrarypair (a, b) (c, d), the circuit is divided into five parts as follows. First, the termsa, b, c and d are changed to 4, 1, 2 and 2n−1+3, respectively. Note that the firstthree terms have only one 1 in their binary representations. As shown in thefollowing theorem, this characterization is used during the synthesis of a pair of2-cycles. Second, a circuit is applied to change 4, 1, 2 and 2n−1 + 3 to 2n − 4,2n−3, 2n−2 and 2n−1 (i.e., the terms used in κ0(2,2) circuit), correspondingly.Afterward, the κ0(2,2) circuit is used which changes 2n − 4, 2n − 3, 2n − 2 and2n − 1 to 2n − 3, 2n − 4, 2n − 1 and 2n − 2, respectively. Applying the secondand the first sub-circuits in the reverse order puts unwanted terms (i.e., allterms except a, b, c and d) back to their original locations and implements thegiven pair of 2-cycles (a, b) (c, d). Fig. 4 demonstrates the complete synthesisscenario. Theorem 3.1 discusses the synthesis of an arbitrary pair of 2-cycles inmore details. The synthesis procedures for other cycles are similar to the oneexplained here as shown later.

Theorem 3.1 (Syn2,2 method): An arbitrary pair of 2-cycles (a, b) (c, d) canbe simulated by at most 34n− 64 elementary gates.

Proof Since a, b, c and d are neither 0 nor 2i, they should have at least twoones in their binary representations. Assume that the c1

th bit of a is 1. One canuse at most one CNOT gate whose control is on c1 to set the 2nd bit of a to 1.Subsequently, by using at most n−1 CNOT gates whose controls are on the 2nd

6

Page 7: Reversible circuit synthesis using a cycle-based approach · Reversible circuit synthesis using a cycle-based approach Mehdi Saeedi, Morteza Saheb Zamani, Mehdi Sedighi, Zahra Sasanian

bit other bits can be set to 0 for converting a to 4 (i.e., 0 · · · 0100). Assume thatafter applying these gates, b, c and d are changed to b′, c′, and d′, respectively.Since b′ should have at least one 1 namely at position c2 (c2 6= 2), b′ can beconverted to 1 (i.e., 0 · · · 01) by at most n CNOT gates using a similar approach.Then, c′ and d′ may be changed to new numbers c′′ and d′′, respectively withoutchanging 4.

Subsequently, c′′ can be converted to 2 (i.e., 0 · · · 010) by at most one Toffoligate and n − 1 CNOT gates with no effects on 4 and 1. Finally, the last termcan be converted to 2n−1 + 3 (i.e., 10 · · · 011) by at most one Toffoli gate andn − 1 CNOT gates with no effect on the previous terms again. Therefore, atmost 4n+ 8 elementary gates are required to transform a, b, c and d into 4, 1, 2and 2n−1 + 3, respectively. Now, the circuit shown in Fig. 3 should be appliedto change 4, 1, 2 and 2n−1 + 3 to the terms used in κ0(2,2) circuit (Lemma 3.1).Considering the applied gates (at most 5n+ 12 elementary gates), the terms a,b, c and d are changed to 2n−4, 2n−3, and 2n−2 and 2n−1, respectively (i.e.,the π2,2 circuit). Now, by using the κ0(2,2) circuit with the cost of 24n − 88,the pair of 2-cycles (2n − 4, 2n − 3) (2n − 2, 2n − 1) is implemented. Applyingthe π−12,2 circuit changes 2n − 4, 2n − 3, and 2n − 2 and 2n − 1 to a, b, c, and

d, respectively. In addition, the circuit π−12,2 puts other unwanted terms back totheir original locations. Therefore, by at most 34n − 64 elementary gates, thepair of 2-cycles (a, b) (c, d) can be implemented.

Example 3.1 Assume that the pair of 2-cycles (5, 3) (9, 67) should be imple-mented in a circuit over 7 bits (i.e., n=7). According to the proof of Theorem3.1, the term 5 should be transformed to 4 by a CNOT gate which has no effecton other terms. Similarly, 3 is transformed to 1 by a CNOT gate which changesthe term 9 to 11 and 67 to 65. Then, 11 is transformed to 2 by two CNOT gateswith no effect on other terms. Finally, 65 is transformed to 67 by a CNOT gate.See the first sub-circuit in Fig. 5 for more details. Now, the circuit shown inFig. 3 should be applied followed by the κ0(2,2) circuit. Finally, as illustratedin the last two sub-circuits of Fig. 5, the above gates (except κ0(2,2) circuit)should be used in the reverse order to construct the complete circuit. In Fig.5, the results of applying all gates on the term 67 are also represented by graysquares where only values 1 are shown for the sake of simplicity. As can be seen,applying all gates changes 67 to 9.

Lemma 3.2 The κ0(3) circuit (Fig. 6) creates the 3-cycle (2n−2k−1−1, 2n−1,2n−1 − 1) by 24n− 88 elementary gates where k = dn/2e.

Proof As shown in Fig. 6, the gates CmNOT(n − 1, n − 2, · · ·, k, k − 1),CkNOT(0, 1, 2, · · ·, k−1, n−1), CmNOT(n−1, n−2, · · ·, k, k−1), CkNOT(0,1, 2, · · ·, k−1, n−1) are applied consecutively in the κ0(3) circuit. After applying

the first CmNOT gate, the locations of 2k minterms (denoted as∑

1={2n− 2k,2n−2k+1, · · ·, 2n−1}) are changed. Particularly, 2n−2k−1−1 (i.e., 1 · · · 101 · · · 1where the underlined 1 is at the (k − 1)th position) ∈

∑1 is changed to 2n − 1

7

Page 8: Reversible circuit synthesis using a cycle-based approach · Reversible circuit synthesis using a cycle-based approach Mehdi Saeedi, Morteza Saheb Zamani, Mehdi Sedighi, Zahra Sasanian

Figure 5: The circuit of Example 3.1

Figure 6: The κ0(3) circuit Figure 7: The circuit of Theorem 3.2

(i.e., 1 · · · 1)(∈∑

1). By applying the CkNOT, the locations of 2m minterms(denoted as

∑2={0×2k +2k−1, 1×2k +2k−1, · · ·, 2m−1×2k +2k−1=2n−1}

are changed (2n−1 ∈∑

1 ∩∑

2). Among them, 2n−1 is exchanged with 2n−1−1(i.e., 01 · · · 1) ∈

∑2. Applying the third CmNOT gate puts all

∑1 minterms at

their right locations except 2n−2k−1−1 and also changes 2n−1 to 2n−2k−1−1.Finally, the last CkNOT gate corrects the locations of all

∑2 members except

2n−1− 1 and 2n− 1. Considering all the exchanges, 2n− 2k−1− 1 is changed to2n− 1, 2n− 1 is changed to 2n−1− 1, and 2n−1− 1 is changed to 2n− 2k−1− 1.

For the second part of the lemma, note that the first and the third gatesshown in Fig. 6 can be implemented by 2× (12× (n− dn/2e)− 22) elementarygates. Similarly, the second and the fourth gates can be implemented by 2 ×(12× dn/2e − 22) gates. Therefore, κ0(3) is implemented by cost 24n− 88.

Theorem 3.2 (Syn3 method): An arbitrary 3-cycle (a, b, c) requires at most32n− 82 elementary gates to be implemented.

Proof Since a, b, and c are neither 0 nor 2i, they should have at least twoones in their binary representations. One can use at most n CNOT gates totransform a to 2n−1 (i.e., 10 · · · 0). After applying these gates, assume that band c are changed to b′ and c′, respectively. By using a similar approach, c′ can

8

Page 9: Reversible circuit synthesis using a cycle-based approach · Reversible circuit synthesis using a cycle-based approach Mehdi Saeedi, Morteza Saheb Zamani, Mehdi Sedighi, Zahra Sasanian

Figure 8: The κ0(3,3) circuit,k = dn/2e

Figure 9: The circuit of Theorem 3.3

be converted to 2n−2 (i.e., 010 · · · 0) by n CNOT gates that may change b′ to anew number b′′ without changing 2n−1. Finally, converting b′′ to 2n−1 + 2k−1

(i.e., 10 · · · 010 · · · 0 where the underlined 1 is at the (k − 1)th position) can bedone by one Toffoli and n− 1 CNOT gates with no effects on the previous 2n−1

and 2n−2 terms. Therefore, by at most 3n+ 4 elementary gates, a, b and c aretransformed into 2n−1, 2n−1 + 2k−1 and 2n−2, respectively. Now, the circuitshown in Fig. 7 should be applied to change the recent terms to the terms usedin κ0(3) circuit.

Considering the applied gates (at most 4n+ 3 elementary gates), the termsa, b, and c are changed to 2n − 2k−1 − 1, 2n − 1, and 2n−1 − 1, respectively(i.e., the π3 circuit). By using the κ0(3) circuit with cost 24n − 88, the 3-cycle

(2n−2k−1−1, 2n−1, 2n−1−1) is implemented. Applying the π−13 circuit changes2n− 2k−1− 1, 2n− 1, and 2n−1− 1 to a, b, and c, respectively. Therefore, by atmost 32n− 82 elementary gates, the 3-cycle (a, b, c) can be implemented. It isworth noting that a single 3-cycle can be a BB by itself because it is even. Aswill be shown later, the same is true for a single 5-cycle.

Lemma 3.3 The κ0(3,3) circuit (Fig. 8) implements the pair of 3-cycles (2n −2k−1 − 1, 2n − 1, 2n−1 − 1) (2n − 2k−1 − 2, 2n − 2, 2n−1 − 2) by 24n − 112elementary gates where k = dn/2e.

Proof It can be verified that the κ0(3,3) circuit differs from the κ0(3) circuit in

its least significant bit (i.e., the 0th bit) which leads to two 3-cycles. The firstand the third gates need 12n− 44 elementary gates. The second and the fourthgates need 12n − 68 elementary gates. Therefore, κ0(3,3) can be implementedby the cost of 24n− 112 gates.

Theorem 3.3 (Syn3,3 method): The implementation of an arbitrary pair of3-cycles (a, b, c) (d, e, f) requires at most 38n− 46 elementary gates.

Proof Use at most 6n+16 elementary gates to convert a to 2n−1 (i.e., 10 · · · 0),b to 2k−1 (i.e., 0 · · · 010 · · · 0 where the underlined 1 is at the (k−1)th position),

9

Page 10: Reversible circuit synthesis using a cycle-based approach · Reversible circuit synthesis using a cycle-based approach Mehdi Saeedi, Morteza Saheb Zamani, Mehdi Sedighi, Zahra Sasanian

Figure 10: The κ0(4,2) circuit, k =dn/2e

Figure 11: The circuit of Theorem 3.4

c to 1 (i.e., 0 · · · 01), d to 2 (i.e., 0 · · · 010), e to 2n−2 (i.e., 010 · · · 0), and f to2n−2 + 6 (i.e., 010 · · · 0110), sequentially. Therefore, the terms a, b, c, d, e, andf are changed to 2n−1, 2k−1, 1, 2, 2n−2, and 2n−2 + 6, respectively. Note thatthe terms a and b can be implemented by only CNOT gates. For each of theother terms, at most one Toffoli and n−1 CNOT gates should be applied. Now,apply the circuit shown in Fig. 9. After applying at most 7n + 33 elementarygates, a, b, c, d, e, and f are transformed into 2n − 2k−1 − 2, 2n − 2, 2n−1 − 2,2n− 2k−1− 1, 2n− 1, and 2n−1− 1, respectively (i.e., π3,3 circuit). By applyingκ0(3,3) and the reversed π3,3 circuit, 38n− 46 elementary gates are used and (a,b, c) (d, e, f) is implemented.

Lemma 3.4 The κ0(4,2) circuit (Fig. 10-a) implements the pair (2n−4, 2n−1,2n − 3, 2n − 2) (2n−1 − 2, 2n−1 − 1) by 36n− 180 elementary gates.

Proof The first Cn−2NOT(n − 1, n − 2, · · ·, 2, 1) gate shown in Fig. 10-achanges 2n − 4, 2n − 3, 2n − 2, and 2n − 1 to 2n − 2, 2n − 1, 2n − 4, and 2n − 3,respectively. The second Cn−2NOT(n− 2, · · · ,2, 1, 0) changes 2n − 2, 2n − 1,2n−1 − 2 and 2n−1 − 1 to 2n − 1, 2n − 2, 2n−1 − 1 and 2n−2 − 2, respectively.Considering the gates sequentially leads to the implementation of κ0(4,2). Thecircuit in Fig. 10-b can be obtained by applying the Lemma 7.3 of [10] on eachCn−2NOT gate of Fig. 10-a and canceling the resulted redundant gates. Thetotal number of 36n− 180 elementary gates can be achieved by a summation ofthe costs of gates in Fig. 10-b.

Theorem 3.4 (Syn4,2 method): An arbitrary pair (a, b, c, d) (e, f) can beimplemented by at most 50n− 122 elementary gates.

Proof Use at most 6n+ 16 elementary gates to convert a to 4 (i.e., 0 · · · 0100),c to 1 (i.e., 0 · · · 01), d to 2 (i.e., 0 · · · 010), e to 2n−2 (i.e., 010 · · · 0), f to 2n−3

(i.e., 0010 · · · 0), and b to 2n−1 + 3 (i.e., 10 · · · 011), sequentially. Note that theterms a and c can be implemented by only CNOT gates. For each of the otherterms, at most one Toffoli and n−1 CNOT gates should be applied. Now, applythe circuit shown in Fig. 11. After applying at most 7n+ 29 elementary gates,

10

Page 11: Reversible circuit synthesis using a cycle-based approach · Reversible circuit synthesis using a cycle-based approach Mehdi Saeedi, Morteza Saheb Zamani, Mehdi Sedighi, Zahra Sasanian

Figure 12: The κ0(4,4) circuit, k =dn/2e

Figure 13: The circuit of Theorem 3.5

the terms a, b, c, d, e, and f are changed to 2n − 4, 2n − 1, 2n − 3, 2n − 2,2n−1 − 2, and 2n−1 − 1, respectively (the π4,2 circuit). Then, apply the κ0(4,2)and the reversed π4,2 circuit (i.e., π−14,2) to complete the implementation of (a,b, c, d) (e, f) by at most 50n− 122 elementary gates.

Lemma 3.5 The κ0(4,4) circuit (Fig. 12-a) implements (2n− 8, 2n− 2, 2n− 6,2n − 4) (2n − 7, 2n − 1, 2n − 5, 2n − 3) by cost 36n− 228.

Proof Consider Fig. 12-a. The first Cn−3NOT(n − 1, n − 2, · · ·, 3, 2) gatechanges 2n − 8, 2n − 7, 2n − 6 and 2n − 5 to 2n − 4, 2n − 3, 2n − 2 and 2n − 1,respectively. The second Cn−2NOT(n− 1, n− 2, · · ·, 2, 1) gate changes 2n − 4,2n − 3, 2n − 2, and 2n − 1 to 2n − 2, 2n − 1, 2n − 4, and 2n − 3, respectively.Considering the gates sequentially leads to the implementation of the cycle.Applying the Lemma 7.3 of [10] on each gate shown in Fig. 12-a and cancelingthe resulted redundant gates transform Fig. 12-a to Fig. 12-b. The totalnumber of 36n − 228 elementary gates can be obtained by summation of thecosts of gates shown in Fig. 12-b.

Theorem 3.5 (Syn4,4 method): An arbitrary pair (a, b, c, d) (e, f , g, h) canbe implemented by at most 56n− 126 elementary gates.

Proof Use at most 9n+22 elementary gates to sequentially convert a to 8 (i.e.,0 · · · 01000), c to 2 (i.e., 0 · · · 010), d to 4 (i.e., 0 · · · 0100), e to 1 (i.e., 0 · · · 01), fto 2n−2 (i.e., 010 · · · 0), g to 2n−1 (i.e., 10 · · · 0), h to 2n−3 (i.e., 0010 · · · 0) andb to 14 (i.e., 0 · · · 01110). Note that a and c can be transformed to 8 and 2 byonly CNOT gates, respectively. In addition, for each term d, e, f , g, and h atmost one Toffoli and n− 1 CNOT gates should be used. For the last term b atmost two Toffoli gates should be used to set the 2nd and 3rd bits to 1. Then, atmost n− 2 Toffoli gates should be applied to set the 1st bit to 1 and the ith bitto 0 where 0 ≤ i ≤ n−1, i 6= 1, 2, 3. The n−2 Toffoli gates can be implementedby cost 2(n − 2) + 3 (see Fig. 1) since all Toffoli gates use the same controllines (i.e., the 2nd and 3rd bits). Note that for n ≥ 8, the term b can also beimplemented by at most one Toffoli and n − 1 CNOT gates. Now, apply the

11

Page 12: Reversible circuit synthesis using a cycle-based approach · Reversible circuit synthesis using a cycle-based approach Mehdi Saeedi, Morteza Saheb Zamani, Mehdi Sedighi, Zahra Sasanian

Figure 14: The κ0(5) circuit Figure 15: The circuit of Theorem 3.6

circuit shown in Fig. 13. After applying at most 10n+ 51 elementary gates, a,b, c, d, e, f , g, and h are changed to 2n−8, 2n−2, 2n−6, 2n−4, 2n−7, 2n−1,2n− 5, and 2n− 3, respectively (π4,4). Then, apply κ0(4,4) and π−14,4 to completethe implementation of (a, b, c, d) (e, f , g, h) by at most 56n− 126 elementarygates.

Lemma 3.6 The κ0(5) circuit (Fig. 14) implements the 5-cycle (2n−2−1, 2n−1,2n − 2n−2 − 1, 2n−1 − 1, 2n − 2n−3 − 1) with cost 48n− 166.

Proof As illustrated in Fig. 14, four gates T(n−1, n−2, n−3), Cn−2NOT(0,· · ·, n − 3, n − 1), T(n − 1, n − 2, n − 3), Cn−2NOT(0, · · ·, n − 1, n − 2) areapplied sequentially. After applying the first Toffoli gate, the locations of 2n−2

minterms (i.e.,∑

1 = {2n − 2n−2, 2n − 2n−2 + 1, · · ·, 2n − 1}) are changed.Mainly, 2n− 2n−3− 1 (i.e., 1101 . . . 1) ∈

∑1 is changed to 2n− 1 (∈

∑1). After

the second Cn−2NOT, the locations of 4 minterms (denoted as∑

2={2n−2 − 1,2n−1−1, 2n−2n−2−1, 2n−1}) are changed (where 2n−1 ∈

∑1 ∩∑

2). Amongthem, 2n − 1 is changed to 2n−1 − 1 ∈

∑2, and 2n−1 − 1 is changed to 2n − 1.

Applying the third Toffoli gate puts all∑

1 minterms at their right locationsexcept 2n − 2n−3 − 1. In addition, it changes 2n − 1 to 2n − 2n−3 − 1. Finally,the last Cn−2NOT gate changes the locations of four minterms as 2n−1 − 1 to2n−2 − 1, 2n − 1 to 2n − 2n−2 − 1, 2n − 2n−2 − 1 to 2n − 1, and 2n−2 − 1 to2n−1− 1. Considering all minterm exchanges, it can be verified that the 5-cycleκ0(5) is implemented by the circuit of Fig. 14. The total number of 48n − 166elementary gates can be obtained by a summation of the costs of gates in Fig.14.

Theorem 3.6 (Syn5 method): An arbitrary 5-cycle (a, b, c, d, e) can be im-plemented by at most 60n− 130 elementary gates.

Proof Use at most 5n+12 elementary gates to convert a to 2n−3 (i.e., 0010 · · · 0),d to 2n−2 (i.e., 010 · · · 0), c to 2n−1 (i.e., 10 · · · 0), e to 2n−4 (i.e., 00010 · · · 0)and b to 2n−1 + 2n−2 + 2n−3 + 1 (i.e., 1110 · · · 01), sequentially. Note that a andd can be transformed to 2n−3 and 2n−2 by only CNOT gates, respectively. Foreach of the other terms at most one Toffoli and n − 1 CNOT gates should be

12

Page 13: Reversible circuit synthesis using a cycle-based approach · Reversible circuit synthesis using a cycle-based approach Mehdi Saeedi, Morteza Saheb Zamani, Mehdi Sedighi, Zahra Sasanian

Figure 16: The κ0(5,5) circuit, k = d(n− 1)/2e

Figure 17: The circuit of Theorem 3.7

used. Then, apply the circuit shown in Fig. 15. After using the applied gates(at most 6n + 18 elementary gates), the terms a, b, c, d, and e are changed to2n−2 − 1, 2n − 1, 2n − 2n−2 − 1, 2n−1 − 1, and 2n − 2n−3 − 1, respectively (π5).Therefore, by applying the κ0(5) circuit and the π−15 circuit, the 5-cycle (a, b, c,d, e) is implemented by at most 60n− 130 elementary gates.

Lemma 3.7 The κ0(5,5) circuit (Fig. 16) implements the pair of 5-cycles (2n−2

−2, 2n−2, 2n−2n−2−2, 2n−1−2, 2n−2n−3−2) (2n−2−1, 2n−1, 2n−2n−2−1,2n−1 − 1, 2n − 2n−3 − 1) by cost 36n− 206.

Proof It can be verified that the κ0(5,5) circuit shown in Fig. 16-a differs from

the κ0(5) circuit in its least significant bit (i.e., the 0th bit) which results intwo 5-cycles. Applying Lemma 7.3 of [10] on each gate shown in Fig. 16-a andcanceling the resulted redundant gates transformed Fig. 16-a to Fig. 16-b. Thetotal number of 36n− 206 elementary gates can be obtained by a summation ofthe costs of gates shown in Fig. 16-b.

Theorem 3.7 (Syn5,5 method): An arbitrary 5-cycle (a, b, c, d, e) (f , g, h, i,j) can be implemented by at most 64n− 54 elementary gates.

13

Page 14: Reversible circuit synthesis using a cycle-based approach · Reversible circuit synthesis using a cycle-based approach Mehdi Saeedi, Morteza Saheb Zamani, Mehdi Sedighi, Zahra Sasanian

Table 1: Maximum cost comparison for the proposed BBs

BB LengthOur Approach [13]

κ0 π, π−1 Total Cost/Length Total(2,2) 4 24n-88 5n+12 34n-64 8.5n-16 34n-64(3) 3 24n-88 4n+3 32n-82 10.7n-27.3 68n-128(3,3) 6 24n-112 7n+33 38n-46 6.3n-15.3 68n-128(2,4) 6 36n-180 7n+29 50n-122 8.3n-20.3 68n-128(4,4) 8 36n-228 10n+51 56n-126 7n-15.7 102n-192(5) 5 48n-166 6n+18 60n-130 12n-26 102n-192(5,5) 10 36n-206 14n+76 64n-54 6.4n-5.4 136n-256

Proof Apply at most 13n+47 elementary gates to convert a to 2 (i.e., 0 · · · 010),d to 4 (i.e., 0 · · · 0100), e to 2n−4 (i.e., 00010 · · · 0), f to 2n−3 (i.e., 0010 · · · 0), hto 2n−1 (i.e., 10 · · · 0), i to 2n−2 (i.e., 010 · · · 0), j to 1 (i.e., 0 · · · 01), b to 2n−1+4(i.e., 10 · · · 0100), c to 2n−1 + 2 (i.e., 10 · · · 010) and g to 2n−1 + 2n−2 + 2n−3

(i.e., 1110 · · · 0) sequentially. Note that a and d can be transformed to 2 and 4by only CNOT gates, respectively. In addition, for each of other terms e, f , h,i, and j at most one Toffoli gate and n−1 CNOT gates should be used. For thelast three terms b, c, and g at most two Toffoli gates should be used to set thecontrol bits to 1. Then, at most n − 2 Toffoli gates should be applied for eachterm. For n ≥ 10, the terms b, c, and g can also be implemented by at most oneToffoli gate and n − 1 CNOT gates which lead to 10n + 32 elementary gates.Now, apply the circuit shown in Fig. 17. By using at most 14n+ 76 elementarygates, the terms a, b, c, d, e, f , g, h, i, and j are changed to 2n−2 − 2, 2n − 2,2n−2n−2−2, 2n−1−2, 2n−2n−3−2, 2n−2−1, 2n−1, 2n−2n−2−1, 2n−1−1,and 2n − 2n−3 − 1, respectively (the π(5,5) circuit). Then, apply the κ0(5,5) and

the π−1(5,5) circuit to implement the cycles (a, b, c, d, e) (f , g, h, i, j) by at most

64n− 54 elementary gates.

So far, direct implementations of the selected building blocks have beenstudied. Table 1 shows a summary of the achieved results for direct implemen-tations of the selected building blocks. In this table, the maximum number ofelementary gates of our direct synthesis method and the 2-cycle-based method[13] for the set of proposed building blocks are compared. As demonstrated inthis table, the direct k-cycle-based implementation has a significant potentialto reduce the cost. However, as the direct implementation of a general k-cyclecould be very hard, in this paper a decomposition algorithm is also proposed tobe used in conjunction with the selected set of building blocks.

3.2 Decomposition Method

In the rest of this paper, 2, 3, 4 and 5 cycles are called elementary cycles. Foran arbitrary single permutation P , we would like to decompose it into a set ofelementary cycles like c1, c2, ..., ck such that applying P would be identical to

14

Page 15: Reversible circuit synthesis using a cycle-based approach · Reversible circuit synthesis using a cycle-based approach Mehdi Saeedi, Morteza Saheb Zamani, Mehdi Sedighi, Zahra Sasanian

applying c1, c2, ..., ck, sequentially; and c1, c2, ..., ck as well as P would belongto a single permutation group.

To describe the decomposition method, the following notations are used: Pas an input permutation, m as the maximum cycle length available in P , Ck asa cycle of length k, Ck,i(k) as the set of i(k) cycles each of which is of length k,

Cjk (j ≤ i(k)) as the jth cycle of the cycle set Ck,i(k), N(k) as the number of

disjoint 5-cycles in a given k-cycle, L(k) as the length of a given k-cycle afterdetaching N(k) disjoint 5-cycles, and E(k) as the length of a given k-cycle afterdetaching all of the available disjoint/non-disjoint 5-cycles in the given k-cycle.

Any permutation P can be written uniquely, except for the order, as aproduct of disjoint cycles. Without loss of generality, we assume that P =Cm,i(m) Cm−1,i(m−1) · · · C3,i(3) C2,i(2) where ∀k ∈ (2, · · ·, m): i(k) ≥ 0. Foreach Ck,i(k) (k > 5) in P , Ck,i(k) is decomposed into a set of cycles of lengths5, 4, 3, and 2, sequentially. In addition, for any two cycles Ck,i(k) and Cj,i(j)

(k > j), Ck,i(k) is processed first. Consider a given k-cycle (1, 2, 3, 4, · · ·, k)(k > 5). It is possible to decompose it into two cycles (1, 2, 3, 4, 5) (6, 7, · · ·, k,1) of length 5 and (k − 4), respectively. Repeating the process leads to N(k)=bk/5c disjoint 5-cycles and a cycle of length L(k)=N(k)+(k mod 5) with somenon-disjoint members. This process is called the 5-cycle extraction method inthe rest of the paper.

Since Ck,i(k) ∀k ∈ (2, · · ·, m) contains i(k) cycles of length k, one can write

Ck,i(k)= C1k C2

k · · · Ci(k)k . For each Ck and by using the 5-cycle extraction

method, Ck=C5,1 Ck−4,1=C5,2 Ck−8,1=...=C5,N(k) CL(k),1. Repeating this pro-cess for L(k), L(L(k)), etc. lead to Ck=C5,N(k) C5,N(L(k)) C5,N(L(L(k))) · · ·C5,N(L(L···(k))) CE(k),1. Note that E(k) is smaller than 5. Since there are i(k)cycles of length k, Ck,i(k) = C5,N(k)×i(k) C5,N(L(k))×i(k) C5,N(L(L(k)))×i(k), ...,C5,N(L(L···(k)))×i(k) CE(k),i(k).

It can be verified that the resulted elementary cycle of a k-cycle (k > 5) hasno common members with other cycles. In addition, all disjoint/non-disjoint5-cycles (detached from a k-cycle) are disjoint over other cycles. Therefore, theinput permutation P can be written as (1). See Example 3.2 for more details.

P =(C5,N(k)×i(k)

)∣∣mk=5

(C5,N(L(k))×i(k)

)∣∣mk=5

...(C5,N(L(L(...(k)))×i(k)

)∣∣mk=5

C4,i′(4)C3,i′(3)C2,i′(2)

where :

i′(4) = i(4) +m∑

k=5

i(k)|E(k)==4,

i′(3) = i(3) +m∑

k=5

i(k)|E(k)==3,

i′(2) = i(2) +m∑

k=5

i(k)|E(k)==2

(1)

Example 3.2 Consider P = (3, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19,20, 21) (22, 23, 24, 25, 26, 27) (28, 29) (30, 31) written as C16,i(16) C6,i(6)

C2,i(2). It can be verified that m = 16, i(16) = 1, i(6) = 1, i(2) = 2, andi(k) = 0 for k ∈ (3, 4, 5, 7, 8, · · ·, 15). We have:

15

Page 16: Reversible circuit synthesis using a cycle-based approach · Reversible circuit synthesis using a cycle-based approach Mehdi Saeedi, Morteza Saheb Zamani, Mehdi Sedighi, Zahra Sasanian

• k = 16

– N(16)= b16/5c=3, L(16)=3 + 1=4

– N(L(16))=N(4)=0, L(L(16))=4=E(16)

• k = 6

– N(6)=1, L(6)=2=E(6)

• k = 2

– N(2)=0, L(2)=2=E(2)

Therefore P= (C5,3 C5,1) (C4,1 C2,3) = (3, 5, 6, 7, 9) (10, 11, 12, 13, 14) (15, 17,18, 19, 20) (22, 23, 24, 25, 26) (21, 3, 10, 15) (22, 27) (28, 29) (30, 31).

Considering the 5-cycle extraction method, the extraction time complexityof each k-cycle can be written as O(k) + O(L(k)) + O(L(L(k)) + · · · + O(E(k))≤ O(ηk) where η is an integer smaller than k. Therefore, each given k-cycleis processed with the time complexity of O(k). On the other hand, as thereare i(k) ≥ 0 cycles of length k, the total time complexity of the decompositionmethod is O(m) × i(m) + O(m − 1) × i(m − 1) + · · ·+ O(2) × i(2) whereO(i(k))=O(2n/k), k < 2n for k ∈ (2 · · ·m). Therefore, we have O(m) × i(m)+ O(m − 1) × i(m − 1) + · · ·+ O(2) × i(2)=O(m2)=O(22n) as m < 2n. It isimportant to note that the decomposition algorithm of [13] works with the sameO(22n) time complexity. After the decomposition stage, the resulted elementarycycles should be implemented by using the proposed synthesis algorithms. Notethat the total number of extracted 5-cycles is O(k)+O(L(k))+O(L(L(k)))+· · ·which is equal to O(k). Considering all k-cycles (k ≥ 5), the total number of5-cycles is O(22n) as explained above. In addition, as each k-cycle (k ≥ 5)could produce at most one elementary cycle with length 2, 3 or 4, the totalnumber of elementary cycles is at most

∑k=2···m i(k)=O(2n). Therefore, the

total number of elementary cycles is O(22n) that leads to the time complexity ofO(22n)×O(SynthesisAlgorithm). It can be verified that the proposed synthesisalgorithms for the elementary cycles are of O(n). As a result, the total timecomplexity of the proposed approach is O(22n × n), the same as [13].

To count the maximum number of elementary cycles in the proposed method,note that the number of 5-cycle pairs, 3-cycle pairs and 4-cycle pairs resultedfrom the decomposition algorithm areNum5,5 =

⌊12

∑k=5···mN(k) +N(L(k)) + · · ·

⌋,

Num3,3 = 12 i′(3), and Num4,4 = 1

2 i′(4), respectively. On the other hand, at

most one single 5-cycle, one single 3-cycle and one 4-cycle followed by a 2-cycleare produced, i.e., Num5 = mod(

∑k=5···m N(k) + N(L(k)) + · · ·, 2), Num3

= mod(i′(3), 2) and Num4,2 = mod(i′(4), 2). Finally, the number of 2-cyclepairs is Num2,2=

⌊12 (i′(2)−Num4,2)

⌋. Altogether, the maximum number of

elementary gates resulted in the proposed k-cycle-based synthesis method can

16

Page 17: Reversible circuit synthesis using a cycle-based approach · Reversible circuit synthesis using a cycle-based approach Mehdi Saeedi, Morteza Saheb Zamani, Mehdi Sedighi, Zahra Sasanian

Step1Fix 0 and 2i terms use a pre-process stage as done in [9].

Step2if n < 7

1- Decompose the input permutation into a set of 2-cycles.2- Apply Syn2,2 to synthesize all 2-cycles

else1- Decompose the input permutation into a set of 5, 4, 3, and 2 cycles2- Synthesize all disjoint 5-cycle pairs (Syn5,5)3- Synthesize single 5-cycles (Syn5)4- Synthesize all disjoint 3-cycle pairs (Syn3,3)5- Synthesize single 3-cycles (Syn3)6- Synthesize all disjoint 4-cycle pairs (Syn4,4)7- Synthesize all disjoint 4-cycle and 2-cycle pairs (Syn4,2)8- Synthesize all disjoint 2-cycle pairs (Syn2,2)

Figure 18: The k-cycle-based synthesis method

be expressed by (2). See the following examples for more details.

Num5,5 × (64n− 54) +Num5 × (60n− 130) +Num3,3 × (38n− 46)+Num3 × (32n− 82) +Num4,4 × (56n− 126) +Num4,2 × (50n− 122)+Num2,2 × (34n− 64)

(2)

Example 3.3 Again, reconsider the permutation of Example 3.2, P = C16,1

C6,1 C2,2 where Num5,5 = 12 b(N(16) +N(6))c = 2, Num5 = 0, Num3,3 = 0,

Num3 = 0, Num4,4 = 0, Num4,2 = 1, and Num2,2 =⌊12 (3− 1)

⌋= 1. At most

2 × (64n − 54) +(50n − 122) +(34n − 64) = 212n − 294 elementary gates areproduced using our k-cycle-based synthesis method.

Example 3.4 Let P=(3, 5, 6, 7, 9, 10, 11, 12, 13, 14) (15, 17, 18, 19, 20, 21)(22, 23, 24) (25, 26, 27) (28, 29, 30) (P = C10,1 C6,1 C3,3). After decomposition,we have P = C5,3 C3,3 C2,2. After applying the proposed method, Num5,5 =1, Num5 = 1, Num3,3 = 1, Num3 = 1, Num4,4 = 0, Num4,2 = 0, Num2,2

= 1 and at most 2 × (64n − 54) + (60n − 130) + (38n − 46) + (32n − 82) +(34n− 64) = 292n− 430 elementary gates are produced.

As stated at the beginning of Section 3, the zero and 2i terms are fixed byapplying a few Toffoli and CNOT gates as done in [9]. In addition, For small n(i.e., n < 7), the decomposition algorithm is modified to produce only 2-cycleswhere each cycle pair is synthesized by the Syn2,2 method. The complete k-cycle-based synthesis method is shown in Fig. 18. For n ≥ 7, a given permuta-tion is recursively decomposed into a set of elementary cycles each of which issynthesized by the synthesis algorithm listed in parentheses as discussed.

17

Page 18: Reversible circuit synthesis using a cycle-based approach · Reversible circuit synthesis using a cycle-based approach Mehdi Saeedi, Morteza Saheb Zamani, Mehdi Sedighi, Zahra Sasanian

Theorem 3.8 The proposed k-cycle-based synthesis method always converges.

Proof According to the proofs of Theorem 3.1 to Theorem 3.7, the suggestedbuilding blocks (i.e., a pair of 2-cycles, single 3-cycle, a pair of 3-cycles, single5-cycle, a pair of 5-cycles, a single 2-cycle (4-cycle) followed by a single 4-cycle(2-cycle), and a pair of 4-cycles) can always be synthesized for any arbitraryvalues of cycle elements for n ≥ 7 as far as each cycle element is neither 0 nor2i. In addition, by using the proposed decomposition algorithm, a given largecycle can always be decomposed into a set of elementary cycles. For small n (i.e.,n < 7), the decomposition algorithm produces only 2-cycles where each pair canalways be synthesized by the Syn2,2 method. Considering the pre-process stagefor the zero and 2i terms and the synthesis scenarios for n < 7 and n ≥ 7 asexplained above lead to the theorem.

3.3 Worst Case Analysis

To analyze the total number of elementary gates resulted from the proposedk-cycle-based synthesis method in the worst case, assume that the maximum ofm members (a1, a2, · · ·, am) of a given permutation P are moved. As each ak,k ∈ (2, · · · ,m) is neither 0 nor 2i, m is equal to 2n − n − 1 for an even n andequal to 2n − n− 2 for an odd n.

Theorem 3.9 The maximum number of elementary gates in the proposed cycle-based synthesis method is calculated by 8.5n2n + o(2n).

Proof In order to place each row at its right position, several reversible gatesshould be applied in the proposed method. The worst-case cost occurs for themaximum number of changed rows (i.e., m = o(2n)). The synthesis costs listedin Table 1 (i.e., Cost/Length) indicate that the cost of correcting a single row is8.5n− 16 for a pair of 2-cycles, 10.7n− 27.3 for a single 3-cycle, 6.3n− 15.3 fora pair of 3-cycles, 8.3n − 20.3 for a single 2-cycle followed by a single 4-cycle,7n− 15.7 for a pair of 4-cycles, 12n− 26 for a single 5-cycle and 6.4n− 5.4 fora pair of 5-cycles.

For a decomposition with 2n changed rows, there are at most one single5-cycle and one single 3-cycle. Considering the cost of 12n− 26 for correcting asingle row in a single 5-cycle, 10.7n− 27.3 in a single 3-cycle and 8.5n− 16 in apair of 2-cycles, it can be verified that the worst-case cost for a decompositionwith 2n changed rows is 8.5n2n + o(2n).

Theorem 3.9 shows a lower upper bound for k-cycle-based synthesis methodcompared to the best reported upper bound of 11n2n + o(n2n) for the synthesisalgorithm proposed in [6]. Given the fact that the 8.5n2n term is dominant overthe o(2n) term, the former will be used in the remainder of this subsection forcost analysis.

Reversible logic has application in quantum computing [10], [5]. Most quan-tum algorithms presume that interaction between arbitrary qubits is possible

18

Page 19: Reversible circuit synthesis using a cycle-based approach · Reversible circuit synthesis using a cycle-based approach Mehdi Saeedi, Morteza Saheb Zamani, Mehdi Sedighi, Zahra Sasanian

with no extra cost. However, some restrictions exist in real quantum tech-nologies [15]. For example in a Linear Nearest Neighbor (LNN) architecture,only adjacent qubits may interact. The implementation complexity with lim-ited interaction depends on the relative target and control positions. It canbe modeled by using a sequence of SWAP gates to move controls and targetsclose to each other to construct appropriate gates. Theorem 3.10 examines theproposed method for LNN architecture.

Theorem 3.10 The maximum number of elementary gates in the proposed k-cycle-based synthesis method for LNN architecture is equal to 51n22n.

Proof To prove, the number of required SWAP operations performing a 2-qubit gate g with control c and target t has to be found. We assume c > t.It can be verified that (c − t − 1) SWAP operations are required to bring thecontrol adjacent to the target, one gate is required to perform g, and the samesequence of (c − t − 1) SWAP operations are required to return value of theith (t < i ≤ c) qubit to its initial value. Considering a cost 3 for each SWAPoperation leads to 6× (c− t− 1) + 1. The case of c ≤ t can be readily deducedby following the same approach.

The theorem can be proven by using Theorem 3.9 and plugging in the costfound above.

4 Experimental Results

The proposed k-cycle-based synthesis method and the 2-cycle-based algorithmpresented in [13] were implemented in C++ and all of the experiments weredone on an Intel Pentium IV 2.2GHz computer with 2GB memory. In addition,we used one of the most recent and efficient NCT-based synthesis tools proposedin [6] for our comparisons. This method used Reed-Muller (RM) spectra in aniterative synthesis procedure (RM-based method). In all experiments, the post-processing algorithm proposed in [13] was applied to simplify circuits producedby our synthesis method and the algorithm of [13]. In this method, optimalcircuits for all 40320 3-input reversible functions and a large set of 4-inputcircuits were generated and stored in a compact data-structure. As a result,applying the post-processing algorithm of [13] leads to optimal results for all3- and some 4-input specifications. The synthesis algorithm of [6] was appliedin “synthesized/ resynthesized using 3 methods” mode for circuits with n < 15and in “synth/resynth with MMD (15+ variables)”for n ≥ 15. In addition, thesynthesis algorithm, the template matching method, the random and exhaustivedriver algorithms were applied sequentially to synthesize each function with atime limit of 12 hours as done in [6]. Bidirectional and quantum cost reductionmodes were also applied.

To evaluate the proposed synthesis method, the completely specified re-versible benchmarks from [16] were examined. In addition, the best documentedsynthesis costs available at [16] resulted from applying different NCT-based syn-thesis tools were used for our comparisons. In some cases, the synthesis results

19

Page 20: Reversible circuit synthesis using a cycle-based approach · Reversible circuit synthesis using a cycle-based approach Mehdi Saeedi, Morteza Saheb Zamani, Mehdi Sedighi, Zahra Sasanian

in NCT library for some benchmarks have not been reported yet (these func-tions are N-th prime functions over more than 7 bits, hamming coding functions(hwb) over more than 11 bits1 and permanent functions). In those cases, weapplied the synthesis method of [6] which works efficiently in terms of quan-tum cost with a time limit of 12 hours. If it failed to synthesize a function inthe given time limit (for hwb functions over more than 11 bits and N-th primefunctions over more than 10 bits, the algorithm failed), the method of [13] wasapplied. All synthesis algorithms were compared in terms of the quantum costas done in [16]. Our actual circuits are available from [18].

The results of the proposed k-cycle-based synthesis method (Pure k-cycle)and the best synthesized circuits resulted from the previous NCT-based synthe-sis algorithms (Best Results) were shown in Table 2. A comparison of thesynthesis costs of the proposed k-cycle-based method and the best reportedones reveals that the cycle-based approach treats differently in terms of thequantum cost for different benchmarks (for examples see the results of hwb11and cycle10 2). In the rest of this section, by analyzing the characteristics ofdifferent benchmarks, a hybrid synthesis framework is proposed which uses thecycle-based method in conjunction with the method of [6] to synthesize a givenfunction. As shown later, the proposed hybrid framework can improve the av-erage quantum costs efficiently.

To evaluate the behavior of k-cycle-based synthesis method, a Distance met-ric is defined as (3) for each reversible function f where 0 ≤ Distance(f) ≤ 1.

Distance(f) =

i=2n−1∑i=0

|f(i)− i|/(22n−1) (3)

For a given function f , Distance(f) models the distribution of output codewords compared with the identity function. Fig. 19 shows the distributions ofoutput code words for three benchmarks. As illustrated in this figure, ham7(Distance(f) = 0.38) and cycle10 2 (Distance(f) = 0.001) are more similarto the identity function (f(i) = i,Distance(f) = 0) compared with hwb10(Distance(f) = 0.62). The distributions of output code words for other func-tions were reported in Table 2 (i.e., Dist.).

Based on the characterization of a reversible function, we divided bench-marks into three categories as shown in Table 2 (Cat.). Category 1 includessmall functions with less than seven inputs. Category 2 and category 3 in-clude large functions with n ≥ 7 but with different distribution levels. In otherwords, for each function in category 2 (3), Distance(f) is greater (less) than0.5. By applying a hybrid synthesis framework, functions in different categoriesare handled differently as shown in Fig. 20.

For functions in category 1, we applied the cycle-based synthesis methodfirst. Then, the random driver procedure introduced in [6] was applied. Sincecategory 1 includes small functions, applying the random driver method for

1For hwb functions, polynomial size reversible circuits in NCTF library (NCT library plusthe Fredkin gate [17]) with [log(n)] + 1 garbage bits and O(nlog2(n)) gates exist [16].

20

Page 21: Reversible circuit synthesis using a cycle-based approach · Reversible circuit synthesis using a cycle-based approach Mehdi Saeedi, Morteza Saheb Zamani, Mehdi Sedighi, Zahra Sasanian

Table 2: The comparison costs of the proposed synthesis framework. Timevalues are in seconds.

Cat. Benchmark n Dist.Pure The Proposed Cost

Best Results k-Cycle Hybrid Framework Impr.Function Cost Cost Cost Time Method (%)

1

3 17 3 0.18 12 12 12 4 kC+R 04 49 4 0.37 32 116 32 5 kC+R 0ham3 3 0.06 7 7 7 4 kC+R 0hwb4 4 0.36 23 60 24 30 kC+R -4hwb5 5 0.44 104 196 91 32 kC+R 13hwb6 6 0.49 140 526 107 44 kC+R 24mod5adder 6 0.07 77 853 79 20 kC+R -3nth prime3 inc 3 0.13 6 6 6 3 kC+R 0nth prime4 inc 4 0.47 58 190 51 20 kC+R 12nth prime5 inc 5 0.34 91 363 97 27 kC+R -7nth prime6 inc 6 0.61 667 1314 701 37 kC+R -5permanent2x2 6 0.02 47 227 49 20 kC+R -4

average 2

2

hwb7 7 0.54 2611 2630 2630 111 kC -1hwb8 8 0.58 7013 6940 6940 56 kC 1hwb9 9 0.60 22502 16173 16173 44 kC 28hwb10 10 0.62 59191 35618 35618 50 kC 40hwb11 11 0.63 136756 90745 90745 60 kC 34hwb12 12 0.64 334218 198928 198928 122 kC 40hwb13 13 0.66 935322 436305 436305 481 kC 53hwb14 14 0.65 1818773 994340 994340 994 kC 45hwb15 15 0.66 4119568 1999194 1999194 1503 kC 51hwb16 16 0.66 8910859 4730024 4730024 4312 kC 47nth prime7 inc 7 0.59 2695 3172 3172 41 kC -18nth prime8 inc 8 0.62 9409 7618 7618 56 kC 19nth prime9 inc 9 0.55 20888 17975 17975 60 kC 14nth prime10 inc 10 0.64 48435 40299 40299 64 kC 17nth prime11 inc 11 0.62 197606 95431 95431 89 kC 52nth prime12 inc 12 0.61 452301 208227 208227 190 kC 54nth prime13 inc 13 0.6 1016567 474660 474660 420 kC 53nth prime14 inc 14 0.62 2254198 1018661 1018661 1101 kC 55nth prime15 inc 15 0.63 4948477 2271370 2271370 2812 kC 54nth prime16 inc 16 0.64 10786095 4823320 4823320 4018 kC 55nth prime17 inc 17 0.61 22144391 10592640 10592640 9231 kC 52

average 35

3

ham7 7 0.38 49 2117 49 ∗ RM 0ham15 15 0.31 214 140343 214 ∗ RM 0mod1024adder 20 0.66 1575 110222 1575 ∗ RM 0cycle10 2 12 0.001 1206 93086 1206 ∗ RM 0cycle17 3 20 ≈ 0 6057 523891 6057 ∗ RM 0permanent3x3 12 ≈ 0 1884 89777 1884 * RM 0

average 0

∗ A time limit of 12 hours was considered in applying the method of [6].

optimizing the results has no runtime overhead. Hence, combining differentheuristics (i.e., cycle-based approach and random driver procedure) to achievebetter cost is reasonable. On the other hand, for large functions in category2 with considerable differences from the identity function (Distance ≥ 0.5),only the cycle-based synthesis method was applied. According to [6], for somefunctions in this category (i.e., hwb11) the method of [6] needs several hoursto synthesize the function. Similarly, in [19], the authors stated their synthesisalgorithm cannot synthesize hwb circuit with over five variables by NCT library

21

Page 22: Reversible circuit synthesis using a cycle-based approach · Reversible circuit synthesis using a cycle-based approach Mehdi Saeedi, Morteza Saheb Zamani, Mehdi Sedighi, Zahra Sasanian

Figure 19: The distributions of output code words for three benchmark functions

Figure 20: The hybrid synthesis framework

(with 4GB RAM and finite runtime). Memory/runtime limitations will be evenmore challenging for hwb functions with more variables. As can be seen in Table2, both average cost and runtime were improved for functions in category 2.

On the other hand, for functions in category 3 which have some similaritiesto the identity function (Distance < 0.5), RM-based method is used in theproposed hybrid framework. A reversible function with large Distance can haveregular distribution at its output side (e.g., f(i) = 2n−1−i where Distance(f) =1). Hence, number of patterns (NoP) in the distribution of output code wordswas also used in the proposed hybrid framework. Regular output distributionleads to a small NoP. Fig. 21 shows output patterns for ham7 function (NoP =12). A function with an appropriate number of patterns (NoP < Th) at itsoutput code words is similar to the identity function to some extent. Hence,such function was synthesized by using the RM-based method too. For example,

22

Page 23: Reversible circuit synthesis using a cycle-based approach · Reversible circuit synthesis using a cycle-based approach Mehdi Saeedi, Morteza Saheb Zamani, Mehdi Sedighi, Zahra Sasanian

Figure 21: Output patterns for ham7 function with NoP = 12

mod1024adder with Distance = 0.66 and NoP = 1000 was synthesized byapplying the RM-based method. We set Th = 0.005× 2n in our experiments.

The results of hybrid synthesis framework were shown in Table 2 wherek-cycle-based, random driver and RM-based methods were denoted by kC, R,and RM, respectively. Runtime results (in seconds) for the hybrid frameworkwere reported in Table 2 too. According to the experimental results, RM-basedmethod works very fast for functions in category 3 compared with category2. Therefore, the proposed hybrid synthesis framework outperforms the bestresults in terms of quantum cost and runtime on average. Our synthesis toolpotentially can synthesize functions with any number of variables. However, asthe number of variables and resulted synthesized gates grows, the runtime andmemory usage grow accordingly (for hwb functions with n ≥ 20, peak memoryusage was more than 2GB).

Since both cycle-based and RM-based methods [6] always result in a synthe-sized circuit, the proposed framework always converges. Moreover, as a genericreversible function f with large n and Distance(f) ≥ 0.5 without regular pat-terns at its output side needs much more gates in the proposed hybrid frame-work compared with other functions, the worst-case cost of hybrid framework isidentical to the worst-case cost of the cycle-based method (i.e., 8.5n2n + o(2n)).

5 Conclusion and future directions

In this paper, a k-cycle-based synthesis method for reversible functions wasproposed and analyzed in detail. To this end, a set of synthesis algorithmswas proposed to synthesize cycles of length less than 6 (i.e., elementary cycles).In addition, a decomposition algorithm was introduced to decompose a largecycle into a set of elementary cycles. Next, the decomposition algorithm andthe proposed synthesis algorithms were used to synthesize all permutations. Byevaluating different benchmark functions, the behavior of cycle-based synthesismethod was analyzed and a hybrid synthesis framework was introduced whichuses the proposed cycle-based synthesis method in conjunction with one of therecent synthesis methods.Our worst-case analysis revealed that the proposed hybrid synthesis frameworkleads to a lower upper bound compared to the present synthesis algorithms.

23

Page 24: Reversible circuit synthesis using a cycle-based approach · Reversible circuit synthesis using a cycle-based approach Mehdi Saeedi, Morteza Saheb Zamani, Mehdi Sedighi, Zahra Sasanian

The hybrid framework always converges and it leads to better average runtime.The experiments for average-case costs revealed that the proposed frameworkproduces circuits with lower costs for benchmark functions.A natural next step to continue this path is working on the synthesis of cycleswith length greater than 5 for the average-case cost improvement in the k-cycle-based synthesis method which can improve the results of hybrid framework too.In addition, working on a synthesis approach for incompletely specified functionsbased on the one proposed here could be considered as a future research.

Acknowledgment

We would like to acknowledge Dmitri Maslov from University of Waterloo forproviding an executable version of his recent synthesis tool.

References

[1] R. Landauer. Irreversibility and heat generation in the computing process.IBM Journal of Research and Development, 5:183–191, July 1961.

[2] C. Bennett. Logical reversibility of computation. IBM Journal of Researchand Development, 17(6):525–532, November 1973.

[3] V. V. Zhirnov, R. K. Kavin, J. A. Hutchby, and G. I. Bourianoff. Limits tobinary logic switch scaling - a gedanken model. Proceedings of the IEEE,91(11):1934–1939, 2003.

[4] G. Schrom. Ultra-Low-Power CMOS Technology. PhD thesis, TechnischenUniversitat Wien, June 1998.

[5] M. Nielsen and I. Chuang. Quantum Computation and Quantum Informa-tion. Cambridge University Press, New York, 2000.

[6] D. Maslov, G. W. Dueck, and D. M. Miller. Techniques for the synthesisof reversible toffoli networks. ACM Trans. Des. Autom. Electron. Syst.,12(4):42, 2007.

[7] D. Maslov, G. W. Dueck, D. M. Miller, and C. Negrevergne. Quantum cir-cuit simplification and level compaction. IEEE Trans. on CAD, 27(3):436–444, March 2008.

[8] P. Gupta, A. Agrawal, and N.K. Jha. An algorithm for synthesis of re-versible logic circuits. IEEE Trans. on CAD, 25(11):2317–2330, 2006.

[9] V. V. Shende, A. K. Prasad, I. L. Markov, and J. P. Hayes. Synthesis ofreversible logic circuits. IEEE Trans. on CAD, 22(6):710–722, June 2003.

24

Page 25: Reversible circuit synthesis using a cycle-based approach · Reversible circuit synthesis using a cycle-based approach Mehdi Saeedi, Morteza Saheb Zamani, Mehdi Sedighi, Zahra Sasanian

[10] A. Barenco, C. Bennett, R. Cleve, D. DiVincenzo, N. Margolus, P. Shor,T. Sleator, J. Smolin, and H. Weinfurter. Elementary gates for quantumcomputation. APS Physical Review A, 52:3457–3467, 1995.

[11] C. Negrevergne, T. S. Mahesh, C. A. Ryan, M. Ditty, F. Cyr-Racine,W. Power, N. Boulant, T. Havel, D. G. Cory, and R. Laflamme. Bench-marking quantum control methods on a 12-qubit system. Physical ReviewLetters, 96(17), 2006.

[12] D. Maslov, G. W. Dueck, and D. M. Miller. Toffoli network synthesis withtemplates. IEEE Trans. on CAD, 24(6):807–817, 2005.

[13] Aditya K. Prasad, Vivek V. Shende, Igor L. Markov, John P. Hayes, andKetan N. Patel. Data structures and algorithms for simplifying reversiblecircuits. J. Emerg. Technol. Comput. Syst., 2(4):277–293, 2006.

[14] D. Maslov, C. Young, D. M. Miller, and G. W. Dueck. Quantum circuitsimplification using templates. In DATE ’05: Proceedings of the conferenceon Design, Automation and Test in Europe, pages 1208–1213, Washington,DC, USA, 2005. IEEE Computer Society.

[15] V. V. Shende, S. S. Bullock, and I. L. Markov. Synthesis of quantum-logiccircuits. IEEE Trans. on CAD, 25(6):1000–1010, June 2006.

[16] D. Maslov, G. Dueck, and N. Scott. Reversible logic synthesis benchmarkspage. http://www.cs.uvic.ca/˜ dmaslov/, November 2009.

[17] E. F. Fredkin and T. Toffoli. Conservative logic. International Journal ofTheoretical Physics, 21(3/4):219–253, 1982.

[18] M. Saeedi, M. Saheb Zamani, and M. Sedighi. Reversible logic synthesisbenchmarks. http://ceit.aut.ac.ir/QDA/benchmarks, March 2010.

[19] James Donald and Niraj K. Jha. Reversible logic synthesis with fredkinand peres gates. J. Emerg. Technol. Comput. Syst., 4(1):1–19, 2008.

25