page 1

www.ijird.comFebruary, 2014Vol 3 Issue 2INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH & DEVELOPMENTPage 406

Efficient Reduction of Area and Delay inNull Convention Logic Design Paradigm1. IntroductionWith the fabrication process of integrated circuits developing in the deep sub micron, more function modules can be integrated into asmall chip with higher clock rates. Even though higher clock rates increase logic speed, they also increase power consumption andcause a challenging clock distribution issue. Due to the difficult clock management of the whole system chip, synchronous circuitdesign faces many challenges such as clock skew, clock jitter, and power consumption [1], [2]. Therefore, asynchronous clocklesscircuit designs have been proposed as a solution to these clock issues. Asynchronous circuits have merits of low power, anti-interference, high robustness, and module reusability due to its clockless nature [3]. Thus, the asynchronous circuits are very suitablefor systems-onchip (SOC) designs which are sensitive to power consumption and performance. NULL Convention Logic (NCL) isone of the promising candidates for asynchronous circuit design paradigms.NCL circuit will work correctly regardless of when circuit inputs become available [4]; therefore, NCL circuits are said to be correct-by-construction. In other words, timing analysis is not required for correct operation. NCL circuits utilize the dual-rail or quad-raillogic to achieve delay-insensitivity. A dual-rail signal, D is comprised of two mutually exclusive wires, D0 and D1, which mightassume any value from the set {Data0; Data1; Null; Invalid}, as shown in Table I. The Data0 state corresponds to Boolean logic 0, theData1 state corresponds to Boolean logic 1, and the Null state corresponds to the empty set, meaning that the value of D is not yetavailable. NCL circuits are comprised of 27 fundamental gates [5], which constitute the set of all the function of four or a fewvariables [5]. Since each rail of an NCL signal is considered as a separate variable or value, a four variable function is not the same asa function of four literals, which would normally consist of eight variables or values, assuming dual-rail signals [6].DD0

D1

Data010Data101Null00Invalid11Table I: Dual Rail Encoding

ISSN 2278 – 0211 (Online)J. NerenjanaPG Scholar, I.F.E.T College of Engineering, Villupuram, Tamil Nadu, IndiaM. AnanthakumarAssistant Professor, I.F.E.T College of Engineering, Villupuram, Tamil Nadu, IndiaR. MalarAssociate Professor, I.F.E.T College of Engineering, Villupuram, Tamil Nadu, IndiaY. Emilet KethsiyalPG Scholar, I.F.E.T College of Engineering, Villupuram, Tamil Nadu, IndiaAbstract:Synchronous circuit designs face many challenges such as clock skew, clock jitter and power consumption. Asynchronousclockless circuit design is a solution to these clock issues. Asynchronous circuits have merits of low power, anti-interference, highrobustness, and module reusability due to its clockless nature. NULL Convention Logic (NCL) is one of the promising candidatesfor asynchronous circuit design paradigms. NCL circuits are said to be correct-by-construction and delay insensitive circuits. Inthis paper, a new methodology for mapping multi-rail logic expressions to NULL convention logic (NCL) gate library is proposedto reduce area and delay of NCL circuits when compared to the existing algorithm.Key words: synchronous, asynchronous, NCL circuits, mapping

www.ijird.comFebruary, 2014Vol 3 Issue 2INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH & DEVELOPMENTPage 407Figure 1: THmn Threshold GateNCL uses threshold gates with hysteresis for its basic elements [5]. The primitive type of threshold gate, shown in Fig. 1, is the THmngate, where 1<= m <= n. Each THmn gate has n inputs, where at least m of n inputs must be asserted before the output will becomeasserted. In a THmn gate, each of the n inputs is connected to the rounded portion of the gates, and the gate’s threshold value, m, iswritten inside of the gate [8]. Another type of threshold gate is referred to as a weighted threshold gate, expressed asTHmnWw1w2:::wR, where n is the number of inputs, m is the gate’s threshold, and w1w2:::wR are the weights of input1, input2,...,inputR, respectively. For example, TH34 has 4 inputs and threshold of 3. When any three inputs go high, its output will be asserted tohigh. Only when all inputs are low, the output will be reset to low. For all other input patterns, the output will remain unchanged, andthey keep the previous state. A weighted gate TH34W2 has 4 inputs are labeled A, B, C, and D, shown in Fig. 2. The weight of inputA, W(A), is, therefore, 2. Since the gate’s threshold, m, is 3, this implies that in order for the output to be asserted, either inputs B, C,and D must all be asserted, or input A must be asserted along with any other input, B, C, or D.Figure 2: Th34w2 Threshold Gate: Z = AB+AC+AD+BCD2. Original Mapping AlgorithmAfter several synthesis and optimization steps in [9], each combinational module in the initial synchronous RTL design is finallyexpressed as dual-rail SOP expressions describing its outputs in terms of its inputs. Then, each SOP expression is separatelymapped to NCL gates using the grouping algorithm in [9], as shown in Fig. 3. The algorithm starts with a multi-rail SOP expression as

input and then groups the product terms. Taking F1 = A0B1C1 + A0B1D1 + A1B1 + B1C0 + B1D0 + A1C1 as an example, the firststep in the original grouping algorithm is to remove all four-variable terms from the initial SOP expression and moved to the set offinal groups, because according to the NCL gate library, these terms cannot be grouped with any other terms to make larger groupsand are always realized using TH44 gates. Since F1 has no four-variable term, no action is needed. Next, the distinct variables in theSOP expression are counted; if the number is less than 4, previously grouped terms can be potentially combined with the SOPexpression to make 4-feasible group.Table 2: Library of 27 Standard NCL Gates

www.ijird.comFebruary, 2014Vol 3 Issue 2INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH & DEVELOPMENTPage 408F1 contains 7 distinct variables, so the algorithm proceeds to the next decision box, where it is determined whether the SOPexpression is empty or not. The next step is to create a k-feasibility table (Table III). Under each parent term in the k-feasibility table,the other terms that can be combined with the parent term such that their combination includes four or fewer distinct variables arelisted. Now, starting from the larger parent terms and proceeding to the smaller ones, one parent term is picked at a time, and all thepossible k-feasible groups are found by combining the parent term with the other terms listed under it. Note that in the originalgrouping method, when the terms are combined the largest possible groups are formed, and the temporary small groups are discarded.In order to reduce runtime, largest group is selected, and the rest are removed. Also, the selected group is added to a “search list” toavoid finding the same group for the other parent terms. It will be shown that this approach of selecting the largest group canpotentially result in non-optimal grouping; the new grouping algorithm addresses this limitation.The original grouping algorithm then proceeds with choosing the largest non-redundant groups from the set of largest groups foundunder each parent term, and moving them to the set of final groups. A group is considered non-redundant if it does not share acommon term with any other group. Shared terms are not allowed in the set of final groups since they introduce gate orphans andtherefore impact the delay insensitivity of NCL circuits. This issue arises because, when several gates share a common term, there is apossibility that all their outputs transition during the same DATA phase, but in that case only the fastest transition is acknowledged bythe output and the other slower transitions are not acknowledged. In the original grouping method, non-redundant groups are found asfollows: starting from the first group in the list, a group is selected and compared with the other groups; if it shares a common termwith any other group, then the larger group is selected and the smaller group is discarded. This procedure is continued until all thesmaller redundant groups are removed. The remaining groups, which are large and non-redundant, are moved to the set of finalgroups. The outputs of the current set of final groups are added as single-variable terms to the current SOP expression at this step. Inthe previous example, two groups already exist in the set of final groups. We will call the output of the first group T1 and the output of

the second group T2. After adding T1 and T2 to the SOP expression, the new expression becomes B1D0+ T1 + T2. At the end of thethird iteration, B1D0+ T1 + T2 is selected as a final group and is removed from the SOP expression.A0B1C1

A0B1D1

A1B1

B1C0

B1D0

A1C1

A0B1D1

A0B1C1

B1C0

A1B1

A1B1

A1B1

A1B1

A1B1

B1D0

B1D0

B1C0

B1C0

B1C0

B1C0

A1C1

A1C1

A1C1

B1D0

B1D0

B1D0

A1C1

Table 3: K-Feasibility TableA0B1C1

A0B1D1

A1B1

A0B1C1+A0B1D1

A0B1D1+ A1B1

A1B1+B1C0+ B1D0

A0B1C1+A1B1+ A1C1

A0B1D1+ B1C0

A1B1+B1C0+ A1C1

A0B1C1+B1C0

A0B1D1+ B1D0

A1B1+ B1D0+ A1C1

A0B1C1+B1D0

Table 4: K-Feasible Groups

www.ijird.comFebruary, 2014Vol 3 Issue 2INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH & DEVELOPMENTPage 409Figure 3: Original Mapping AlgorithmTable 5: Final K-Feasible Groups after First IterationA0B1C1

A0B1D1

A1B1

A0B1C1+A1B1+

A1C1

A0B1D1+A1B1

A1B1+B1C0+B1D0

www.ijird.comFebruary, 2014Vol 3 Issue 2INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH & DEVELOPMENTPage 410Figure 4: NCL ImplementationTable 5: Final Groups3. Proposed Mapping AlgorithmOriginal mapping algorithm targets only area reduction. And it requires several iterations to complete. It is referred as Multi-passgrouping and inefficient grouping. It is computationally expensive and could be alleviated by engaging in less computation in the mainloop. The original grouping method is also not efficient in terms of finding the largest non-redundant groups. Finally, the approach ofadding the final groups as single variable terms to the initial SOP expression to make larger groups is not always helpful. In fact, thisapproach can potentially result in a larger circuit or increase the logic levels unnecessarily and consequently make the final circuitslower. The main idea of the new grouping algorithm is to sort the list of NCL gates based on a cost function prior to grouping. At thetime of grouping, the gates occupying a higher position in the sorted list are considered more important and will have a higher priorityin grouping. Targets both area and delay.In summary, the algorithm begins with moving all the four variable terms to the set of final groups, just as in the original groupingmethod. Next, all k-feasible groups are found by using a k-feasibility table and combining all parent terms with the other terms, asdiscussed in the original grouping method. The k-feasible groups are then arranged into priority levels according to Table. Next,starting from the highest nonempty priority level, the non-redundant groups in each priority level are found, their terms are removedfrom all the other groups, and the priority levels are updated accordingly. This procedure continues until all priority levels becomeempty. For this example, the original grouping method and the proposed one produce the same grouping result, but this is not alwaysthe case as explained in the next section.The proposed grouping method always preserves delay insensitivity of the initial dual-rail function. In order to prove it, one mustprove that the proposed grouping method preserves input-completeness and observability. Several advantages of the proposedgrouping algorithm, such as the ability to use different cost functions, the ability to map to a subset of NCL gates, and one-passgrouping, were pointed out. With the proposed grouping method, the non-redundant groups under a priority level are selectedefficiently.FINAL GROUPST1 = A0B1C1+A1B1+ A1C1

T2 = A0B1D1+ B1C0

F = B1D0+T1+T2

Page 6

www.ijird.comFebruary, 2014Vol 3 Issue 2INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH & DEVELOPMENTPage 411Figure 5: Proposed Mapping Algorithm

Page 7www.ijird.comFebruary, 2014Vol 3 Issue 2INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH & DEVELOPMENTPage 412Table 6: Priority TableTable 7: K-Feasible GroupsA0B1C1

A0B1D1

A1B1

B1C0

B1D0

A1C1

A0B1C1+ A0B1D1

A0B1D1+A1B1

A1B1+B1C0+ B1D0

B1C0+ B1D0

B1D0

A1C1

A0B1C1+A1B1+A1C1

A0B1D1+B1C0

A1B1+B1C0+ A1C1

B1C0

A0B1C1+B1C0

A0B1D1+B1D0

A1B1+B1D0+ A1C1

A0B1C1+B1D0

A0B1D1

A1B1+B1C0

A0B1C1

A1B1+ B1D0

A1B1+A1C1

A1B1

Page 8www.ijird.comFebruary, 2014Vol 3 Issue 2INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH & DEVELOPMENTPage 413Table 8: K-Feasible Groups with Priority LevelsTable 9: Updating Priority LevelsFigure 6: NCL ImplementationTable 10: Final Groups8 (Highest

priority)101112141619A0B1C1+A1B1+A1C1

A1B1+B1C0+B1D0

A0B1C1+A0B1D1

A0B1C1+B1C0

A1B1+B1C0

A0B1C1

A1B1

A1B1+B1C0+A1C1

A0B1C1+B1D0

A1B1+B1D0

A0B1D1

B1C0

A1B1+ B1D0+A1C1

A0B1D1+ A1B1

A1B1+A1C1

B1D0

A0B1D1+ B1C0

B1C0+B1D0

A1C1

A0B1D1+ B1D0

12141619A0B1D1+ B1C0

B1C0+B1D0

A0B1D1

B1C0

A0B1D1+ B1D0

B1D0

FINAL GROUPST1 = A0B1C1+A1B1+ A1C1

T2 = A0B1D1+ B1C0

F = B1D0+T1+T2

Page 9www.ijird.comFebruary, 2014Vol 3 Issue 2INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH & DEVELOPMENTPage 4143. Experimental ResultsRelated GroupsAreaDelay Run time1

F = A0B1 + A0C0 + A0D1 + B1C0 + A0D0 + B0C1D1OriginalX = A0B1 + A0C0 + A0D1 + B1C0 (TH44w322)Y = B0C1D1 (TH33)F = A0D0 + X + Y (TH24w22)27166013.5Proposed X = A0B1 + A0C0 + A0D0 + B1C0 (TH44w322)Y = A0D1 + B0C1D1 (TH54w32)F = X + Y (TH12)25443413.82F = B1C0D0 + C1D0 + A0B1 + A0C1 + A0D1 + B0C1OriginalX = A0B1 + A0C1 + C1D0 (THand0)Y = A0D1 + B0C1 (THxor0)Z = B1C0D0 + X (TH34w3)F = Y + Z (TH12)35073413.9Proposed X = A0B1 + A0C1 + A0D1 (TH44w3)Y = B1C0D0 + C1D0(TH54w32)Z = B0C1 + X + Y (TH24w22)29370429.73F = A0B1C0 + A0 B0C1 + A1B1C0 + A1B0C1 + B1C0D1OriginalX = A0B1C0 + A1B1C0 (TH54w22)Y = A0B0C1 + A1B0C1 (TH54w22)Z = B1C0D1 + X (TH34w3)F = Y + Z (TH12)3687395.6Proposed X = A0B1C0 + A1B1C0 (TH54w22)Y = A0B0C1 + A1B0C1 (TH54w22)Z = B1C0D1 (TH33)F = X + Y + Z (TH13)35244711.4Table 11: Comparison of Original and Proposed Grouping MethodsAfter processing a sum of product (SOP) expression using the original and proposed mapping algorithm, final groups are obtained asdiscussed earlier. They are implemented using Modelsim 6.4 c and Xilinx ISE 13.2 software. Codes are written in verilog HDLlanguage as they are user-friendly.Figure 7: Proposed Method 1Figure 8: Proposed Method 2

Page 10www.ijird.comFebruary, 2014

Vol 3 Issue 2INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH & DEVELOPMENTPage 415Figure 9: Proposed Method 34. ConclusionA New mapping algorithm to reduce both area and delay of NULL convention logic circuits is proposed in this paper, while originalmapping algorithm takes only area as a cost function. Constraints of original mapping algorithm are met by the proposed algorithm. Incontrast to the original mapping algorithm, the new mapping algorithm can map a multi-rail expression to a restricted subset of NCLgates and can target any cost function.5. References1. E. Yahya and L. Fesquet,”Asynchronous Design: A Promising Paradigm for Electronic Circuits and Systems,” The 16thIEEE International Conference on Electronics, Circuits and Systems, p.339 (2009).2. A.J. Martin and M. Nystrom,”Asynchronous Techniques for System-on-Chip Design,” Proceedings of the IEEE, 6, p.1089(2006).3. A. Kondratyev, L. Sorensen, and A. Streich, Testing of Asynchronous Designs by Inappropriate Means: SynchronousApproach, Proc. IEEE Symp. Asynchronous Circuits and Systems (ASYNC 02), 2002, pp. 171-180, doi:10.1109/ASYNC.2002.1000307.4. Scott Smith, Jia Di.,”Designing Asynchronous Circuits using NULL Convention Logic,” Morgan & Claypool, Aug 15, 2009,ISBN: 9781598299816.5. Gerald E. Sobelman and Karl M. Fant,”CMOS Circuit Design of Threshold Gates with Hysteresis,” IEEE InternationalSymposium on Circuits and Systems (II), 1998, pp. 61-65.6. Sankar, R.; Kadiyala, V.; Bonam, R.; Kumar, S.; Mohan, S.; Kacani, F.; Al-Assadi, W.K.; Smith, S.C.,”Implementation ofStatic and Semi- Static Versions of a Bit-Wise Pipelined Dual-Rail NCL 2S Complement Multiplier,” Region 5TechnicalConference, 2007 IEEE , vol., no., pp.228,233, 20-22 April 2007 doi: 10.1109/TPSD.2007.4380386.7. Ho Joon Lee and Yong-Bin Kim,”Low Power Null Convention Logic Circuit Design Based on DCVSL”, Department ofElectrical and Computer Engineering.8. Yancey, S.; Smith, S.C.,”A differential design for C-elements and NCL gates,” Circuits and Systems (MWSCAS), 2010 53rdIEEE International Midwest Symposium on, vol., no., pp.632, 635, 1-4 Aug. 2010 doi: 10.1109/MWSCAS.2010.5548905.9. B. Bhaskaran, “Automated synthesis and cycle reduction optimization for asynchronous NULL convention circuits usingindustry-standard CAD tools,” Ph.D. dissertation, Dept. Comp. Eng., Univ. Missouri - Rolla, Rolla, 2007.10. I. David, R. Ginosar, and M. Yoeli, “An efficient implementation of Boolean functions as self-timed circuits,” IEEETrans. Comput., vol. 41, no. 1, pp. 2–11, Jan. 1992.11. S. K. Bandapati, S. C. Smith, and M. Choi, “Design and characterization of convention self-timed multipliers,” IEEE Des.Test Comput., vol. 20, no. 6, pp. 26–36, Nov.–Dec. 2003.12. R. Reese, S. C. Smith, and M. A. Thornton, “Uncle—an RTL approach to asynchronous design,” in Proc. 18th IEEE Int.Symp. Adv. Res. Asynchron. Circuits Syst., May 2012, pp. 65–72

page 1

Documents