synthesis of asynchronous qdi circuits using … · 3) synthesis of ‘case’ clause we recommend...

4
Abstract—We propose a synthesis of asynchronous quasi-delay-insensitive (QDI) circuits. We highlight three notably features/novelties of the proposed synthesis as follows. First, the targeted synthesized circuits abide by the QDI protocol; hence they are inherently timing-robust and are desirable for applications with high variation-space and wide operation-space (including defense/space applications). Second, the coding specifications accept Verilog HDL language, and are the same/similar to the standard coding for synchronous circuits, hence no special and/or ad-hoc design/coding rules are required. Third, the proposed synthesis is applicable to accept various QDI library cells, hence enabling to explore full merit of different library cells. To the best of our knowledge, no reported synthesis methods incorporate all these features; some limited features were only incorporated. Our proposed synthesis, at this juncture, accepts three basic clauses – complete ‘if-else’ clause, incomplete ‘if-else clause’, and the ‘case’ clause. These clauses are more than sufficient to describe any complex systems. The synthesis stages involve analyzing QDI pipelines, generating (corresponding) single-rail combinational circuits, converting dual-rail netlists (from the single-rail circuits), and embedding customized controllers. In order to demonstrate the validity and practicality of the proposed synthesis, an 8-bit 8-tap asynchronous QDI Finite Impulse Response (FIR) filter is synthesized, implemented to the layout stage, and evaluated using spice models –specifically, it features 3.7 mW power dissipation, 39,181 transistors, and a delay of 200 ns per operation. Keywords—Asynchronous circuit; QDI; synthesis; synchronous coding; Verilog I. INTRODUCTION Rapid advancement of technologies enables numerous emerging applications, such as wireless-sensor-networks [1], etc. These applications are preferably fabricated in advanced nano-scaled process. With the down-scaling feature size, digital circuits are becoming more vulnerable from process, voltage and temperature (PVT) variations. Consequently, the contemporary design approach – synchronous approach – experiences increasingly more challenges to manage its timing issues (e.g. clock skew, setup/hold time violations, etc.). To cope with increasingly larger PVT variations, a number of techniques have been proposed, including setting clock frequency slower than required, imposing strict operation condition, etc. However, these techniques, to some extent, can only alleviate PVT variations for a conditionally robust operation because a complete profile of PVT variations is intractable. Another design approach – asynchronous approach – is an alternative to address timing issues, and hence is more functionally robust [2], [3]. Other potential advantages of asynchronous circuits (over their synchronous counterparts) include no clock skew, less electromagnetic interference, potentially high speed due to average delay operation and etc. Among different asynchronous design approaches, QDI is arguably considered as the most acceptable approach regarding the robustness and realization. Despite these potential advantages, the asynchronous approach is still not widely accepted. Lack of appropriate asynchronous electronic design automation (EDA) tools is clearly one of the key reasons. Although several asynchronous EDA tools/synthesis have been reported, these EDA tools, in part, have different features/issues as follows. First, some tools target low overhead but arguably less robust circuits, for example, De-synchronization [4] is applicable to bundled-data asynchronous circuits, where the delay-matching components are critical to accommodate the worst-case delay of a datapath and the safety timing margin. Second, some tools are somewhat incompatible to the well-developed synchronous design flow, because they adopt unfamiliar design languages and/or coding styles. For example, Balsa uses Communicating Sequential Process (CSP) [5] which requires a new set of circuit descriptions and design rules. Another language is Communication Hardware Process (CHP) [6], which is also esoteric. Although some synthesis tools [7], [8] adopt Verilog, they expect some special/ad-hoc coding rules. Third, some synthesis tools have high cost in terms of circuit overheads and synthesis time. For example, Petrify [9], a controller synthesis tool adopted in many asynchronous EDA tools, somewhat suffers from the state-space explosion problem. Fourth, some synthesis tools pertain to the fixed customized library cells, such as Null Convention Logic (NCL) library cells [10]. In this paper, we propose a synthesis of asynchronous QDI circuit, addressing some of the abovementioned issues. First, the proposed synthesis targets QDI circuits (for accommodating PVT variations and for robust operation). Second, for general acceptance, the proposed synthesis adopts Verilog HDL and synchronous coding description/styles. Finally, the proposed synthesis is applicable to many different asynchronous cell realizations, hence providing more design flexibility/optimizations. The proposed synthesis, at this juncture, supports three basic clauses, including complete ‘if-else’ clause, incomplete ‘if-else’ clause, and ‘case’ clause, virtually sufficient to describe complex systems. The corresponding synthesis templates and combinational circuits are also developed and generated. In order to show the validity and practicality of the proposed synthesis, an 8-bit 8-tap asynchronous QDI Finite Impulse Response (FIR) filter (one of Synthesis of Asynchronous QDI Circuits using Synchronous Coding Specifications Rong Zhou*, Kwen-Siong Chong, Bah-Hwee Gwee, Joseph S. Chang, and Weng-Geng Ho Division of Circuits and Systems, EEE Nanyang Technological University, Singapore *[email protected] 978-1-4799-3432-4/14/$31.00 ©2014 IEEE 153

Upload: leque

Post on 27-May-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Synthesis of Asynchronous QDI Circuits Using … · 3) Synthesis of ‘Case’ Clause We recommend employing ‘case’ to model state machine, such as the source code in Example

Abstract—We propose a synthesis of asynchronous quasi-delay-insensitive (QDI) circuits. We highlight three notably features/novelties of the proposed synthesis as follows. First, the targeted synthesized circuits abide by the QDI protocol; hence they are inherently timing-robust and are desirable for applications with high variation-space and wide operation-space (including defense/space applications). Second, the coding specifications accept Verilog HDL language, and are the same/similar to the standard coding for synchronous circuits, hence no special and/or ad-hoc design/coding rules are required. Third, the proposed synthesis is applicable to accept various QDI library cells, hence enabling to explore full merit of different library cells. To the best of our knowledge, no reported synthesis methods incorporate all these features; some limited features were only incorporated. Our proposed synthesis, at this juncture, accepts three basic clauses – complete ‘if-else’ clause, incomplete ‘if-else clause’, and the ‘case’ clause. These clauses are more than sufficient to describe any complex systems. The synthesis stages involve analyzing QDI pipelines, generating (corresponding) single-rail combinational circuits, converting dual-rail netlists (from the single-rail circuits), and embedding customized controllers. In order to demonstrate the validity and practicality of the proposed synthesis, an 8-bit 8-tap asynchronous QDI Finite Impulse Response (FIR) filter is synthesized, implemented to the layout stage, and evaluated using spice models –specifically, it features 3.7 mW power dissipation, 39,181 transistors, and a delay of 200 ns per operation.

Keywords—Asynchronous circuit; QDI; synthesis; synchronous coding; Verilog

I. INTRODUCTION Rapid advancement of technologies enables numerous

emerging applications, such as wireless-sensor-networks [1], etc. These applications are preferably fabricated in advanced nano-scaled process. With the down-scaling feature size, digital circuits are becoming more vulnerable from process, voltage and temperature (PVT) variations. Consequently, the contemporary design approach – synchronous approach – experiences increasingly more challenges to manage its timing issues (e.g. clock skew, setup/hold time violations, etc.). To cope with increasingly larger PVT variations, a number of techniques have been proposed, including setting clock frequency slower than required, imposing strict operation condition, etc. However, these techniques, to some extent, can only alleviate PVT variations for a conditionally robust operation because a complete profile of PVT variations is intractable.

Another design approach – asynchronous approach – is an alternative to address timing issues, and hence is more

functionally robust [2], [3]. Other potential advantages of asynchronous circuits (over their synchronous counterparts) include no clock skew, less electromagnetic interference, potentially high speed due to average delay operation and etc. Among different asynchronous design approaches, QDI is arguably considered as the most acceptable approach regarding the robustness and realization. Despite these potential advantages, the asynchronous approach is still not widely accepted. Lack of appropriate asynchronous electronic design automation (EDA) tools is clearly one of the key reasons.

Although several asynchronous EDA tools/synthesis have been reported, these EDA tools, in part, have different features/issues as follows. First, some tools target low overhead but arguably less robust circuits, for example, De-synchronization [4] is applicable to bundled-data asynchronous circuits, where the delay-matching components are critical to accommodate the worst-case delay of a datapath and the safety timing margin. Second, some tools are somewhat incompatible to the well-developed synchronous design flow, because they adopt unfamiliar design languages and/or coding styles. For example, Balsa uses Communicating Sequential Process (CSP) [5] which requires a new set of circuit descriptions and design rules. Another language is Communication Hardware Process (CHP) [6], which is also esoteric. Although some synthesis tools [7], [8] adopt Verilog, they expect some special/ad-hoc coding rules. Third, some synthesis tools have high cost in terms of circuit overheads and synthesis time. For example, Petrify [9], a controller synthesis tool adopted in many asynchronous EDA tools, somewhat suffers from the state-space explosion problem. Fourth, some synthesis tools pertain to the fixed customized library cells, such as Null Convention Logic (NCL) library cells [10].

In this paper, we propose a synthesis of asynchronous QDI circuit, addressing some of the abovementioned issues. First, the proposed synthesis targets QDI circuits (for accommodating PVT variations and for robust operation). Second, for general acceptance, the proposed synthesis adopts Verilog HDL and synchronous coding description/styles. Finally, the proposed synthesis is applicable to many different asynchronous cell realizations, hence providing more design flexibility/optimizations. The proposed synthesis, at this juncture, supports three basic clauses, including complete ‘if-else’ clause, incomplete ‘if-else’ clause, and ‘case’ clause, virtually sufficient to describe complex systems. The corresponding synthesis templates and combinational circuits are also developed and generated. In order to show the validity and practicality of the proposed synthesis, an 8-bit 8-tap asynchronous QDI Finite Impulse Response (FIR) filter (one of

Synthesis of Asynchronous QDI Circuits using Synchronous Coding Specifications

Rong Zhou*, Kwen-Siong Chong, Bah-Hwee Gwee, Joseph S. Chang, and Weng-Geng Ho Division of Circuits and Systems, EEE

Nanyang Technological University, Singapore *[email protected]

978-1-4799-3432-4/14/$31.00 ©2014 IEEE 153

Page 2: Synthesis of Asynchronous QDI Circuits Using … · 3) Synthesis of ‘Case’ Clause We recommend employing ‘case’ to model state machine, such as the source code in Example

the critical building blocks for our targeted wireless sensor network [1]) is synthesized, implemented to the layout stage, and evaluated using spice models.

This paper is organized as follows. Section II describes the proposed synthesis, and Section III provides the implementation and evaluation of an asynchronous QDI FIR filter synthesized by the proposed synthesis. Finally, conclusions are drawn in Section IV.

II. PROPOSED DESIGN SYNTHESIS This section presents the proposed synthesis. Specifically,

we first present the QDI pipeline structure; thereafter, we present the proposed design synthesis.

A. QDI Pipeline Structure The asynchronous QDI pipeline structure is shown in Fig.1,

comprising Registeri, Controlleri, QDI Combinational Circuiti, Register Completion Detection (RCDi) and Data Completion Detection (DCDi). Particularly, RCDi and DCDi ensure ‘input completeness’ and avoid ‘gate orphan’ [10], hence, ensuring the QDI attribute of the pipeline. The data flow (registers and QDI combinational circuits) adopts dual-rail data coding [3]. For the operation sequence, Reqi first controls Registeri (basically comprising C Muller gates) to latch dual-rail input data, then, the data is passed through Combinational Circuiti, and latched by the next register Registeri+1. During this process, RCDi and DCDi will check the validity/nullity of all the internal wires. Thereafter, Controlleri will generate the next control signal Reqi on the basis of signal Donei and the acknowledge signal Acki+1. Conceptually, although the pipeline depicted in Fig. 1 is similar to that of the NCL-X design methodology [10], the proposed synthesis nonetheless has several differences/novelties, such as, applicable to not only NCL microcells but also to other microcells, seamlessly integrated with synchronous design flow and leveraged on commercial EDA tools, etc. It is instructive to note that we consider a pipeline stage in the sequence of register then combinational circuit for easy view of a complete operation, whereas, we view a pipeline stage as the sequence of combinational circuit and register in synthesis for consistency with synchronous design fashion.

B. Delineation of the Proposed Synthesis We adopt Verilog HDL as the high-level design description

language. With Verilog HDL, we can conceptually leverage on the standard (synchronous-based) description, and many commercial synchronous circuit design tools for our design flow. In this case, training efforts for designing QDI circuits could be somewhat mitigated. Currently, we support a subset of synthesizable Verilog clauses, including ‘if ’, ‘else’, ‘case’, which are potentially sufficient to describe complex systems. These clauses will be synthesized in the coarse-grain pipeline fashion, and the associated combinational circuits will be synthesized using Synopsys Design Complier, thereafter, converted to dual-rail netlist.

1) Synthesis of Complete ‘if-else’ Clause The complete ‘if-else’ clause is synthesized as a single

asynchronous pipeline. For easy view of the circuit structure, we prefer the coding style of separation of sequential circuits and combinational circuits (see the source code in Example 1).

Any other coding style (such as the mixing code of sequential circuits and combinational circuits together in always blocks) will be automatically transformed to the preferred coding style in Example 1. During synthesis, we will map sequential circuits to asynchronous Registers with handshake circuits, and the combinational circuits will be synthesized and convert to dual-rail netlist. The synthesized pipeline structure is depicted in Fig. 2.

2) Synthesis of Incomplete ‘if-else’ Clause The incomplete ‘if-else’ clause is viewed as the state-holding

or feedback functionality (see the original source code in Example 2). Similar to the synthesis of the complete ‘if-else’ clauses, we would like to generate the separate code of sequential and combinational circuits, but the difference is that we complete the ‘if-else’ branch by adding the omitted assignment (eg. A_temp = A_feedback in Example 2). Then, the generated combinational circuits will be similar to that of complete ‘if-else’ clause, but the pipelines are different, as we add a feedback path to maintain the previous data. The circuit structure after synthesis is depicted in Fig. 3.

Fig. 1 QDI pipeline structure

Example 1: // sequential circuit always @ (posedge CLK or negedge NRST) if(!NRST) A<=Constant; else A <=A_temp; // combinational circuit always @ (A1 or A2 or …)

if ( Condition1 ) A_temp= f(A1, A2, ...); else if ( Condition2 ) A_temp= f(B1, B2, ...); else A_temp = f(C1, C2, ...);

……

Reg

ister

Fig. 2 Synthesized pipeline of complete ‘if-else’ clause with conditional operations embedded inside the combinational circuit

154

Page 3: Synthesis of Asynchronous QDI Circuits Using … · 3) Synthesis of ‘Case’ Clause We recommend employing ‘case’ to model state machine, such as the source code in Example

3) Synthesis of ‘Case’ Clause We recommend employing ‘case’ to model state machine,

such as the source code in Example 3. For the former synthesis of complete and incomplete ‘if-else’, we have implicitly assumed all the primary input variables are explicitly updated to a useful/meaningful data for every operation. If it is the case for the ‘case’ clause, the code in Example 3 will be synthesized to the pipelines similar to the one depicted in Fig. 3. However, we may face the situation in ‘case’ clause that some variables are conditionally required (such as Condition1 in Example 3 is explicitly required only in Branch2). This situation is not a problem in synchronous circuits because the variables/signals are either logic 1 or logic 0. However, this situation is problematic in asynchronous dual-rail circuit system if the variable needs manually triggering, because the

variables/signals in dual-rail asynchronous circuits alternate between valid and empty/null. In this case, we synthesize the Verilog code in Example 3 to the pipelines in Fig. 4, where the guard controller is to pass the primary input when ‘guard’ is true, otherwise, generate a default value to the combinational logic. The implementation of the guard controller and the source code of ‘generated guard combinational circuit’ are depicted in Fig. 5.

III. EXPERIMENT AND RESULT In this section, we validate the proposed synthesis by

synthesizing an 8-tap 8-bit asynchronous QDI FIR filter from synchronous specification - one of the critical building blocks for our targeted wireless sensor network [1]. For completeness, the circuits are implemented in a 130nm CMOS and simulated in Synopsys Nanosim using spice models, and at the nominal VDD = 1.2V. The transistor sizings of all library cells are

Example 2: //original source code always @(posedge CLK or negedge NRST) if(!NRST) A<=Constant;

else if (Condition1) A<= f(A1, A2, …); else if (Condition2) A<= f(B1, B2, …); //processed code //generated sequential circuit always @(posedge CLK or negedge NRST) if(!NRST) A<=Constant; else A<=A_temp; //generated sequential circuit always @( A_feedback or A1 or A2 or …)

if (Condition1) A_temp = f(A1, A2, …); else if (Condition2) A_temp = f(B1, B2, …); else A_temp = A_feedback;

Fig. 3 Synthesized pipeline of incomplete ‘if-else’ clause with conditional operations embedded inside the combinational circuit

Example 3: //sequential circuit always @(posedge CLK or negedge NRST) if(!NRST) state<=Constant; else state<=next_state; //combinational circuit always @(state or Condition1) case(state) Branch1: next_state= f(state); Branch2: if (Condition1) next_state= f1(state); else next_state= f2(state); Branch3: next_state= f3(state); … Default: next_state= fn(state); endcase

combinational circuit

Ack_in

state

NRST(to Constant)

state_feedback

C

NRST(to Null)NRST(to Null)

RCD

C

RCDRCD

next_state

Condition1

generated guard

combinational circuit

Condition1_ack

C

guard

state_ack

Fig. 4 Synthesized pipelines of one fashion of ‘case’ clause

(a) (b) (c)

Fig. 5 (a) implementation of guard controller, (b) implementation of default value cell, (c) source code of the generated guard combinational circuit

155

Page 4: Synthesis of Asynchronous QDI Circuits Using … · 3) Synthesis of ‘Case’ Clause We recommend employing ‘case’ to model state machine, such as the source code in Example

560nm/120nm for PMOS and 280nm/120nm for NMOS, except for the weak keepers where 160nm/500nm for PMOS, 160nm/1000nm for NMOS.

Fig. 6 depicts the block diagram of the synthesized QDI FIR filter. In part, the layout view is shown in Fig. 7. The synthesized FIR filter operates as follows. State Machine will generate the write address and write enable signal on SRAM_Addr and R/W respectively to write in the input Data_In, thereafter, generate SRAM_Addr, ROM_Addr, R/W signals to read out data and coefficient, and they are held by Latch 3 and Latch 2 respectively, and then fed to the 8-bit Multiplier. The output of the 8-bit Multiplier is held in Latch 4, and then accumulated by the 16-bit Adder.

The performance of the synthesized QDI FIR filter is evaluated and the result is tabulated in Table I. The filter comprises 39,181 transistors, and dissipates 3.7mW at the sample rate of 5MHz (200ns per operation). From the breakdown of the overhead, we can see the combinational circuits and handshake circuits both take up large portion of the overall overhead, ensuing potential optimizations (such as, microcell-interleaving optimization for combinational circuits[11], de-initialized pipeline for handshake circuit optimization, and Dynamic Voltage Scaling technique for overall power dissipation optimization[1]).

IV. CONCLUSIONS In this paper, we proposed a synthesis of asynchronous QDI

circuits from synchronous coding specifications. The complete ‘if-else’, the incomplete ‘if-else’, and ‘case’ clauses (virtually sufficient to describe complex systems) were investigated, and the corresponding synthesis templates were presented, including the pipelines, the generated combinational circuits and the guard controller. In order to validate the proposed synthesis, an 8-tap 8-bit FIR filter was synthesized and evaluated - specifically, it comprised 39,181 transistors, dissipated 3.7mW at the sample rate of 5MHz.

ACKNOWLEDGEMENT This research work was supported by Agency for Science, Technology and Research, Singapore, under SERC 2013 Public Sector Research Funding, Grant No: SERC1321202098. The authors thank A*STAR for the kind support in funding this research.

REFERENCES [1] L. Tong, et al, “An Ultra-Low Power Asynchronous-Logic In-Situ

Self-Adaptive VDD System for Wireless Sensor Networks," IEEE JSSC, v48, n2, pp. 573-586, 2013.

[2] K. S. Chong, et all, “Energy-Efficient Synchronous-Logic and Asynchronous-Logic FFT/IFFT Processors," IEEE JSSC, vol.42, no. 9, pp. 2034-2045, 2007.

[3] J. Sparso and S. Furber, Principles of Asynchronous Circuit Design: A Systems Perspective. Kluwer Academic Publishers, 2001.

[4] J. Cortadella, et al., "Desynchronization: Synthesis of asynchronous circuits from synchronous specifications," IEEE Trans. CAD-ICS, vol. 25, pp. 1904-1921, 2006.

[5] C. A. R. Hoare, Communicating Sequential Processes. Englewood Cliffs, NJ: Prentice-Hall, 1985.

[6] Martin, Alain J. “Synthesis of Asynchronous VLSI Circuits”. Technical Report. California Institute of Technology, 1991

[7] I. Blunno and L. Lavagno, "Automated synthesis of micro-pipelines from behavioral Verilog HDL," in Proc. Int. Symp. ASYNC, 2000, pp. 84-92.

[8] L. Chong-Fatt, et al, "Modeling and Synthesis of Asynchronous Pipelines," IEEE Trans.VLSI, vol. 19, pp. 682-695, 2011.

[9] J. Cortadella, et al., "Petrify: a tool for manipulating concurrent specifications and synthesis of asynchronous controllers," IEICE Transactions on Information and Systems, vol. 3, pp. 315-325, 1997

[10] A. Kondratyev, and K. Lwin, “Design of Asynchronous Circuits Using Synchronous CAD Tools,” IEEE Design Test Comput.,v19, n4, pp. 107–117, Jul.-Aug., 2002.

[11] R. Zhou, et al, “A Low Overhead Quasi-Delay-Insensitive (QDI) Asynchronous Data Path Synthesis Based on Microcell-Interleaving Genetic Algorithm (MIGA),” IEEE Trans. CAD-ICS, in press.

8-bitMultiplier

16-bitAdder

State Machine

8×8-bitData SRAM

RCD

RCD RCD

DCDDCDRCD

RCD

Output

C

RCD

CC CAck_In

C

8×8-bitCoefficient ROM

Read_ack

Write_ack

Data_In

Start

Ack_Out

Data_In_ack

RCDEn_Mult

En_AdderEn_Mult_ack RCD

En_Adder_ack

Req Req

RCD

C

Fig. 6 Block diagram of synthesized 8-bit 8-tap asynchronous QDI FIR filter

Table I Overhead (power dissipation (P, mW), transistor-count (TC) and delay (t, ns)) breakdown for the synthesized Async QDI FIR Filter @ 1.2V, 130nm CMOS

Handshake Circuits

Combinational Circuits

Total

P 44.5% 55.5% 100% (3.7) TC 34.5% 65.5% 100% (39181) t 200 ns per operation

Multiplier

Latches

State Machine

ROM SRAM

Adder

Fig. 7 The layout view of 8-tap Async QDI FIR Filter

156