ringtree - a vlsi architecture for fast image generation and processing (1988)

8/6/2019 Ringtree - A Vlsi Architecture for Fast Image Generation and Processing (1988)

1/4

RINGTREE : A VLSI ARCHITECTURE FOR FAST IMAGEGENERATION AND PROCESSING

K . S . E o , S . S . K i m a n d C . M . K y u n g

Department of Electrical Engineering, KAISTP.O. Box 150, Cheongryang, Seoul 131, Korea

ABSTRACTThis paper describes a new hardware architecture calledRingtree for 2-D geometry generation and processing such asimage processing (noise suppression, notch elimination and con-tour extraction), graphics processing (polygon filling and multi-windowing with nonrectangular window) and VLSI layout verifi-cation (design rule checking). Ringtree consists of Ring memorywhich is a special rotating frame buffer, EFT(Edge PaintingTree) which is a polygon rasterizing hardware and LPA(LinearProcessor Array). LPA executes a set of basic operations a pplicable for bit map data manipulation while EPT generates the

bit map data from a set of scanline commands received fromhost processor. VLSI implementation issues are also discussedfor practical display screen size of 1024x1024 pixels, utilizingthe concept of Super T r 4 5 ] for realizing the whole system us-ing identical VLSI chips.

I . INTRODUCTIONOwing to the recent drastic advances in VLSI process anddesign techniques, tremendous efforts are now being made to-

ward the development of various hardware accelerators fordiverse applications such as image processings, computer graph-ics, and layout verification in VLSI design exploiting hardwareparallelism.[ 1-51In dealing with 2-dimensional image generation and pro-cessing, we note that there are basically two kinds of processingalgorithms. One is what we call global processing which con-siders very wide area of pixel plane in calculating the transformresult of one pixel value, while the other is what we call localprocessing which considers only the neighboring or near-rangepixels for the updating of each pixel value. Global processingis generally very complicated and less repetitive compared t othe local processing, and therefore, suitable for software imple-mentation. On the other hand, the individual complexity of lo-cal processing is very small, while the number of repetition ishuge, i.e., as many as the number of pixels in each frame forgenerating one frame data. Local image processing includingnoise filtering, contour extraction and geometrical layout verifi-cation can, therefore, be speeded up drastically by utilizinghardware parallelism. Actually, many hardware architecturesfor 2-dimensional image processing are mainly targeted for im-plementing the above-mentioned local processing algorithmrather than the global om , for which cytocomputer[l], bitmap processor[2] and wire routing machine[3] ar e typical ex-amples. In cytocomputer[l], bit map data representing the 2-D geometry space is scanned one pixel by one in row-majororder and processed by a pipelined processor chain where eachunit processor executes one of the basic operations into whichthe desired instruction is decomposed. Each processor in thepipeline stage itself consists of a subarray processor implement-ed as combinational logic circuitry and a shift register t o per-form the neighborhood operations m the vertical direction as

well as in the horizontal direction. In this architecture, howev-er, the complexity of each processor becomes excessive becauseeach processor has t o contain a shift register array whoselength is proportional to ( length of two rows of the bit mapplane ) x ( number of layers ). Moreover, the structure ofthe unit processor and the number of stages in the processorpipeline is tied up t o a specific bit map size and a specific setof operations. AISO, its time complexity is ~ ( n ) or n ~n bitmap planes due to the serial processing with no hardwareparallelism.Bit map processor in [2] describes an array processor fortwo-dimensional bit map data where the cells called SAM cellare linlied in a two-dimensional mesh-connected network.

Another example of this scheme is an interconnected array ofmicrocomputers to solve the wire routing problem described in[3]. These schemes need minimal external data flow because bitdata in each cell are directly accessible from its neighbors,resulting in high speed performance. However, the complexityof this scheme is somewhat excessive even with the presentVLSI technology since the size of the bit map data could easilybe more than loo0 X loo0 in VLSI layout. As a trade off interms of hardware complexity and performance between thecytocomputer[l] and bit map processor[2], we proposed ahardware architecture called MultiRing[6] which requires O(n )processors for n x n pixel planes and was proven to be capa-ble of performing DRC(Design Rule Check) for VLSI layoutverification. Another point of major concem in designing im-age processing or graphics system is the method of receivingthe imagefgraphics data from host processor and store them insuch smart frame buffers as MultiRing[6] for optional local 2-dimensional image processing. To reduce the bandwidth re-quirement on the channel between host CPU and theimagdgraphics processing engine, another hardware called EdgePainting Machine[7] was proposed, which is used for convertingscanline commands into a stream of pixel data to be stored inframe buffer memory.

In this paper, we propose a new hardware architecturecalled Ringtree performing various 2-D image processing, imagegeneration with scanline command processing and multi-windowing for graphics applications. Ringtree is composed ofthree parts, EPT[7], Ring memory which is a kind of smartframe buffer[6] and LPA as shown in Fig. 1. Ring memory isa special frame buffer where the pixel data is rotating m theirrespective row, while undergoing necessary transformation asthey pass through two vertically long gates. One gate, LPA(Gate 1) is responsible for performing most of the local 2-Dimage processing operations. The other gate (Gate 2) is usedfor the purpose of either transmitting the scanline image gen-erated at the EPT to the frame buffer (Ring memory) ormasking the rotating pixel data with the mask data availableat the leaves of th e EPT. Ringtree has two input ports, i.e.,root processor port of the EPT receiving the scanline com-mands and the serial input port at the Gate 2 receiving the

ISCAS88801

CH2458-8/88/0000-0801$1OO 0 988 IEEE


2/4

pixel data in serial fashion. Detailed description of EPT, Ringmemory and LPA are given in section II and III respectively.The VLSI implementation issues of Ringtree is discussed in sec-tion IV. Finally, three major applications for Ringtree, i.e.,image processing, graphics processing and VLSI layout verifica-tion are explained in section V with the simulation results.11. EPT(EDGE PAINTING TREE)

As is shown in Fig. 2, EPT consists of a pipelined binarytree of specialized processors and image buffer whose cells areone-to-one connected to each leaf of the tree. The imagebuffer is only one scanline long and temporarily stores the pixeldata produced at the leaves of the tree. To explain the opera-tion of EFT, we assume a scanline command @I.,,=) entersthe root. We also assume that there are P pixels per scanlineand there are P/2 first-level &(leaves), and P/4 second-levelnodes, ... and one n-th (n=log P) level node(root). XL andXR are then represented as n-bit %nary numbers, respectively.The basic operation of the root processor is to comparethe MSB of XL and XR. When the MSB of XL is '0' andthe MSB of XR s 'l',he painting region is divided into twosubregions, i.e., one region is in the left subtree extending fromXL to the maximum coordinate of the left subtree, and theother region to be painted in the right subtree extends from theminimum coordinate of the right subtree to XR. If MSBs ofXL and XR are both Q' , the painting region is to be d i n e din the left subtree. If both MSB's of XL and XR are all 'l',the painting region is to be confined in the right subtree. Thedata is transmitted only to the relevant subtree(s) with theMSB of current data stripped off. Therefore, the XL and XRdata in the processors just below the root is (n-1)-bit long andthe subsequent data movement is performed according to theMSB's of current XL and XR . Similar operations occur ineach node processor of the binary tree.

Fig. 3 describes the ( X L ,XR ) data to be propagated to theleft and right child of the current node according to the condi-tion of MSBs of (XL,XR) data of the current node. Scanlinecommand processing at each level of the binary tree occurs ina pipelined fashion, such that while the second MSB's are com-pared in the next lower level (children of the root ), the MSBsof XL and XR in the next scanline command are used in theroot node for determining the redirection of the scanline com-mand. Since the current MSBs of XL and XR data in thepresent level of th e tree are used only for data redirection, andstripped off when the XL and XR are propagated down thenext level, there is no more bit to propagate in the leaf nodelevel where the pixel data written onto the corresponding posi-tions of the image buffer according to the @I.,,=) values atthe leaves.

While the function of EPT is the same as that of rastergraphics engine in the Super Buffer[4], the hardware complexityof EFT is significantly reduced Owing to the use of one-bitcomparator rather than log,P-bit comparator in (41. ( P is thenumber of pixels per scanline. )

111. RING MEMORY AND LPAAs was shown in Fig. 1, Ring memory is a special framebuffer which is a recirculating two-dimensional memory plane.The Ring memory can be implemented as multiple rows ofdymmic shift registers. The pixel data stored in Ring memorycncles around in a column-synchronized fashion and undergoesnecessary transformations as they pass through LPA(LinearProcessor Array).Fig. 4 shows a block diagram of the unit processor in LPAwhich consists of three switches (nxl -S W, nx2-SW, 2x1-SW)

and three submodules (Boolean module, Geometry module andLoad-back stage) , where the number of bit map planes is n .n xl-SW is a simple switch to select one plane out of the n in-

put planes in the Ring memory. We let Ct denote the selectedpixel data in the current unit processor at time point t ( t = -,0, + signifies past, present and future, respectively)The output of n x l switch will be stored m a 3-stage shiftregister as C-, O and C + constructing 3x3 window withthose from the upper unit processor LU-,U', U+) nd ttmefrom the lower unit processor ( L - , L , + ) as shown in Fig.5 . This 3x 3 window becomes the data for the geometryoperations. On the other hand, C,,C , , C,, and C,are also fed to nxZ-SW, where two (B , , B2) of them areselected as the inputs to the Boolean module which executesvarious Boolean operations such as AND, OR, NOT and Copybetween planes. The outputs of the Geometry and Booleanmodules are fed to 2x1-SW whose output is Po. P is fed tothe output stage either to replace or to be ORed &h one ofc,, c,, c,, . . . and C,,. In the output stage,c,, c,,c,, . . . and C,, are delayed by four clocks to besynchromed with P o .

Basic instructions of LPA consists of 4 Boolean instructionsand 16 geometry instructions.1) Boolean instructionsAND, OR, NOT, and Copy operations.2) Geometrical instructionsThere a re three geometrical instructions, 'Shriak'(SHR) and'Expand'(EXP) and Move'(M0V). When the EXp(SHR) in-struction is executed for each geometrical primitive, its widthsin X and Y directions are enlarged(shrunken) as much as 1 unitlength.3) Detection instructionsDetection instructions are respwsible for searching for thespecific pattern in the given 3 x 3 window, and consist of interior detection (INDTC) and exterior detection (EXDTC). Theseare described using Boolean expression as follows.

The Boolean instructions that LPA performs are pixelwise

m c P . ( (UO .LO)+(C c+ )+ (L - .U+8-.i$+ (U - . L + . i - . O )) eq(2),where -(tilde) denotes complementing.

I " C is used to check for the width rule in the 3 X 3window shown in Fig. 5 . A prerequisite for I " C to be '1' isthat @, the center of 3 X 3 window, should be 1, which is thefirst term of eq.(l). 2 A width rule error in the vertical aswell as horizontal direction is reflected in the first and secondterms within the outemnost parenthesis (OR term) in eq.(l).The third and fourth OR-term in eq.(l) check for the widthrule error in the two diagonal directions. Exterior detection in-struction, EXDTC can be explained in a similar fashion asINDTC except that all the Boolean values are complemented.

IV. VLSI IMPLEMENTATIONTo implement the aforementioned Ringtree in VLSI tech-nology for 1024x1024 pixel graphic display system application,the whole Ring memory frame buffer is first divided mto 64slices each of which consists of 1024 (pixeldrow) X 16 (rows)= 16 K pixels. Accordiogly, the number of processom in LPAin each Ringtree chip is reduced to 16. Subdivision of the W ebinary tree (EPT) which is 10 levels deep with 1024 leaves into64 identical un i ts is less straightforward. A small EFT 4 levelsdeep and 16 leaves wide is incorporated into each chip, whilethe upper six levels are realized with a linear cham called

EPC(Edge Painting Cham) as shown in Fig. 6. A similar ideawas implemented in a binary pipelined multiplier m [SI.EPC isconstructed of a cascade of processing elements (PE's) and ad-

802


3/4

dress registers. Each PE receives two n-bit data XL and XRfrom its upper level PE and sends two (n-1)-bit data, XL ' and)(R' to its lower level PE, according to the values of theMSBs of XL and XR , as well as the corresponding bit valueof the chip address, which is unique to each Ringtree chip andthus supplied and stored in the PROM resident in each chip.

V. VARIOUS APPLICATIONS OF RINGTREE1) Image ProcessingNoise suppression; Noise is defined as small image frag-ment whose widths in either vertical or horizontal dimension areless than a specified number of pixels. Noise can be suppressedthrough two steps of basic instructions, that is, shrinking theoriginal image by the amount of the specified noise width andexpanding the shrunk image by the same amount.Notch elimination; Notch is a noise in the bit complementplane of the original image plane. Therefore, notch can be el-iminated through the reverse order of the two steps applied tothe noise suppression.Contour extraction; The contour extraction is obtained byANDing the original image and the negated image of the im-age obtained by shrinking the original image by the amount ofone pixel width.2) Graphics processingsMost of recent graphics workstations require fast andpowerful multi-window functions. On top of the image genera-tion capability of EPT, Rugtree is capable of performing suchfunctions as pattern filling, PD(BLT(PIXe1 BLock Transfer)and hardwired multi-window function including nonrectangular,nonconvex window.3) Layout VerificationAnother application of Ringtree is DRC(Design RuleCheck) in VLSI layout. Compared to any other hardware ar-chitectures, Ringtree is very flexible against the variations ofdesign rules since the number of basic operations into which therequired instruction is decomposed does not affect the hardwarearchitecture. The design rules described in this paper were tak-en from [8].The application programs running on Rugtree for DRC inIC layout are presented in a simplified form. The layers 2 and3 in Ring memory are assumed as reserved layers prohibitedfrom being used as input layers when Ringtree is used for DRCapplications.Widlh rule checking; Since the interior detection opera-tion, INMC of LPA checks only for 2 A width rule (where 1A width rule is illegal, while minimal 2 A width is legal), suc-cessive shrinking and interior detection instruction is required forreporting errors for all patterns having less than n A widths.Spacing rule checking; Since the intra-layer spacing rulecheck is a problem which is complementary to the width rulecheck, it can be understood in a very similar way. The inter-layer spacing rule check n& extra processes such as zerointer-layer spacing check and the elimination of the bays in thetwo individual layers whose widths are smaller than n A's. Thebay elimination prevents checking the intra-layer spacing errorsin the two layers.Extension and Enclosure rule checking; The extensionrule check is to report the insufficient amount of extension ofthe patterns in one layer over the patterns in another layer, forexample, polysilicon over diffusion, depletion implant over PO-lysilicon, etc. The extension rule check algorithm consists of 1A expansion of all the patterns in layer B in four directionsand subtraction of layer A from layer B, which is followed by(n+l) A width rule check for layer B.The extension rule checkalgorithm can be used to check for the enclosure rule error,where layer A is regarded as the contact cut layer.Simulation results of DRC for an example shown in Fig.

7(a) is illustrated in Fig. 7(b). This example consists of rectil-inear patterns in layer A and B. Four kinds of design rules,that is, width, spacing, extension, and enclosure rules were test-ed successfully.V I . CONCLUSION

In this paper, we proposed a hardware architecture calledRingtree for image processings, graphics processings and layoutverification in VLSI. The proposed processor consists of threeparts, Ring memory which is a special memory, EFT which is arasterization processor with a binary tree structure, and LPAwhich executes the modification of bit map data stored in Ringmemory. Various application examples of Ringtree were ex-plained with the results software simulation using C language.VLSI implementation issues of the Ringtree for 1024 X 1024pixel graphics applications was also studied using the slicingstructures, where each slice consists of horizontal strips of Ringmemory, small EPT and EPC.

REFERENCES[l] R. A. Rutenbar, T. N. Mudge and D. E. Atkins, "A classof Cellular Architectures to support Physical Design Auto-mation." IEEE Transaction on CAD of IC's and systems,[2] T. Blank, M. Stefik and W. vanClemput, "A Parallel BitMap Processor Architecture for DA Algorithms." Proc.18th Design Automation Conference, pp 837-845, 1981[3] R. Nair, S . J. Hong, S . Lila and R. Villani, "Global Wir-ing on a Wire Routing Machine." Proc. 19th DesignAutomation Conference, pp 224-231, 1982[4] N. Gharachorloo and C. Pottle , "SUPER BUFFER : ASystolic VLSI Graphics Engine for Real Time Raster Im-age Generation", 1985 Chapel Hill Conference on VeryLarge Scale Integration , Computer Science Press, Inc., pp.[5] J. Poulton, H. Fuchs, J. D. Austin, J. G. Eyles, J.Heinecke, C. H. Hsieh, J. Goldfeather, J. P. Hultqukt, S .Spach, : "PIXEL PLANES : Building a VLSI-BasedGraphic System", 1985 Chapel Hill Conference on VeryLarge Scale Integration , Computer Science Press, Inc., pp.[6] K. S . Eo and C. M. Kyung, " A Two-DimensionalGeometry Processor for DRC Applications", Proc. 1987

IEEE Region 10 Conference, pp. 266-270, Aug. 1987,

Vol. CAD3,No.4, pp 264-278, October 1984

285-305.

35-60.

Seoul, KO--[7] S . S . Kim, K. S . Eo and C. M. Kyung, "Edge Painting

Machine : A Hardware for Image Rasterization", Proc.1987 IEEE Region 10 Conference, pp. 115-119, Aug.1987, Seoul, KOREA.[8] C . Mead and L. Conway, "Introduction to VLSI system."Addison Wesley, 1980, Chapter 2.

Edge

Gore 2

Rlng Memory '

Gore 1(LinearProcesxxArroy I

Ipixel dofo in pix el'd ora out(t o Display)

Figure 1. Block diagram of Ringtree.

803


4/4

scnnline ammonds( X - le f t , X-right)root

Figure 2. Block diagram of Edge Painting Tree.

XLXR

Figure 3. Rule for determining XL and XR data for the leftand right child from the MSB of the XL and XRdata of their parent, where R(N) denotes (n-1)-bitnumber made by dropping the MSB of n-bit biaarynumber, N

*+. . n I l lI I t----

IIIIIIIII

Ic;LFigure 4. Block diagram of the unit processor m the LinearProcessor Array.

c- CO c+ElFigure 5. The 3 x 3 window of bit map data used as input ofLPA

Videosipnal

. .

P m s s o r 0 Rocessor 1 Processor63

Figure 6. Implementation scheme of Ringtree using 64 denticalVLSI chips.

1

2

Figure 7. Simulation results of Ringtree for DRC applications(a) Two mput layers A and B(b) Simulation results of Ringtree for the given inputlayers. Locations of various DRC errors areshown with their identification numbers explainedbelow.1 width error for layer A.2 spacing error for layer A.3 spacing error between layer A and B.4 extension error of layer B from A.5,6 width error to diagonal direction of layer A.7 enclosure error of layer A from B.

804

ringtree - a vlsi architecture for fast image generation and processing (1988)

Documents