a case study on approximate fpga design with an open ...4 fpga board, and a monitor with vga port....

6
A Case Study On Approximate FPGA Design With an Open-Source Image Processing Platform 1 st Yunxiang Zhang Engineering Department University of Houston Clear Lake 2700 Bay Area Blvd, Houston, TX, USA 2 nd Xiaokun Yang* Engineering Department University of Houston Clear Lake 2700 Bay Area Blvd, Houston, TX, USA 3 rd Lei Wu Computer Science Department Auburn University at Montgomery 7430 East Drive, Montgomery, AL, USA 4 th Jean H. Andrian Department of Electrical & Computer Engineering Florida International University 10555 West Flagler St., Miami, FL, USA Abstract—This paper presents a case study of approximate design with Field-Programmable Gate Array (FPGA), combining an application of a color to grayscale converter and an open im- age/video processing platform with Verilog hardware description language (HDL). First of all, by integrating two approximations of adders and two approximations of multipliers, together with the exact design, nine different approximations of the design on color to grayscale converter are offered. Second, the image processing platform is presented to demonstrate the proposed work on Nexys-4 FPGA, enabling to capture color images through a low-cost OV7670 camera and display the grayscale results of images on a VGA-interfaced monitor. Experimental results show the difference between different approximations of the design, providing a range of design options corresponding to different quality constrains. Index Terms—approximate design, Field-Programmable Gate Array (FPGA), hardware description language (HDL), im- age/video processing platform I. I NTRODUCTION To date, applications of image processing and computer vision are growing rapidly, bringing many challenges of com- putation speed and power efficiency on traditional software based frameworks. As a result of advantages such as pro- grammability and parallelism on pure hardware design, the implementation on Field-Programmable Gate Array (FPGA) is becoming widely used as accelerator in cloud centers [1]– [3]. Furthermore, approximate design on FPGA offers a big potential to further reduce the computational cost and energy consumption corresponding to different quality bound of re- sults [4], [5]. Under this context, in prior work [8] authors have proposed four approximations of addition models and evaluated the quality of results on a histogram equalization algorithm. The idea was simulated on Matlab so the hardware performance was not estimated. Continuously, in [7] twelve approximate designs on adders and multipliers have been implemented with FPGA. Experimental results show that the minimum slice- energy cost, integrated with approximate#2 adder and ap- proximate#3 multiplier, achieved 25.17% slice-energy saving compared with the exact design by sacrificing the quality of results as 5.69% error for the multiplier and 2.85% for the adder. Additionally, in [6] an open image/video processing plat- form was presented, by interfacing an OV7670 camera and VGA-interfaced monitor. This platform is designed by using Verilog hardware description language (HDL) and imple- mented on a Xilinx Nexys-4 FPGA board. Compared with the existing open source implementations on the design of the Camera-FPGA-VGA data path [9]–[11], this proposed work spends the least FPGA resource (753 LUTs and 277 Register) to the best of our knowledge. Combining the prior works [6] and [7], in this paper we present and demonstrate a thesis study with an application of color to grayscale conversion using the Camera-FPGA-VGA platform. By interfacing a low-cost OV7670 camera as the image input on the Nexys-4 FPGA, it is able to simultaneously display the color and grayscale images on a VGA-interfaced monitor. Specifically, the contributions of this paper include: We presented nine approximate designs of color to grayscale converter, integrating by two approximations of multipliers, two approximations of adders, and the exact designs of adder and multiplier as well. We demonstrated the application of color to grayscale converter with a scalable image/video processing plat- form on Nexys-4 FPGA. As a result, there are minor differences between grayscale images from different im- plementations, offering multiple design options related to different quality requirements of results. We evaluated the performance in terms of slice count and power consumption with target device of Nexys-4 FPGA. Experimental result shows a reduction of the slice count and power dissipation as an increase of the approximation of the design. The organization of this paper is as follows. Section II presents our work with design architecture and the approx- imate design on adders and multipliers. Section III discusses

Upload: others

Post on 13-May-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Case Study On Approximate FPGA Design With an Open ...4 FPGA board, and a monitor with VGA port. Generally the design with FPGA contains three interfaces written by Verilog HDL:

A Case Study On Approximate FPGA Design Withan Open-Source Image Processing Platform

1st Yunxiang ZhangEngineering Department

University of Houston Clear Lake2700 Bay Area Blvd, Houston, TX, USA

2nd Xiaokun Yang*Engineering Department

University of Houston Clear Lake2700 Bay Area Blvd, Houston, TX, USA

3rd Lei WuComputer Science Department

Auburn University at Montgomery7430 East Drive, Montgomery, AL, USA

4th Jean H. AndrianDepartment of Electrical & Computer Engineering

Florida International University10555 West Flagler St., Miami, FL, USA

Abstract—This paper presents a case study of approximatedesign with Field-Programmable Gate Array (FPGA), combiningan application of a color to grayscale converter and an open im-age/video processing platform with Verilog hardware descriptionlanguage (HDL). First of all, by integrating two approximationsof adders and two approximations of multipliers, together withthe exact design, nine different approximations of the designon color to grayscale converter are offered. Second, the imageprocessing platform is presented to demonstrate the proposedwork on Nexys-4 FPGA, enabling to capture color images througha low-cost OV7670 camera and display the grayscale results ofimages on a VGA-interfaced monitor. Experimental results showthe difference between different approximations of the design,providing a range of design options corresponding to differentquality constrains.

Index Terms—approximate design, Field-Programmable GateArray (FPGA), hardware description language (HDL), im-age/video processing platform

I. INTRODUCTION

To date, applications of image processing and computervision are growing rapidly, bringing many challenges of com-putation speed and power efficiency on traditional softwarebased frameworks. As a result of advantages such as pro-grammability and parallelism on pure hardware design, theimplementation on Field-Programmable Gate Array (FPGA)is becoming widely used as accelerator in cloud centers [1]–[3]. Furthermore, approximate design on FPGA offers a bigpotential to further reduce the computational cost and energyconsumption corresponding to different quality bound of re-sults [4], [5].

Under this context, in prior work [8] authors have proposedfour approximations of addition models and evaluated thequality of results on a histogram equalization algorithm. Theidea was simulated on Matlab so the hardware performancewas not estimated. Continuously, in [7] twelve approximatedesigns on adders and multipliers have been implemented withFPGA. Experimental results show that the minimum slice-energy cost, integrated with approximate#2 adder and ap-proximate#3 multiplier, achieved 25.17% slice-energy saving

compared with the exact design by sacrificing the quality ofresults as 5.69% error for the multiplier and 2.85% for theadder.

Additionally, in [6] an open image/video processing plat-form was presented, by interfacing an OV7670 camera andVGA-interfaced monitor. This platform is designed by usingVerilog hardware description language (HDL) and imple-mented on a Xilinx Nexys-4 FPGA board. Compared withthe existing open source implementations on the design of theCamera-FPGA-VGA data path [9]–[11], this proposed workspends the least FPGA resource (753 LUTs and 277 Register)to the best of our knowledge.

Combining the prior works [6] and [7], in this paper wepresent and demonstrate a thesis study with an application ofcolor to grayscale conversion using the Camera-FPGA-VGAplatform. By interfacing a low-cost OV7670 camera as theimage input on the Nexys-4 FPGA, it is able to simultaneouslydisplay the color and grayscale images on a VGA-interfacedmonitor. Specifically, the contributions of this paper include:

• We presented nine approximate designs of color tograyscale converter, integrating by two approximations ofmultipliers, two approximations of adders, and the exactdesigns of adder and multiplier as well.

• We demonstrated the application of color to grayscaleconverter with a scalable image/video processing plat-form on Nexys-4 FPGA. As a result, there are minordifferences between grayscale images from different im-plementations, offering multiple design options related todifferent quality requirements of results.

• We evaluated the performance in terms of slice count andpower consumption with target device of Nexys-4 FPGA.Experimental result shows a reduction of the slice countand power dissipation as an increase of the approximationof the design.

The organization of this paper is as follows. Section IIpresents our work with design architecture and the approx-imate design on adders and multipliers. Section III discusses

Page 2: A Case Study On Approximate FPGA Design With an Open ...4 FPGA board, and a monitor with VGA port. Generally the design with FPGA contains three interfaces written by Verilog HDL:

Fig. 1. Design Architecture of the Image Processing Platform with ThreeApproximate Design Algorithms

the implementation of the proposed system. In Section IV, theexperimental results in terms of hardware cost, power con-sumption, and FPGA prototype are shown. Finally, Section Vpresents the concluding remarks and future work in our targetarchitecture.

II. PROPOSED WORK

In this section, the design architecture of the image/videoprocessing platform is introduced. And then the approximatedesign on multipliers and adders are discussed as well.

A. Design Architecture

Fig. 1 shows the design architecture of the image processingplatform, capable of capturing images through a low-costcamera, processing the images, and displaying the results to aVGA-interfaced monitor.

This platform is composed of an OV7670 camera, a Nexys-4 FPGA board, and a monitor with VGA port. Generallythe design with FPGA contains three interfaces written byVerilog HDL: the Camera Controller, Image Capture, andVGA Master. The Camera Controller is basically an I2Cmaster for configuring functional registers of the OV7670camera. After being configured, the camera is able to captureand send images pixel by pixel through the ‘VSYNC-HREF-DATA’ interface [15].

The Image Capture module receives and stores the data intofour memory blocks, named the ‘Frame Buffer’; each blockcan store one 320 × 240 resolution image. Then, the VGAoutput module reads the data out from buffers and sends themto a VGA-interfaced monitor. The ‘Frame Buffer’ is created byusing the intellectual property (IP) offered by Xilinx Vivado17.4. And another IP of a phase lock loop (PLL) is instantiatedto generate two clocks. The 25 MHz clock is used for thebuffers and the VGA master, and the other 50 MHz clock isused for the Camera Controller.

As shown in Fig. 2(a), the display window with 640 ×480 resolution is able to be split into four regions with 320× 240 resolution for each. The ‘Region0’ is used to displaythe original data from ‘FrameBuffer0’, and the ‘Region1’,‘Region2’, and ‘Region3’ are respectively used to display theresults of approximate#1, approximate#2, and approximate#3,or simply named as ‘AP1’, ‘AP2’, and ‘AP3’ in Fig. 1.

Fig. 2. An Example of Four-Region Display

Fig. 3. Multi-bit Multiplier Design

As an example shown in Fig. 2(b), the color image isdisplayed in ‘Region0’. After being converted into grayscaleimages by using three different algorithms, the results areshown in ‘Region1’, ‘Region2’, and ‘Region3’.

B. Approximate Design on Multipliers

The details of design on approximate multiplier are dis-cussed in this section. First, we present two approximationsof 2×2-bit multiplier. Generally any multi-bit multiplicationindicted as W × W-bit can be rebuilt by four W/2 × W/2-bit multiplier as shown in Fig. 3. Specifically the multi-bit input M1 and M2 can be rewritten as (M1HM1L) and(M2HM2L) with the most significant bit (MSB) M1H andM2H , and the least significant bit (LSB) M1L and M2L. Thesummation of four blocks is the product of W/2 × W/2-bitmultiplier.

Therefore, the approximation of 2 × 2 multiplier is createdas the fundamental block of multi-bit multipliers. Beforediscussing the approximate design, the exact 2 × 2 multiplierimplemented by using the K-map is shown in Fig. 4. TheBoolean expression can be written as

Mulout[0](EX) = A[0]B[0]. (1a)

Mulout[1](EX) = A[1]′A[0]B[1] +A[0]B[1]B[0]′+

A[1]B[1]′B[0] +A[1]A[0]′B[0].(1b)

Mulout[2](EX) = A[1]B[1]B[0]′ +A[1]A[0]′B[1]. (1c)

Mulout[3](EX) = A[1]A[0]B[1]B[0]. (1d)

where Mulout[3], Mulout[2], Mulout[1], and Mulout[0]represent the result from MSB to LSB. And A and B are the2-bit input. A[1] and B[1] denote the MSB and A[0] and B[0]denote the LSB.

The first approximation of the 2 × 2 multiplier is de-signed by replacing the number at A = 11 (Binary) and

Page 3: A Case Study On Approximate FPGA Design With an Open ...4 FPGA board, and a monitor with VGA port. Generally the design with FPGA contains three interfaces written by Verilog HDL:

Fig. 4. K-map of 2 × 2 Exact Multiplier

Fig. 5. K-map of 2 × 2 approximate multiplier

B = 11 (Binary) from Binary ‘0’ to ‘1’, which is shown inFig. 5(a). As a result, the Boolean expression of Mulout[1]can by simplified as

Mulout[1](AP1) = A[1]B[0] +B[1]A[0]. (2)

Assume that the gate count of inverter can be ignored.Compared with the Boolean expression shown in Eq. 1bwhich spends eight AND gates and three OR gates, the firstapproximation on Mulout[1] reduces the gate count to twoAND gates and one OR gate, achieving 90.9% gate reduction.

With the same methodology, the design on Mulout[2] isable to be simplified by modifying the number at A =11 (Binary) and B = 11 (Binary) from Binary ‘0’ to ‘1’,which is shown in Fig. 5(b) and can be rewritten as

Mulout[2](AP2) = A[1]B[1]. (3)

Similarly, instead of using four AND gates and one ORgate, the approximation on Mulout[2] achieves a significantgate saving by using just one AND gate.

C. Approximate Design on Adders

In what follows, the approximations of adder is discussedin this section. The exact design on single-bit adder can bewritten as

Cout = ab+ bc+ ac (4a)

Sum = a′bc′ + ab′c′ + a′b′c+ abc (4b)

Using the same methodology of the design on approximatemultiplier, the approximations of adder can be rewritten as

Cout(AP1) = a+ bc (5a)

Sum(AP1) = a′bc′ + a′b′c+ abc (5b)

as the first approximation and

Cout(AP2) = a (6a)

Sum(AP2) = a′c+ bc+ a′c (6b)

as the second approximation [8]. Compared with the exactdesign of adder, the gate count of the first approximation isreduced by 35.7%, and that of the second approximation isreduced by 68.8% by combining both AND and OR gates.

D. Approximate Design on Color to Grayscale Converter

Integrating the designs of adders and multipliers, it comesup with the implementation of the color to grayscale converter.Basically the algorithm of the converter can be expressed as asum of 29.89% red pixel, 58.7% green pixel, and 11.4% bluepixel, which can be written as

Grayscale = 0.2989× r + 0.587× g + 0.114× b. (7)

To implement the function, it would spend three floating-point multipliers and two adders. Though the floating-pointcomputation can provide more accurate result, the resourcecost is higher compared to the fixed-point calculation. One ofthe case studies on fixed-point design thus can be written as

Grayscale = (76× r + 150× g + 30× b) >> 8. (8)

by multiplying 28 to the floating-point weights in Eq. 7and then rounding them up to an integer. Finally the equationshould be right shifted by eight bits, or divided by 28.

After simplified, Eq. 8 consists of three fixed-point mul-tipliers and two adders. To replace the exact designs of themultiplier and adder with approximate designs, it is able tooffer many different implementations of the converter.

III. IMPLEMENTATION

In this section, details of the design of the image processingplatform is introduced first, including the Camera Controller,the Image Capture module, and the VGA Master. As a casestudy with the approximate design, the color to grayscaleconverter is further implemented to demonstrate the resultswith the image processing platform.

Page 4: A Case Study On Approximate FPGA Design With an Open ...4 FPGA board, and a monitor with VGA port. Generally the design with FPGA contains three interfaces written by Verilog HDL:

Fig. 6. Design of Camera Controller

Fig. 7. Timing Diagram of I2C

A. Camera Controller

Fig. 6 depicts the design of the Camera Controller, contain-ing two main blocks – the OV7670 Register and I2C Sender.The main function of the controller is to configure the OV7670registers through an I2C interface. All the register settings areinitialized in the OV7670 Register block. The configurationsrefer to an open source work in [9], involving the specificvalues of registers such as ‘HSTART’, ‘HSTOP’, ‘VSTART’,‘VSTOP’, ‘HREF’, ‘VREF’, ‘QVGA’, etc. [15]

Through the I2C interface, the configuration is able to besent as the timing diagram shown in Fig. 7. For each registerconfiguration, it consists of five stages: the start stage withthe high clock line (SIOC) and the falling-edge of data line(SIOD), the stop stage with the high clock line (SIOC) andthe rising-edge of data line (SIOD), as well as the stageswith ID, address, and associated data. The ID would specifya write command as hexadecimal ‘0x42’ and a read commandas hexadecimal ‘0x43’.

B. Image Capture

The design of the Image Capture module is shown in Fig. 8,including three main blocks: a latch for holding the 8-bitinput, a control module for switching the hold activities, anda converter to transform two 8-bit input data into one 12-bit output. The data fed in by the OV7670 camera is in theRGB565 format, including 5-bit red pixel, 6-bit green pixel,and 5-bit blue pixel. The data stored into the buffers shouldbe in the RGB444 format, including 4-bit red, 4-bit green,and 4-bit blue. Thus the main function of the Image Capturemodule is to receive images from the camera, transform eachpixel from RGB565 to RGB444 format, and store the imagesinto buffers pixel by pixel.

The timing diagram is shown in Fig. 9. Each RGB444 dataoutput spends two clock cycles, one cycle for holding the first8-bit data, and another cycle for converting the combined two8-bit data into 12-bit output. More specifically, it takes twocycles to hold one 16-bit color pixel in RGB565 format. At

Fig. 8. Design of Image Capture

Fig. 9. Timing Diagram of Image Capture

the second cycle, the output would be 12-bit color pixels inRGB444 format.

C. VGA Master

The VGA Master is implemented to display the originalcolor image in ‘Region0’ and the results of the image fromdifferent approximate designs in ‘Region1’, ‘Region2’, and‘Region3’. Using the two control signals, horizontal sync(Hsync) and vertical sync (Vsync), it is able to split the 640× 480 resolution window into four different regions.

As shown in Fig. 10, to display pixels in each horizontalline of an image needs three timing slots, the back porch (BP),the front porch (FP), and the display time (DIS). Betweendisplaying two different horizontal lines, one time slot of pulsewidth (PW) should be added. Therefore, to separate ‘Region0’and ‘Region1’ shown in Fig. 2 (or ‘Region2’ and ‘Region3’),the first half of display time would be used to feed in thepixels in ‘Region0’ (or ‘Region2’), and the second half ofdisplay time would be used to feed in the pixels in ‘Region1’(or ‘Region3’).

Similarly, in the vertical direction four time slots are definedas well, the back porch, the front porch, the display time, andthe pulse width. As a 640 × 480 size of window, the first halfof vertical lines in display time would be applied to feed inthe lines of pixels in ‘Region0’ (or ‘Region1’), and the secondhalf of vertical lines in display time would be applied to feedin the lines of pixels in ‘Region2’ (or ‘Region3’). In such a

Page 5: A Case Study On Approximate FPGA Design With an Open ...4 FPGA board, and a monitor with VGA port. Generally the design with FPGA contains three interfaces written by Verilog HDL:

Fig. 10. Timing Diagram of VGA

TABLE INEXYS-4 VGA TIMING

Symbol Vertical Sync Horiz.SyncTime Clocks Lines Time Clocks

T s 16.7 ms 416800 521 32 us 800T disp 15.36 ms 38400 480 25.6 us 640T pw 64 us 1600 2 3.84 us 96T fp 320 us 8000 10 640 ns 16T bp 928 us 23200 29 1.92 us 48

way, it is able to display up to four 320 × 240 images intothe window size as 640 × 480.

The specific time related to each time slot is shown in thesecond column and the fifth column in Table I [12]. Using a25 MHz clock, the time slots can be realized by designing twotimers for counting the number of lines shown in the fourthcolumn and the number of pixels shown in the sixth column.Adding all the time slots in the vertical direction together asT fp+ T pw + T disp+ T bp = 16.7 ms, the refresh rateof the images can be calculated as 1/16.7 ms = 60 Hz.

D. Approximate Design

In [7], the power consumption of each approximation ofadders and multipliers has been estimated. As a result, thehigher of the approximation of designs, the lower of powerdissipation can be achieved.

By integrating two approximations of adders and two ap-proximations of multipliers, together with the exact designs,this work offers nine different approximations of color tograyscale converter. As shown in Fig. 11, there are threeoptions of multipliers: exact design or EX, approximated#1or AP1, and approximate#2 or AP2. Likewise, three optionsof adders, exact design or EX, approximated#1 or AP1, andapproximate#2 or AP2, are provided as well.

Finally, we combine the approximate design of the con-verter with the image processing platform to demonstrate anapplication on FPGA.

IV. EXPERIMENTAL RESULTS

In this section, the combination of the design on color tograyscale converter and the implementation of the image pro-cessing platform is synthesized by using Xilinx Vivado 2017.4with the target device of Nexys-4 FPGA. The performance isevaluated in terms of slice count and power consumption. As aresult, the original images and the grayscale results of differentdesigns are demonstrated with a Nexys-4 FPGA.

Fig. 11. Designs of Color to Grayscale Converter

TABLE IIRESOURCE COST ON FPGA

Mul-ADD LUT RegisterEX-EX 940 265EX-AP1 934 265EX-AP2 930 265AP1-EX 939 337AP1-AP1 937 337AP1-AP2 933 337AP2-EX 845 181AP2-AP1 840 181AP2-AP2 837 181

A. Resource Cost

As shown in Table II, the resource cost are mainly deter-mined by slice count of look-up-tables (LUTs) and registers.For simplicity, in the first column different approximations ofdesigns are named as the approximation of multiplier with theapproximation of adder. For example, ‘EX-EX’ represents thatthe converter is integrated by exact design of the multiplier andadder, and EX-AP1 means that the converter is integrated byexact design of the multiplier and approximate#1 design ofthe adder.

Generally speaking, the slice count would be reduced asthe approximation level increases as shown in this Table. Forexample, the resource cost of the design with approximate#2multipliers and approximate#2 adders, which is shown in thetenth row of the Table, is 837 slices of LUTs and 181 slices ofregister. However, the design with exact multipliers and addersshown in the second row spends 940 slices of LUTs and 265slices of register, which is respectively 1.12× and 1.46× thenumber of the approximate design of AP2-AP2.

The slice count between two adjacent rows are similarbecause the synthesis results are based on the entire systemincluding not only the approximate design but also the imageprocessing platform. The difference between the approxima-tions of the design has been presented in prior work [7]. Thispaper focuses on the demonstration combining the approxi-mate design with the open platform.

B. Power consumption

Table III shows the power consumption of different designs.The total power (TP) shown in the second column is the sum

Page 6: A Case Study On Approximate FPGA Design With an Open ...4 FPGA board, and a monitor with VGA port. Generally the design with FPGA contains three interfaces written by Verilog HDL:

TABLE IIIPOWER CONSUMPTION ON FPGA

Mul-Add TP(mW)

DP (mW) SP(mW)Clk Sig Logic RAM PLL I/O

EX-EX 218 2 4 1 8 97 4 102EX-AP1 217 2 4 1 8 97 4 101EX-AP2 215 2 4 1 7 97 4 100AP1-EX 220 2 4 1 9 97 4 103AP1-AP1 218 2 4 1 9 97 4 101AP1-AP2 218 2 4 1 9 97 3 102AP2-EX 203 1 1 1 1 97 4 98AP2-AP1 203 1 1 1 1 97 4 98AP2-AP2 203 1 1 1 1 97 4 98

Fig. 12. FPGA Approximate Demonstration

of static power (SP) and the dynamic power (DP).As the FPGA design, the dynamic power is mainly com-

posed of the power dissipation of clock, signals, logic, BRAM,PLL, and I/Os, as shown from the third column to the eighthcolumn. In other words, the number of slices and I/Os used bythe design, and the toggle rate of each nodes of the network onFPGA determine the dynamic power consumption [13], [14].

As shown in Table III, the similar trend of the resourcecost is obtained, that is, the higher level of the approximations,the larger savings of the power dissipation can be achieved.For example, the total power consumption of the combinationof approximate#2 multipliers with approximate#2 adders, asshown in the tenth row, is 93.12% of the exact design shownin the second row.

Again, there are minor differences of power cost betweenadjacent rows since the image processing platform contributesa large proportion of the total power consumption.

C. FPGA Prototype

Finally, the application is demonstrated on Xilinx Nexys-4 FPGA, as shown in Fig. 12. The ‘Region0’ is used todisplay the original color image, and the ‘Region1’, ‘Region2’,and ‘Region3’ are used to show the results from differentapproximate designs. Three monitors are needed to displaynine grayscale images at the same time.

As a result, there are minor differences between grayscaleimages from different designs. With different quality require-ments of the specific application, our work provides a widerange of design options.

As a case study, the demonstration of this work is based on asimple converter design combined with three multipliers andtwo adders. Considering a large scale system, our proposedwork has a big potential to reduce the slice count and powerdissipation by using the approximations of submodules likeadders and multipliers.

V. CONCLUSION AND FUTURE WORK

This paper comes from a thesis study combing two priorworks, one open image/video processing platform on Nexys-4 FPGA and one approximate design library including twoapproximations of adders and two approximations of multi-pliers. By demonstrating the application of color to grayscaleconversion on FPGA, it provides nine different design optionscorresponding to different quality bounds.

The future work of this thesis is to explore the tradeoffbetween quality and energy cost, and specify the qualityconstrain for different implementations of the approximatedesign.

REFERENCES

[1] S. Che, J. Li, J. W. Sheaffer, et al., “Accelerating Compute-IntensiveApplications with GPUs and FPGAs,” 2008 Symposium on ApplicationSpecific Processors, Anaheim, CA, pp. 101-107, 2008.

[2] A. M. Caulfield et al., “A cloud-scale acceleration architecture,” 201649th Annual IEEE/ACM International Symposium on Microarchitecture(MICRO), Taipei, pp. 1-13, 2016.

[3] D. Firestone, A. Putnam, S. Mundkur, et al., “Azure AcceleratedNetworking: SmartNICs in the Public Cloud,” Proceedings of the 15thUSENIX Symposium on Networked Systems Design and Implementa-tion (NSDI 2018), pp. 51-64, 2018.

[4] W. Shi, et al. “Edge Computing: Vision and Challenges,” IEEE Internetof Things, vol 3, no. 5, pp. 637–646, Oct. 2016.

[5] X. Yang, et al., “A Vision of Fog Systems with Integrating FPGAs andBLE Mesh Network,” Journal of Communications (JC) , Vol. 14, No.3, PP. 210-215, March 2019.

[6] X. Yang, Y. Zhang, L. Wu, “A Scalable Image/Video Processing Platformwith Open Source Design and Verification Environment,” 20th Interna-tional Symposium on Quality Electronic Design (ISQED 2019), in press,March 2019.

[7] Y. Zhang and X. Yang and L. Wu and et. al, “Exploring Slice-Energy Saving on An Video Processing FPGA Platform with Approx-imate Computing,” Intl. Conf. on Algorithms Computing and Systems(ICACS2018), pp. 138-143, July, 2018.

[8] Y. Zhang, X. Yang, and L. Wu, et al, “Hierarchical Synthesis of Approx-imate Multiplier Design for Field-Programmable Gate Arrays (FPGA)-CSRmesh System,” Intl. Journal of Compt. Applications (IJCA), Vol.180, No. 17 PP. 1-7, Feb. 2018.

[9] Mike Field, “Zedboard OV7670,” http://hamsterworks.co.nz/mediawiki/index.php/Zedboard\ OV7670

[10] C. Ababei, et al., “Open source digital camera on field programmablegate arrays,” Intl. Journal of Handheld Computing Research (IJHCR),Vol. 7, No. 4, PP. 30-40, 2016.

[11] C. Ababei, et al., “Open source digital camera on field programmablegate arrays,” IEEE Intl. Conf. on Electro Information Technology (EIT),Grand Forks, ND, May 2016.

[12] “Nexys 4 FPGA Board Reference Manual”, Rev. B, Digilent, Sunnyvale,CA, USA, April 2016.

[13] X. Yang and et al, “A Novel Bus Transfer Mode: Block Transfer andA Performance Evaluation Methodology,” Elsevier Integration the VLSIJournal, vol.53, pp. 23-33, Jan, 2016.

[14] X. Yang and J. Andrian, “A High Performance On-Chip Bus (MSBUS)Design and Verification,” IEEE Trans. Very Large Scale Integr. (VLSI)Syst., Vol. 23, Issue: 7, PP. 1350-1354, Sept. 2015.

[15] OmniVision., “OV7670/OV7171 CMOS VGA(640X480) CAMER-ACHIP with OmniPixel Technology,” Version 1.01, July, 2005.