comp20592: laboratory project manual integrated image...

29th January 2010

COMP20592: Laboratory Project Manual

integrated Image Manipulation Platform

Schools of Computer ScienceUniversity of Manchester

January 2010

PIC motors

camera

Video Decoder (VDEC1)

FPGA

SRAM

screenARM

I2C

ARM Microcontroller Board

PAL

YCbCr

RGB

1

29th January 2010

Index

Chapter 1 Introduction 3

Chapter 2 Lab Organisation 7

Chapter 3 Image Processing Task 12

Chapter 4 Video Capture Tasks 23

Chapter 5 I2C Master Communication Task 51

2

29th January 2010

cessntre

sign,ting,

tion

hard-

mple-levant

ingts ofwarewarecom-

ement

esignience -project

aloguedigitaluouslybeinge dis-m theeingback-

COMP20592: Second Year CSE Projectintegrated Image Manipulation Platform - iIMP

Chapter 1: Introduction

Project Aims• To learn about the process of working on a large project by ‘doing’• To learn about how to integrate a system of many components parts together• To design an integrated Image Manipulation Platform which can capture an image, pro

it so as to compute a camera position to (for example) track object movement or to cethe largest object.

• To strengthen practical hardware and software skills by giving experience of system decommunication protocols, operating systems, high level programming, simulation, tesdebugging and hardware implementation.

• To have exposure to industry standard CAD tools• To enhance transferable skills such as team working, time management and presenta

abilities.

Learning Outcomes• Intellectual Skills - developing a systematic approach to testing and debugging of both

ware and software.• Knowledge and Understanding - understanding the specification, analysis, design, i

mentation and testing of embedded systems through the practical application of retheory and techniques.

• Practical Skills - familiarity with the use of microcontroller development and debuggtools, enhancing ability to write elegant, maintainable, efficient and significant amounC code, enhancing ability to specify a significant hardware system in the Verilog harddescription language, use of Cadence CAD tools for simulation and debugging hardmodels, experience of I2C communication protocol, experience of integrating systemponents.

• Transferable Skills and Personal Qualities - the development of basic project managand team-working skills, development of report writing and oral presentation skills.

Overview of the integrated Image Manipulation PlatformWhile most of the technical background can be imparted by lectures, when it comes to dthere is no substitute for experience. So as Albert Camus said, “You cannot create experyou must undergo it”. To enable students to be exposed to such an experience, a groupon the integrated Image Manipulation Platform shown in Figure 1.1 will be undertaken.

The envisaged operation of the hardware system in Figure 1.1 is now described. The ancamera output passes to the Video Decoder Board which continuously outputs frames asinformation. These images are converted to the format needed by the screen and continwritten into the SRAM memory. Three frames are used by the system: one that’s currentlywritten to, one that is being displayed on the screen and the other is the next frame to bplayed. When the software on the ARM is ready to process a frame, it reads a frame froSRAM via the FPGA (Field Programmable Gate Array); this frame is the frame currently bdisplayed. The image is then processed e.g. to determine which parts of the image are

3

29th January 2010

somecentreto theticalARM

en theread-

le Ver-s thehard-parti-

ground and which are objects. The image processing software on the ARM determinesposition that it requires the camera to be at in order to achieve some particular aim, e.g.the camera on the largest object in the frame. This pixel position is then communicatedPIC via the FPGA across an I2C bus. The PIC will then convert this into horizontal and verangular movements which are used to drive the motors so as to reposition the camera. Thevia the FPGA continually polls a status register in the PIC across the I2C bus to detect whmotors have stopped moving. The cycle of activity can then repeat with the ARM softwareing and processing the currently displayed frame.

The hardware components shown in Figure 1.1 are provided plus some of the synthesisabilog required for the configurable hardware in the FPGA. No software is provided to procesimage, communicate with the PIC or drive the motors. The group task is to implement theware and software required in order to operate the iIMP as described above. This can betioned into the following six distinct tasks, for which an overview is now briefly given:

PIC motors

camera

Video Decoder (VDEC1)

FPGA

SRAM

monitorARM

I2C

ARM Microcontroller Board

PAL

YCbCr

(Version 2)

Figure 1.1: Hardware Components For iIMP Group Project

RGB

4

29th January 2010

as tohich

ftwareter 3.

. Thissyn-

here-

thepix-en-

olourd by

ludedmage

erateanslate

Vid-coderowev-acedyou.

ftware

lan-eogreed

bleom therther

rs sootory-xis

1. Image ProcessingThe aim of this task is to read an image held in the framestore, analyse the image sodistinguish the image background from objects and then locate the largest object wenables a camera position to be computed to centre the largest object. This is a sotask with the algorithms coded in C. Further details of this task can be found in chap

Video CaptureFrame data from the Video Decoder board is captured in the framestore (the SRAM)has to be streamed via the FPGA. The hardware required in the FPGA is written inthesisable behavioural Verilog. The work required for image capture is large and is tfore divided into two tasks.

2. Video GrabbingThe aim of this task is to accept video data (576 x 720 pixels) as would be output fromVideo Decode board, synchronise it to the FPGA clock, word align it, and extract theels required for the screen display (480 x 640 pixels). The resulting image, which iscoded using a colour space format known as YCbCr (describing the intensity and cdifference of pixels), has to be converted to the RGB (red, green, blue) format requirethe screen. Colour conversion is a two-stage process with the table look-up part incin the Video Grabbing task. The outputs from the look-ups are then passed to the IStoring task.

3. Image StoringThe aim of this task is to complete the conversion of the image to RGB format, genwrite requests and frame addresses to store the RGB image in the framestore, and trthe frame address into an actual memory address.Further details of the video grabbing and image storing tasks are in chapter 4.

CommunicationThis is concerned with the I2C communication required between the ARM, PIC andeo Decoder board; the ARM acts as a master device while the PIC and Video Deboard are slaves. The Video Decoder board has an I2C interface as does the PIC. Her, the ARM does not have an appropriate I2C interface so the ARM interface is plin the FPGA and an open source I2C master in synthesisable Verilog is provided toThis task has two identifiable components: software for the master processor and sofor the slave processor.

4. Master I2C CommunicationThe aim of this task is to design, implement and test software in the C programmingguage which will run on the ARM enabling the I2C master interface to initialise the VidDecoder board, and to send write and read requests to the PIC according to the acommunication protocol. Further details on this task are given in chapter 5.

5. Slave I2C CommunicationThe aim of this task is to write software in the C programming language which will enathe PIC to read write requests sent by the master and to respond to read requests frmaster for status information according to the agreed communication protocol. Fudetails of this task will be given out by the School of EEE.

6. Motor ControlThis comprises software written in the C programming language to drive the motothat the requested position originating from the ARM translates into the required mand hence camera movement. The x-axis is controlled by a stepper motor while the

5

29th January 2010

l of

m-inte-

is with a brushed DC motor. Further details of this task will be given out by the SchooEEE.

System IntegrationFollowing the completion of the individual tasks, these will be combined to form a coplete, operational system that can be demonstrated. More details on how to run thegrated system will be given in chapter 6 which will be handed out at a later date.

6

29th January 2010

r 2. Ofns, re-are as

lar de-ed to

ms in-

floorthere15:00

:00 on

ng anepeats.to be

nding

d your

onsibleupose.

ead-roupble 2.1.:

Chapter 2: Lab Organisation

Length:The lab is worth 20 credits and so equates to 200 hours of effort per person over semestethis, 101 hours are currently scheduled. Thus preparation outside this time for presentatioport writing and practical work is necessary and expected. The scheduled hours allocatedfollows:

The seminars and demonstrations will be assessed. In addition during the lab time, regusign reviews will be undertaken with a member of the academic staff. The talks are designboth set the context of the project and to give all students an understanding of the problevolved in the tasks being undertaken. All students are expected to attend these.

Location:In Computer Science, scheduled lab time is in the LF9 Engineering Lab on the lower first(Tootill Lab 0). Outside the scheduled hours you are also welcome to use this lab providedare machines available. The scheduled lab hours are 13:00 to 17:00 on Mondays, 13:00 toon Tuesdays and Fridays. In addition to the lab, EEE has scheduled a lecture slot at 10Tuesdays up at North Campus.

The Project:The project is to use the provided hardware platforms to implement a system for capturiimage, process it and then use this information to move the camera before the sequence rThere are a number of individual tasks involved in realising this aim and these then needcombined to provide the overall system. Thus this is a group project with groups correspoto their tutorial grouping in Computer Science.

Laboratory Notebooks:You are expected and strongly recommended to use these for recording the group’s anpart of the project as it progresses.

Equipment Security:In Computer Science, the Engineering Lab has open access. Students are therefore respfor their group’s iIMP equipment and thismustbe secured away when no member of the grois working on it. A shelf in a locked cupboard will be allocated to each group for this purp

Schedule:This is an incremental project so an extension system isnot operated. Nevertheless there’s alotof work in this project and so you will need to be disciplined in adhering to the scheduled dlines shown if you hope to complete the project and not let other members of your gdown.The proposed week by week schedule and assessment milestones are shown in Ta

Talks Seminars Demonstrations Lab Private Study Total

14 2 3 82 99 200

7

29th January 2010

der-led lab

Note: G refers to a group assessment mark while I is an individual assessment mark.

Table 2.1: COMP20592 Schedule

TalksAll students are expected to attendall the talks so that you each have an awareness and unstanding of the system tasks. Most talks are in weeks 1 and 2, usually during the schedutime. The schedule is shown in Table 2.2.

week Activity Assessment

1 Talks: Intro to system (1), Video systems & video cap-ture (3), I2C communications protocol (1), Motor opera-tion (1), Requirements analysis and system design (2)Practical: Read lab manual, partition tasks among group

-

2 Talks: I2C slave operation (1), Image processing (2),software and hardware development environments (2)Practical: High level design of task

3 Talk: Motor control (1)Practical: High level design of task.Seminar: High level design of task.

I seminar 10%

4 Practical: Low level implementation and testing.Report: on high level design task.

I report 15%

5 Practical: Low level implementation and testing.

6 Practical: Low level implementation and testing.Demonstration of progress to date.

I demo 15%



9 Demonstration: completion of task including its testing.Report: on low level designPractical: system integration.

I demo 15%I report 20%

10 Practical: system integration.

11 Practical: system integration.

12 System demonstrationGroup seminarGroup Report: overall system design, testing, evaluationProject debrief

G demo 7.5%G seminar 7.5%G report 10%

8

29th January 2010

pro-

inuousns andcar-

is giv-

sumeand ation

Table 2.2: Talks Schedule

A talk on System Integration is also likely to be required and this will be scheduled at an appriate time later in the semester.:Assessment:There is no examination at the end of the course. Instead, assessment will be contthroughout the project and based on a mixture of laboratory demonstrations, presentatiowritten reports. Most of the marks awarded will be an individual mark while the integrationries a group mark.

Details of the assessed work and the criteria for assessing the various project componentsen below:

SEMINARSa) Seminar Week 3: Individual mark worth 10% of final project mark.

This talk on the high level design of their task should be a general talk which does not asspecialist knowledge of the audience. It should cover the task problem, backgroundgeneral method of solution. Talks will be 10 minutes and followed by a 5 minute quessession.

date time topic building room lec-turer

Mon Feb 1st 13:00 introduction to project Kilburn LF17 lemb

Mon Feb 1st 14:00 video systems & image capture Kilburn LF17 jdg

Tues Feb 2nd 10:00 motor operation Sackville B16 daf

Tues Feb 2nd 13:00 video systems & image capture Kilburn LF17 jdg

Tues Feb 2nd 14:00 video systems & image capture Kilburn LF17 jdg

Thu Feb 4th 14:00 requirements analysis & systemdesign

Renold C16 png

Thu Feb 4th 15:00 requirements analysis & systemdesign

Renold C16 png

Fri Feb 5th 13:00 I2C communication protocol Kilburn LF17 lemb

Mon Feb 8th 13:00 image processing Kilburn LF17 lemb

Mon Feb 8th 14:00 image processing Kilburn LF17 lemb

Tues Feb 9th 10:00 I2C slave operation Sackville B16 png

Tues Feb 9th 13:00 development environments Kilburn LF17 jdg

Tues Feb 9th 14:00 development environments Kilburn LF17 jdg

Tues Feb 16th 10:00 motor control Renold E5 daf

9

29th January 2010

te andmin-

to theed in

work.

d the

ratestaken.learnt

ctory

your

b) Seminar Week 12: Group mark worth 7.5% of final project mark.This should summarise the group’s work done on the project, describe the current stadiscuss the lessons learnt! Groups will be allocated a 30 minute slot which includes 10utes for questions.

Seminars will be judged on the following factors and weighted as shown:Organization & content (clarity, relevance, flow, balance) 40%Oral communication (clarity, loudness, presence) 20%Visuals (clarity, tidiness, suitability) 20%Technical impression (response to questions, understanding) 20%

on a scale where:90% Outstanding first class 40% Bottom 3rd80% Good first class 30% Bottom Pass70% Borderline First 20% Fail60% Bottom 2-1 10% Bad Fail50% Bottom 2-2 0% Not even that

REPORTSa) Report Week 4: Individual mark worth 15% of final project mark.

This describes the high level design of your task and should comprise an introductiontask, your understanding of what you need to do and a description of the stages involvcompleting the task with timescales and some thoughts on testing and evaluating the

b) Report Week 9: Individual mark worth 20% of final project mark.This should concentrate on the implementation work that you’ve done on your task antesting that has been performed.

c) Report Week 12: Group mark worth 10% of final project mark.This briefly summarises the final state of the individual project components but concenton the integration work performed and the testing of the system that has been underThe report should also assess any failings in the final system and reflect on the lessonsfrom the work undertaken.

Reports should comprise the following sections which are weighted as shown:Introduction 15%Design 20%Implementation 20%Testing/Results 10%Evaluation 10%Reflection/Conclusions 10%Report Presentation 15%

The scale used is the same as that given for the seminars.

DEMONSTRATIONSa) Demo Week 6: Individual mark worth 15% of the final project mark.

This work-in-progress demonstration acts as a check that you are making satisfaprogress with your task.

b) Demo Week 9: Individual mark worth 15% of final project mark.This demo is for you to communicate the achievements of implementing and testingtask.

10

29th January 2010

m by

s ford 15l con-

ehen-

c) Demo Week 12: Group mark worth 7.5% of final project mark.This demo is for group to communicate their achievements in implementing the systeintegrating its individual components.

Individual demos will be 15 minutes with 10 minutes for the demonstration and 5 minutequestions. The group demo will be 30 minutes with 15 minutes for the demonstration anminutes for questions. The marks awarded will be based on the impression of the technicatent of the work, and on the ability to organize and communicate that content in a comprsible way.

Demonstrations will be judged on the following factors and weighted as shown:Clearness of the Context 20%Appropriate Content 20%Demo Worked Well 20%Lot of Work Done 20%High Technical Difficulty 20%

Again the scale used is that given for the seminars.

11

29th January 2010

menteffectin the

ng

box

ject in

magesage.r ex-

ct ap-ameramage.

whatnd toter ofaddedrea-

proach

lative-eter-

Chapter 3: Image Processing Task

3.1 Task SummaryThis task comprises writing software in C and runs under a software emulation environwhere the framestore holding the image is attached to a virtual screen. This enables theof each operation below to be checked for correctness by writing the image processed byARM back to the framestore after each operation so that the effect is displayed.

A summary of the stages in this task is:• read a full size image from the framestore and scale it to one quarter size• process the (scaled) image so as to convert it to a black and white image by performi

operations such as averaging and thresholding.• prepare the black and white image for object detection by clipping and noise removal• locate each object within the image by constructing a dynamically adjustable bounding

and measure the object area.• select the largest object and compute the camera position to place the centre of this ob

the centre of the image.• the developed software should be tested out on a variety of images.

3.2 IntroductionThis chapter describes possible approaches that you might adopt in processing colour iheld in the frame store. The essential task is to locate the position of object(s) within an imThis information can then be used to perform some specific operation on the object(s), foample computing a vertical/horizontal camera position so that the largest/smallest objepears at the centre of the camera image. A more advanced application might compute cpositions so that a (slowly) moving object always appeared in the centre of the camera i

It is surprisingly difficult to locate an object within an image because you need to decideis object and what is background. Often, particularly at the edges of an object, edges temerge into the background rather than having very distinct edges. In addition the perimean object may appear somewhat jagged. To this imperfect picture, there is varying noiseto or subtracted from the genuine image (often due to differing light reflections). For thesesons, the student assigned to the image processing task needs to be flexible in their apand be prepared to experiment.

3.3 Neighbourhood SetVarious operations, such as averaging or pixel noise removal can be performed using a rely simple set of operations which just concern the pixel and its eight neighbour pixels to d

12

29th January 2010

umn

ugh themagerenceEd-reop-

ects

pixelntationimagewhitee red,entinge andFFF).

your

mine the pixel’s new value. The neighbourhood set for the pixel on row/line j at position/coli is shown in Figure 3.1.

Such operations can be considered local as they can be computed in a single pass throarray (frame). An excellent description of useful operations that can be performed on an iwith examples of their effects can be found within the Hypermedia Image Processing Refe(HIPR) website developed by the Department of Artificial Intelligence at the University ofinburgh. It can be found from:http://homepages.inf.ed.ac.uk/rbf/HIPR2/index.htm You astrongly advised to look at this material and in particular the digital filter and thresholdingerations. Don’t be afraid to experiment with modifying the algorithms given to get the effyou need!

3.4 Pixel RepresentationA pixel occupies two bytes of storage describing the red, green and blue components of theas shown in Figure 3.2. The red, green and blue components each have a 5 bit represeleaving one unused don’t-care bit in the 16-bits. This complicates the processing as theprocessing operations need to be applied to all three colour components. With a black andimage it is much easier to differentiate between objects and background. Black is when thblue and green components are all 0, making the pixel content 0x0000. With black represthe image background, white represents the objects. White is obtained when the red, blugreen components are a maximum (all ‘1’s) and so can be represented by 0x7FFF (or 0xF

So at some point in the image processing procedure, you will find it convenient to convertcoloured image to a black and white representation for the final processing stages.

3.5 Bounding Boxes

[j-1][i-1] [j-1][i] [j-1][i+1]

[j][i-1] [j][i] [j][i+1]

[j+1][i-1] [j+1][i] [j+1][i+1]

Figure 3.1: A Pixel and its Neighbourhood

red green bluex

don’t care

15 14 10 9 5 4 0

Figure 3.2: Pixel Organisation Across Two Bytes

13

29th January 2010

com-roach

y andfromct isobject

width

uct a

e larg-

Figure 3.3 illustrates an image with a single object where you want to locate the object andpute a row and column shift so that object in the centre of the image. The most sensible app

is to construct a bounding box around the object which marks the start and end verticallhorizontally of pixels which are not background pixels. The area can then be computedsumming the object pixels within the bounding box. Assuming that the centre of the objethe mid-point of the start and end measurement the required shifts dx and dy to centre thecan then be computed as:

dx = width / 2 - (end_x + start_x) / 2dy = height / 2 - (end_y + start_y) / 2

where end_x, start_x, end_y, start_y mark the columns and rows of the bounding box, andand height are the image width and height (in pixels).

Extending this to multiple objects is illustrated in Figure 3.4. Now it is necessary to constr

bounding box around each object, compute the area of each, select the bounding box of th

Figure 3.3: Example of Bounding Box for Single Object

Figure 3.4: Example of Multiple Object Detection

14

29th January 2010

at ine ob-to in-ox be

nding

ld in aixels.by 320s soeduced

150Ktrans-ftware

ationard.ing

t intopro-andcor-

d (al-the

effectabless onlyisplayedpixelure

est object and then compute the dx and dy shift required for this object. You will note thFigure 3.4 the bounding box for each object is distinct. If bounding boxes overlap then thjects will be detected as a single object, as the objects can’t be separated. This will leadcorrect processing of the image. Therefore it is recommended that an object’s bounding bseparated by at least two pixels in all directions from their nearest neighbour object boubox.

3.6 Displaying ImagesThe image displayed on the virtual screen or hardware monitor relates to the contents heparticular area of framestore memory. The screen/monitor display is 480 lines by 640 pHowever, because the actual image stored in framestore memory is reduced to 240 linespixels, each image pixel is duplicated both horizontally and vertically for display purposeas to occupy the whole screen. Note that framestore memory requirements have been rat the price of decreased image resolution.

Since a pixel occupies two bytes of storage, an image 240 lines by 320 pixels requiresbytes of memory. To process the image in RAM memory, the image/frame needs to beferred into local memory. Two environments are used in the image processing task: a soemulation environment and the iIMAP hardware.

In the emulation environment the local memory is a two-dimensional array on your workstwhile on the actual hardware, it is the local memory of the ARM on the ARM processor boTo display a processed image in local memory, it has to be written back to the RAM formthe framestore memory. C functions to read a frame (from framestore memory and put ilocal memory) and to write a frame (from the local memory to the framestore memory) arevided to assist with this transfer of 150K bytes of image data. You will first develop, testdebug your image processing software in the emulation environment; when functionallyrect, the software should transfer and run in an identical way on the ARM processor boarthough the compilation process to get object code will be different). A simple view ofdevelopment environment and hardware platform is shown in Figure 3.5.

To aid with the processing of the image at various stages and have the ability to view theof the processing operations, it is helpful to further scale the image by a factor of 4. This enthe image to be reduced to 120 lines by 160 pixels, i.e. 37.5K bytes of data and since thioccupies a quarter of the screen, another three other stages of the processing can be din the other three quadrants. Using an array frame[240][320] to hold the four quadrants,[j][i] where 0 ≤ j ≤ 119 and 0≤ i ≤ 159, in the four quadrants is accessed as shown in Fig

Workstation/ARM processorwith local memory

RAM(Framestore)

virtualscreen/

monitor

Figure 3.5: Image Processing Development

15

29th January 2010

rants)

leftf the

estore.

proc-on-

space.tions,ifting

downh outuses it

e ninegreen

3.6. Remember that to display the array you need to write the frame array (i.e. all 4 quadin local memory back to the framestore memory.

Assuming an original image of 240 lines by 320 pixels, scaling of the image into the topquadrant can be performed with the following C code, which also illustrates the use oread_frame and write_frame functions:

3.7 Processing an Image in Local MemoryThis section details some of the processing steps you could perform on an image in a framIt is by no means the only approach you could adopt.

As previously stated, a full image is 240 lines by 320 pixels and occupies 150K bytes of aessor’s local memory. As the ARM’s local memory is only 159K bytes, you need to write ccise C programs and be economical in your use of variables otherwise you may run out ofFurthermore, the lack of space prevents the use of the usual C library of mathematical funcso the use of multiplication and division should be avoided and addition/subtraction and shused instead.

3.7.1 Mean filterHaving read the image from the framestore memory into the local memory and scaled itso as to occupy the top left quadrant, applying a simple low pass filter operation will smootspikes that appear on the edges. A mean filter takes each pixel in the reduced image andand the neighbourhood set depicted in Figure 3.1 to compute an average intensity for thpixels. This of course needs to be done for each colour component to obtain a new blue,

frame[j][i] frame[j][i+160]

frame[j+120][i] frame[j+120][i+160]

Figure 3.6: Partitioning the Frame Array so as to Display 4 Images

#define SCALE 2 /* defines scale factor of 2 */

ushort local_frame[240][320]; //local store declarationuint x, y;

read_frame(local_frame); //reads from framestore into local memoryfor (y = 0; y < 240; y = y + SCALE)for (x = 0; x < 320; x = x + SCALE)

{local_frame[y>>1][x>>1] = local_frame[y][x]; /* >>1 does SCALE/2 */}

write_frame(local_frame); // writes local memory to framestore

16

29th January 2010

o seebtrac-

andr all

holdr ob-

o thealuesd 16

erestent

ixels

;

and red component intensity for the pixel. This can then be written to another quadrant tthe effect on the image. Remember that division should be performed as shifting and/or sution. So for example, the red component of the smoothed image might be found using:

3.7.2 ThresholdAfter filtering, it is probably useful to threshold the image in order to convert it to a blackwhite image. Assuming a white background for your object(s), this has a high intensity fothree colour components, suggesting the use of thresholding of the form:

to make the background black with objects indicated by solid white. The choice of thresvalues for the red, green and blue components is critical to the ‘sensible’ conversion of youject(s) to a black and white image. Furthermore, the image is likely to be quite sensitive tthreshold value. So for example, if light conditions on the image are altered, the threshold vmay well require adjustment. With maximum intensity values of 31, threshold values arounfor each colour are probably a good starting point.

3.7.3 Noise RemovalYou will probably have some boundary noise surrounding the black and white image of intand this can be removed by suitably clipping the picture. You will probably need to experimin order to remove those edge parts of the image you don’t want. In addition, removing p

//simplistic algorithm for red component!//ignores pixels round edgefor (j = 1; j < 119; j++)for (i =1; i < 159; i++){intensityr = 0;for (dy=-1; dy < 2; dy++)for (dx = -1; dx < 2; dx++)

intensityr += (local_frame[j+dy][i+dx]>>10) & 0x001Fintensityr = intensityr>>3;//divide by 8 easier to do than divide by 9if (intensityr > 0x001F) intensityr = 0x001F;}:

local_frame[j][i+160] = intensityr<<10 | intensityg<<5 | intensityb

if (intensityr<threshr || intensityg<threshg || intensityb<threshb) frame[j+120][i]=0x7FFFlocal_frame = 0xFFFF;else local_frame[j+120][i] = 0;

17

29th January 2010

ner-

to. Theplaced

tion is. Theiallythe

pix--1 tos newndingoes

ationming

n thee the

e larg-rowin.

ith a 4t areaage

which are mainly surrounded by black pixels will remove many pixels which have been geated by noise. Again, experimentation is required but the general form is:

A value for n of 4 or 5 is a good starting point.

3.7.4 Object Location and Area ComputationThis is the hardest stage and iterative rather than the single pass operations used hitherapproach described proceeds on the basis that a distinct rectangular bounding box can bearound the object that does not intersect with any other bounding box. The other assumpthat the object to be detected is a single solid colour, i.e. all pixels of the object are whitealgorithm begins with a small bounding box, say 4 by 4 pixels which moves sequentthrough the array until white pixels are detected within the box. At this point, it computesstart and finish column (xmin and xmax) and start and finish row (ymin and ymax) of whiteels in the box and then adjusts the bounding box size to be ymin-1 to ymax+1, xminxmax+1. The process then repeats, locating the start and finish columns and rows in thibounding box to compute a new bounding box etc. The process continues with the boubox continually and gradually adjusting to the white pixels within it until the bounding box dnot change over two iterations.

At this point, the bounding box is defined around the object and using the bounding box locdefined by its start and finish column and row values, the area can be computed by sumthe white pixels within it. If this area is greater than the currently saved largest object thebounding box for the latest detected object overwrites the currently saved object to becomcurrently saved largest object. Regardless of whether or not the latest detected object is thest so far in the image, it is removed from the image using its bounding box column andvalues to make all pixels within it black; this prevents the object from being detected aga

Having removed the detected object, a search for the next object in the image resumes wby 4 bounding box. In this way, all objects in the image should be detected and the largesobject and its location identified enabling the pixel movement to centre the object in the imto be computed.

for (j = 1; j < 119; j++)for (i =1; i< 159; i++){if (j<y1 || j>y2 || i<x1 || i>x2) local_frame[j+120][i+160] = 0; //black

// y1 and y2 are the row clipping values; x1 and x2 are column clipping valueselse

{sum = 0;for (dy = -1; dy < 2; dy++)

for (dx = -1; dx < 2; dx++)if (local_frame[j+dy+120][i+dx] == 0x7FFF) sum += 1;

if (sum >= n) local_frame[j+120][i+160] = 0x7FFF; // whiteelse local_frame[j+120][i+160] = 0; //black

// gets rid of pixels not surrounded by at least n white neighbours}

}

18

29th January 2010

bjecto-rgest

is shifttness.n that

t ob-. Thisposi-o ap-luese.g. a ym thee in-. Thevia themayn thes (withto centre

ld bent; as-roundblack

ior toh pointon thepixelbel

pixel

In order to see if this algorithm is working correctly, it is suggested that at the start of the olocation and area computation stage youcopythe image to another quadrant. You can then lcate objects on the black and white image, remove them as you find them, identify the laobject present and compute the shift needed to centre this object. You can then apply thto the copied image, writing the shifted image back to another quadrant to check its correcWhen running on the hardware of course, it will be necessary to specify the camera positiois required rather than the pixel shift.

When you are satisfied that your algorithms are correctly identifying and shifting the largesject correctly, the x and y position of the camera has to be computed in the final systemwill require that the hardware is calibrated to convert from pixel movements to camerations; this can only be undertaken during system integration. More details about how tproach the calibration will be given out later. However, it may be helpful to know that the vato be passed to the PIC comprise an x and y camera position using a scale of .1 degree,value of +600 sent by the ARM specifies an upwards camera position of 60 degrees frocentral horizontal position; a negative y indicates a downwards position. A positive x valudicates a clockwise horizontal camera position while a negative x is counterclockwisecomputed x and y camera position needs to be placed in variables and will be transferredI2C Master to the PIC by the student undertaking the I2C master communication task. Itbe helpful to know that a pixel movement of 5 translates to (approx) 1 degree movement ocamera. At the PIC end, the camera position requests are translated into motor movementon the stepper side 2/3 steps per degree of camera movement) to move the camera so asthe image.

3.7.5 Alternative Image Processing ApproachesA possible alternative approach to the filtering and thresholding operations described wouto compute an (approximate) average intensity across the image for each colour componesuming that there is more area covered by the background than object, this yields a backgintensity. The image is then re-scanned with pixels close to the background being set toand the others to white.

You may wish to consider object edge detection as an additional step after filtering and prthresholding. This uses a weighted neighbourhood set to compute a colour gradient at eacso as to accentuate intensity changes. You can find details of different edge detectorsHIPR website. A Sobel edge detector is often used because it only requires multiplyingvalues by±1 or±2 which can be easily done by addition/subtraction and/or shifting. The Soweights commonly used on the neighbourhood set are shown in Figure 3.7.

These compute a vertical and horizontal gradient component for each pixel, with the finalgradient taken as:

-1 0 +1 +1 +2 +1

-2 0 +2 0 0 0

-1 0 +1 -1 -2 -1

Gx Gy

Figure 3.7: Weighted Neighbourhood Set Using Sobel Edge Detection

19

29th January 2010

agethusfol-mn

m the

envi-creen.them run-

n thenually

dis-ded.o-n, thisis not

e thethent)

oard’t use

ou

G[j][i] = |Gx[j][i]| + |Gy[j][i]|

Thresholding to obtain a black and white image following edge detection will yield an imwith the edges highlighted but the inner parts of an object exhibit no gradient change andwill appear black. In this case, locating an object will require finding an edge pixel and thenlowing the outside boundary pixel by pixel to find the maximum and minimum row and coluboundary for the object. Again, comparison with the largest saved object and removal froimage is required as before.

3.8 Development3.8.1 C ProgramThe initial development of the C program to process an image is on a software emulationronment which opens up the Komodo simulator and debugger and also starts the virtual sThe C program is compiled into ARM assembly code which will run on Komodo, whilescreen enables images to be loaded and the results of changes to the image resulting froning the code to be viewed.

In addition to theread_frame(local_frame)andwrite_frame(local_frame)functions provided,the other C functions provided are:freeze_frame();unfreeze_frame();write_LEDS((ushort)0x<four hex bits>); //e.g. write_LEDS((ushort)0x0004) lights lamp 2.

// One hot decode on bits 5:0, drives 6LEDsThe frame read from the framestore into local store is the frame that is being displayed oscreen. On the hardware platform, the frame displayed will change as frames are contiread from the camera and stored. Thus in the hardware platform,freeze_frame()prevents theframe being displayed from moving on. However, on the emulation since only one frame isplayed,freeze_frame()is used to invoke the display screen to request that an image is loaOn the hardware platform,unfreeze_frame()has the effect of releasing the display frame allcation so that the area in store holding the display frame is free to change. In the emulatiocommand results in no action although you are advised to include it in your code so that itforgotten when the code transfers to the hardware platform!

Thewrite_LEDS((ushort)0x<four hex bits>)function is a useful monitoring facility allowing aquick visual check on the progress of the program. In emulation, the LEDs referred to arsix “lights” bottom right on the display screen with the least significant light rightmost. Onhardware platform, the LEDs refer to the six ‘traffic light’ LEDs numbered 1 (least significato 6 (most significant) mounted on the processor board.The provided C functions are in a library calledgcc_suppoprt/emul.hand should be included inyour C program at the start of your program, as shown below:

Note thatno otherinclude file must be used. This is because the ARM on the processor bdoesn’t have the underlying components needed to support such functions, i.e. you canfunctions like printf(), malloc(), free() etc. Furthermore, if you need to multiply or divide, yshould use shift left/right with addition/subtraction to achieve this.

#include <stdio.h>#include "emul.h"

20

29th January 2010

ow:

ate ap-

it by

An example of the possible use of the provided functions in your C program is given bel

3.8.2 Compilation into Executable CodeHaving written your C program, you now need to compile it into object code and then cre.elf file which can be run on Komodo. To compile, in a shell window perform the following oerations:1. From your COMP20592 directory, create a new directory for your work and attach to

typing:mkdir <dir name>cd <dir name>

2. Set a path which points to the ARM development tools by typing:PATH=$PATH:/home/cadtools5/gnuarm-3.4.3/bin

3. Type:arm-elf-gcc -c $COMP20592/gcc-support/init.s -o init.oto create an object fileinit.o from the ARM assembler code fileinit.s which is required toinitialise the ARM state and its memory space.

#include <stdio.h>#include "emul.h"

#define SCALE 2#define WHITE 0x7FFF#define BLACK 0x0000

int main(argc, argv)int argc;char *argv[];ushort local_frame[240][320];{

:unfreeze_frame(); /* Await new frame */while (TRUE){freeze_frame();

write_LEDS((ushort)0x0001);

read_frame(local_frame); /* Copy image into array */write_LEDS((ushort)0x0002);

/* scale frame and process image as required */write_frame(local_frame);

unfreeze_frame();

}return(0);}

21

29th January 2010

.o

.elfbeof.

ob-

n

tart of

emu-moreo thiswill

4. Compile the library of image processing functions (read_frame etc.)emul.swritten in ARMassembler into an object fileemul.oby typing:arm-elf-gcc -c $COMP20592/gcc_support/emul.s -o emul.o-o specifies that the output of this command isemul.o-c specifies compile but don’t link

5. To compile your C program<name of prog>.cinto object code, type:arm-elf-gcc -I $COMP20592/gcc_support -c -o3 <name of prog>.c -o <name of prog>-I search for other include files while-o3 specifies an optimisation level

6. The different object code files now have to be linked by typing:arm-elf-ld -T $COMP20592/gcc-support/lscr init.o emul.o <name of prog>.o -o <name of prog>This produces a.elf file which Komodo can run and the simulation environment can nowopened. -T says use a link script (lscr). The link script defines where the various piecesobject code are to be placed in memory. Note that all the object files need to be listed

3.8.3 Emulation Environment1. In a terminal window, start Komodo and the screen by typing

new_start_komodo -v 20592 &-v opens up the screen as well as Komodo.

2. In the Komodo window, use the browse button for Load to select your elf file. (You’ll prably have to double click on the directory button in theSelect Source Filewindow to displayall the files in your COMP20592 directory). PressOK to load it into Load line of the Komodowindow.

3. PressLoad in the Komodo window to load the elf code into the emulator.4. In the Komodo window, pressResetandRun5. The program stops running when it reachesfreeze_frame()with the screen requesting that a

image be loaded. On the screen, selectLoad Imagetop left. In theLoad Imagewindow whichcomes up, select a bitmap file (extension.bmp) and pressOpento bring it up on the screen.You may findmulti-circle.bmpin the COMP20592 directory a useful image to try out.

6. The program runs executing the program and if structured as above will return to the sthe while(TRUE) statement and will freeze requesting that an image be loaded etc.

7. Exit from the emulation environment by pressingQuit on the screen.

You should follow the guidelines above until your program executes successfully underlation. On the hardware platform, the procedure to run your program on the ARM board isinvolved as features of the board need to be taken into account. Instructions on how to dand how to approach integrating your C code with that for the I2C master communicationbe handed out later.

22

29th January 2010

ormsis re-o theirodel

e vid-mentalh stage

in twotask

ckets

look-

h the

plied

rrently

with

the usederf ac-uired

com-com-

The Y

Chapter 4: Video Capture Task

4.1 Tasks SummaryVideo Capture takes the digital YCbCr output image from the Video Decoder board and fa request to memory to store the image data in the RGB format required by the iIMP. Thquires that hardware be designed to perform this and the students assigned to this will ddesign as synthesisable behavioural Verilog at the RTL (Register Transfer Level). This mwill be simulated under the Cadence CAD environment. The interfaces for each stage of theo capture processing are defined with test benches provided to test each stage. An increapproach to testing must be adopted with students using the provided test benches at eacto verify correct behaviour before proceeding to test the next stage.

As previously stated, Video Capture is a large task and subsequently has been dividedwith Image Storing following on from Video Grabbing. A summary of the stages for eachis given below:Video Grabbing• capture the bytes sent by the Video Decoder board and assemble them into 32-bit pa• synchronise the 32-bit packets to the receiving clock (of the FPGA)• extract the YCbCr image and the frame control signals• compute the table look-up contents for the colour conversion to RGB and perform the

up; the look-up table outputs pass to the Image Storing task• test the units using the supplied test benches• combine the units with the Image Storing task units and test Video Capture System wit

supplied test bench

Image Storing• complete the conversion to an RGB format by combining the table look-up outputs sup

by the Video Grabbing task.• buffer up pixel data and compute frame address to form a write request to memory• translate the frame address into a real memory address depending on the frame cu

being written to• test the units using the supplied test benches• combine the units with the Video Grabbing task units and test Video Capture System

the supplied test bench

4.2 Background information.4.2.1 Video Decoder Board.In order to convert the analogue video data coming from the camera, the system requiresof a Video Decoder board that will convert this data into a digital form. The Video Decoboard used in the iIMP is the VDEC1 board produced by Digilent. This board is capable ocepting many different analogue signals and converting them into the BT.656 format reqby the iIMP system. The board is configurable through an I2C interface.

4.2.2 YCbCr Colour SpaceYCbCr is a colour space in which the brightness and colour are encoded separately. The Yponent represents the brightness information, Cb is the blue component and Cr is the redponent. Cg is the green component, which can be calculated from the Cb and Cr values.

23

29th January 2010

more 4.1

ationo col-a byte

threepixel

B val-

CbCrent inndardeans

nking.656

areash sec-ntalizontal

ideotrolkingd-even

component is normally stored at a higher accuracy due to the human visual system beingresponsive to it. The YCbCr format from the Video Decoder board is as shown in Figure

During the active part of the image, 32-bits is used to hold the brightness and colour informof two pixels. The Cr and Cb values give the average red and blue components for the twours while each pixel has its own Y brightness value. The Video Decoder board sends thisat a time starting with Cb.

4.2.3 RGB (Red Green Blue) Colour Space.RGB is a colour space in which a colour is represented by the additive combination of theprimary colours: red, green and blue. In the iIMP, 5 bits are used for each colour so that arequires two bytes of storage (leaving bit 15 at the most significant end spare) and the RGues are each between 0 and 63.

4.2.4 BT.656 FormatBT.656 is a digital video protocol that defines the video data as interlaced data in the Ycolour format. This means that an image frame is sent as two fields with the odd lines sthe first field and the even lines in the second field. The Video Decoder board uses this stato output its video data. The format the Video Decoder board uses is called 625/50, which mthat there are 625 lines per frame, and that the frame rate is 50Hz. After removing the blainformation, the frame size becomes 720 pixels by 576 lines. For more details of the BTstandard than given in this section, seewww.intersil.com/data/an/an9728.pdf.

The frame that is sent by the Video Decoder board is shown in Figure 4.2. The shadedrepresent blanking information. The vertical numbers show the number of lines that eaction is, together with the state of the vertical blanking bit (V) during the section. The horizonumber represent how many bytes long each section is, together with the state of the horblanking bit (H).

At the start of each line of the image, the Video Decoder board sends a Start of Active V(SAV) control packet and at the end of each image line an End of Active Video (EAV) conpacket. These 32-bit control packets include a horizontal blanking bit (H), a vertical blanbit (V) and a field (F) bit. The field bit indicates which field the VDEC board is currently sening: F is 0 during field 1 where the odd lines are being sent, and 1 during field 2 where thelines are sent.

Figure 4.1: YCbCr Format

Y1 Cr Y2 Cb

31 24 23 16 15 8 7 0

24

29th January 2010

teing

The 32-bit control packet is sent in the format 0xδδ0000FF, where 0xFF is sent as the first byand 0xδδ represents the control byte. The format of the control byte is shown in the followtable:

Table 4.1: Format of the Control Byte

Bit Name Description

7 1 always 1

6 Field 0 = Field 1; 1 = Field 2

5 Vertical Blanking 1 on entering vertical blanking; 0 whenleaving vertical blanking

4 Horizontal Blanking 1 on entering horizontal blanking; 0when leaving horizontal blanking

3 Parity bit 3 V⊕H

2 Parity bit 2 F⊕H

1 Parity bit 1 F⊕V

0 Parity bit 0 F⊕V⊕H

Figure 4.2: Frame Composition Sent by Video Decoder Board

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

22

1

25

288

288Field 1

Field 2

(H=1) (H=0)

(V=1)

(V=1)

(V=1)

(V=0)

(V=0)

288

SAV

1440 bytes

EAV

F = 0

F = 1

25

29th January 2010

ot in

goes’

storeo De-dis-

y the

When in blanking, the Video Decoder board outputs the word: 0x10801080. Thus when nthe vertical blanking region, a line and horizontal blanking are as shown in Figure 4.3.

The vertical blanking signal, V, sent in the control packets should be zero at this time. Vto ‘1’ on the first SAV (Start of Active Video) control word received in a field and will be ‘0on the final EAV (End of active video) control word for a field as shown in Figure 4.4.

4.2.5 Colour Space Conversion.As the system displays its output on a virtual screen or VDU monitor, the system needs tothe image data in the RGB colour space format. This means that the data from the Videcoder board will have to be converted from YCbCr into RGB in order for the image to beplayed on the monitor. This conversion is in hardware.

The following formulae are used to do the conversion from the YCbCr values produced bcamera to the RGB format required by the monitor.

start of line

start ofline data

end of line datai.e. start of next line

F 0 0 δ 8 1 8 1 8 1 F 0 0δ C Y C Y C Y C Y C Y F 0F 0 0 δ 0 0 0 0 0 0 F 0 0δ b r b r r F 0

H horizontal blanking

EAV code EAV code SAV code

4 280 4 1440 bytes

blanking line data

Figure 4.3: Organisation of Line Data

time

SAV EAV

blanking

SAVline 1 line 1 line 2

SAVline 288

line data blanking

EAVline 288(end of

field)

vertical blanking V

line data

H horizontal blanking

Figure 4.4: Timing of Vertical Blanking Signal

field)(start of

time

26

29th January 2010

ders,blesmingera-

onee re-

manys ac-

t RGB

fromlisedth Y,mon-Thusalised

alisedThe

o 255.city5 carry, Cb,

R = 0.272(Y - 16) + 0.386(Cr - 128)G = 0.272(Y - 16) - 0.094(Cb - 128) - 0.193(Cr - 128)

B = 0.272(Y - 16) + 0.482(Cb - 128)To do these calculations using multipliers and adders would require 5 multipliers and 7 adwhich is too much logic. If the multiplications are pre-computed and placed in look up tathen it is only necessary to combine the look-up table results by adding and shifting. Assulook-up tables for the inputs Y, Cb and Cr, then this results in the following combining options being done:

R = y_table[Y] + (cr_table[Cr] << 1)G = y_table[Y] - (b0_table[Cb] + cr_table[Cr])

B = y_table[Y] + b1_table[Cb]As one of the Cr multipliers is almost twice the other, only one table is being used to givevalue which is then shifted 1 bit left to get the other value. Note that two look-up tables arquired for the Cb value.

An explanation of how the equations above have been arrived at now follows. There areequations given on the web and in books to convert from YCbCr to RGB, and the equationtually needed depend on the value range of both the input YCbCr stream and the outpustream. The equations used are based on the Julien equations given at:http://www.fourcc.org/fccyvrgb.php

However, the Julien equationsRnormalised = Ynormalised + 1.403Crnormalised

Gnormalised = Ynormalised - 0.344Cbnormalised - 0.714CrnormalisedBnormalised= Ynormalised + 1.77Cbnormalised

relate to producing normalised RGB output (i.e. outputs for R, G and B between 0 and 1)normalised YCbCr inputs, where a normalised Y input is in the range 0 to 1 and the normarange for Cb and Cr is -0.5 to +0.5. Unfortunately, the camera output is denormalised wiCb and Cr having values from 0 to 255. Furthermore, the R, G and B inputs required by theitor are denormalised with each occupying 5-bits in memory and so have values 0 to 63.the normalised equations above need to be modified so as to first normalise the denormcamera output. This enables conversion to a normalised RGB format and finally the normRGB can then be converted to denormalised RGB for the monitor by multiplying by 64.transformations required are shown in Figure 4.5.

The denormalised values for Y, Cb and Cr produced by the camera are in the range 0 tHowever, the range for Y is 16 to 235 and that for Cb and Cr is 16 to 239 which for simpliis treated as the same range as Y, i.e. 16 to 235. The numbers outside the range 16 to 23other information for systems which use this format. So to normalise the denormalised YCr values from the camera, the following transformations are used:

Ynormalised = (Y-16)/235Cbnormalised = (Cb -128)/235Crnormalised = (Cr -128)/235

YCbCrnormalise YCbCr

denormalised

camera

convert tonormalised RGB

denormalise RGB

Figure 4.5: Converting YCbCr to RGB

27

29th January 2010

l-

mal-

olouras

5 un-

of thee data,men-es ise iIMPto two

d con-red inby theVideod ex-

. Thees theta intodata,ress.

ord 0.ry Ar-om the

Substituting for Ynormalised, Cbnormalisedand Crnormalisedinto the Julien equations gives normaised R, G and B values of

Rnormalised = (Y-16)/235 + 1.403(Cr -128)/235Gnormalised = (Y-16)/235 - 0.344(Cb -128)/235 - 0.714(Cr -128)/235

Bnormalised= (Y-16)/235 + 1.77(Cb -128)/235

Finally having got the normalised R, G and B values, multiplying these by 64 gives denorised R, G and B values in the range 0 to 63.

R = 64((Y-16)/235 + 1.403(Cr -128)/235) = 0.272(Y - 16) + 0.386(Cr - 128)G = 64((Y-16)/235 - 0.344(Cb -128)/235 - 0.714(Cr -128)/235)

= 0.272(Y - 16) - 0.094(Cb - 128) - 0.193(Cr - 128)B = 64((Y-16)/235 + 1.77(Cb -128)/235) = 0.272(Y - 16) + 0.482(Cb - 128)

4.2.6 Storage Format.The data for the system will be stored in the frame store in the RGB colour space at 15bit cdepth at 320 by 240 resolution. The format of each pixel is XRRR RRGG GGGB BBBBshown in Figure 4.6. Five bits are allocated to each colour (red, green, blue) leaving bit 1used (X). Thus black is 0x0000 and white is 0x7FFF (or 0xFFFF).

The video data is stored at a lower resolution and colour depth because of the limitationbandwidth and space available in the frame store. There are various other ways to store the.g. a higher resolution with a lower colour depth, e.g 640 by 480 by 8 bits, but after experitation, 320 by 240 by 15 bits was found to give the best image quality. The reduction in linmost easily achieved by treating each field as a separate frame and this is adopted in thsystem. Furthermore it eases the processing required. Mapping the 15 bits for each pixelbytes, each 320 by 240 image occupies 150Kbytes.

4.3 Video Capture System4.3.1 IntroductionThe Video Capture system is the unit that takes the data from the Video Decoder board anverts into the RGB format that the iIMP system uses and then requests for this to be stothe memory. It consists of a five stage pipeline. The first stage takes the 8 bit data sentVideo Decoder board and assembles it into a 32 bit word; this stage operates with theDecoder board clock. The second unit synchronises the data onto the FPGA’s clock, antracts the control and colour information, passing the colour information to the next stagethird stage performs look-ups based on the value of the current pixel. The fourth stage takresult from the previous stage and performs the necessary calculations to convert the dathe red green blue (RGB) format required by the Screen. The final unit then takes this RGBpackages it and sends it off to the memory for storing along with the relevant memory add

The address to memory is the word address within a frame that is assumed to start at wThus while the data and the Request signal for a memory access go directly to the memobiter to compete against all other memory requests for a memory access, the address fr

red green bluex

don’t care

15 14 10 9 5 4 0

Figure 4.6: Pixel Organisation Across Two Bytes

28

29th January 2010

is per-pturethese

de-

e thek.

gisteris re-fer in-

g sec-

Video Capture system requires translation to the real address in memory. This translationformed by a Frame Allocator which holds the current memory area used by the Video Casystem and the VDU Controller (and ARM) and determines the new memory area thatunits will use when their current frame completes. The design of the Frame Allocator isscribed in section 4.4.Figure 4.7 is a diagram of the video capture system:

The vid_sync, vid_decoder and vid_lookup stages form the Video Grabbing task, whilvid_add, vid_address and Frame Allocator comprise the stages of the Image Storing tas

As is usual in a pipeline, a new output of each stage is clocked into the stage’s output reon each positive clock edge and a request signal forwarded to the next stage if an actionquired. The vid_sync to vid_decoder also uses an acknowledge signal as this data transvolves moving from one clock region to another.

The operations that each of these units needs to perform is now described in the followintions:

4.3.2 vid_sync

Figure 4.7: The Components of the Video Capture System

clk (27MHz) data(7:0)

req ack data(31:0)

control req YCbCrsignals

control req lookupsignals results

control req RGBsignals

req ack address data(31:0)

vid_sync

vid_decoder

vid_lookup

vid_add

vid_address

clk (25MHZ)

29

29th January 2010

it intoivesto ther unit,

untilpacketould

bytes

theoderwhenta and

houlda. Theknowl-

n Ac-nransferises itsit cap-e ands out-receiv-

This unit is responsible for taking the data from the Video Decoder board and assembling32 bit words. To do this it should buffer up the bytes until it receives the fourth. As it recethe fourth byte it should copy the 32 bit word to its output registers and raise a requestvid_decoder unit. When the vid_sync unit receives an acknowledge from the vid_decodeit should remove its request (i.e. the request line returns to ‘0’).

When the system starts up, the unit does not know which byte it is currently receiving, soa control packet is detected, received bytes are always treated as being byte 0. A controlhas its first byte = 0xFF while the second and third are 0x00. When 0xFF is received, it shbe placed in byte 0 and if the following two bytes are 0x00 then they should be placed in1 and 2 with the assembling of 32-bits continuing from here.

The interface for the vid_sync is:

Table 4.2: vid_sync Interface

4.3.3 vid_decoderThe vid_sync unit is working from the Video Decoder board’s clock while all other units inVideo Capture use the FPGA clock. So, in crossing from the vid_sync unit to the vid_decstage, the Video Capture system moves from one clock domain (area) to another. Thusthe vid_decoder unit receives a request from the vid_sync stage, it should capture the dasynchronise it onto the FPGA’s clock. Furthermore, when it has captured the data, it sraise its acknowledge signal to the vid_sync stage to indicate that it has captured the datprocedure necessary to effect the synchronisation and capture as well as produce an acedge is described below.

4.3.3.1 HandshakingThe transfer of data from the vid_sync unit to the vid_decoder unit using a Request and aknowledge signal, is calledhandshaking. It is a protocol often used to transfer informatiosafely between units operating asynchronously to each other or between units where the ttime is unpredictable. When one unit has assembled all the data to send to the other, it raoutput Request line. The sending unit maintains the data constant while the receiving untures this data. After the receiving unit has captured this data, it raises its Acknowledge linon detecting this, the sending unit lowers its Request line and knows it is free to change itput signals. On detecting that the Request line in the sending unit has been lowered, the

Name Width Direction Description

v_clk 1 I 27MHz clock from VDEC board

v_datain 8 I data from VDEC board

v_word 32 O assembled 32 bits

v_req 1 O request to vid_decoder stage

v_ack 1 I acknowledge from vid_decoder stage

30

29th January 2010

ept

, thissolve.p andas itflopangestate

n re-stem

ent in

utputsolveigh atn ac-nchro-a ‘0’,re andchro-

ing unit lowers its Acknowledge signal signifying that the receiving unit is ready to accfurther data from the sender. This handshake sequence is shown in Figure 4.8.

4.3.3.2 SynchronisationAs the Request from the vid_sync unit can arrive at any time relative to the FPGA clockmay cause metastability at the vid_decoder unit which needs to be allowed time to reMetastability is caused by data changing so close to the clock edge that a flip flop’s set uhold times are not adhered to. This results in the flip flop output entering a halfway stateseeks to determine whether its output should be one or zero. Normally given time, the flipoutput resolves to a valid ‘0’ or ‘1’ state, the time being dependent on how close the data chis to the clock edge. Note that if the flip flop goes metastable and then resolves, the finalis unpredictable. In principle, if the clock and data change simultaneously, the flip flop camain indefinitely in the metastable state although the probability of this is small since synoise will usually tip the flip flop output one way or the other.

The standard method for the synchronisation and handshake logic you need to implemVerilog are shown in Figure 4.9.

The request signal from vid_sync is captured on the +ve edge of the FPGA clock and FF1 o(req_d) goes high. A clock period is allowed for any metastable half state on req_d to reso that on the next +ve clock edge, FF1’s o/p passes to FF2. Provided, FF1’s output is hthis time, FF2’s output goes high allowing the data from vid_sync to be captured and aknowledge signal to vid_sync to be raised. The acknowledge signal is then used to (asynously) reset FF1. If metastability occurs when clocking FF1 and FF1’s output resolves tothen the FF2 output may be delayed going high by one bit period, thus delaying the captuacknowledgement by one bit period. Typical timing for the capture of data using the syn

Figure 4.8: Principle of Handshaking

output signals from sender

Request

Acknowledge

requestreq_dFF1 FF2

(from vid_sync)acknowledge

(to vid_sync)

reset

Figure 4.9: Design for Handshake Logic

FPGA clock

(could go metastable)

capture data and send

31

29th January 2010

com-does

ycles; nor-eri-

andals.ontaltheketsringet forandideonstantefuln the

kingelues

th off theplaceuest tomost

d fromtput

nisation logic shown in Figure 4.9 is shown in Figure 4.10; as shown in Figure 4.10, the ining request is not coincident with the clock edge (which is the usual case), so metastabilitynot occur.

The deadline for servicing the Request from the vid_sync unit is around 144ns (4 clock cat 27MHz), which leaves a maximum of three FPGA clock cycles to synchronise the datamally only two FPGA clocks will be required but metastability may require another clock pod.

4.3.3.3 vid_decoder OperationWhen the vid_decoder unit receives a control packet, it should interpret the control byteproduce the vertical sync (Vsync), horizontal sync (Hsync), vertical blanking and field signThe vertical and horizontal sync signals should be set for one clock period when the horizor vertical blanking bits initially go to ‘1’, as shown in Figure 4.11. The Hsync signal marksend of a line while Vsync marks the end of a field. Note that the SAV and EAV control pacare generated forall lines including blank lines so that Hsync should be generated both duthe image and the blank lines, as shown in Figure 4.11c). The vertical blanking should be sall blank lines as shown in Figure 4.11b). This signal going to ‘0’ marks the start of a fieldis helpful in the correct formation of the frame address performed in a later unit of the VCapture system. The field signal should be set at the start of each new field and remain cothroughout the field. The field signal is not used in the current system but you may find it usfor debugging. It is recommended that the state of these control signals is transmitted dowpipeline on every positive clock edge.

Having located the control packet, the vid_decoder unit should then filter out all the blaninformation using the control signals, passing ononly the pixel data. The unit should extract ththree colour components for each pixel to form 3 outputs for the pixel. As there are two Y vaper word, these should be averaged to form one 8-bit output. Doing this will shrink the widthe frame, as we are averaging two pixels, to 360 pixels horizontally, which forms part osize reduction needed. Having computed one pixel’s colour, the vid_decoder unit shouldthe 3 components on its register outputs at a positive FPGA clock edge and raise a reqthe next unit which is the vid_lookup. Since a unit is guaranteed to only produce data atonce every alternate clock period, you can opt to have the Request Out signal from this anthe vid_lookup and vid_add units one clock period wide. Note that active data is only ouwhen present while the control signals are output on every clock.

req

req_d

ack

FPGA clk

Figure 4.10: Typical Timing in Capturing Data from vid_sync

32

29th January 2010

timee time

Acknowledge SignalsThere is no need for an acknowledge back from the vid_lookup unit. This is because thefor the vid_lookup to operate is constant and therefore predictable. The same is true for th

Figure 4.11: Timing for the Hsync and Vsync Signals

FPGA clock

horizontal blanking signalH

Hsync

EAV word forline n received

SAV word forline n received

a) Hsync Timing Relative to Horizontal Blank Signal

FPGA clock

vertical blanking signalV

Vsync

EAV word at endof field received

SAV word for startof next field received

b) Vsync Timing Relative to Vertical Blank Signal

Vsync

vertical blanking signal

Hsync

EAV

c) Hsync Timing Relative to Vertical Blank Signal and EAV

33

29th January 2010

dge tocoder,l from

to thee Re-rbiter

tionsfourYoung the

igitsorre-e andline.

through the vid_add and vid_address units. So these too don’t need to return an acknowlethe unit sending them a request. This also means that if the Request signal from the vid_deis one clock period long then a delayed version of this can be used as the Request signathe vid_lookup and vid_add units. The Request signal from the vid_address unit goesmemory Arbiter and since the Request competes with those from other units, the time thquest needs to be held is not predictable and hence an acknowledge from the memory Ato the vid_address unit is required.

Thus the interface of the vid_decoder unit is:

Table 4.3: vid_decoder Interface

4.3.4 vid_lookupIn this stage of the pipeline, table look-ups are performed to get the results of the multiplicaof the colour components. This unit forms the first unit of the colour conversion. There arelook-up tables each with 256 9-bit entries, comprising 8 integer bits and 1 fractional bit.need to compute the table contents for each input combination (e.g. with a C program) usipreviously given equations.

y_table[Y] = 0.272(Y - 16)cr_table[Cr] = 0.193(Cr - 128)

b0_table[Cb] = 0.094(Cb - 128)b1_table[Cb] = 0.482(Cb - 128)

The computed values should produce a binary or hex ASCII file. A hex file requires three dper line of which the most significant value will be 0 or 1. There needs to be an exact cspondence between the number of bits specified per line and number of lines in your filthe size of the ROM, i.e. your file should comprise 256 lines specifying 9 binary bits per


clk 1 I 25MHz FPGA clock

data_in 32 I data from vid_sync unit

req_in 1 I request from vid_sync unit

sync_ack 1 O acknowledge to vid_sync unit

y_out 8 O pixel’s y component

cr_out 8 O pixel’s Cr component

cb_out 8 O pixel’s Cb component

lookup_req 1 O request to vid_lookup for look-up

field 1 O field signal

hsync 1 O horizontal sync signal

vsync 1 O vertical sync signal

vblank 1 O vertical blanking signal

34

29th January 2010

y in

m

tputeach-is the

ges.

fourk-upe out-

d unit.y theata and

To create the ROM contents from your file, you will need to declare a two dimensional arraVerilog, e.g.reg[8:0] y_table[0:255]; The Verilog directive$readmemh (“<file name>”, <name of ROM>);causes the specified ROM to be loaded frothe named hex file. If you want to load a binary file, use the directive$readmembrather than$readmemh.

Due to the design of the FPGA, the outputs of the ROMs can only go to the vid_lookup ouregister, i.e the look-up table results cannot go through some combinatorial logic before ring the output register or be sent to another destination as well as the output register. Thisreason that the look-up and the combining of the look-up table results form separate sta

On receiving a request from the vid_decoder unit, the vid_lookup unit should perform thelook-ups based on the three components of the input pixel (Cb forms the input to two lootables). When the look-ups are complete, the data from the look up tables is loaded into thput register on the positive edge of the FPGA clock and a request is raised to the vid_adThe vid_lookup unit must also delay the control signals (field, hsync, vsync and vblank) bsame amount as the data. This has to be done so the control signals do not overtake the dcause errors further down the pipeline.

The interface of the vid_lookup stage is



req_in 1 I request from the vid_decoder unit

y_in 8 I pixel’s Y component

cr_in 8 I pixel’s Cr component

cb_in 8 I pixel’s Cb component

y_result 9 O result of the Y multiplication from thelook-up table

cr_result 9 O result of the Cr multiplication from thelook-up table

b0_result 9 O result of the first Cb multiplicationfrom the lookup table

b1_result 9 O result of the second Cb multiplicationfrom the lookup table

add_req 1 O request to vid_add stage

field_in 1 I field signal

hsync_in 1 I horizontal sync signal

vsync_in 1 I vertical sync signal

vblank_in 1 I vertical blanking signal

35

29th January 2010

esults

tionssults:

on islourssarybit 0lippedld be

usedCbCr

gisterraised.

theake the

Table 4.4: vid_lookup Interface

4.3.5 vid_addThis unit forms the last stage of the colour space conversion. In the vid_add stage the rfrom the lookup tables are combined in order to find the RGB values for the pixel.When the unit receives a request from the lookup unit this unit should perform the computato form the RGB results. The following are the operations that are needed to form the re

R = y_result + (cr_result << 1)G = y_result - (b0_result + cr_result)

B = y_result + b1_resultYou can assume that if addition or subtraction is specified then when the RTL descripticompiled into logic, the adder/subtractor logic will be created. Having completed the coconversion to RGB, the vid_add unit should take the 9-bit results and perform any nececlipping before forwarding the data to the vid_add outputs. Taking bit 8 as the ms bit andas the ls bit, if bit 8 is set, this means that the result is negative and the result should be cto 0. If bits 6 or 7 are set, this means that the result has overflowed and the result shouclipped to the maximum value of 0x1F. Otherwise the result comprises bits 5 to 1. Bit 0 isto reduce rounding errors and is ‘thrown away’. Clipping is required as there are some Yvalues that do not exist in the RGB space.

When the clipping has been performed the R, G and B values are loaded into an output reon the positive edge of the FPGA clock and a request to the vid_address unit should beThe vid_add unit must also delay the control signals (field, Hsync, Vsync and vblank) bysame amount as the data. Again, this has to be done so the control signals do not overtdata and cause errors further down the pipeline.

The interface for the vid_add unit is

field_out 1 O delayed field signal

hsync_out 1 O delayed horizontal sync signal

vsync_out 1 O delayed vertical sync signal

vblank_out 1 O delayed vertical blanking signal



req_in 1 I request from the vid_lookup unit

y_result 9 I result of the Y multiplication from thelookup table

cr_result 9 I result of the Cr multiplication from thelookup table

b0_result 9 I result of the first Cb multiplicationfrom the lookup table


36

29th January 2010

d upa re-it intoun-and-it topix-twoemoryuire-

in theroms unitriveReg-

Table 4.5: vid_add Interface

4.3.6 vid_addressThis is the final unit of the Video Capture system. In this unit, the pixels’ data is packageand sent off to memory along with the correct address within the frame. Upon receivingquest from the vid_add unit, the vid_address unit should take the pixel data and arrange16 bits in the format XRRR RRGG GGGBBBBB, where X is the most significant bit and isused. (The X bit could be used to hold information that might be useful in debugging or exping the system, e.g. the field bit.) If you are not using the X bit, it is probably easiest to set‘0’. Having got the pixel into the correct format, the vid_address unit should then buffer upels until it has received two pixels. At which point the vid_address unit should output thepixels’ data and the address (on a +ve FPGA clock edge), and raise a request to the mArbiter. Thirty two bits rather than sixteen bits are written at a time to reduce bandwidth reqments to the 32-bit wide RAM.

Since the timing through the Arbiter is not predictable, as it depends on other requestsiIMP, extra buffering needs to be provided in the vid_address unit for data coming fvid_add. With regard to the extra buffering required, the timing is such that the vid_addreswill only have to buffer a single pixel as shown in Figure 4.12. The first pixel of the pair to aris placed in Register A. When the second of the pair arrives, it is transferred with that from

b1_result 9 I result of the second Cb multiplicationfrom the lookup table

r_out 5 O pixel’s R (red) component

g_out 5 O pixel’s G (green) component

b_out 5 O pixel’s B (blue) component

addr_req 1 O request to vid_address unit





field_out 1 O delayed field signal

hsync_out 1 O delayed horizontal sync signal

vsync_out 1 O delayed vertical sync signal

vblank_out 1 O delayed vertical blanking signal


37

29th January 2010

time.

trans-ith a

its out-

forount)fromclockoldaving

secondrt ofas a

t be

ister A into Register B as the timing shows that register B is guaranteed to be free at thisThe timing is discussed in section 4.9 on the memory Arbiter.

The unpredictable timing through the Arbiter also necessitates the use of handshaking tofer the write request to memory and signal the memory’s acceptance of the request wmem_ack acknowledge signal. Usually the sender (vid_address) would be free to changeput signals once the acknowledge signal is received. However, that isnot the case here.mem_ack is one clock period long and as the FPGA clock frequency is 25MHz it is high40ns. However, the memory access time is 55ns (more if wire delays are taken into accwhich occupies two FPGA clock cycles. When the request for a memory accessvid_address is granted by the Arbiter, mem_ack is returned on the first of the two memorycycles but the data and addressmustremain constant during the second clock cycle so as to hthe data and address constant to the memory and so complete the writing to memory. Hreceived the mem_ack signal, the mem_req signal can be removed at the start of theclock if there is no further request waiting to be output. Leaving mem_req high at the stathe second cycle indicates a further memory request from the unit and will be interpretednew request by the Arbiter. However, for the vid_address unit, a further request will no

32

Figure: 4.12: Buffering at the Output of vid_address

Reg B

Reg A

to Arbiter

pixel data (from vid_add)

16

16 bits

32 bits

38

29th January 2010

r the

eceiptsteadhouldd endsn addory. Andress

andvious

d andin thepixelsls onlines.

cur-line

iden-the

o ther the

ready so mem_req should return to ‘0’ at the start of the second clock cycle. The timing fovid_address unit is illustrated in Figure 4.13.

At the end of the second cycle, the frame address should be incremented. Following the rby the vid_address unit of the last pixel data for a frame, the Vsync signal goes high and inof incrementing, the address should be reset to 0 for the start of the next frame. The unit sassume that there is only one frame in memory and that it starts at word address 0x0 anat word address 0x95FF. The Frame Allocation unit which receives the address will thethe necessary offset to the address so that the frame ends up in the correct area of memend_of_frame signal (formed from Vsync) informs the frame allocator that a new offset adis required. The pixel data goes directly to the Arbiter.

Image Clipping.The image format that is coming from the Video Decoder board is 720 pixels by 576 linesthe image format needed by the system is 320 pixels by 240 lines. So far, operations in preunits have reduced the width of the image by two by averaging the two pixels in each worto halve the height of the image, each field is treated as a separate frame. This resultsframe size being 360 pixels by 288 lines, and doubles the frame rate. To reduce the 360by 288 lines to the desired 320 pixels by 240 lines, the frame should be clipped by 20 pixeeach side, and by 24 lines at the top and bottom to leave a central area of 320 pixels by 240To do this, it is recommended that you keep track of which line and which pixel the unit isrently receiving by counting pixels and resetting the pixel count to zero on Hsync and thecount to zero on Vsync. In this way, the pixels to be displayed on the screen/monitor aretified and a request to write to memory should only be raised for the pixels that lie withincentral 320 by 240. Note that because of the image clipping, the end-of-frame signal tFrame Allocator will be sent some considerable time after the last piece of pixel data foframe.

The interface of the vid_address unit is:



clock

mem_req

mem_ack

memory write

data/address

Figure 4.13: Timing at the Output of the vid_address Unit

39

29th January 2010

ddresscesses

, name-rnedixelsr dis-g inom-ed byratee Allo-ysteme units

Con-merainedVid-next

Table 4.6: vid_address Interface

4.4 Frame AllocatorThe Frame Allocator adds the correct offset to the frame address generated by the vid_aunit. How this is done and how the correct address offset added to requests for memory acfrom the other units in the iIMP system are described in this section.4.4.1 Frame Allocator OperationAs well as requests from the Video Capture system, other units request memory accessesly the ARM processor and the VDU Controller. The ARM processor requests are concewith processing the image to detect objects, while the VDU Controller continuously reads pin an image from the memory and on the provided hardware sends them to the monitor foplay. The VDU Controller reads the image sequentially along each row line by line, startinthe top left corner and finishing at the bottom right. It reads 32 bits from memory at a time cprising two pixels. The frame the VDU Controller accesses is also the frame that is accessthe ARM, while the Video Capture system is writing into another frame. All the units geneword addresses within the frame relative to 0 and the memory requests pass to the Framcator which translates the addresses generated by the VDU Controller, Video Capture Sand ARM requests into their actual memory addresses according to the frame that thesare currently working in.

The Frame Allocator is synchronised to the FPGA clock and three frames are used: a VDUtroller frame (which is also the frame the ARM uses), a Video Capture frame (called the caframe) and a spare frame. The camera frame ‘follows’ the VDU Controller frame as explalater. The VDU continuously reads from memory and displays the current frame while theeo Capture continuously writes the camera frames to memory and normally this will be theframe read and displayed by the VDU Controller.

req_in 1 I request from the vid_add unit

r_in 5 I pixel’s R (red) component

g_in 5 I pixel’s G (green) component

b_in 5 I pixel’s B (blue) component





mem_req 1 O request to memory arbiter

mem_ack 1 I acknowledge from memory arbiter

data_out 32 O packaged pixel data to memory

addr_out 18 O frame address sent to frame allocator

end_vg_frame 1 O Vsync signal sent to frame allocator


40

29th January 2010

andme Al-

meraames.le toame.

(mem-

offsetst, theing atations

ffset

gnal

Effec-com-

A frame occupies 150K bytes (320 pixels by 240 lines at 2 bytes per pixel) or 37.5K wordsthe three frames used start at word addresses, 0x0, 0x12C00 and 0x25800. Since the Fralocator needs to know when to apply a new offset to the VDU/ARM address and the caaddress, the Allocator needs to know when the VDU and camera have completed their frThus these units signal when a frame read/write is complete. In addition, the ARM is abissue a Freeze signal so the Frame Allocator will maintain the VDU in the same display frThus the Frame Allocator interface is:

Table 4.7: Frame Allocator Interface

The output addresses are word addresses which join the data and request as inputs to theory) Arbiter (see later).

4.4.2 Frame Allocation RulesThe Frame Allocator adds an appropriate offset to each incoming address and holds theas a VDU_offset (= ARM_offset), camera_offset and spare_offset. When initialised or resecamera frame starts at word address 0x0 while the VDU (and ARM) use the frame startword 0x12C00 and the spare frame starts at word 0x25800. Thereafter, the offset allocperformed by the Frame Allocator should obey the following rules:1. The ARM offset = VDU Controller offset.2. If the freeze signal is active (‘1’) then the VDU offset remains as is and the camera o

operates as described in Rule 4.3. If the VDU and camera frames finish simultaneously (highly unlikely) and the freeze si

is inactive, their offsets just swap. The flag is reset if set.Rules 4, 5 and 6 apply if the VDU and camera frames do not finish simultaneously.4. If the camera finishes its frame, the spare and camera offsets swap, and a flag is set.

tively, the camera frame moves to the spare frame with the camera frame just written being the spare.



reset 1 I reset signal

camera_addr 18 I word address from vid_address unit

camera_EOF 1 I end_vg_frame from vid_address unit

VDU_addr 18 I word address from VDU Controller

VDU_EOF 1 I end of VDU frame

processor_addr 18 I word address from ARM

freeze I I signal from ARM to freeze VDU frame

camera_addr_1 18 O memory address for vid_address unit

VDU_addr_1 18 O memory address for VDU Controller

processor_addr_1 18 O memory address for ARM

41

29th January 2010

rameand

ramene thelag is

5. If the freeze signal is inactive and the VDU Controller finishes its frame but the camera fhas not set its flag, then there is no frame available for the VDU Controller to move intothe VDU offset should remain the same so as to display the same frame again.

6. If the freeze signal is inactive and the VDU Controller finishes its frame and the camera fhas set its flag then the new VDU offset becomes the spare offset (since that’s the ocamera has finished writing to). The spare offset becomes the old VDU offset and the freset.

Figure 4.14 illustrates rules 2, 4 and 5, and Figure 4.15 illustrates rules 3 and 6.

spare_offset

cameraframe

spareframe

camera frame

camera_offset

⇒completes

camera_offset

set flag

spare_offset

spareframe

newcameraframe

a) Swapping Spare and Camera Offsets when Camera Finishes a Frame

spare_offset

cameraframe

spareframe

camera_offsetVDU_offset

VDUframe

flag not setor freeze = ‘1’

finish no offset changes

b) VDU Frame Ends but Offsets Remain As Is Because Flag Not Set or Freeze is Active

finish

Figure 4.14: Frame Allocation Offsets for Rules 2, 4 and 5

42

29th January 2010

evise

rameputames

l have

ectory

ig-

dence

You need to compose Verilog code to implement the rules given or you are welcome to dyour own rules and implement these in Verilog.

4.5 Developing the Verilog for the Video Capture System and Frame Allocator UnitBefore developing the Verilog for the different units of the Video Capture system and the FAllocator, it is suggested that you will find it helpful to draw some timing diagrams for the inand output signals of each unit. The name of the cells you will develop correspond to the nin the text, namely:vid_sync, vid_decode, vid_lookup, vid_add, vid_address. The Frame Allo-cator code should be developed in a cell calledframe_alloc. Apart from the cell files definingthe inputs and outputs as listed in the tables in this chapter, the files are empty, so you wilto add your behavioural Verilog code for the cell.

Before using Cadence for the first time, you need to create the COMP20592 Cadence dirby typing mk_cadence 20592. Thereafter you just open up Cadence by typingstart_cadence20592. To insert your behavioural Verilog code for a unit, start Cadence (start_cadence 20592)to bring up anicdswindow. SelectTools-> Library Managerto bring up aLibrary Managerwindow. SelectCOMP20592for the Library; <name of unit e.g.vid_decoder> for the Cell;functionalfor theView.Then in theLibrary Managerwindow, selectFile -> Opento bring upaverilog.vwindow containing the Verilog code for the unit just containing input and output snals. Type in this window to add your behavioural code for the unit and then save it usingFile->Save from the window’s toolbar.

Parsing (Syntatic Analysis)It is first necessary to parse the design to see that it satisfies the Verilog syntax. Start Ca(start_cadence 20592). This brings up theicdswindow. ChooseFile->Open. This brings up anOpen Filewindow. SelectTools-> Library Managerto bring up aLibrary Managerwindow.

VDU_offset

cameraframe

spareframe

camera_offset

Figure 4.15: Frame Allocation Offsets for Rules 3 and 6

frame ends frame ends

⇒

camera_offset

cameraframe

spareframe

VDU_offset

d) Offset When Camera and VDU Frames finish simultaneously

camera_offset

spareframe

cameraframe

spare_offsetVDU_offset

VDUframe

finish flag set, freeze = ‘0’

camera_offset

VDUframe

cameraframe

VDU_offsetspare_offset

spareframe⇒

flag reset

c) VDU Frame Ends and Its Offset Moves to the Spare Frame

flag reset

freeze = ‘0’

43

29th January 2010

and-

d

d thehen.

ribed

evel-wholepment

sche-s be-

hosebe de-layed byted toture.

e sec-

SelectCOMP20592for theLibrary; <name of unit e.g.vid_decoder> for theCell; functionalfor theView.Then in theLibrary Managerwindow, selectFile -> Opento bring up averilog.vwindow containing the Verilog code for the unit. You need to perform at least one edit on itthen save it. Now, in theverilog.vwindow, selectFile->Exit. The edit operation on the file causes all the Verilog code to becheckedfor syntax errors on Exit. If you have errors aHDL ParserError/Warningswindow comes up telling you that parsing of the Verilog file failed. A failedesign check indicates syntax errors and by ClickingYesin theHDL Parser Error/Warningswindow you can inspect the error report to gain some indication of where the error is, anverilog.vwindow will reopen. Correct any errors in the top level description, save it and texit to re-check for syntax errors. Repeat this until the code correctly passes the checks

PrintingYou can print out your code from theverilog.vwindow toolbar usingFile->Print . This bringsup aPrinter window. Enterlpr -Plfprt and click onPrint. Dismiss theverilog.vfile usingFile->Exit. If theHDL Parser Errors/Warnings window comes up, clickNo

Having added your Verilog code for a unit, you can test it by running the testbench as descfor each unit in the next section.

4.6 Video Capture and Frame Allocator TestingIt is assumed that each cell of the Video Capture system and the Frame Allocator will be doped, tested and modified until correct. It is then assumed that students will assemble theVideo Capture system to test this for correct behaviour. To assist students with the develorequired in the Video Capture task, a test has been written for each cell and a test benchmatic is provided consisting of the test and the cell under test with appropriate connectiontween them. The test Verilog to exercise the cell is in a cell called<name of cell>_test. Notethat Verilog system tasks in the test cells start with$.

The test bench schematic is in the cell called<name of cell>_testbench. For example, the testfile for the frame_alloc cell is in frame_alloc_testand the cell’s testbench is inframe_alloc_testbench. These test benches assume that your unit interfaces conform to tdefined in this chapter. Furthermore, where control signals are specified as needing tolayed by the same amount as the data, the test bench tests to see that the signals are dethe correct amount. You should not modify the test files or the test benches unless instrucdo so. In addition to test for the individual cells, a test is provided for the entire Video CapThe cell vid_capture contains the cellsvid_sync, vid_decode, vid_lookup, vid_addandvid_address; its test file is calledvid_capture_testand its testbenchvid_capture_testbench.

4.7 Test Benches4.7.1 Running the Test Bench SimulationsThe instructions for running each test bench are the same but you should also refer to thtions below for any specific information concerning each test.1. Start Cadence (start_cadence 20592) bringing up an icds window.2. SelectTools -> Library Manager to bring up a Library Managerwindow.3. SelectCOMP20592for the Library; <name of test bench> for theCell; schematicfor the

View4. Then in theLibrary Managerwindow, selectFile -> Opento bring up a schematic in aVir-

tuoso Schematicwindow.5. InVirtuoso Schematicwindow, selectXilinx -> Simulationto bring up aVirtuoso Verilog En-

vironmentwindow.

44

29th January 2010

ut:

gnals

you

0000

1/

n syn-file

txt,s the

6. InVirtuoso Verilog Environmentwindow, entercomp20295for Library, <name of testbench> forCell, Schematic for Cell.

7. In Virtuoso Verilog Environmentwindow, initialise the simulator by clicking on top left icon(of the running man).

8. When theicdswindow shows that initialisation is complete, inVirtuoso Verilog Environmentwindow click on the second icon down (of three separate ticks) to generate a netlist.

9. When theicdswindow indicates this is complete, click on the third icon down (box with inpand output pulse) to launch theSimVisionsimulator. Eventually, this opens up two windowsa Design Browser 1 - SimVisionwindow containing the design and aConsole-SimVisionwindow which is a command window.

10. If you want to view signals on theWaveform Viewer, in theDesign Browser 1 - SimVisionwindow, select the fifth icon from the right. This brings up aWaveform 1-SimVisionwindow.In theDesign Browser 1 - SimVisionwindow, in the left handDesign Browserwindow, usethe left mouse button to select a particular block (to expand, press its + button). The sifor the highlighted box are displayed in theNamewindow to the right of theDesign Browserbox window. If a box is highlighted, all its signals can be sent to theWaveform Viewerbypressing the Waveform Viewer icon (fifth from the right) in theDesign Browser 1 - SimVi-sionwindow. Similarly, if you want just to send a selection of signals from those in theNamelist, use the left mouse button to highlight the ones you want and send them to theWaveformViewerusing the fifth icon from the right in theDesign Browser 1 - SimVisionwindow.

11. Normally the time the simulation runs for is specified within the test bench. However ifwant to specify that the simulation runs for a set time, then in theDesign Browser 1 - SimVi-sionwindow, select the (small) down arrow button to the right of thePlaybutton to bring upa box to its right. Fill in the specified time as a number, without commas and units e.g. 1(default unit is nsecs), and press Enter

12. Run the simulation by selecting thePlay icon (leftmost icon of a solid- white right arrow-head in the simulation toolbar) in theDesign Browser 1 - SimVisionwindow. A report filefrom running the simulation is in~/Cadence/COMP20592/<name of cell>_testbench_runsimout.tmp. This file also contains any output from the$DISPLAYcommand in the test cell.

4.7.2 vid_sync_testbenchThis test bench file tests thevid_synccell by providing it with clock information and data pluscontrol bytes. The control packets arrive at different positions to make sure that the unit cachronise properly no matter when the first control byte is received. The input bytes invid_sync_data.txt are input to the test cell and passed to thevid_synccell. Assembled wordsfrom thevid_synccell are returned to the test cell and written out to a file vid_sync_output.see Figure 4.16. This should then be compared with the file vid_sync_ref.txt which contain

45

29th January 2010

of

FigureThis

t_in isut.txt.the

tablese testunitked.

our ont areut.txtlation

lts pro-ench.. Theerated

_addd unitas thatalso

ng the

expected output. WhenPlay is pressed, the simulator will run with a default stopping point11,000ns.

4.7.3 vid_decoder_testbenchvid_decoder_testbench operates in a similar manner to the vid_sync test bench shown in4.16. The vid_decoder_test cell gets its stimulus input from the vid_decoder_data.txt file.comprises 32-bit assembled words in the BT.656 format. These together with a requespassed to the vid_decoder cell and its output results are written to file vid_decoder_outpThe information written to the output text file needs to be compared withvid_decoder_ref.txt file with any differences indicating an error.

4.7.4 vid_lookup_testbenchThe test bench for vid_lookup enables a visual check to see that the contents of the lookupare sensible. This is achieved by opening up a (virtual) screen when ‘Play’ is pressed. Thcell then generates YCbCr input to the lookup unit and every time a request to the followingis generated, a PLI (Programmable Language Interface) called ‘vid_lookup_check’ is invoThis takes the lookup outputs and does the additions and then displays the resulting colthe screen. If operating correctly, you should see wide vertical strips which from left to righred, green, blue and white. The test cell also opens up an output file vid_lookup_outpwhich stores the control outputs (hsync, vsync, vblank, field and add_req) as the simuruns; this is useful for debugging if your screen output is not as described.

4.7.5 vid_add_testbenchThe vid_add test bench checks that the computations performed on the lookup table resuduces an appropriate image. The test bench works in a similar way to the vid_lookup test bIt too starts a virtual screen enabling a visual check to be performed on the unit’s resultstest cell generates table lookup inputs to the vid_add unit and every time a request is gento the following unit, the vid_add_check PLI is invoked so that the colour generated by vidunit is displayed; there are no additions this time in the PLI of course because the vid_addoes these. The output displayed by running the test bench should be exactly the samefor the vid_lookup test, i.e. wide vertical strips of red, green, blue and white. The test cellopens up an output file vid_add_output.txt which stores the control outputs generated duritest.

4.7.6 vid_address_testbench

data

vid_sync

DUTTEST

byte data

results

test controloutput.txt

Figure 4.16: vid_sync Test Bench

test bench

46

29th January 2010

ith re-st cellpixelsee

l be

p dis-black

correctferente allo-ts file.ault

add,t is writ-

dis-pixel

The vid_address test bench provides a visual check on the correct operation of the unit wgard to the addresses it generates within the frame and the clipping performed. The tereads a full size image comprising a rectangular blue core completely surrounded by a onewide boundary of green which in turn is surrounded by a multi-pixel wide red boundary,Figure 4.17. If the vid_address cell is working correctly, the clipped image remaining wiljust the blue core with its 1 bit wide green boundary.

When ‘Play’ is pressed to run the simulation on the test bench, a virtual screen opens uplaying the original image. Pressing ‘Blank’ on the screen toolbar resets the screen to alland then pressing ‘Play’ again causes the clipped image to be displayed.

4.7.7 frame_alloc_testbenchThis test bench uses a fixed camera, processor address and three offsets to check that theoffset is being added corresponding to the rules outlined for changing the offsets under difconditions. When play is pressed, the test runs with each test checking a particular framcation rule. The test halts if a fault is encountered and a fault code is written to the resulframe_alloc_output.txtwhile if all tests pass, atests passmessage is written to the output fileRule 1 (ARM offset = VDU offset) is checked in all 32 tests. Table 4.8 shows the rule a fcodes has infringed.

Table 4.8: Frame allocation Test Faults.

4.7.8 vid_capture_testbenchThis test all units in the Video Capture system, i.e. vid_sync, vid_decoder, vid_lookup, vid_vid_address. The test supplies camera data to the system and then displays the frame thaten into memory on a virtual screen. When the ‘Play’ button is pressed, the full size imageplayed should be a red and blue checker board pattern with two yellow horizontal lines one

Fault Code Rule Infringement

0initialisation off-sets incorrect

26, 27 rule 2

23 rule 3

2, 5, 8, 11, 12, 15, 18, 21, 25, 28, 29, 30 rule 4

1, 4, 7 10, 14, 17, 20, 24 rule 5

3, 6, 9, 13, 16, 19, 22, 31 rule 6

Figure 4.17: Screen Output for vid_address Test Bench

red

blue

blue

green

green

a) Initial Image b) After Clipping

47

29th January 2010

(i.e.nd theouldleft

re-In the

ping

will

rameto the

d thewhile

in (i.e. along the second and next to last line) and two yellow vertical lines one pixel inalong the second and next to last column). The blue and red blocks are 16 by 16 pixels acamera image commences with a 16 by 16 red block in the top left. However, clipping shcause the top left block to be a 12 horizontal pixels by 8 vertical pixel red block in the topof the clipped image.

4.8 Exiting and Stopping4.8.1 Exit from SimulationTo exit from the Waveform Viewer, selectFile->Exit SimVisionin theWaveform 1-SimVisionwindow. To exit from the simulator, in theDesign Browser 1-SimVisionwindow selectFile->Exit SimVision. This removes both theDesign Browser 1-SimVisionwindow and theConsole-SimVisionwindow. To exit from the simulator environment, in theVirtuoso Verilog Environ-mentwindow selectCommands->Close. Note that the created netlist does not need to be recated if further simulations are performed unless there are changes to the Verilog code.Virtuoso Schematicwindow, selectWindow->Close.Finally, exit from Cadence by clicking onFile in the icds window and then selectingExit. In theExit icds?window this brings up, clickyes.

4.8.2 Stopping the SimulationIf for any reason the simulation fails to complete within a few seconds and the time is galloon uncontrollably, you need stop the simulation! In theDesign Browser 1-SimVisionwindow,the button next to the play button (having two parallel vertical lines) is a stop button andhalt the simulation. Alternatively, useSimulation->Stop.

4.8.3 Exit from CadenceTo exit from Cadence, click onFile in the icdswindow and then selectExit. In theExit icds?window this brings up, clickyes.

4.8 Transferring the Video Capture and Frame Allocator Design to the iIMP HardwareWhen you are convinced that the Verilog model of your Video Capture system and the FAllocator hardware are correct, then it is ready for transfer to the hardware. Please referchapter on integrating the system (which will be handed out later) for this.

4.9 Arbiter4.9.1 PriorityThe Arbiter receives requests for memory accesses from the ARM, VDU Controller anVideo Capture system. Addresses from these components come via the Frame Allocator

48

29th January 2010

igure

re-nitorstemcessings fullyines

iter,

veryem-request

, thethere-

ARMpturemoryously

the data and control information comes directly from the components as illustrated in F4.18.

The Verilog for the VDU Controller is written for you and this regularly sends 32-bit readquests to memory for two pixels from its frame. The pixels read are displayed on the mothat the VDU Controller is connected to. 32-bit write requests from the Video Capture syhave already been described. Requests from the ARM processor relate to the image properformed by the ARM are also sent as 32-bit read and write requests; when the system iintegrated, it is envisaged that the ARM will only make read requests. The Arbiter determwhich unit will get the next memory access.You are provided with Verilog code for the Arbwhich implements a simple priority scheme.

As the memory cycle occupies two FPGA clock cycles, arbitration is only performed on ealternate clock cycle. At the start of an arbitrating clock cycle, the Arbiter looks at all the mory access requests present at the positive clock edge and selects the highest prioritypresent for being granted a memory access which commences in thenextclock cycle. If themonitor is not supplied at the rate it requires then the image will flicker and for this reasonVDU Controller is given top priority. Requests from the Video Capture System are givennext highest priority as it produces requests at a slower rate than the VDU Controller. ARMquests are given bottom priority as it does not matter when these are serviced. Thus thecan have any of the memory accesses not required by the VDU Controller or the Video CaSystem. Note that both the VDU Controller and the Video Capture system only make meaccess requests when displaying or writing the active image area so neither unit is continuactive.

ARM emulator(jimulator) &debugger(Komodo)

ARB ITER

VDUController

memory

monitor

Figure 4.18: Hardware Connected to the Frame Allocator and Arbiter

VideoCapture

digitalimage

FrameAllocator

frameaddress

realaddress

data

data

data

Verilog Model/FPGA

49

29th January 2010

morye sec-ter iscle.. oneto theby 640

onitorstemcorre-ss).nd 1 innents

ccess

4.9.2 TimingAs arbitration occurs every alternate cycle (corresponding to a rate of 12.5MHz) and meaccesses occupy two cycles, then arbitration occurs concurrently with the completion of thond memory cycle. That is, while the memory is completing its current access, the Arbidetermining which unit will be granted the memory access which will start in the next cyWhen active, the VDU Controller will request and obtain every alternate memory cycle, i.e32-bit request every 160ns (or a rate of 6.25MHz). Since pixels are duplicated when sentscreen so that the 240 lines by 320 pixels in memory appears as the required 480 linespixels on the screen, the memory request rate of 2 pixels per 160ns matches the required mdisplay rate of 4 pixels per 160ns. During the active part of the image, the Video Capture syproduces a 32-bit write request from the vid_address unit on average every 296.3ns (sponding to a frequency of 27/8MHz allowing for the buffering of two pixels in vid_addreThis indicates that the Video Capture system can be expected to require between 1 in 3 a4 memory accesses. Typical timing for the Arbiter and memory accesses if all three composimultaneously request a memory access is illustrated in Figure 4.19.

Note that if requests are present, arbitration during the second clock cycle of a memory aallows the memory to be kept constantly busy.

+ve clockedges

arbitrate

VDUrequest

memoryaccess

VideoCapturerequest

VDU video VDU ARM VDU video

capture capture

40ns

ARMrequest

Figure 4.19: Typical Arbiter and Memory Access Timing

50

29th January 2010

is anoproc-n ons areproto-f the

resses

x and

datasys-

nals.utputpullThese

Chapter 5: I2C Master Communication Task

5.1. Task SummaryThis task involves setting up the required communication between the I2C Master whichopen-source synthesisable Verilog model and the Video Decoder board and the PIC micressor which act as I2C slaves. The software to drive the master will be written in C and ruthe ARM. A test environment is provided enabling a check of whether the Master registerbeing loaded up correctly and an evaluation of whether the master is adhering to the I2Ccol in communicating with the video decoder board and a dummy slave PIC. A summary osoftware required in this task follows:• write a library of functions to:

initialise the I2C Masterwrite a message to a slave device across the I2C linesread a message from a slave across the I2C lineswrite a byte to a register in the FPGAread a byte from a register in the FPGA

• test the initialisation function• test the write a message function by writing supplied set-up data to a sequence of add

on the Video Decoder board.• test write a message function by writing 4 bytes of data to a PIC specifying a camera

y position• test read a message function by repeatedly reading a byte from the PIC

5.2 IntroductionI2C is a shared bus system having two lines: SCL which is a clock line and SDA which is aline. (A ground wire also normally accompanies the I2C clock and data lines.) In the iIMPtem, there is a single master and two slaves as shown in Figure 5.1.

A device has a tri-state driver onto the SDA and SCL lines and can receive the line sigHowever, the lines can only be driven low by a device since a high turns off the device’s odriver. Thus if no device is outputting a ‘0’ onto a line, the line is not driven and floats. Soup resistors are used to take the lines high to the supply voltage when not being driven.

SCL

SDA

I2C Master (ARM board)

I2C Slave (VDEC Board)

I2C Slave (PIC)

Vp

pull up resistorsRR

Figure 5.1: I2C Bus in iIMP

51

29th January 2010

nd aren done

(fast).scal-of theto runnd the

rittener toe com-full

der tors are

ARMe FPGAn in Ta-

anable

resistors are included in the interface between the I2C Master and the model of the slave aincluded in the interface board connecting the master and slave devices so this has beefor you both when modelling and in the hardware implementation.

Two speeds are standard on the I2C clock line. These are 100Kbps (normal) and 400KbpsThe I2C clock line speed is achieved by dividing down the Master clock speed by writing aing factor to registers within the Master. The Master clock speed will operate at the speedFPGA, i.e. 25MHz. Since you can put any scaling factor in the registers, you can chooseat any lower speed but you should aim to use 100Kbs on both the Video Decoder board aPIC.

5.3 Open Source I2C MasterThe I2C Master used is open source Verilog code (which is Wishbone bus compatible) wby Richard Herville. The Master contains seven 8-bit registers which are written to in ordinitiate transfers across the I2C lines at the desired rate and are read in order to detect thpletion of a transfer request or information from an I2C slave. If you are interested in thedetails, the Master Specification can be found at:http://www.latticesemi.com/documents/doc21615x46.pdfA helpful tutorial on the operation of a single master I2C bus can also be found athttp://www.best-microcontroller-projects.com/i2c-tutorial.html

However, since it is only necessary for you to understand the register organisation in orbe able to transmit and receive along the I2C data lines, only details of the Master registegiven. These are

Table 5.1: I2C Master Registers

Although the registers are only 8-bits wide, only 16-bit addresses can be generated by theto the FPGA; thus the Master registers have even addresses. Furthermore, registers on thare in the ARM address space starting at address 0x30000000. Thus the addresses giveble 5.1 are the system addresses.

Register 4 (CTR)You can only read/write to bits 6 and 7 in the Control Register. Bit 7 isEnable Master bit with ‘1’ denoting the Master is enabled and ‘0’ disabled. Bit 6 is an EnMaster Interrupt bit with ‘1’ indicating that interrupts are enabled and ‘0’ disabled.

FPGAAddress

Name Access Description

0x30000000 PRERlo RW clock divider - least significant byte

0x30000002 PRERhi RW clock divider - most significant byte

0x30000004 CTR RW control register

0x30000006 TXR W transmit register

0x30000006 RXR R receive register

0x30000008 CR W command register

0x30000008 SR R status register

52

29th January 2010

oe to

d the(mi-

ith a

avee ad-r it. Ifwithin

the

slave

andhe re-

s tobyteeither. Iner al-

Registers 0 and 2 (PRER)Apart from the clock divider registers which are initialised t0xFFFF, all other registers are initialised to zero. The two clock divider registers combingive a 16 bit quantity specifying the scaling factor required between the FPGA clock anI2C clock. Internally, the Master uses a 5 * SCL clock rate so the clock divider is set to thisnus 1) using the formula:

clock divider regs =FPGA clock frequency - 1 5*SCL clock frequency

to give a 16-bit integer.These should be the first register writes when communicating wslave but the value of the clock divider registers canonlybe written if the Enable Master bit (bit7) in the Control Register is ‘0’.

Register 6 (TXR)The Transmit Register holds the byte of data to be written bit by bit to a slstarting with the most significant bit. The transfer commences with the sending of the slavdress and this address is unique. This informs the slave that the byte(s) following are fothe slave contains registers then the next byte written will (usually) be a register addressthe slave. This is then followed by one or more bytes ofdatauntil the slave receives a STOPbit. The slave address occupies bits 7 to 1, with bit 0 indicating whetherdatawill be written tothe slave (‘0’) ordatawill be read from the slave (‘1’). The 7-bit slave address chosen forPIC is 0100000 while the slave address used for the Video Decoder board is 0100001.

Register 6 (RXR)The Receive Register holds the data byte requested from the addressedunit.

Register 8 (CR)The Command Register is a write-only register and writing to the CommRegister initiates actions in the I2C Master. Bits 1 and 2 are reserved and the format of tmaining bits are as follows:

Table 5.2: Command Register in the I2C Master

All values are cleared automatically by the Master. After every byte that the Master writethe Slave, the Slave sends an acknowledge bit (ACK = ‘1’) back to the Master. After everythat the Master reads from the Slave, the Master acknowledges the receipt of the byte withACK, i.e. ‘1’, if more bytes are to be read or NACK, i.e. ‘0’, if no more bytes are to be readthe communication defined with the PIC, only one byte is read from the PIC so the Mastways generates a NACK.

Bit Description

7 START - generate start or repeated start bit

6 STOP - generate stop bit

5 RD - read from slave; current byte transfer is a read from slave

4 WR - write to slave; current byte transfer is write to slave

3 ACK - acknowledges end of a byte transfer (see below)

0 IACK is Interrupt Acknowledge. When set, it clears the interruptflag in the Status Register

53

29th January 2010

-only

o de-

ent by

Register 8 (SR)The status of a transfer is indicated by the Status Register which is a readregister. Bits 2 to 4 are reserved. the remaining bits have the following meaning:

Table 5.3: Status Register

Interrupts arenot used in the current system. Instead, polling is used (as described later) ttermine the completion of a read or write request.

The normal sequence of events for a write and read transfer is shown below with data sthe slave shaded. All other bits are sent by the Master.

To set up the 2-byte write transfer shown in Figure 5.2a):1. Write the slave address + the write bit to the Transmit Register.2. Write the START bit and the WR bit to the Command Register.3. Poll the RxACK bit in the Status Register until it clears.4. Write the data byte to be sent to the slave to the Transmit Register.5. Write the WR bit to the Command Register.6. Poll the RxACK bit in the Status Register until it clears.7. Write the next data byte to be sent to the Slave to the Transmit Register.8. Write the STOP bit and the WR bit to the Command Register.9. Poll the RxACK bit in the Status Register until it clears. The transfer is then complete.

Bit Description

7 RxACK - goes to ‘0’ when acknowledge received from slave

6 I2C bus busy - goes to ‘1’ on START and resets to ‘0’ on STOP

5 Arbitration Lost - Master drives data line high but it is held low,or STOP detected when not set

1 TIP - Transfer In Progress = ‘1’ when transferring data; = ‘0’ when dataand clock lines idle

0 IF - Interrupt Flag. If set to ‘1’, processor will be interrupted if IEN bit isalso set. IF is set at the end of a byte transfer or when arbitration is lost.

a) Write Transfer of 2 Bytes

slave addressSTART W ACK data ACKACK data STOP

slave addressSTART R ACK data NACK STOP

b) Read Transfer of 1 Byte

Figure 5.2: Read & Write Sequence on I2C Bus

ACK data

54

29th January 2010

e Vid-ss with

omet some

rocess-

sitionpo-ent)

ded byslave

s (2fies as half

Note: if the slave unit has many registers addressable via the I2C interface (e.g. those in theo Decoder board) then the byte(s) following the slave address is usually a register addrethe data to be written to that register following.

To perform the 2-byte read transfer shown in Figure 5.2b):1. Write the slave address + the write bit to the Transmit Register.2. Write the START bit and the WR bit to the Command Register.3. Poll the RxACK bit in the Status Register until it clears.4. Write to the RD and ACK bits in the Command Register.5. Poll the TIP bit in the Status Register until it clears.6. Read the Receive Register.7. Write to the RD, ACK and STOP bits in the Command Register.8. Poll the TIP bit in the Status Register until it clears.9. Read the Receive Register.

5.4 Writing to the PICAs a result of the image processing performed by the ARM, the ARM should compute shorizontal and vertical position that the camera should be at and these should be placed alocation agreed between the student(s) assigned to the I2C Communication and image ping tasks.

The camera centre taken as the zero point in horizontal and vertical axes. A positive x pois to the right of the centre, negative to the left. A positive y position is up and a negative ysition down. The x and y camera position are sent as sixteen bit signed (two’s complemnumbers and thus each requires two byte transfers across the I2C lines. This is precesending the PIC slave address. Thus when writing to the PIC, the I2C Master sends 1 bytePIC address + write followed by 4 bytes specifying the desired x position in 10*degreebytes) and the y position in 10*degrees (2 bytes). For example, a y value of -400 specidownwards camera position 40 degrees from the horizontal. For the x and y position, the m

55

29th January 2010

n in

fiesresetCL

lowhigh

thelock

s writeep upnsmis-

leas-rans-

A and

n writ-

is sent first followed by the ls half. The transfer (which needs to be at 100Kbs) is as showFigure 5.3.

The I2C bus is idle when the SDA and SCL are high. Taking SDA low with SCL high signithe START of a transfer. The START signal (at any time) always causes a slave device toits internal logic. Following a START signal, bits on the SDA line only change state when Sis low. Following the receipt of the last ACK bit from the slave, the Master maintains SDAand releases the SCL line allowing it to go high. Releasing the SDA line allows SDA to goand with the SCL line high signifies a STOP condition. SDA and SCL are both now high solines are idle. Note that apart from ACK signals on SDA which are sent by the PIC, all the cpulses on SCL and all other data signals shown on Figure 5.3 are sent by the Master in thitransfer. However, during the transfer, the PIC I2C Slave may decide that it is unable to kewith the rate that the Master is sending the data. In this case, the Slave can pause the trasion by driving the clock line low. This is calledclock stretchingand the continuing low stateof the line is detected by the Master when it releases the clock line. When the PIC finally rees the clock line, SCL floats high as neither the Master nor Slave now drive it low and the tmission will resume.

Provided you have correctly written to the Master registers, the pulses shown on the SDSCL lines should automatically be generated by the Master.

5.5 Reading from the PICThe PIC I2C Slave has a single status register. Once the desired x and y position has beeten to the PIC, the PIC Status register should bepolled (i.e. continuously read) by the FPGA

Start 0 1 0 0 0 0 0 Wr ackx15 x14 x13 x12 x11

x10 x9 x8 ack x7 x6 x5 x4 x3 x2 x1 x0 ack y15 y14 y13

y12 y11 y10 y9 y8 ack y7 y6 y5 y4 y3 y2 y1 y0 ack

SCL

SDA

SCL

SDA

SCL

SDA Stop

.

Figure 5.3: Writing to the PIC via the I2C Bus

time

56

29th January 2010

wn in

or ison to

posi-it bit

ant.oth

zeroested.

its D0te slaves Reg-e 5.5.

ockPIC

lls thehen ittwo

master until the motor movement is complete. The format of the PIC Status register is shoFigure 5.4.

Figure 5.4: PIC Status Register

The low and high bits for x and y are soft limits on the motor movement indicating the motat its (software) extremity. If an x or y value is sent which would cause the camera positiexceed software limits set in the PIC for the camera position, then the camera should betioned at its software limit and the appropriate bit set in the PIC status register. The limshould remain on as long as the camera is at this limit position.

From the viewpoint of the communication with the ARM board, bits D0 and D1 are importAt least one of the D0 and D1 must be set to ‘1’ following a write transmission. Moreover, bD0 and D1 can be ‘1’ simultaneously indicating both motors motioning in parallel. Thus ain bits D0 and D1 indicates the motors are no longer in motion and are at the position requ

Polling is effected by sending repeated requests from the Master to the PIC Slave until band D1 in the PIC’s Status Register are zero.To read this register, the Master sends a 1 byPIC address + a read bit. In the next byte the Master receives the data in the PIC’s Statuister. Again 100kbs is used for the transfer rate. The format of the transfer is shown in Figur

The ACK signal and D0 to D7 are driven onto the SDA line by the PIC Slave while the clsignal SCL and all other data on SDA including NACK is driven by the Master. Again theSlave can clock stretch to pause transmission if required.

At the Master end, having requested the reading of the PIC Status register, the ARM poMaster’s Status Register to determine if the data byte from the PIC has been received. Whas arrived in the Master, the ARM will then read the Master’s Receive Register, test the

spare spare y high y low x high x low y motor inmotion

x motor inmotion

D7 D6 D5 D4 D3 D2 D1 D0

Start 0 1 0 0 0 0 0 Rd ackD7 D6 D5 D4 D3

SCL

SDA

D2 D1 D0 nack Stop

SCL

SDA

Figure 5.5: Reading from the PIC via the I2C Bus

time

57

29th January 2010

gister

ameraad-

. Thein hex.

eed to

to you.your

s

e

irese in amode.ted by

least significant bits (D0 and D1) and will send another request to read the PIC’s Status Reif either D0 or D1 is non-zero.

5.6 Writing to the Video Decoder BoardOn starting up the system, the Video board needs to be initialised so that it converts the cdata to the digital output format expected. This involves writing the Video Decoder slavedress followed by specifying the register address to be written followed by the register datarequired sequence of addresses and data is given below with all address and data values

Address 0x00 = 0x04Address 0x15 = 0x00Address 0x17 = 0x41Address 0x3A = 0x16Address 0x50 = 0x04Address 0x0E = 0x80Address 0x50 = 0x20Address 0x52 = 0x18Address 0x58 = 0xEDAddress 0x77 = 0xC5Address 0x7C = 0x93Address 0x7D = 0x00Address 0xD0 = 0x48Address 0xD5 = 0xA0Address 0xD7 = 0xEAAddress 0xE4 = 0x3EAddress 0xE9 = 0x3EAddress 0xEA = 0x0FAddress 0x0E = 0x00

In normal operation, once the Video Decoder is set up then there should be no further nwrite to it and all subsequent communication should be between the PIC and the ARM.

5.7 Development5.7.1 C ProgramsYou need to develop C programs to drive the open source synthesisable I2C Master givenOne program needs to contain a set of functions to drive the I2C Master. You should placelibrary of functions in a C program namedi2c.c. It is suggested that you will need five functionin this library:1. i2c_init() which initialises the Master by setting the clock speed and Master enable2. i2c_write(slave address, number of bytes, data pointer)which writes a message to the slave3. i2c_read(slave address, number of bytes, data pointer)which reads a message from the slav4. swi__write__i2c(FPGA reg address, data byte)writes a byte of data to a FPGA register5. swi__read_i2c(FPGA reg address, data byte)reads a byte of data from a FPGA register

The first three functions need to use yourswi__<name>functions in order to communicate withthe Master registers which reside in the FPGA. Writing to or reading from the FPGA requthat the ARM code executing be in privileged supervisor mode since FPGA registers arprotected space. However, a C program only ever generates ARM code operating in userThus it is necessary to replace the user-mode branch and link ARM instructions genera

58

29th January 2010

de-re rec-ehrom

test Card as

of itsanding

startkept

glecode

it by

For

SWIb-

t

elf

the compiler into supervisor mode SWI (SoftWare Interrupt) instructions which allow thesired access to the FPGA registers. The points at which such replacement are required aognised by the function naming conventionswi__<name>. Extra stages of processing after thinitial compilation into ARM object code looks for theseswi__calls and replaces the brancand link with a SWI instruction which directs the code to the SWI handler and on return fthe SWI instruction places the code back into user mode.

The other C program should be a test program which uses thei2c_init, i2c_writeandi2c_readto check the operation of the Master. You should not use swi__ calls in this program. Theprogram needs to initialise the Master, then set up the registers in the Video Decoder bodescribed and finally send write transfers and receive status information from the PIC.

5.7.3 Compilation into Executable CodeThe following assumes that you have two C programs: a library of functions includingswi__functions called i2c.c and a test program called test.c which contains noswi__functions. Thesehave to be compiled into object code with the library requiring extra processing becausesw__functions. All C programs containing SWIs have to be dealt with in the same passcompiled into a single object code file. In addition, the ARM has hardwired vectors definfixed addresses in memory from where the different types of interrupts (including SWIs)executing instructions. This information is also required as the I2C code uses SWIs. It isin an ARM assembler code file (init.s); the conversion to object code can be done in a sinstep as the ARM code can specify the SWIs correctly in its code. Finally, all the objectfiles need to be linked together to provide an executable.elf file that can run on Komodo andproduce information on the emulation screen.

In a shell window, perform the following operations in the sequence given:1. From your COMP20592 directory, create a new directory for your work and attach to

typing:mkdir <dir name>cd <dir name>

2. Set a path which points to the ARM development tools by typing:PATH=$PATH:/home/cadtools5/gnuarm-3.4.3/bin

3. Produce an object fileinit.o from init.s initialising the ARM by typing:arm-elf-gcc -c gcc_support/init.s -o init.o

4. C programs containingswi__functions require a 3-stage process to obtain object code.the i2c.c library of functions, typearm-elf-gcc -S <list ofall C code files containing SWIs separated by a space> -o i2c.s$COMP20592/gcc-support/swi_pp i2c.sarm-elf-gcc -c -o3 pp_i2c.s -o i2c.oThe first command generates an ARM assembly code filei2c.sfrom all the C programs and-Sstops it being assembled. The second modifies the ARM assembler code to includeinstructions, producing a filepp_i2c.s(pp denotes post processing). The third produces oject codei2c.o from the modified ARM code file.

5. Any C program not containing SWIs, e.g.test.c, can now be directly compiled into objeccode by typing:arm-elf-gcc -I $COMP20592/gcc_support -c test.c -o test.oThis step needs to be repeated for each C code file not containingswi__ functions.

6. The different object code files now have to be linked by typing:arm-elf-ld -T $COMP20592/gcc-support/lscr init.o i2c.o test.o -o <name of program>.

59

29th January 2010

ndd.

I2C

ttomhown

he I2Ce infor-tiont val-

ob-

m-

lscr refers to a link script for allocating memory to object files. The output of this commais a.elf file which Komodo can run and the simulation environment can now be opene

5.7.2 Emulation EnvironmentA software emulation environment is provided so that you can develop your C code for theMaster prior to integrating it into the system hardware.1. In a shell window starting Komodo by typing:

new_start_komodo -c 20592 &This opens up Komodo, a virtual screen for displaying an image (with LEDs at the bocorresponding to the board LEDs) and an emulation screen. The console window is sin Figure 5.6.

The emulation screen displays the I2C operation of the three components attached to tbus with regard to the transmissions that the Master send. The console areas record thmation that the units send or receive from the time of Reset (top left button). This informais usually in black text. However, red indicates a warning or an error such as an incorrecue; comments are in blue.

2. Press theReset button on the emulation screen.3. In the Komodo window, use the browse button for Load to select your elf file. (You’ll pr

ably have to double click on the directory button in theSelect Source Filewindow to displayall the files in your COMP20592 directory). PressOK to load it into Load line of the Komodowindow.

4 PressLoad in the Komodo window to load the ARM.5. In the Komodo window, pressResetandRunand observe the sequences displayed on the e

ulator screen, checking for correctness.

Figure 5.6: Emulation Screen for the I2C Master Task

60

comp20592: laboratory project manual integrated image...

Documents