sharc

1

The SHARC

Super Harvard Architecture Computer

2

The SHARC

• Developed by Analog Devices

• Optimized for demanding DSP and imaging applications.

• 32 Bit floating point, with 40 bit extended floating point capabilities.

• Large on-chip memory.

• Ideal for scalable multi-processing applications.

3

Harvard Architecture

• Program memory can store data.

• Able to simultaneously read or write data at one location and get instructions from another place in memory.

• 2 buses1 Data memory bus.2 Program bus.

• Either two separate memories or a single dual-port memory.

4

Super Harvard Architecture

• Many processor employ Harvard Architecture by having two separate memories or caches integrated into the processor chip

• The SHARC is unique in that it’s internal memory is capable of holding a large program as well a large amount of data. This is what makes it SUPER!!!

5

DSP

• Digital Signal Processor.

• High speed, low overhead data movement and rapid computations required.

• Usually has a small on-board ROM, RAM and single cycle multiply.

• Designed to run single line, serial in, serial out, signal processing applications very fast.

6

DSP Computations

• The inner product of two vectors is a common computation for determining energy or correlation.

• The following C code is an example: for (n=0; n<length; n++) result+= x[n] * y[n];

• The process which has the lowest instruction time will have the best performance.

7

SHARC DSP

• The SHARC incorporates features aimed at optimizing such loops.

• High-Speed Floating Point Capability

• Extended Floating Point

• These features are DSP specific.

• Meaning, when applied to a non-DSP application performance may not be as optimal.

8

Floating Point and Extended Floating Point

• The SHARC supports floating, extended-floating and non-floating point.

• No additional clock cycles for floating point computations.

• Data automatically truncated and zero padded when moved between 32-bit memory and internal registers.

• Not accurate enough for scientific algorithms. Excellent signal to noise ratio.

9

SHARC’s Internal Memory

• Makes SHARC unique.

• Size• Allows many complex functions to be preformed on-chip.

Eliminating the need to move data between internal and external memory.

• Memory size is significantly larger then most other high speed computational devices.

• Dual-block, Dual-port• Optimizes the Harvard Architecture by allowing the fetch

of instructions while performing data memory accesses.

10

Multiply and Accumulate Instructions on the SHARC

• Like most DSPs the SHARC is able to compute a product and add the product to a running total in a single clock cycle.

• The SHARC’s super instruction is that it can multiply and accumulate while adding, subtracting, or averaging data in two other registers.

• These instructions give the SHARC its 120 megaflop rating.

11

Zero Overhead Loopingon the SHARC

• A single instruction outside the loop performs loop set-up. Informing the SHARC that there is a loop approaching.

• The instruction also includes the iteration count and termination condition.

• This causes the pipeline to remain full during loop execution and also allows the termination condition to be tested in parallel.

12

DAGs on the SHARC

• Data Address Generators are integer computation units that manage the indexing of registers.

• Allows the SHARC to fetch a value and update the index value.

• If the updated value exceeds a limit, the DAB adjusts the index so that it wraps.

• This occurs in the same clock cycle as the read or write.

13

DAG Capabilities

• Circular Buffering• Rather then actually moving data in and out of a vector,

circular buffers are used.• Updating the index modulo, the oldest entry can be

conveniently replaced by the newest entry.

• Bit Reverse Addressing• The bit pattern of a vector index is reversed.• Done automatically by the SHARC.• Required for Fast Fourier Transform (FFT), which is

often critical to DSP applications.

14

SHARC DSP

• What Makes the SHARC unique?– It also has some features not related directly

related to optimizing numeric computations.• Pipelining

• Handling Branches

• Why has this not emerged sooner?– Technology has only recently become available

to make it economical to integrate general single computing devices.

15

SHARC’s Pipeline

• 3 stages1 Instruction Fetch

2 Decode

3 Execution

• Takes three clock cycles for an instruction to propagate through the pipeline.

• The processor execution speed is one instruction per clock cycle even though each instruction requires three clock cycles.

16

SHARC’s Handling BranchesDelayed Branching

• When a branch instruction is encountered the two instructions which have been loaded and decoded are executed before the branch.

• This keeps the pipeline full and avoids junking those two instructions and reloading the pipeline.

• Beneficial in situations such as a few instruction loops. When the ratio of wasted clock cycles to instructions is significant.

17

SHARC’s Handling BranchesNon-delayed Branching

• Traditional branching.

• If the pipeline cannot be reordered to use delayed branching, non-delayed branching is space saving.

• Uses only one word of storage.

• Although, it takes three cycles as the pipeline gets reloaded.

18

Multi-processing

• SHARC is uniquely equipped for multi-processing.

• Links to ports are very powerful multi-processing capabilities.

• Two main program models depending on the application.

• Adapts well to different multi-processing architectures.

19

Multi-processingSHARC Links

• SHARC has 6 link ports that can transport data at rates up to 40Mbytes/sec.

• Links designed for point-to-point connections.

• Data can be transmitted in either direction but not both simultaneously.

20

Multi-processing Program ModelMIMD

• Multiple instruction, multiple data.

• Good for applications that require multiple instruction threads to execute concurrently.

• Processors operate individually.• Each processor executes different code.

• Typically used for image reconstruction and multi-channel DSP.

21

Multi-processing Program ModelSIMD

• Single instruction, multiple data.

• Works best when all processors execute identical instruction sequences.

• Do not require overhead for inter-processor synchronization.

• Typically used for synthetic aperture radar and automatic target recognition.

22

Multi-processing ArchitecturesCluster Design

• Groups of up to 6 in a cluster

• Most common for joining multiple SAHRC's

• All processors, global I/O and global memory connected to a common “Cluster bus.”

• Each SHARC can “drive” the bus.

23

Multi-processing ArchitecturesMesh Design

• All SHARC’s joined by their link ports and are connected to a common bus.

• In SIMD mode one single master SHARC drives the bus.

• In MIMD mode mesh architecture cannot function if data is lager then on chip available memory.

• Advantageous scalability over a wider range of applications.

24

How optimal is the SHARC for non-DSP Applications?

• It is obviously geared for DSP applications.

• While it may fare better then other processors it is still behind those which are designed specifically for non-DSP applications.

25

Sources

• www.alacron.com/news/tp_mimd_simd.htm

• www.analog.com

• www.cs.seas.gwu.edu/~cs339/cs339-lecture2.pdf

• www.ixthos.aa.psiweb.com/technical/notes_articles/articles

sharc

Documents

sharc dspthe sharc

data memory accesses

floating point computations

nonfloating point

chip memory

memory size

external memory

single dualport memory