lecture 6

Mekelle Institute of TechnologyEmbedded Systems (CSE507)

Department of Electronics and Communication Engineering/Computer Science and Engineering

Lecture – 6Hardware Acceleration and Embedded Networks

Hardware Acceleration

● Hardware acceleration is the use of computer hardware to perform some function faster than is possible in software running on the general-purpose CPU. Examples of hardware acceleration include acceleration functionality in graphics processing units (GPUs) and instructions for complex operations in CPUs.( from wikipedia)

....

● Normally, processors are sequential, and instructions are executed one by one. Various techniques are used to improve performance; hardware acceleration is one of them. The main difference between hardware and software is concurrency, allowing hardware to be much faster than software. Hardware accelerators are designed for computationally intensive software code. Depending upon granularity, hardware acceleration can vary from a small functional unit to a large functional block

....

● The hardware that performs the acceleration, when in a separate unit from the CPU, is referred to as a hardware accelerator, or often more specifically as graphics accelerator or floating-point accelerator, etc. Those terms, however, are older and have been replaced with less descriptive terms like video card or graphics card.

....

● Many hardware accelerators are built on top of field-programmable gate array chips.

CPUs and accelerators

● Accelerated Systems• Use additional computational unit

dedicated to some functions• hardwired logic• extra CPU

• Coprocessor

• Applications– graphics and multimedia (streaming data)– encryption and compression– communication devices (signal processing)– supercomputing, numerical computations

Why Accelerators?

● Better cost/performance.• Custom logic may be able to perform operation faster

than a CPU of equivalent cost.• CPU cost is a non-linear function of performance.

● Better real-time performance.• Put time-critical functions on less-loaded processing

elements.

Role of Performance Estimation

● First, determine that the system really needs to be accelerated.

• How much faster is the accelerator on the core function (speed up)?

• How much data transfer overhead?

• How much is the overhead for synchronization with CPU?● Performance estimation must be done on all levels of

abstraction• Simulation based methods (only average case, need to find

test patterns that stress the system)

• Analytic methods (only limited accuracy, quick, used mainly on system level)

Accelerator Execution Time

● Total accelerator execution time:

Input/output time (bus transactions) include:– flushing register/cache values to main memory;– time required for CPU to set up transaction;– overhead of data transfers by bus packets, handshaking,

etc.

Accelerator Gain● For simplification let us consider:

• An application consists of one task only, repeated n times• task can be executed completely on the accelerator

• But in general:– not all tasks can be executed on the accelerator– CPU has also other obligations– possibilities:

• single-threaded/blocking: CPU waits for accelerator

• multithreaded/non-blocking: CPU continues to execute along with accelerator, CPU must have useful work to do, software must support multi-threading

Accelerator/CPU Interface Issues

● Synchronization• via interrupts• via special data and control registers at accelerator

● Data transfer to main memory• assisted by DMA or special logic within the accelerator• caching problems as CPU works on cache and

accelerator works on main memory (declare data area as non-cacheable, invalidate cache after transfer, write through cache, …)

System Design Issues● Hardware/software co-design

• meeting system-level objectives by exploiting the synergism of hardware and software through their concurrent design.

• joint design of hardware and software architectures.

• Co-design problems have different flavor according to the application domain, implementation technology and design methodology

● Design a heterogeneous multiprocessor architecture

• Communication (bus, network, …)• Memory architecture• Interfaces and I/O

• Processing elements (CPU, application-specific integrated circuit, FPGA (Field-programmable gate array))

● Program the system

Overheads for Computers as Components 13

Networking for Embedded Systems

● Why we use networks.● Network abstractions.● Example networks.


Network elements

PEPE

PE

network

communication link

distributed computing platform:

PEs may be CPUs or ASICs.


Networks in embedded systems

PEPE sensor

PE actuator

initial processing

more processing


Why distributed?

● Higher performance at lower cost.● Physically distributed activities---time

constants may not allow transmission to central site.

● Improved debugging---use one CPU in network to debug others.

● May buy subsystems that have embedded processors.


Hardware architectures

● Many different types of networks:● topology;● scheduling of communication;● routing.

18

Point-to-point networks

● One source, one or more destinations, no data switching (serial port):

PE 1 PE 2 PE 3

link 1 link 2


Bus networks

● Common physical connection:

PE 1 PE 2 PE 3 PE 4

header address data ECC packet format


Bus arbitration

● Fixed: Same order every time.● Fair: every PE has the same access over long

periods.● round-robin: rotate top priority among Pes.

A,B,C A,B,C

fixed

round-robin

A B C A B C

A B C AB C


Multi-stage networks

● Use several stages of switching elements.● Often blocking.● Often smaller than crossbar.


I2C bus

● Designed for low-cost, medium data rate applications.

● Characteristics:● serial;● multiple-master;● fixed-priority arbitration.

● Several micro-controllers come with built-in I2C controllers.

23

The CAN Bus

● Originally designed for automotive electronics● Now used for other applications as well● Bit serial transmission, 500 Kb/s, over twisted pair,up to 40 m● Synchronous, nodes synchronize themselves by listening to

the bit transitions on the bus● Arbitration by using Carrier Sense Multiple Access with

Arbitration on Message Priority (CSMA/AMP)● For error handling a special error frame and an overload

frame are used as well as acknowledgements

lecture 6

Documents

main memory

embedded systems

performance estimation

hardware acceleration

bus arbitration

hardware accelerators

accelerator

system