multi layer ahb protocol.doc

Abstract

The multilayer advanced high-performance bus (ML-AHB) busmtrix employs slave-side

arbitration. Slave-side arbitration is different from master-side arbitration in terms of request and

grant signals since, in the former, the master merely starts a burst transaction and waits for the

slave response to proceed to the next transfer. Therefore, in the former, the unit of arbitration can

be a transaction or a transfer. However, the ML-AHB busmatrix of ARM offers only transfer-

based fixed-priority and round-robin arbitration schemes.

In this project, the design a flexible arbiter for the ML-AHB busmatrix to support three

priority policies—fixed priority, round robin, and dynamic priority—and three data multiplexing

modes—transfer, transaction, and desired transfer length. In total, there are nine possible

arbitration schemes. The proposed arbiter, which is self-motivated (SM), selects one of the nine

possible arbitration schemes based upon the priority-level notifications and the desired transfer

length from the masters so that arbitration leads to the maximum performance.

In this project, a flexible arbiter based on the SM arbitration scheme for the ML-AHB

busmatrix will be designed and also arbiter should supports three priority policies-fixed priority,

round-robin, and dynamic priority-and three approaches to data multiplexing- transfer,

transaction, and desired transfer length; in other words, there are nine possible arbitration

schemes. In addition, the SM arbiter design should selects one of the nine possible arbitration

schemes based on the priority-level notifications and the desired transfer length from the masters

to allow the arbitration to lead to the maximum performance. The design is simulated using

Modelsim and synthesized using Xilinx tools.

CHAPTER 1CHAPTER 1

INTRODUCTION

1.1 Introduction

The Advanced Microcontroller Bus Architecture (AMBA) was introduced by ARM Ltd

in 1996 and is widely used as the on-chip bus in System-on-a-chip (SoC) designs. AMBA is a

registered trademark of ARM Ltd. The first AMBA buses were Advanced System Bus (ASB)

and Advanced Peripheral Bus (APB). In its 2nd version, AMBA 2, ARM added AMBA High-

performance Bus (AHB) that is a single clock-edge protocol.

Figure 1.1 ARM soc Block Diagram

In 2003, ARM introduced the 3rd generation, AMBA 3, including AXI to reach even higher

performance interconnects and the Advanced Trace Bus (ATB) as part of the CoreSight on-chip

debugs and trace solution. These protocols are today the de-facto standard for 32-bit embedded

processors because they are well documented and can be used without royalties. Some

manufacturers utilize AMBA buses for non-ARM designs. As an example Infineon uses an

AMBA bus for the ADM5120 SoC based on the MIPS architecture.

The important aspect of a SoC is not only which components or blocks it houses, but also how they are interconnected. AMBA is a solution for the blocks to interface with each other.

Since its inception, the scope of AMBA has gone far beyond microcontroller devices, and is now widely used on a range of ASIC and SoC parts including applications processors used in modern portable mobile devices like smartphones.

1.2 Advanced High performance Bus (AHB) Protocol:

The AMBA advanced microcontroller bus architecture (AMBA) Specification defines an On-

Chip Communications standard for designing high performance embedded microcontrollers.

Three distinct buses are defined within the AMBA specification

• The Advanced High-Performance Bus (AHB)

• The Advanced System Bus (ASB)

• The Advanced Peripheral Bus (APB)

A test methodology is included with the AMBA specification which provides an infrastructure for

modular test and diagnostic access.

1.2.1 Advanced High Performance Bus (AHB):

The AMBA AHB is for high-performance, high clock frequency system modules.

The AHB acts as the high-performance system backbone bus. AHB supports the efficient

connection of processors, on-chip memories and off-chip external memory interfaces with low-

power peripheral macro cell functions. AHB is also specified to ensure ease of use in an efficient

design flow using synthesis and automated test techniques.

1.2.2 Advanced System Bus (ASB):

The AMBA ASB is for high-performance system modules.

AMBA ASB is an alternative system bus suitable for use where the high-performance features of

AHB are not required. ASB also supports the efficient Connection of processors, on-chip

memories and off-chip external memory interfaces with low-power peripheral macro cell

functions.

1.2.3 Advanced Peripheral Bus (APB):

The AMBA APB is for low-power peripherals. AMBA APB is optimized for minimal power

consumption and reduced interface complexity to support peripheral functions. APB can be used

in conjunction with either version of the system bus.

1.3 Objectives of the AMBA Specification:

The AMBA specification has been derived to satisfy four key requirements:

•To facilitate the right-first-time development of embedded microcontroller products with one or

more CPUs or signal processors.

•To be technology-independent and ensure that highly reusable peripheral and system macro

cells can be migrated across a diverse range of IC processes and be appropriate for full-custom,

standard cell and gate array technologies.

•To encourage modular system design to improve processor independence, providing a

development road-map for advanced cached CPU cores and the development of peripheral

libraries.

•To minimize the silicon infrastructure required to support efficient on-chip and off-chip

communication for both operation and manufacturing test.

1.4 A Typical AMBA-based Microcontroller:

An AMBA-based microcontroller typically consists of a high-performance system

backbone bus (AMBA AHB or AMBA ASB), able to sustain the external memory bandwidth, on

which the CPU, on-chip memory and other Direct Memory Access (DMA) devices reside. This

bus provides a high-bandwidth interface between the elements that are involved in the majority

of transfers. Also located on the high-performance bus is a bridge to the lower bandwidth APB,

where most of the peripheral devices in the system are located (see Figure 1.1).

Figure1.2: A typical AMBA AHB-based System

AMBA APB provides the basic peripheral macro cell communications infrastructure as a

secondary bus from the higher bandwidth pipelined main system bus such peripherals typically.

• Have interfaces which are memory-mapped registers

• Have no high-bandwidth interfaces

• Are accessed under programmed control

The external memory interface is application-specific and may only have a narrow data path, but

may also support a test access mode which allows the internal AMBA AHB, ASB and APB

modules to be tested in isolation with system-independent test sets.

1.5 Multi-layer AHB

THE ON-CHIP bus plays a key role in the system-on-a-chip (SoC) design by enabling

the efficient integration of heterogeneous system components such as CPUs, DSPs, application

specific cores, memories, and custom logic. Recently, as the level of design complexity has

become higher, SoC designs require a system bus with high bandwidth to perform multiple

operations in parallel. To solve the bandwidth problems, there have been several types of high-

performance on-chip buses proposed, such as the multilayer AHB (ML-AHB) busmatrix from

ARM, the PLB crossbar switch from IBM, and CONMAX from Silicore . Among them, the ML-

AHB busmatrix has been widely used in many SoC designs. This is because of the simplicity of

the AMBA bus of ARM, which attracts many IP designers, and the good architecture of the

AMBA bus for applying embedded systems with low power.

The ML-AHB busmatrix is an interconnection scheme based on the AMBA AHB

protocol, which enables parallel access paths between multiple masters and slaves in a system.

This is achieved by using a more complex interconnection matrix and gives the benefit of both

increased overall bus bandwidth and a more flexible system structure. In particular, the ML-

AHB busmatrix uses slave-side arbitration. Slave-side arbitration is different from master-side

arbitration in terms of request and grant signals since, in the former, the master merely starts a

burst transaction and waits for the slave response to proceed to the next transfer. Therefore, the

unit of arbitration can be a transaction or a transfer. The transaction-based arbiter multiplexes

the data transfer based on the burst transaction, and the transfer-based arbiter switches the data

transfer based on a single transfer. However, the ML-AHB busmatrix of ARM presents only

transfer-based arbitration schemes, i.e., transfer based fixed-priority and round-robin arbitration

schemes. This limitation on the arbitration scheme may lead to degradation of the system

performance because the arbitration scheme is usually dependent on the application

requirements; recent applications are likewise becoming more complex and diverse. By

implementing an efficient arbitration scheme, the system performance can be tuned to better suit

applications. For a high-performance on-chip bus, several studies related to the arbitration

scheme have been proposed, such as table-lookup-based crossbar arbitration, two-level time-

division multiplexing (TDM) scheduling, token-ring mechanism , dynamic bus distribution

algorithm , and LOTTERYBUS. However, these approaches employ master-side arbitration.

Therefore, they can only control priority policy and also present some limitations when handling

the transfer-based arbitration scheme since master-side arbitration uses a centralized arbiter. In

contrast, it is possible to deal with the transfer-based arbitration scheme as well as the

transaction- based arbitration scheme in slave-side arbitration. In this paper, we propose a

flexible arbiter based on the self-motivated (SM) arbitration scheme for the ML-AHB busmatrix.

Our SM arbitration scheme has the following advantages: 1) It can adjust the processed data unit;

2) it changes the priority policies during runtime; and 3) it is easy to tune the arbitration scheme

according to the characteristics of the target application. Hence, our arbiter is able to not only

deal with the transfer-based fixed-priority, round-robin, and dynamic-priority arbitration

schemes but also manage the transaction-based fixed-priority, round-robin, and dynamic-priority

arbitration schemes. Furthermore, our arbiter provides the desired-transfer-length-based fixed-

priority, round-robin, and dynamic-priority arbitration schemes. In addition, the proposed SM

arbiter selects one of the nine possible arbitration schemes based on the priority-level

notifications and the desired transfer length from the masters to ensure that the arbitration leads

to the maximum performance.

Multi-layer AHB is an interconnection scheme, based on the AHB protocol, which enables

parallel access paths between multiple masters and slaves in a system. This is achieved by using

a more complex interconnection matrix.

Key advantages are:

• You can develop multi-master systems with an increased available bus bandwidth.

• You can construct complex multi-master systems that have a flexible architecture. This

removes the requirement to fix design decisions about the allocation of system resources to

particular masters at the hardware design stage.

• You can use standard AHB master and slave modules without requiring modification.

• Each AHB layer can be very simple because it only has one master, so no arbitration or master-

to-slave muxing is required. These layers can use the AHB-Lite protocol, meaning that they do

not have to support request and grant, or retry and split transactions.

• The arbitration effectively becomes point arbitration at each peripheral and is only necessary

when more than one master wants to access the same slave simultaneously.

• The only hardware you have to add to the standard AHB transport infrastructure is the

multiplexor block to connect the multiple masters to the peripherals.

• Because the multi-layer architecture is based on the existing AHB protocol, you can reuse

previously-designed masters and slaves without modification.

Figure 1.3 shows a block diagram of the basic multi-layer concept.

Figure 1.3 Basic multi-layer concept

1.6 APPLICATIONS

AMBA-AHB can be used in the different application and also it is technology

independent.

ARM Controllers are designed according to the specifications of AMBA.In the present

technology, high performance and speed are required which are convincingly met by

AMBA-AHB Compared to the other architectures AMBA-AHB is far more advanced

and efficient.

To minimize the silicon infrastructure to support on-chip and off-chip communications

Any embedded project which involve in ARM processors or microcontroller must always

make use of this AMBA-AHB as the common bus throughout the project.

CHAPTER 2CHAPTER 2

Literature survey

2.1 Introduction to Advanced High-performance Bus AHB

The AHB (Advanced High-performance Bus) is a high-performance bus in AMBA

(Advanced Microcontroller Bus Architecture) family. This AHB can be used in high clock

frequency system modules. The AHB acts as the high-performance system backbone bus. AHB

supports the efficient connection of processors, on-chip memories and off-chip external memory

interfaces with low-power peripheral macro cell functions. AHB is also specified to ensure ease

of use in an efficient design flow using automated test techniques. This AHB is a technology-

independent and ensure that highly reusable peripheral and system macro cells can be migrated

across a diverse range of IC processes and be appropriate for full-custom, standard cell and gate

array technologies.

2.2 FeaturesAMBA Advanced High-performance Bus (AHB) supports the following features.

High performance

Burst transfers

Split transactions

Single edge clock operation

SEQ, NONSEQ, BUSY, and IDLE Transfer Types

Programmable number of idle cycles

Large Data bus-widths - 32, 64, 128 and 256 bits wide

Address Decoding with Configurable Memory Map

2.3 MeritsSince AHB is a most commonly used bus protocol, it must have many advantages from

designer’s point of view and are mentioned below.

AHB offers a fairly low cost (in area), low power (based on I/O) bus with a moderate

amount of complexity and it can achieve higher frequencies when compared to others

because this protocol separates the address and data phases.

AHB can use the higher frequency along with separate data buses that can be defined to

128-bit and above to achieve the bandwidth required for high-performance bus

applications.

AHB can access other protocols through the proper bridging converter. Hence it supports

the bridge configuration for data transfer.

AHB allows slaves with significant latency to respond to read with an HRESP of

“SPLIT”. The slave will then request the bus on behalf of the master when the read data

is available. This enables better bus utilization.

AHB offers burst capability by defining incrementing bursts of specified length and it

supports both incrementing and wrapping. Although AHB requires that an address phase

be provided for each beat of data, the slave can still use the burst information to make the

proper request on the other side. This helps to mask the latency of the slave.

AHB is defined with a choice of several bus widths, from 8-bit to 1024-bit. The most

common implementation has been 32-bit, but higher bandwidth requirements may be

satisfied by using 64 or 128-bit buses.

AHB used the HRESP signals driven by the slaves to indicate when an error has

occurred.

AHB also offers a large selection of verification IP from several different suppliers. The

solutions offered support several different languages and run in a choice of environments.

Access to the target device is controlled through a MUX, thereby admitting bus-access to

one bus-master at a time.

AHB Masters, Slaves and Arbiters support Early Burst Termination. Bursts can be early

terminated either as a result of the Arbiter removing the HGRANT to a master part way

through a burst or after a slave returns a non-OKAY response to any beat of a burst.

However that a master cannot decide to terminate a defined length burst unless prompted

to do so by the Arbiter or Slave responses.

Any slave which does not use SPLIT responses can be connected directly to an AHB

master. If the slave does use SPLIT responses then a simplified version of the arbiter is

also required.

Thus the strengths of the AHB protocol is listed above which clearly resembles the

reason for the wide use of this protocol.

2.4 DemeritsEven though AHB protocol is commonly used bus in the design, it has some affordable

demerits which are listed below.

AHB cannot achieve full data bus utilization and bandwidth if some slaves have a

relatively high latency.

AHB defines transfer sizes of 1, 2, 4, 8, and 16 bytes. Because byte enables are not

defined, there are cases where multiple transfers must be made inside a single quadword.

AHB defines timing parameters for many of the relationships between signals on the bus.

However, these are not associated with requirements relative to a clock cycle. Therefore,

SoC developers must integrate AHB cores and run chip level static timing analysis to

judge how compatible AHB masters and slaves are with one another.

Power-based SoCs cover a wide range of applications, and there is a corresponding wide

range of address map requirements. Having the address decodes for all AHB slaves reside

within the interconnect means having to support the most complex split address ranges,

even for the simplest of slaves.

Thus the weakness of AHB protocol is mentioned above which can be tolerated with

respect to its useful advantages.

2.5 Block DiagramThe block diagram of the Advanced High-Performance Bus Protocol is shown in the

Figure 2.1.

Totally this block diagram comprises of four components.

Arbiter

Master

Slave

Decoder

2.5.1 Arbiter

The arbitration mechanism is used to ensure that only one master has access to the bus at

any one time. The arbiter performs this function by observing a number of different requests to

use the bus and deciding which is currently the highest priority master requesting the bus.

Figure 2.1 AMBA – AHB block diagram

2.5.2 Master

A bus master is able to initiate read and write information by providing address and

control information. Only one bus master can use the bus at the same time An AHB bus master

has the most complex bus interface in an AMBA system. Typically an AMBA system designer

would use predesigned bus masters and therefore would not need to be concerned with the detail

of the bus master interface. No provision is made within the AHB specification for a bus master

to cancel a transfer once it has commenced.

2.5.3 Slave

After a master has started a transfer, the slave then determines how the transfer should

progress. Whenever a slave is accessed it must provide a response which indicates the status of

the transfer. The HREADY signal is used to extend the transfer and this works in combination

with the response signal HRESP which provide the status of the transfer.

The slave can complete the transfer in a number of ways. It can:

Complete the transfer immediately

Signal an error to indicate that the transfer has failed

Delay the completion of the transfer, but allow the master and slave to back off the bus,

leaving it available for other transfers.

2.5.4 Decoder

The AHB decoder is used to decode the address of each transfer and provide a select

signal for the slave that is involved in the transfer. A central address decoder is used to provide a

select signal ‘HSELx’ for each slave on the bus. The select signal is a combinatorial decode of

the high-order address signals. A slave must only sample the address and control signals and

HSELx is asserted when HREADY is HIGH, indicating that the current transfer is completing.

2.6 Working of AHB

The AMBA AHB bus protocol is designed with a central multiplexor interconnection

scheme.

Using this scheme all bus masters drive out the address and control signals indicating the

transfer, they wish to perform and the arbiter determines which master has its address and control

signals routed to all of the slaves. Before which initially the master who needs to perform the

operation should give the request signal to the arbiter and the arbiter will give the grant signal to

the master for further proceedings. Similarly, a decoder is used to select the slave which has to

be active during the operation based on the address given by the master. A central decoder is also

required to control the read data and response signal multiplexor, which selects the appropriate

signals from the slave that is involved in the transfer. These make the read and write operation

smoothly.

Thus the working of AMBA AHB protocol is explained with the help of its block

diagram shown in Figure 2.1.

2.6 SpecificationThe following points should be considered when reading the AMBA specification

Technology independence

Electrical characteristics

Timing specification

2.6.1 Technology Independence

AMBA is a technology-independent on-chip protocol. The specification only details the

bus protocol at the clock cycle level.

2.6.2 Electrical Characteristics

No information regarding the electrical characteristics is supplied within the AMBA

specification as this will be entirely dependent on the manufacturing process technology that is

selected for the design.

2.6.3 Timing Specification

The AMBA protocol defines the behavior of various signals at the cycle level. The exact

timing requirements will depend on the process technology used and the frequency of operation.

Because the exact timing requirements are not defined by the AMBA protocol, the system

integrator is given maximum flexibility in allocating the signal timing budget amongst the

various modules on the bus.

2.7 AMBA Signals

All AMBA signals are named such that the first letter of the name indicates which bus the

signal is associated with. A lower case n in the signal name indicates that the signal is active

LOW, otherwise signal names are always all upper case.

Test signals have a prefix T regardless of the bus type.

2.7.1 AHB Signal Prefixes

‘H’ indicates an AHB signal.

For example, HREADY is the signal used to indicate that the data portion of an AHB

transfer can complete. It is active HIGH.

2.7.2 AMBA AHB Signal List:

All signals are prefixed with the letter H, ensuring that the AHB signals are differentiated

from other similarly named signals in a system design. The signals involved in designing the

AMBA AHB are listed in the Table 2.1 which also gives the specification of each signal.

Table 2.1 AMBA AHB signal specification

S.No. NAME WIDTH DRIVER FUNCTION

1 HCLK 1 Clock SourceThis clock times all bus transfers at the

rising edge of HCLK

2 HADDR 32 Master The system address bus of width 32-bit

3 HTRANS 2 MasterIndicates the type of the current

transfer happening

4 HWRITE 1 Master

When HIGH this signal indicates a

write transfer and when LOW a read

transfer

5 HSIZE 3 Master Indicates the size of the transfer

6 HBURST 3 MasterIndicates if the transfer forms part of a

burst.

7 HWDATA 8 Master

The write data bus is used to transfer

data from the master to the bus slaves

during write operations.

8 HSELx 1 Decoder

Each AHB slave has its own slave

select signal and this signal indicates

that the current transfer is intended for

the selected slave.

9 HRDATA 8 Slave

The read data bus is used to transfer

data from bus slaves to the bus master

during read operations.

10 HREADY 1 Slave

When HIGH the HREADY signal

indicates that a transfer has finished on

the bus. This signal may be driven

LOW to extend a transfer.

11 HRESP 2 Slave

The transfer response provides

additional information on the status of

a transfer

The table also includes the function of each signal and the source from which the each

signal is driven. The operation is performed in a synchronized clock frequency and hence the

signals should be changed with respect to the rising edge of the clock.

2.8 Overview of AMBA AHB Operation

Before an AMBA AHB transfer can commence the bus master must be granted access to

the bus. This process is started by the master asserting a request signal to the arbiter. Then the

arbiter indicates when the master will be granted use of the bus.

A granted bus master starts an AMBA AHB transfer by driving the address and control

signals. These signals provide information on the address, direction and width of the transfer, as

well as an indication if the transfer forms part of a burst. Two different forms of burst transfers

are allowed.

• Incrementing bursts, which do not wrap at address boundaries

• Wrapping bursts, which wrap at particular address boundaries

A write data bus is used to move data from the master to a slave, while a read data bus is

used to move data from a slave to the master.

Every transfer consists of:

• An address and control cycle

• One or more cycles for the data.

The address cannot be extended and therefore all slaves must sample the address during

this time. The data, however, can be extended using the HREADY signal. When LOW this

signal causes wait states to be inserted into the transfer and allows extra time for the slave to

provide or sample data.

During a transfer the slave shows the status using the response signals, HRESP OKAY.

The OKAY response is used to indicate that the transfer is progressing normally and when

HREADY goes HIGH this shows the transfer has completed successfully.

2.8.1 Address Decoding

A central address decoder is used to provide a select signal, HSELx, for each slave on the

bus. The select signal is a combinatorial decode of the high-order address signals, and simple

address decoding schemes are encouraged to avoid complex decode logic and to ensure high-

speed operation.

Figure 2.2 Decoder and Slave select signals

A slave must only sample the address and control signals and HSELx when HREADY is

HIGH, indicating that the current transfer is completing. Under certain circumstances it is

possible that HSELx will be asserted when HREADY is LOW, but the selected slave will have

changed by the time the current transfer completes.

In the case where a system design does not contain a completely filled memory map an

additional default slave should be implemented to provide a response when any of the

nonexistent address locations are accessed. Typically the default slave functionality will be

implemented as part of the central address decoder.

2.8.2 Slave Transfer Responses

After a master has started a transfer, the slave then determines how the transfer should

progress. No provision is made within the AHB specification for a bus master to cancel a transfer

once it has commenced.

Whenever a slave is accessed it must provide a response which indicates the status of the

transfer. The HREADY signal is used to extend the transfer and this works in combination with

the response signals, HRESP [1:0], which provide the status of the transfer. The slave can

complete the transfer by doing its transfer immediately.

2.8.3 AHB Decoder:

The decoder in an AMBA system is used to perform a centralized address decoding

function, which improves the portability of peripherals, by making them independent of the

system memory map.

Figure 2.3 AHB Decoder Interface Diagram

2.8.4 Arbitration

The arbitration mechanism is used to ensure that only one master has access to the bus at

any one time. The arbiter performs this function by observing a number of different requests to

use the bus and deciding which is currently the highest priority master requesting the bus. The

arbiter also receives requests from slaves that wish to complete SPLIT transfers.

Any slaves which are not capable of performing SPLIT transfers do not need to be aware

of the arbitration process, except that they need to observe the fact that a burst of transfers may

not complete if the ownership of the bus is changed.

Interface Diagram

Figure 2.4 AHB Arbiter Interface Diagram

The role of the arbiter in an AMBA system is to control which master has access to the

bus. Every bus master has a REQUEST/GRANT interface to the arbiter and the arbiter uses a

prioritization scheme to decide which bus master is currently the highest priority master

requesting the bus.

The detail of the priority scheme is not specified and is defined for each application. It is

acceptable for the arbiter to use other signals, either AMBA or non-AMBA, to influence the

priority scheme that is in use.

Signal Description

A brief description of each of the arbitration signals is given below.

HREQx

The bus request signal is used by a bus master to request access to the bus. Each bus

master has its own HBUSREQx signal to the arbiter and there can be up to 16 separate bus

masters in any system.

HGRANTx

The grant signal is generated by the arbiter and indicates that the appropriate master is

currently the highest priority master requesting the bus, taking into account locked transfers and

SPLIT transfers.

A master gains ownership of the address bus when HGRANTx is HIGH and HREADY is

HIGH at the rising edge of HCLK.

HMASTER

The arbiter indicates which master is currently granted the bus using the HMASTER[3:0]

signals and this can be used to control the central address and control multiplexer.

2.8.5 Requesting Bus Access

A bus master uses the HREQx signal to request access to the bus and may request the bus

during any cycle. The arbiter will sample the request on the rising of the clock and then use an

internal priority algorithm to decide which master will be the next to gain access to the bus.

Normally the arbiter will only grant a different bus master when a burst is completing.

However, if required, the arbiter can terminate a burst early to allow a higher priority master

access to the bus.

When a master is granted the bus and is performing a fixed length burst it is not

necessary to continue to request the bus in order to complete the burst. The arbiter observes the

progress of the burst and uses the HBURST[2:0] signals to determine how many transfers are

required by the master. If the master wishes to perform a second burst after the one that is

currently in progress then it should re-assert the request signal during the burst. If a master loses

access to the bus in the middle of a burst then it must re-assert the HREQx request line to regain

access to the bus.

For undefined length bursts the master should continue to assert the request until it has

started the last transfer. The arbiter cannot predict when to change the arbitration at the end of an

undefined length burst. It is possible that a master can be granted the bus when it is not

requesting it. This may occur when no masters are requesting the bus and the arbiter grants

access to a default master. Therefore, it is important that if a master does not require access to the

bus it drives the transfer type HTRANS to indicate an IDLE transfer.

2.8.6 Granting Bus Access

The arbiter indicates which bus master currently the highest priority is requesting the bus

by asserting the appropriate HGRANTx signal. When the current transfer completes, as indicated

by HREADY HIGH, then the master will become granted and the arbiter will change the

HMASTER [3:0] signals to indicate the bus master number.

The arbiter changes the HGRANTx signals when the penultimate (one before last)

address has been sampled. The new HGRANTx information will then be sampled at the same

point as the last address of the burst is sampled.

Figure 2.5 Bus Master Grant Signals

Because a central multiplexer is used, each master can drive out the address of the

transfer it wishes to perform immediately and it does not need to wait until it is granted the bus.

The HGRANTx signal is only used by the master to determine when it owns the bus and hence

when it should consider that the address has been sampled by the appropriate slave. A delayed

version of the HMASTER bus is used to control the write data multiplexer.

2.8.7 Default Bus Master

Every system must include a default bus master which is granted the bus if all other

masters are unable to use the bus. When granted, the default bus master must only perform IDLE

transfers. If no masters are requesting the bus then the arbiter may either grant the default master

or alternatively it may grant the master that would benefit the most from having low access

latency to the bus.

Granting the default master access to the bus also provides a useful mechanism for

ensuring that no new transfers are started on the bus and is a useful step to perform prior to

entering a low-power mode of operation.

2.8.8 AHB Data Bus Width:

One way to improve bus bandwidth without increasing the frequency of operation is to

make the data path of the on-chip bus wider. Both the increased layers of metal and the use of

large on-chip memory blocks (such as Embedded DRAM) are driving factors which encourage

the use of wider on-chip buses.

Specifying a fixed width of bus will mean that in many cases the width of the bus is not

optimal for the application. Therefore an approach has been adopted which allows flexibility of

the width of bus, but still ensures that modules are highly portable between designs.

The protocol allows for the AHB data bus to be 8, 16, 32, 64, 128, 256, 512 or 1024-bits

wide. However, it is recommended that a minimum bus width of 32 bits is used and it is expected

that a maximum of 256 bits will be adequate for almost all applications.

For both read and write transfers the receiving module must select the data from the correct byte

lane on the bus. Replication of data across all byte lanes is not required.

2.8.9 AHB Bus Master

An AHB bus master has the most complex bus interface in an AMBA system. Typically

an AMBA system designer would use pre designed bus masters and therefore would not need to

be concerned with the detail of the bus master interface.

Here the Arbiter signals and the transfer signals were mentioned in the below data flow

AHB bus master interface diagram.

Figure 2.6 AHB Bus Master Interface Diagram

2.8.10 AHB Bus Slave

An AHB bus slave responds to transfers initiated by bus masters within the system. The

slave uses a HSELx select signal from the decoder to determine when it should respond to a bus

transfer. All other signals required for the transfer, such as the address and control information,

will be generated by the bus master.

Figure 2.7 AHB Bus Slave Interface

Hence all the signals involved in the slave and decoder were mentioned in the above

AHB bus slave interface diagram.

2.9 Multi Layered Advanced High performance Bus protocol (ML-AHB)

In the simplest implementation of a multi-layer system, each master has its own AHB Layer and

is connected to the slave devices by an interconnect matrix, as shown in Figure 2.8

Figure 2.8 Multi-layer interconnect topology

Within the interconnect matrix:

1. Every layer has a Decode stage that determines which slave is required for a transfer.

2. A mux routes the transfer from the appropriate layer to the required slave. If two layers require

access to the same slave at the same time, the arbitration within the interconnect matrix must

determine which layer has highest priority. The layer that is not given access is waited using

HREADY until it is given access to the required slave. When a layer is waited an Input Stage is

used to store a copy of the pipelined address and control information until the access to the

shared slave is given. Each slave port has its own arbitration and a number of different schemes

can be used.

For example:

• Input layers can be serviced in round-robin manner, changing every transfer or every

burst

• The arbitration can use a fixed priority scheme where certain high priority layers are

always given access in preference to lower priority layers.

The number of input/output ports on the interconnect matrix is completely flexible and can be

adapted to suit the system requirements.

2.10 Bus configurations

As the number of masters and slaves in a system increases, the complexity of the interconnect

matrix can become significant. You can use the following techniques to optimize the system

architecture:

• Figure 2.9 shows how you can make slaves local, or private, to a particular layer. This reduces

the complexity of the interconnect matrix. By using this when it is acceptable that a slave can

only be accessed by masters on the same layer.

Figure 2.9 Local slaves

• Figure 2.10 shows how you can make multiple slaves appear as a single slave to the

interconnect matrix. This is useful to combine a number of low-bandwidth slaves. You can also

use it where a set of slaves are normally accessed by just one master, such as a DMA controller,

and the interconnect matrix is used only to give access to other masters under special

circumstances, such as debugging the system.

Figure 2.10 multiple slaves on one slave port

Figure 2.11 multiple masters on one layer

• In the simplest implementation of a multi-layer system, each master has its own layer. Figure

2.11 shows how you can also build a system where multiple masters share a layer. This is well

suited to combine masters that have low-bandwidth requirements or masters, such as the Test

Interface Controller (TIC), that have particular characteristics.

• Each layer can be a complete AHB subsystem, with the interconnect matrix being used to

enable communication between the two subsystems. The example in Figure 2.12 shows only a

single slave is shared. Typically this is an on-chip memory slave that is used as a buffer area

between the two subsystems.

Figure 2.12 Separate AHB subsystems

2.12 Multi-port slaves

In a multi-layer AHB system certain slaves, such as an SDRAM controller, are able to

operate more efficiently by processing transfers from different layers in parallel. Figure 2.13

shows how you can do this by designing the slave with multiple AHB slave ports.

Figure 2.13 Multi-port slaves

2.9 Summary

The literature survey is carried out with merits and demerits of AHB and the signal flow

diagram is identified.

The specification for the signals shown in the signal flow diagram is identified and its

working is explained with the help of its block diagram.

The discussion on the overview of the AMBA-AHB operation was made which includes

all the components involved in the AHB

The ML AHB bus configurations and the multi layer AHB bus matrix interconnection

scheme are studied.

.

\

CHAPTER 3 CHAPTER 3

Design of Self-Motivated Arbitration Scheme for the Multilayer AHB Bus matrix

The ML-AHB bus matrix of ARM consists of the input stage, decoder,

and output stage, including an arbiter Fig. 1 shows the overall structure of

the ML-AHB bus matrix of ARM.

Fig.3.1 Overall structure of the ML-AHB bus matrix of ARM

The input stage is responsible for holding the address and control

information when transfer to a slave is not able to commence immediately.

The decoder determines which slave that a transfer is destined for. The

output stage is used to select which of the various master input ports is

routed to the slave. Each output stage has an arbiter. The arbiter determines

which input stage has to perform a transfer to the slave and decides which

the highest priority is currently. The ML-AHB bus matrix employs slave-side

arbitration, in which the arbiters are located in front of each slave port, as

shown in Fig. 1; the master simply starts a transaction and waits for the

slave response to proceed to the next transfer. Therefore, the unit of

arbitration can be a transaction or a transfer. However, the ML-AHB

busmatrix of ARM furnishes only transfer-based arbitration schemes,

specifically transfer-based fixed-priority and round-robin arbitration schemes.

The transfer-based fixed-priority (round-robin) arbiter multiplexes the data

transfer based on a single transfer in a fixed-priority or round-robin fashion.

SM ARBITRATION SCHEME FOR THE ML-AHB BUSMATRIX

An assumption is made that the masters can change their priority level

and can issue the desired transfer length to the arbiters in order to

implement a SM arbitration scheme. This assumption should be valid

because the system developer generally recognizes the features of the

target applications. For example, some masters in embedded systems are

required to complete their job for given timing constraints, resulting in the

satisfaction of system-level timing constraints. The computation time of each

master is predictable, but it is not easy to foresee the data transfer time

since the on-chip bus is usually shared by several masters. Previous works

solved this issue by minimizing the latencies of several latency-critical

masters, but a side effect of these methods is that they can increase the

latencies of other masters; hence, they may violate the given timing

constraints. Unlike existing works, our scheme can keep the latency close to

its given constraint by adjusting the priority level and transfer length of the

masters. Fig. 3.2 shows an example. In this example, the service latencies

(latency-limit times) of M1, M2, and M3 are 4, 8, and 2 cycles (T14, T8, and

T10), respectively. The requests for three masters are all initiated at T0, and

M3 is the most latency-sensitive master. Fig.3.2(a) shows an arbitration

scheme that does not use latency constraints for arbitration.

Therefore, M2 and M3 violate the latency constraint as the masters are

selected in ascending order. Only M1 meets the constraint. Fig.3.2(b) shows

the scheduling of a typical latency- minimizing arbiter. It minimizes the

latency of the most latency-sensitive module, namely, M3, causing M2 to

violate its constraint. Although neither of these two arbitration schemes can

meet the latency constraints for all three masters, in the SM arbitration

shown in Fig.3.2(c), all masters use the bus with no violations by configuring

the priority levels (transfer lengths) of M1, M2, and M3 as the lowest,

highest, and intermediate priorities (4, 8, and 2), respectively.

Fig. 3.2. Arbitration scheme examples in an embedded system. (a)

Arbitration scheme with no consideration of the latency constraint. (b)

Arbitration scheme minimizing latency. (c) SM arbitration scheme.

A 32-b address bus of the masters is used to inform the arbiters of the

priority level and the desired transfer length of the masters. Fig. 3 shows the

decoding information for our address bus.

Fig.3.3. Decoding information of the 32-b address bus.

In Fig.3.3, S_Number indicates the target slave number, P_Level means

the priority level of a master, T_Length denotes the desired transfer length of

a master, and Offset_Add specifies the internal address of the target slave.

Each of S_Number and P_Level consists of 3 b because the maximum

number of master–slave sets is 8x8. Also, T_Length is composed of 4 b

because the maximum number of burst lengths is 16. Although we used 7 b

for P_Level and T_Length in the 32-b address bus to notify the arbiters of the

priority level and the desired transfer length of a master, we consider it

adequate to express the internal address of a slave because the range of

Offset_Add is from 0 to 222-1. Through the aforementioned assumption, the

priority level and transfer length can then be changed by the SM demand of

each master.

Figure 3.4. Internal structure of our arbiter.

Fig.3. 4 shows the internal structure of our arbiter based upon the SM

arbitration scheme.

In Fig.3.4, the NoPort signal means that none of the masters must be

selected and that the address and control signals to the shared slave must

be driven to an inactive state, while Master No. indicates the currently

selected master number generated by the controller for the SM arbitration

scheme. In general, our arbiter consists of an RR block, a P block, two

multiplexers, a counter, a controller, and two flip-flops. MUX_1 and MUX_2

are used to select the arbitration scheme and the desired transfer length of a

master, respectively. A counter calculates the transfer length, with two flip-

flops being inserted to avoid the attempts by the critical path to arbitrate. An

RR block (P block) performs the round-robin- or priority-based arbitration

scheme.

A controller compares the priority levels of the requesting masters. If

the masters have equal priorities, the controller selects the round-robin

arbitration scheme (RR block); in other cases, it chooses the priority

arbitration scheme (P block). The controller also makes the final decision on

the master for the next transfer based on the transfer length of the selected

master.

The control process follows the following three steps.

1) If HMASTLOCK is asserted, the same master remains selected.

2) If HMASTLOCK is not asserted and the currently selected master does not

exist, the following hold.

a) If no master is requesting access, the NoPort signal is asserted.

b) Otherwise, a new master for the next transfer is initially selected. If the

masters have equal priorities, the round-robin arbitration scheme is selected;

otherwise, the priority arbitration scheme is chosen. In addition, the counter

is updated based on the transfer length of the selected master.

3) If none of the previous statements applies, the following hold.

a) If the counter is expired, the following hold.

i) If the requesting masters do not exist, the No-Port signal is updated based

on the HSEL signal of the currently selected master. If the HSEL signal is “1,”

the same master remains selected, and the NoPort signal is deasserted.

Otherwise, the NoPort signal is asserted.

ii) Otherwise, a master for the next transfer is selected based on the priority

levels of the requesting masters. Also, the counter is updated.

b) If the counter is not expired, and the HSEL signal of the current master is

“1,” the same master remains selected, and the counter is decreased.

c) If the currently selected master completes a transaction before the

counter is expired, the following hold.

i) If the requesting masters do not exist, the No-Port signal is asserted.

ii) Otherwise, a master for the next transfer is chosen based on the priority

levels of the requesting masters, and the counter is updated.

The SM arbitration scheme is achieved through iteration of the

aforementioned steps. Combining the priority level and the desired transfer

length of the masters allows our arbiter to handle the transfer-based fixed-

priority, round-robin, and dynamic-priority arbitration schemes (abbreviated

as the FT, RT, and DT arbitration schemes, respectively), as well as the

transaction-based fixed-priority, round-robin, and dynamic-priority arbitration

schemes (abbreviated as the FR, RR, and DR arbitration schemes,

respectively). Moreover, our arbiter can also deal with the desired-transfer-

length-based fixed-priority, round-robin, and dynamic-priority arbitration

schemes (abbreviated as the FL, RL, and DL arbitration schemes,

respectively).

The transfer- or transaction-based arbiter switches the data transfer based

upon a single transfer (burst transaction), and the desired-transfer-length-

based arbiter multiplexes the data transfer based on the transfer length assigned by

the masters.

Fig. 3.5 shows the internal process of an RR block. Initially, we create the up- and down-mask

vectors (Up_Mask and Dn_Mask) based on the number of currently selected masters, as shown

in Fig. 3.5.

Fig.3.5. Internal process of the RR block.

We then generate the up- or down-masked vector created through bitwise AND-ing

operation between the mask vectors Fig. 3.5. Internal process of the RR block. And the requested

master vector. After generating the up- and down-masked vectors, we examine each masked

vector as to whether they are zero or not. If the up-masked vector is zero, the down-masked

vector is inserted to the input parameter of the round-robin function; if it is not zero, the up-

masked vector is the one inserted. A master for the next transfer is chosen by the round-robin

function, and the current master is updated after 1 clock cycle. The RR block is then performed

by repeating the arbitration procedure shown in Fig.3. 5.

Fig. 3.6 shows the internal procedure of the P block. First of all, we create the highest

priority vector (V) through the round-robin function of Fig. 3.6. After generating the highest

priority vector (V), the priority-level vectors and the highest priority vector (V) are inserted to

the input parameters of the priority function. The master with the highest priority is chosen by

the priority function, while the current master is updated after 1 clock cycle.

Fig. 3.6. Internal procedure of the P block.

FUTURE SCOPE

Presently there is much other architecture along with AMBA ML -AHB, but with its high

performance, speed and reliability its usage is widely increasing.

The AMBA-AHB architecture is now only confined to ARM Company, but due to

AMBA ML- Self Motivated AHB advantages, many other companies are moving towards

implementing this architecture.

Presently the AMBA ML- Self Motivated AHB is using the 1024 bit transfer per clock period

but this rate can further increased with the improving technology.

The configurations of the SM arbitration scheme with the maximum

performance need to be found automatically during run time.

In future The AMBA ML- Self Motivated AHB may applicable to AMBA AXI.

CONCLUSION

In this paper, the proposed a flexible arbiter based on the SM arbitration scheme for the

ML-AHB bus matrix. This arbiter supports three priority policies-fixed priority, round-robin, and

dynamic priority-and three approaches to data multiplexing- transfer, transaction, and desired

transfer length; in other words, there are nine possible arbitration schemes. In addition, the

proposed SM arbiter selects one of the nine possible arbitration schemes based on the priority-

level notifications and the desired transfer length from the masters to allow the arbitration to lead

to the maximum performance.

This design proposed ML- AHB SM arbitration schemes increases area than the other

arbitration schemes in ML-AHB, but ML-AHB SM arbitration scheme gives the better

performance when it selects the input stage and output stage in self motivated manner. Therefore

expect that it would be better to apply our SM arbitration scheme to an application- specific

system because it is easy to tune the arbitration scheme according to the features of the target

system.

REFERENCES

[1] M. Drinic, D. Kirovski, S. Megerian, and M. Potkonjak, “Latencyguided on-chip bus-

network design,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 25, no. 12, pp.

2663–2673, Dec. 2006.

[2] S. Y. Hwang, K. S. Jhang, H. J. Park, Y. H. Bae, and H. J. Cho, “An ameliorated design

method of ML-AHB busmatrix,” ETRI J., vol. 28, no. 3, pp. 397–400, Jun. 2006.

[3] ARM, “AHB Example AMBA System,” 2001 [Online]. Available:

http://www.arm.com/products/solutions/AMBA_Spec.html

[4] IBM, New York, “32-bit Processor Local Bus Architecture Specification,” 2001.

[5] R. Usselmann, “WISHBONE interconnect matrix IP core,” Open- Cores, 2002. [Online].

Available: http://www.opencores.org/ ?do=project=wb_conmax

[6] N.-J. Kim and H.-J. Lee, “Design of AMBA wrappers for multipleclock operations,” in Proc.

Int. Conf. ICCCAS, Jun. 2004, vol. 2, pp. 1438–1442.

[7] D. Flynn, “AMBA: Enabling reusable on-chip designs,” IEEE Micro, vol. 17, no. 4, pp. 20–

27, Jul./Aug. 1997. 1http://www.arm.com/products/solutions/axi_spec.html, accessed Feb. 2008

[8] S. Y. Hwang, H.-J. Park, and K.-S. Jhang, “Performance analysis of slave-side arbitration

schemes for the multi-layer AHB busmatrix,” J. KISS, Comput. Syst. Theory, vol. 34, no. 5, pp.

257–266, Jun. 2007.

http://www.opencores.org/

[9] S. S. Kallakuri and A. Doboli, “Customization of arbitration policies and buffer space

distribution using continuous-time Markov decision processes,” IEEE Trans. Very Large Scale

Integr. (VLSI) Syst., vol. 15, no. 2, pp. 240–245, Feb. 2007.

[10] D. Seo and M. Thottethodi, “Table-lookup based crossbar arbitration for minimal-routed,

2D mesh and torus networks,” in Proc. Int. Conf. IPDPS, Mar. 2007, pp. 1–10.

[11] K. Lahiri, A. Raghunathan, and S. Dey, “Performance analysis of systems with multi-

channel communication architectures,” in Proc. Int. Conf. VLSI Design, Jan. 2000, pp. 530–537.

[12] J. Turner and N. Yamanaka, “Architectural choices in large scale ATM switches,” IEICE

Trans. Commun., vol. E-81B, no. 2, pp. 120–137, Feb. 1998.

[13] C. H. Pyoun, C. H. Lin, H. S. Kim, and J. W. Chong, “The efficient bus arbitration scheme

in SoC environment,” in Proc. Int. Conf. SoC Real-Time Appl., Jul. 2003, pp. 311–315.

[14] K. Lahiri, A. Raghunathan, and G. Lakshminarayana, “The LOTTERYBUS on-chip

communication architecture,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 14, no. 6,

pp. 596–608, Jun. 2006.

[15] J. H. Han, M. Y. Lee, B. Younghwan, and C. Hanjin, “Application specific processor design

for H.264 decoder with a configurable embedded processor,” ETRI J., vol. 27, no. 5, pp. 491–

496, Oct. 2005. [16] M. Jun, K. Bang, H.-J. Lee, N. Chang, and E.-Y. Chung, “Slack-based bus

arbitration scheme for soft real-time constrained embedded systems,” in Proc. Int. Conf. ASP-

DAC, Jan. 2007, pp. 159–164.

[17] S. Y. Hwang, H. J. Park, and K. S. Jhang, An Efficient Implementation Method of Arbiter for

the ML-AHB Busmatrix. Berlin, Germany: Springer-Verlag, May 2007, vol. 4523, LNCS, pp.

229–240.

[18] E.-G. Jeong, J.-G. Lee, K.-S. Jhang, J.-A. Lee, and D. Har, “Asynchronous layered interface

of multimedia socs for multiple outstanding transactions,” J. VLSI Signal Process. Syst., vol. 46,

no. 2/3, pp. 133–151, Mar. 2007.

[19] S. Y. Hwang, H. J. Park, and K. S. Jhang, “An implementation and performance analysis of

slave-side arbitration schemes for the ML-AHB busmatrix,” in Proc. Int. Conf. ACM Symp.

Appl. Comput., Mar. 2007, vol. 2, pp. 1545–1551.

multi layer ahb protocol.doc

Documents

possible arbitration

amba highperformance

unit of arbitration

chip bus

amba buses

scope of amba

ahb busmatrix of arm

desired transfer length