Transcript
Page 1: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

1

Computer System Structures

cz:Struktury počítačových systémů

Lecturer: Richard Šusta

ČVUT-FEL in Prague, CR – subject A0B35SPS

Version: 1.0

Microprocessors

2

Page 2: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

2

Definition: Microprocessor

A Microprocessor is a Single VLSI Chip that has a CPU and may also have some Other Units (for example, caches, floating point processing arithmetic unit, pipelining, and super-scaling units)

Source Microprocessor Family

Motorola 68HCxxx

Intel 80x86 & i860

Sun SPARC

IBM PowerPC

Source: Raj Kamal, Embedded Systems: Architecture, Programming and Design 3

Existovalo a dodnes existuje mnoho typů od různých výrobců

http://research.microsoft.com/en-us/um/people/gbell/CyberMuseum_contents/Microprocessor_Evolution_Poster.jpg

Page 3: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

3

From 8 bit to 32 bit CISC processors

8 bit µ-processor 8008 8080 8085 Z80

Year 1972 1974 1976 1976

Instructions 66 111 113 158

MIPS 0.06 0.29 0.37 0.58

Minimum system (IO circuits) 20-40 7 3 3

16 and 32 bit

µ-processor 8086/88 80286 80386 80486 Pentium

64bit example: Intel

I5-2500K Quad core

Year 1978/79 1982 1985-91 1989-94 1993-98 2011

Instructions processor

+ numeric cooprocessor 133 ~175 ~195 ~200 ~225 ~500

MIPS 0.33-0.75 0.9-2.6 2.5-11 16-70 100-300 ~80000

Memory Physical / Virtual 1 MB 16 MB/1 GB 4 GB / 64TB 32 GB / 256 TB

Definition: Microcontroller

A microcontroller is a single-chip VLSI unit which, though having limited computational capabilities possesses enhanced input-output capabilities and a number of on-chip functional units

Source Microcontroller Family

Motorola 68HC11xx, HC12xx, HC16xx

Intel 80x86, 8051, 80251

Microchip PIC 16F84, 16F876, PIC18

ARM ARM9, ARM7

Source: Raj Kamal, Embedded Systems: Architecture, Programming and Design 6

Page 4: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

4

Microcontroller

Microcontroller

(uC)

Outp

ut in

terfaces

Input in

tefaces

In device1

In device2

In device3

Out device1

Out device2

Auxiliary

units

7

Example: “Original” 8051 Microcontroller

Oscillator

and timing

4096 Bytes

Program

Memory

128 Bytes

Data

Memory

Two 16 Bit

Timer/Event

Counters

8051

CPU

64 K Byte Bus

Expansion

Control

Programmable

I/O

Programmable

Serial Port Full

Duplex UART

Synchronous Shifter

Internal data bus

External interrupts

subsystem interrupts

Control Parallel ports

Address Data Bus

I/O pins Serial Input

Serial Output

Source: Raj Kamal, Embedded Systems: Architecture, Programming and Design 8

Page 5: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

5

SPS 9

Hardcore/Softcore Processors

Hard-core Processors

A Fabricated Integrated Circuit that may or

may not be Embedded into Additional Logic

Soft-core Processors

A Processor Described in a HDL that is

implemented in Reconfigurable Logic (ie, in

an FPGA), e.g NIOS, MicroBlaze, OpenRISC

SPS 10

Soft-core Processors Logic Elements

192

200

204

390

510

1170

1324

1800

1928

1984

2536

2600

3500

5000

6000

37000

60000

100 1000 10000 100000

Xilinx PicoBlaze

LatticeMico8

PacoBlaze

Nios II/e

DSPuva16

Nios II/s

Xilinx MicroBlaze

Nios II/f

OpenFire

LatticeMico32

aeMB

ARM Cortex-M1

LEON3

LEON2

OpenRISC 1200

S1 Core

Ultra SPARC S1 Core/f

Su

n M

icro

syste

ms

Op

en

so

urc

e

Page 6: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

6

Copyright © 2005 Altera Corporation

Nios II: RISC soft-core processor

Copyright © 2005 Altera Corporation

Nios II Versions

Nios II Processor Comes In Three ISA (Instruction Set Architecture) Compatible Versions

Software Code is Binary Compatible

No Changes Required When CPU is Changed

FAST: Optimized for Speed

STANDARD: Balanced for Speed and

Size

ECONOMY: Optimized for Size

12

Page 7: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

7

Copyright © 2005 Altera Corporation

Nios II: Hard Numbers

Nios II/ f Nios II /s Nios II /e

Stratix II 200 DMIPS @

175MHz

1180 LEs

90 DMIPS @

175MHz

800 LEs

28 DMIPS @

190MHz

400 LEs

Cyclone 100 DMIPS @

125MHz

1800 LEs

62 DMIPS @

125MHz

1200 LEs

20 DMIPS @

140MHz

550 LEs

DMIPS = Dhrystone benchmark MIPS (million instructions per second)

Dhrystone is without float point operations

(Note: Whetstone [cz brousek] has float points)

Comparing with Pentium (1996) 224 DMIPS @ 200MHz

cca 196 DMIPS@175MHz - cca 140 DMIPS@125 MHz

13

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off.

and Altera marks in and outside the U.S.

Processor Core Variations Nios II /f

Fast Nios II /s Standard

Nios II /e Economy

Pipeline 6 Stage 5 Stage None

H/W Multiplier &

Barrel Shifter 1 Cycle 3 Cycle

Emulated

In Software

Branch Prediction Dynamic Static None

Instruction Cache Configurable Configurable None

Data Cache Configurable None None

Logic Requirements

(Typical LEs)

1800 w/o MMU

3200 w/ MMU 1200 600

Custom

Instructions Up to 256

(MMU - Memory Management Unit)

14

Page 8: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

8

SPS 15 15

Mutlicycle CPU

Instr.

decode/reg.

fetch/branch

addr.

ALU

operation

Read

memory

data

Instr.

fetch/

adv. PC

load

store

Reg Branch

Jump

Start

State 0 1

2 3

4 5

6

7 8 9

Compute

memory

addr.

Write

jump addr.

to PC

Write

register

Write

memory

data

Write

register

Write PC

on branch

condition

SPS 16

Different Execution Time of Instructions

instruction

memory

data

memory registers registers ALU

P C

Compute Not used

Not used

Not used

P C

Jump Not used

Not used

Not used

Not used

P C

Branch Not used

P C

Branch Not used

Not used

Not used

P C

Load

Not

P C

Store Not used

Fastest

Slowest

Page 9: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

9

SPS 17

Single-cycle versus multicycle instruction execution.

Clock

Clock

Instr 2 Instr 1 Instr 3 Instr 4

3 cycles 3 cycles 4 cycles 5 cycles

Time saved

Instr 1 Instr 4 Instr 3 Instr 2

Time needed

Time needed

Time allotted

Time allotted

SPS 18

Internal Registers in CPUs

Processor

AC MAR

ALU

Control

Unit

MDR

PC IR

Memory Input/Output

Data Bus

Address Bus

PC Program counter

AC Accumulator

Internal registers

IR Instruction register

MAR Memory address register

MDR Memory data register

Page 10: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

10

SPS 19

Cycles of Instructions

Fetch

Instruction

Decode

Opcode

Execute

Operation

Fetch

Operands

Store

Result IR MAR MDR

SPS 20

Pipelining

Řízení procesoru s pipelining (zřetězeným)

zpracováním instrukcí není jednoduché.

Autoři procesoru MIPS, podobného NIOSu, uvádějí

ve své knize (Paterson, D., Henessy, J.: Computer

Organization and Design, Elsevier), že jimi

navržená jednotka pro pipelining úspěšně otestována

nejméně 100 lidmi na 18 různých pracovištích

obsahovala chybu ztracenou v několika tisících

řádkách HDL kódu, na kterou se přišlo až delším

používání. Více bude v předmětu A0B36APO

Architektura počítačů

Page 11: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

11

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off.

and Altera marks in and outside the U.S.

Minimum Nios II Processor

Program Controller

& Address

Generation

clock

reset

Status & Control

Registers

Instruction Master Port

Data Master Port

General Purpose Registers

Nios II Processor Core

= Configurable

= Fixed

Arithmetic Logic Unit

21

Outside NIOS

Image source: Tron

Page 12: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

12

23

Simple NIOS System on DE2

32-Bit

Nios

Processor

Switch PIO

LED PIO

7-Segment

LED PIO

PIO-32

User-Defined

Interface UART Timer

Address (32)

Read

Write

Data In (32)

Data Out (32)

Nios Processor

Avalo

n B

us

Copyright © 2005 Altera Corporation

Nios II/e on Basic Computer for DE2

24

Page 13: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

13

NIOS DE2 Basic Computer

Start-address End-address Element Size

0x00000000 0x007FFFFF SDRAM 8 MB Ram

--- 120 MB

0x08000000 0x0807FFFF ROM (SRAM) 512 kB read-only

15,5 MB

0x09000000 0x09001FFF On-chip-RAM 8 kB

111,92 MB

0x10000000 0x10000003 LEDR 4 bytes-write-only

0x10000010 0x10000013 LEDG 4 bytes-write-only

0x10000020 0x10000023 HEX3_HEX0 4 bytes-write-only

0x10000030 0x10000033 HEX7_HEX4 4 bytes-write-only

0x10000040 0x10000043 SW 4 bytes-read-only

0x10000050 0x1000005F KEY_BASE 4 bytes data

+12 bytes control

.. other I/O

128 MB

16 MB

112 MB

25

6 M

B

25

Copyright © 2005 Altera Corporation

Memory Map of Basic Computer SDRAM - synchronous dynamic RAM, i.e. synchronized

with the system bus, on the DE2 board

- 1M x 16 bits x 4 banks

- addresses 0x00000000 to 0x007FFFFF.

SRAM - static RAM (SRAM) chip, on the DE2 board

- 256K x 16 bits

- addresses space 0x08000000 to 0x0807FFFF.

On-Chip Memory

8-Kbyte memory inside the Cyclone II FPGA chip

- 2K x 32 bits

- addresses 0x09000000 to 0x09001FFF.

26

Page 14: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

14

Copyright © 2005 Altera Corporation

Nios II/s: DE2 Media Computer

27

28

NIOS DE2 Media Computer

Start-address End-address Element Size

0x00000000 0x007FFFFF SDRAM 8 MB synchron. DRAM

--- 120 MB

0x08000000 0x0807FFFF SRAM 512 kB VGA pixel buffer

15,5 MB

0x09000000 0x09001FFF On-chip-RAM 8 kB VGA char. buffer

111,92 MB

0x10000000 0x10000003 LEDR 4 bytes-write-only

0x10000010 0x10000013 LEDG 4 bytes-write-only

0x10000020 0x10000023 HEX3_HEX0 4 bytes-write-only

0x10000030 0x10000033 HEX7_HEX4 4 bytes-write-only

0x10000040 0x10000043 SW 4 bytes-read-only

0x10000050 0x1000005F KEY_BASE 4 bytes data

+12 bytes control

.. other I/O

128 MB

16 MB

112 MB

25

6 M

B

28

Page 15: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

15

I/O DE2 Basic & Media Computer

0x10000000 LEDR_BASE

0x10000010 LEDG_BASE,

0x10000020 HEX3_HEX0_BASE

0x10000030 HEX7_HEX4_BASE

0x10000040 SW_BASE

0x10000050 KEY_BASE

...others I/O...

Literatura on DVD: DE2_Media_Computer.pdf 29

Switch and Push Button

SW_BASE

KEY_BASE

30

Page 16: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

16

LED

LEDR_BASE

LEDG_BASE

31

HEX

HEX3_HEX0_BASE

HEX7_HEX4_BASE

32

Page 17: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

17

Wiki: Memory Mapped IO Memory-mapped I/O (MMIO) and port I/O (also called isolated I/O or port-mapped

I/O abbreviated PMIO) are two complementary methods of performing input/output between the CPU and peripheral devices in a computer. An alternative approach, not discussed here, is using dedicated I/O processors — commonly known as channels on mainframe computers — that execute their own instructions.

Memory-mapped I/O (not to be confused with memory-mapped file I/O) uses the same address bus to address both memory and I/O devices ‒ the memory and registers of the I/O devices are mapped to (associated with) address values. So when an address is accessed by the CPU, it may refer to a portion of physical RAM, but it can also refer to memory of the I/O device. Thus, the CPU instructions used to access the memory can also be used for accessing devices. Each I/O device monitors the CPU's address bus and responds to any CPU access of an address assigned to that device, connecting the data bus to the desired device's hardware register. To accommodate the I/O devices, areas of the addresses used by the CPU must be reserved for I/O and must not be available for normal physical memory. The reservation might be temporary — the Commodore 64 could bank switch between its I/O devices and regular memory — or permanent.

Port-mapped I/O often uses a special class of CPU instructions specifically for performing I/O. This is found on Intel microprocessors, with the IN and OUT instructions. These instructions can read and write one to four bytes (outb, outw, outl) to an I/O device. I/O devices have a separate address space from general memory, either accomplished by an extra "I/O" pin on the CPU's physical interface, or an entire bus dedicated to I/O. Because the address space for I/O is isolated from that for main memory, this is sometimes referred to as isolated I/O.

[ http://en.wikipedia.org/wiki/Memory-mapped_I/O ]

33

Inside NIOS

FPGA

Avalo

n S

wit

ch

Fab

ric

UART

GPIO

Timer

SPI

SDRAM

Controller

On-Chip

ROM

On-Chip

RAM

Nios II

CPU

Debug Cac

he

Page 18: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

18

NIOS Simplified System

.text

.data

35

NIOS: Common Register Usage

Normal usage Name Reg

0x0000_0000 zero r0

Assembler Temporary at r1

r7

r6

r5

r4

r3

r2

r14

r15

r14

r12

r11

r10

r9

r8

r23

r22

r21

r20

r19

r18

r17

r16

Normal usage Name Reg

Break Temporary bt r25

Exception Temporary et r24

Frame Pointer fp r28

Stack Pointer sp r27

Global Pointer gp r26

Break Return Address ba r30

Return Address ra r31

Exception Return Address ea r29

36

Page 19: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

19

NIOS Assembler

is GNU assembler http://tigcc.ticalc.org/doc/gnuasm.html

http://sources.redhat.com/binutils/docs-2.12/as.info/

by Altera monitor – lieratura on DVD • Doporuceno_Altera_Monitor_Program_Tutorial.pdf

• n2cpu_nii5v1.pdf – instruction set reference

37

General Processor Instruction Formats

Three-Address Instructions

ADD o1, o2, o3 o1 ← o2 + o3

Two-Address Instructions

MOV o1, o2 o1 ← o2

One-Address Instructions

NEXTPC o1 o1 ← PC + 4

Zero-Address Instructions

NOP r1 ← r1 add r0 (NIOS)

R-Type - operands oi are registers only

I-Type - instruction contains an immediate value

C-Type - control instruction

38

Page 20: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

20

NIOS Addressing Modes

(b) Immediate addressing Instruction contains the

operand

Op -20 movi r2, -20

(a) Register direct addressing Register contains the operand

Op R1 mov R2,R1

R2 R1

Operand R2

(c) Displacement (or offset)

addressing Address of operand =

register + constant

stw R2, byte_offset(R1)

stw R2, 100(R1) : R2 → M[R1+100]

Operand

Memory M

Op R2

Address R1

100

Op +

39

RISC vs CISC - Pentium Addressing Modes

[Source: Intel co.] LA=linear address ~ EA

Only for self-study

40

Page 21: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

21

Load/Store

ldw rB, im-const(rA) : rB ← MEM[rA + im-const]

ldw (load 32-bit word from memory)

stw (store 32-bit word into memory)

stw rB, im-const (rA) : rB → MEM[rA + im-const]

41

Example - demo1.s /*.include "nios_macros.s"*/

.equ LEDR_BASE, 0x10000000

.equ SW_BASE, 0x10000040

.text/* executable code follows */

.global_start

_start:

/* initialize base addresses of parallel ports */

movia r10, SW_BASE/* SW slider switch base address */

movia r12, LEDR_BASE/* red LED base address */

LOOP:

ldwio r8, 0(r10)

stwio r8, 0(r12)

br LOOP

.end

42

Pseudo-instruction

movia r12, 0x10000040

=>

orhi r12, r0, 0x1000

addi r12, r0, 0x0040

Page 22: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

22

Some GNU assembler directives

.include insert another file

.text this directive tells the assembler to place the following statements at the end of the code section

.global this directive makes the symbol visible to the program loading instructions into memory.

.data this directive tells the assembler to place the following statements at the end of the data section.

.equ this directive set the value of a symbol.

.word this directive is used to set the content of memory locations to the specified value.

.end Marks the end of current assembly program.

43

Data Types in Instructions

NIOS II registers hold 32-bit (4-byte) words.

Byte = 8 bits - example: ldb / ldbu + ldbio / ldbuio

- load byte extend signed/unsigned

Halfword = 2 bytes example: - ldh / ldhu + ldhio / ldhuio

- load halfword extend signed/unsigned

Doubleword = 8 bytes - user defined instructions only

Other common data sizes include byte and halfword.

Word = 4 bytes - example: - ldw + ldwio - load word

u-unsigned extension io - bypass cashe 44

Page 23: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

23

Unsigned/Signed Extension

• • • X

X • • • • • •

• • •

m

m k

signed

• • • X

Xu' • • • • • •

• • •

m k

0

unsigned

45

ALU Instructions

xori

ori

andi

subi

addi

nor

xor

or

and

sub

add

R-format

xorhi XOR

NOR

orhi OR

andhi AND

subtract

add

I-format operation

pseudo-instruction addi rB ← rA + se(number)

Logic instructions AND, OR, XOR, NOR

do not use se = sign-extension

muli mul multiply mulxuu,mulxsu

mulxss High 32bits

of 64bits High 32bits of

64bit result

Lower 32bits

of 64bit result

46

Page 24: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

24

Increasing Performance

Image source: Tron

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off.

and Altera marks in and outside the U.S.

3 Ways to Increase Performance

Custom Instructions

FPGA

Nios II

Custom Instructions

Accelerate CPU

processing

performance with

application-

specific hardware

Add external

co-processing

hardware to

accelerate data

functions

FPGA

Nios II

Hardware Accelerator

Hardware Accelerators

External CPU or

DSP

Multi-Processor System

PC

Ie

FPGA

Add more

processors

(internal &/or

external) to increase

processing power

Nios II

Nios II

Nios II

48

Page 25: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

25

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off.

and Altera marks in and outside the U.S.

Accelerating Software in FPGA

Accelerator 0

500

1,000

1,500

2,000

2,500

Itera

tio

ns

/Seco

nd

Software Only

Custom Instruction

27 Times

Faster

530 Times

Faster

Program Memory

Nios II

Data Memory

Arbiter

Data Memory

Arbiter

DM

A

Accelerator

DM

A

Control

Custom Instruction • Accelerator running 64Kb CRC at 100 MHz

(CRC can be easily calculated on a Linear

Feedback Shift Register, see the next lecture)

C - Out << >>

&

Custom Logic

+ -

A

B

Nios II Embedded Processor

49

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off.

and Altera marks in and outside the U.S.

Altera C2H – the SW Accelerator Solution

Generates and integrates a custom hardware

accelerator from an ANSI C function.

Program Memory

CPU

Data Memory

Arbiter

Data Memory

Arbiter

C2H

Accelerator

50

Page 26: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

26

Expander created from modified VGAFrame

51

VCCCLOCK_27 INPUT

VCCSW[17..0] INPUT

TD_RESETOUTPUT

VGA_R[9..0]OUTPUT

VGA_G[9..0]OUTPUT

VGA_B[9..0]OUTPUT

VGA_CLKOUTPUT

VGA_BLANKOUTPUT

VGA_HSOUTPUT

VGA_VSOUTPUT

VGA_SYNCOUTPUT

LEDR[17]OUTPUT

LEDR[16]OUTPUT

LEDR[15..12]OUTPUT

LEDR[9..0]OUTPUT

LEDR[11..10]OUTPUT

VC

C

Cyclone II

inclk0 frequency: 27.000 MHz

Operation Mode: Normal

Clk Ratio Ph (dg) DC (%)

c0 1007/1080 0.00 50.00

inclk0 c0

locked

altpll0

inst

CLK_25MHz175

ACLRN

yrow[9..0]

xcolumn[9..0]

VGA_CLK

VGA_BLANK

VGA_HS

VGA_VS

VGA_SYNC

VGAsynchro

inst4

yrow[9..0]

xcolumn[9..0]

VGA_CLK_in

VGA_BLANK_in

VGA_HS_in

VGA_VS_in

VGA_SYNC_in

ACLRN_in

datain[17..0]

VGA_R[9..0]

VGA_G[9..0]

VGA_B[9..0]

VGA_CLK

VGA_BLANK

VGA_HS

VGA_VS

VGA_SYNC

tb_yrow[9..0]

tb_xcolumn[9..0]

VGANiosRectangle

inst1

VC

C

GND

GND

VC

C

Freq25MHz175

datain(17) - enable write; datain(15 downto 12) - memory index

(0-x0, 1-y0, 2-xwidth, 3-yheight; 4 to 6 - RGB color of background, 8 to 10 - RGB color of the rectangle)

show bits of datain regions

VGANiosRectangle

52

yrow[9..0]

xcolumn[9..0]

VGA_CLK

ACLRN

datain[17..0]

VGA_R[9..0]

VGA_G[9..0]

VGA_B[9..0]

store:process(VGA_CLK, ACLRN)

frame:process(VGA_CLK)

subtype datain_t is unsigned(9 downto 0);

type memory_t is array(0 to 15) of datain_t;

signal memory : memory_t;;

x0,y0

xwidth

yh

eig

ht

0 1 2 3 4 5 6 8 9 10

x0 y0 xw yh R G B Rb Gb Bb

paramaters of picture

Page 27: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

27

Top entity of Basic Computer

53

DE2 Basic Computer Schema Detail

CLOCK_50

CLOCK_27

KEY[3..0]

SW[17..0]

UART_RXD LEDG[8..0]

LEDR[17..0]

HEX0[6..0]

HEX1[6..0]

HEX2[6..0]

HEX3[6..0]

HEX4[6..0]

HEX5[6..0]

HEX6[6..0]

HEX7[6..0]

SRAM_ADDR[17..0]

SRAM_CE_N

SRAM_WE_N

SRAM_OE_N

SRAM_UB_N

SRAM_LB_N

UART_TXD

DRAM_ADDR[11..0]

DRAM_BA_1

DRAM_BA_0

DRAM_CAS_N

DRAM_RAS_N

DRAM_CLK

DRAM_CKE

DRAM_CS_N

DRAM_WE_N

DRAM_UDQM

DRAM_LDQM

GPIO_0[35..0]

GPIO_1[35..0]

SRAM_DQ[15..0]

DRAM_DQ[15..0]

DE2_Basic_Computer

inst

VCCCLOCK_50 INPUT

VCCCLOCK_27 INPUT

VCCKEY[3..0] INPUT

VCCSW[17..0] INPUT

VCCUART_RXD INPUT LEDG[8..0]OUTPUT

LEDR[17..0]OUTPUT

HEX0[6..0]OUTPUT

HEX1[6..0]OUTPUT

HEX2[6..0]OUTPUT

HEX3[6..0]OUTPUT

HEX4[6..0]OUTPUT

HEX5[6..0]OUTPUT

HEX6[6..0]OUTPUT

HEX7[6..0]OUTPUT

SRAM_ADDR[17..0]OUTPUT

SRAM_CE_NOUTPUT

SRAM_WE_NOUTPUT

SRAM_OE_NOUTPUT

SRAM_UB_NOUTPUT

SRAM_LB_NOUTPUT

UART_TXDOUTPUT

DRAM_ADDR[11..0]OUTPUT

DRAM_BA_1OUTPUT

DRAM_BA_0OUTPUT

DRAM_CAS_NOUTPUT

DRAM_RAS_NOUTPUT

DRAM_CLKOUTPUT

DRAM_CKEOUTPUT

DRAM_CS_NOUTPUT

DRAM_WE_NOUTPUT

DRAM_UDQMOUTPUT

DRAM_LDQMOUTPUT

VCCGPIO_0[35..0]BIDIR

VCCGPIO_1[35..0]BIDIR

VCCSRAM_DQ[15..0]BIDIR

VCCDRAM_DQ[15..0]BIDIR

54

Page 28: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

28

Connected Accelerator

55

VCCCLOCK_50 INPUT

VCCCLOCK_27 INPUT

VCCKEY[3..0] INPUT

VCCSW[17..0] INPUT

VCCUART_RXD INPUT LEDG[8..0]OUTPUT

LEDR[17..0]OUTPUT

HEX0[6..0]OUTPUT

HEX1[6..0]OUTPUT

HEX2[6..0]OUTPUT

HEX3[6..0]OUTPUT

HEX4[6..0]OUTPUT

HEX5[6..0]OUTPUT

HEX6[6..0]OUTPUT

HEX7[6..0]OUTPUT

SRAM_ADDR[17..0]OUTPUT

SRAM_CE_NOUTPUT

SRAM_WE_NOUTPUT

SRAM_OE_NOUTPUT

SRAM_UB_NOUTPUT

SRAM_LB_NOUTPUT

UART_TXDOUTPUT

DRAM_ADDR[11..0]OUTPUT

DRAM_BA_1OUTPUT

DRAM_BA_0OUTPUT

DRAM_CAS_NOUTPUT

DRAM_RAS_NOUTPUT

DRAM_CLKOUTPUT

DRAM_CKEOUTPUT

DRAM_CS_NOUTPUT

DRAM_WE_NOUTPUT

DRAM_UDQMOUTPUT

DRAM_LDQMOUTPUT

TD_RESETOUTPUT

VGA_R[9..0]OUTPUT

VGA_G[9..0]OUTPUT

VGA_B[9..0]OUTPUT

VGA_CLKOUTPUT

VGA_BLANKOUTPUT

VGA_HSOUTPUT

VGA_VSOUTPUT

VGA_SYNCOUTPUT

VCCGPIO_0[35..0]BIDIR

VCCGPIO_1[35..0]BIDIR

VCCSRAM_DQ[15..0]BIDIR

VCCDRAM_DQ[15..0]BIDIR

CLOCK_50

CLOCK_27

KEY[3..0]

SW[17..0]

UART_RXD LEDG[8..0]

LEDR[17..0]

HEX0[6..0]

HEX1[6..0]

HEX2[6..0]

HEX3[6..0]

HEX4[6..0]

HEX5[6..0]

HEX6[6..0]

HEX7[6..0]

SRAM_ADDR[17..0]

SRAM_CE_N

SRAM_WE_N

SRAM_OE_N

SRAM_UB_N

SRAM_LB_N

UART_TXD

DRAM_ADDR[11..0]

DRAM_BA_1

DRAM_BA_0

DRAM_CAS_N

DRAM_RAS_N

DRAM_CLK

DRAM_CKE

DRAM_CS_N

DRAM_WE_N

DRAM_UDQM

DRAM_LDQM

GPIO_0[35..0]

GPIO_1[35..0]

SRAM_DQ[15..0]

DRAM_DQ[15..0]

DE2_Basic_Computer

inst

VC

C

Cyclone II

inclk0 frequency: 27.000 MHz

Operation Mode: Normal

Clk Ratio Ph (dg) DC (%)

c0 1007/1080 0.00 50.00

inclk0 c0

locked

altpll0

inst2

CLK_25MHz175

ACLRN

yrow[9..0]

xcolumn[9..0]

VGA_CLK

VGA_BLANK

VGA_HS

VGA_VS

VGA_SYNC

VGAsynchro

inst4

yrow[9..0]

xcolumn[9..0]

VGA_CLK_in

VGA_BLANK_in

VGA_HS_in

VGA_VS_in

VGA_SYNC_in

ACLRN_in

datain[17..0]

VGA_R[9..0]

VGA_G[9..0]

VGA_B[9..0]

VGA_CLK

VGA_BLANK

VGA_HS

VGA_VS

VGA_SYNC

tb_yrow[9..0]

tb_xcolumn[9..0]

VGANiosRectangle

inst1

Freq25MHz175

E(x)ternal

Influence Q: – How to transfer

signals depended on different clocks?

A: If you look at the picture you will see the

key to this solution: You just need second hands for automatic

shaking:-)

56 [Image: 9gag.com]

Page 29: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

29

57

Synchronize Signal with 2nd Clock

Signal-

clk1

source

Hand

Snake

communication channel

data

RDY new data are ready

Clk1 Clk2 2 unsynchronized clocks

Send without implicit hand-shaking

(cz: nepotvrzovaný přenos dat)

It is similar to UDP on networks

(UDP - User Datagram Protocol)

58

Handshaking

Clk1

data

Signal-

clk1

source

register

a signal synchronized

by the 1st clock

RDY

ACK

- acknowledge

read by hand-shaking

(cz: kladně potvrzovaný přenos) like TCP = Transmission Control Protocol

Han

dsh

ake

FS

M

clk=5kHz

SLOAD

Data

Clk2

Page 30: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

30

59

Handshake with stable data

Hand

shake

RDY

ACK

data

ACK 5

6

7

Signal-

clk1

source

Register

1

5 5

RDY is synchronized with clk1, but not with clk2

1

4 RDY

request

3

1 1

clk1

data

2

4

Clk1

request

clk2 SLOAD

FSM for Handshaking Moore

Machine ACK RDY

Clock1

S0

S1 RDY

Clk1-signal ='1'

Clk1-signal ='0' and ACK='0'

S2 ACK='1'

Clk1-signal='0'

ACK='0'

Clk1-signal ='1' or ACK='1'

Page 31: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

31

61

Handshaking for

unstable data

clk1

data

Signal-

clk1

source

Shift

register RDY

init1

ACK

- acknowledge H

an

dsh

ake

FS

M

clk2

SLOAD

62

Handshake with unstable data

Hand

shake RDY

ACK

init0 init1

ACK

clk1

5 6

7

Signal-

clk1

source

Shift

1

5 5

RDY is synchronized with clk1, but not with clk2

1

4 RDY

request

3

1 1

init1

clk1

init0

2

4

KEY request clk2

SLOAD

Page 32: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

32

SPS 63

rdy request

ack

data0

clock

aclrn data1

Handshake for unstable data (1/4)

signal stReg

signal

stNext Sta

teR

egis

ter

pro

cess

Nex

tSta

te

pro

cess

data1 <=dataReg;

signal

dataReg signal loadData

ARCHITECTURE rtl of handshake is

Input ports

of ENTITY

Output

ports of

ENTITY

ENTITY handshake is

SPS 64

Handshake for unstable data (2/4)

LIBRARY ieee; USE ieee.std_logic_1164.all;

ENTITY handshake is

PORT( ack, request : IN STD_LOGIC;

data0 : in std_logic_vector(15 downto 0);

clock, aclrn : IN STD_LOGIC;

rdy : OUT STD_LOGIC;

data1 : out std_logic_vector(15 downto 0) );

end handshake;

ARCHITECTURE rtl of handshake is

type state is (s0, s1, s2);

signal stReg, stNext : state;

signal dataReg:std_logic_vector(15 downto 0);

signal loadData:std_logic;

begin StateRegister: PROCESS(clock, aclrn) BEGIN...END PROCESS;

NextState: PROCESS(stReg, ack, request) BEGIN...END PROCESS;

data1<=dataReg;

end rtl;

L:Libraries

EP: entity-ports

AD: architecture

declarations

ACS: architecture concurrent

statements There are 3

concurrent statements:

StateRegister:process,

NextState:process, and

<=

Page 33: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

33

SPS 65

Handshake for unstable data (3/4)

StateRegister: PROCESS(clock, aclrn) begin

if (aclrn = '0') then stReg <= s0; dataReg<=(others=>'0'); elsif rising_edge(clock) then

stReg <= stNext; if loadData='1' then dataReg<=data0; end if; end if; end PROCESS;

PS: process

-synchronous part

PC: process-clock test

PH: process-header with sensitivity list

SPS 66

Handshake for unstable data (4/4)

NextState: PROCESS(stReg, ack, request)

begin

rdy<='0'; loadData <= '0' ;

case stReg is

when s0 => if request = '1' then stNext <= s1;loadData <= '1' ;

else stNext <= s0; end if;

when s1 => rdy<='1';

if ack = '1' then stNext <= s2;

else stNext <= s1; end if;

when s2 => if request = '0' and ack = '0' then stNext <= s0;

else stNext <= s2; end if;

when others => report "Undefined state";

end case;

end PROCESS;

PD:=

process-

dataflow

part

PH: process-header with

sensitivity list

Page 34: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

34

SPS 67

ack

clk

NextState:OUT0

request

reset

s0

s1

0

1

0

D Q

PRE

ENA

CLR

loadData

dataReg[15..0]

ack

clock

aclrn

rdy

data1[15..0]

stReg

NextState

data0[15..0]

request

RTL

s1 s2stReg~1

s0

SPS 68

RDY

Clk2

Asynchronous

clear

active-low

D

CLK

Q

CLRN

D

CLK

Q

CLRN

D

CLK

Q

CLRN

EN

DATA

synchronní

z clk2

Příjem signálu handshake DFFE-registry

DFF DFF

ACK

Data-

CLK1

SLOAD

Synchronizátor není nutný,

ale velmi doporučený

Obvod

potvrzení

Page 35: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

35

NIOS gcc

programming in low-level C language

69

70

optional

parts

optional

parts

Basic Steps of C Compiler

code.C

NIOS

processor Linker

NIOS gcc C-compiler passes 1 to M

C preprocessor modifies source code by substitutions

metacode

code.tmp

Compiler passes M to N: Optimization of metacode

relative object

module

Loader

system

libraries

string.h stdio.h code.h

70

Page 36: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

36

71

C primitive types

1) In many implementations, it is not a standard C datatype, but only common

custom for user's "#define" macro definitions, see next slides

Size Java C C alternative Range

1 boolean any integer, true if !=0 BOOL(1 0 to !=0

8 byte char(2 signed char –128 to +127

8 unsigned char BYTE(1 0 to 255

16 short int signed short –32768 to

+32767

16 unsigned short 0 to + 65535

32 int int signed int -2^31 to 2^31-1

32 unsigned int DWORD(1 0 to 2^32-1

64 long long long int -2^63 to 2^63-1

64 unsigned long LWORD(1 0 to 2^64-1

2) Default is signed but it is best to specify.

Sign Extension Example in C

short int x = 15213;

int ix = (int) x;

short int y = -15213;

int iy = (int) y;

Decimal Hex Binaryx 15213 3B 6D 00111011 01101101

ix 15213 00 00 C4 92 00000000 00000000 00111011 01101101

y -15213 C4 93 11000100 10010011

iy -15213 FF FF C4 93 11111111 11111111 11000100 10010011

72

Page 37: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

37

73

NIOS C #define / const

#define TENDEF 10 //substitution rule

read only access

const int TEN2 = TEN; /* Error: initializer element is not constant

Note: OK in C++, but C variables must not initialize constants * /

const ~ static final in Java

const int TEN=10;

const int TEN3 = TENDEF; // OK

/* after preprocessor run:

const int TEN3 = 10;

*/

74

Definition of BYTE and BOOL

// by substitution rule no ; and no type check

#define BYTE unsigned char

#define BOOL int

// by introducing new type, ending ; is required

typedef unsigned char BYTE;

typedef int BOOL;

C language has no strict type checking #define ~ typedef,

but typedef is usually better integrated into compiler.

Page 38: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

38

75

Bitwise Logic Instructions

AND

OR

XOR

left shift

right shift

NOT

&

|

^

<<

>>

~

n = n | 0xFF00;

Examples:

In C, operator << usually behaves as logical for unsigned

operand and as arithmetic shift for signed operands.

n = n & 0xF0;

n = n >> 4

n = ~0xFF;

n = 0xFF << 4;

n = n ^ 0x80;

NIOS Shift Instructions

sll rC, rA, rB rC ← rA << (rB[4..0])

slli rC, rA, sh rC ← rA << (sh)

---

roli

srai

srli

slli

ror

rol

sra

srl

sll

Register Immediate

right

left

right

right

left

dir

rotate

rotate

arithmetic

logical

logical

shift

barrel shifter

barrel shifter

signed / 2

C >> ; unsigned/2

C << ; *2

note

Immediate

76

Page 39: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

39

77

NIOS and C

1. ORI r4, r2, 0x30

2. ANDI r4, r2, 0x0F

3. XORI r4, r2, 0b10000000

4. NOR r4, r2, r0 /* r4←not r2 */

5. SLLI r4, r2, 2

6. SRLI r4, r2, 3

7. SRAI r4, r2, 4

8. Rotations ROR and ROL has no direct C equivalents.

1. x4 = x2 | 0x30 ;

2. x4 = x2 & 0x0F ;

3. x4 = x2 ^ 0b10000000;

4. x4 = ~x2

5. x4 = x2<<2;

6. x4 = (unsigned int) x2>>3

7. x4 = (signed int) x2>>3;

int x4, x2=-10;

Example - demo2.s

/*.include "nios_macros.s"*/

.equ LEDR_BASE, 0x10000000

.equ SW_BASE, 0x10000040

.text/* executable code follows */

.global_start

_start:

/* initialize base addresses of parallel ports */

movia r10, SW_BASE/* SW slider switch base address */

movia r12, LEDR_BASE/* red LED base address */

LOOP:

ldwio r8, 0(r10)

slli r8, r8, 1 stwio r8, 0(r12)

br LOOP

.end

78

Page 40: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

40

79

volatile keyword

int main(void)

{ // no assumption about value

volatile int x4, x2= -10;

x4 = x2<<2;

x4 = (unsigned int) x2>>3;

x4 = (signed int) x2>>3;

}

int main(void)

{ int x4, x2= -10; //0xfffffff8

x4 = x2<<2;

x4 = (unsigned int) x2>>3;

// x4 = 0x1ffffffe

x4 = (signed int) x2>>3;

// x4=0xfffffffe

}

volatile

winged animal

from plural

of Latin volatilis

= winged

optimized

code

79

Volatile

When the keyword volatile is specified for a variable,

it gives a hint to the compiler not to do certain

optimizations as this variable can change from other

parts of the program unexpectedly.

What is meant here, is that the compiler should not

reuse the value already loaded in a register, but

access the memory again as the value in register is

not guaranteed to be the same as the value stored in

memory.

All addresses of IO devices should be volatile.

[Source: C-reference]

80

Page 41: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

41

81

Pointer Operators

& (address operator)

Returns the address of its operand

Example int y = 5;

int *yPtr;

yPtr = &y; // yPtr gets address of y

yPtr “points to” y

yPtr

y

5

yptr

500000 600000

y

600000 5

address of y

is value of

yptr

82

Pointer Operators

& (address operator)

Returns the address of its operand

* dereference address

Get operand stored in address location

* and & are inverses

(though not always applicable)

Cancel each other out

*&myVar == myVar

and

&*yPtr == yPtr

Page 42: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

42

83

Size of Pointer in C-kod

int * ptri; char * ptrc; double * ptrd;

-

+

ptri

ptri+1

ptrc

ptrc+1

ptrd

ptrd+1

*ptrx ≡ ptrx[0]

*(ptrx+1) ≡ ptrx[1]

*(ptrx+n) ≡ ptrx[n]

*(ptrx-n) ≡ ptrx[-n]

nr1 = sizeof (double);

nr2 = sizeof (double*);

nr1 != nr2

84

Edge Detector – Full Source Code .equ IOBASE, 0x10000000

.equ LEDG, 0x10

.equ SW, 0x40

.text

.global _start

_start:

movia r8, IOBASE

mov r18, zero

LOOP:

ldwio r15, SW(r8)

nor r9, zero, r17 /* r9:=not r17*/

and r9, r9, r16

and r9, r9, r15

cmpne r9, r9, zero

xor r18, r18, r9

stwio r18, LEDR(r8)

mov r17, r16

mov r16, r15

br LOOP

.end

Memory Address

= C pointer

type * pointer;

Dereferencing

Pointer

* (pointer+ix)

≡ pointer[ix]

Page 43: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

43

85

Char [ ]

C-kod:

char text[] = ”Hello World!”;

/* text = &text[0] */

Assembler:

.data

text: .asciz ”Hello World!”

’l’

’l’

’H’

’a’

’W’

’o’

’o’

’ ’

’d’

’r’

’l’

’!’

text:

text[4]

0

1

2

3

4

5

6

7

8

9

10

11

12

text[7]

0x00

86

int [ ]

C-kod:

int vect[] = {1, 10, 0xA};

/* vect = &(vect[0]) */

Assembler:

.data

.align 2

vect: .word 1, 10, 0xA

#little endian byte ordering

0x00

0x00

0x01

0x00

0x00

0x00

0x0A

0x00

0x0A

0x00

0x00

vect:

*(vect+1) ≡ vect[1]

0

1

2

3

4

5

6

7

8

9

10

0x00

*(vect+2) ≡ vect[2]

Page 44: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

44

87

Pointer and const

int x;

int * lpio = 0x10000000;

*lpio = 1; x=*lpio; lpio++;

const int * lpCio = 0x10000000;

*lpCio = 1; x=*lpCio; lpCio++;

int * const lpioC = 0x10000000;

*lpioC = 1; x=*lpioC; lpioC++;

const int * const lpCioC = 0x10000000;

*lpCioC = 1; x=*lpCioC; lpCioC++;

VGARectangle

DE2 Computer Extender

Page 45: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

45

C-program

//#define DEBUG volatile int * red_LED_ptr = (int *) 0x10000000;

volatile int * SW_switch_ptr = (int *) 0x10000040;

void WrData(int address, int data); // WrData to extender

int Count(int * counter, int * updown, int max); // increment/decrement counter on updown bits

int main(void) {

int x0=0, y0=0, xwidth=1, yheight=1;WrData(0, x0);

WrData(1, y0); WrData(2, xwidth); WrData(3, yheight); // initilize extender

while(1) {

int SW_value = *(SW_switch_ptr); // read the SW slider switch values

if(Count(&x0, &SW_value, 640)) WrData(0, x0);

if(Count(&y0, &SW_value, 480)) WrData(1, y0);

if(Count(&xwidth, &SW_value, 640)) WrData(2, xwidth);

if(Count(&yheight, &SW_value, 480)) WrData(3, yheight);

int delay_count = 500000; while(delay_count) --delay_count; // delay loop

}

}

89

Functions // Write on Address of extender data

void WrData(int address, int data) {

int send = (data & 0x3FF) | ((address&3)<<14); // address 15. and 14. bit + data

*red_LED_ptr=send; *red_LED_ptr=(send | 0x20000); // toggle 17. bit

int delay=10; while(delay>0) delay--;

*red_LED_ptr=send; // reset toggle

#ifdef DEBUG

printf("%d:%d\t",address, data);

#endif

}

// increment/decrement counter on updown bits

int Count(int * counter, int * updown, int max)

{ int ud=*updown, cntr=*counter, cntr0=cntr;

if((ud & 1) != 0 && cntr>0) cntr--;

if((ud & 2) != 0 && cntr<max) cntr++;

*updown=ud>>2; *counter=cntr;

return cntr!=cntr0; // move to next value

}

90

Page 46: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

46

Edge Detection in C

Optimal / Non-optimal code

This parts demonstrates links between

assembler and C program

Page 47: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

47

NIOS: Common Register Usage in C Normal usage Name Reg

0x0000_0000 zero r0

Assembler Temporary at r1

Register Arguments (Fourth 32 bits) r7

Register Arguments (Third 32 bits) r6

Register Arguments (Second 32 bits) r5

Register Arguments (First 32 bits) r4

Return Value (most significant 32 bits) r3

Return Value (least-significant 32 bits) r2

r14

r15

r14

r12

r11

r10

General-Purpose Registers r9

Caller-Saved r8

r23

r22

r21

r20

r19

r18

General-Purpose Registers

r17

Callee-Saved r16

Normal usage Name Reg

Break Temporary bt r25

Exception Temporary et r24

Frame Pointer fp r28

Stack Pointer sp r27

Global Pointer gp r26

Break Return Address ba r30

Return Address ra r31

Exception Return Address

ea r29

93

94

VHDL 32 bit edge detector

library ieee; use ieee.std_logic_1164.all;

entity re32d2 is port(data : in std_logic_vector(31 downto 0);

clk : in std_logic; q : out std_logic);

end entity;

architecture rtl of re32d2 is

signal r15, r16, r17: std_logic_vector(31 downto 0); --the shift register

signal r18: std_logic:='0'; --output flip-flop

begin

process (clk)

begin

if (rising_edge(clk)) then

if((r15 and r16 and (not r17))/=X"00000000") then r18<=not r18;

end if;

r17<=r16; r16<=r15; r15<=data; --shift

end if;

end process;

q<=r18; --write output

end rtl;

Page 48: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

48

NIOS SW Edge Detector

1. ldwio r15,SW(r8)

2. nor r9, zero, r17

3. and r9, r9, r16

4. and r9, r9, r15 9. mov r16, r15

8. mov r17, r16

7. stwio r18, LEDR(r8)

6. xor r18, r18, r9

5. cmpne r9, r9, zero

r9

VCC sw[17..0] INPUT

VCC CLOCK_50 INPUT

LEDG[0] OUTPUT

DFF data[31..0]

clock

aclr

q[31..0]

lpm_tff0

inst3

DFF data[31..0]

clock aclr

q[31..0]

lpm_tff0

inst4

DFF data[31..0]

clock

aclr

q[31..0]

lpm_tff0

inst5

NOT

inst10

32

inst7

CLRN

D PRN

Q

DFF

inst6

VC

C

VC

C

Even_Divisor 5000 Unsigned Integer

Parameter Value Type

CLK q

FreqDivByEven

inst15

32

32

32

32

lpm

_a

nd

0

inst1

4

GN

D

GND

q[31..0]

data[31..18]

data[31..0]

data[17..0]

F=10kHz

XOR

inst1

lpm_or32to1

r16 r17

r18

r15 1

2

3 4

5

6

7

8 9

r9

95

96

Low-level C-code ~ Assembler

const int IXLEDG=0x10/sizeof(int);

const int IXSW =0x40/sizeof(int);

int main(void)

{

int reg17, reg16, reg15;

volatile int * const IOBASE= 0x10000000;

int reg18=0;

while(1)

{ reg15 = IOBASE[IXSW];

if(reg15 & reg16 & ~reg17)

reg18= reg18^1;

// store result

IOBASE[IXLEDG] = reg18;

reg17=reg16; // shift

reg16=reg15;

}

}

.equ IOBASE, 0x10000000

.equ LEDG, 0x10

.equ SW, 0x40

.text

.global _start

_start:

movia r8, IOBASE

mov r18, zero

LOOP:

ldwio r15, SW(r8)

nor r9, zero, r17 /* r9:=notr17*/

and r9, r9, r16

and r9, r9, r15

cmpne r9, r9, zero

xor r18, r18, r9

stwio r18, LEDR(r8)

mov r17, r16

mov r16, r15

br LOOP

.end

It corresponds

to our

assembler

code

96

Page 49: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2014pr11_NIOS.pdf · 8 bit µ-processor 8008 8080 8085 Z80 ... Fetch Instruction Decode Opcode

4.12.2014

49

97

Higher level C-code

int main(void)

{ volatile int * lpLEDG = (int *) 0x10000010; // red LED address

volatile int * lpSW= (int *) 0x10000040; // SW switch address

int SW_mem[3], SW_value, ledVal=0;

while(1)

{

SW_value = lpSW[0]; // *lpSW read the SW slider switch values

if(SW_mem[0] & SW_mem[1] & ~SW_mem[2])

ledVal= ledVal^1; // light up the green LEDs

lpLEDG[0] = ledVal;

int i; for(i=2; i>0; i--) SW_mem[i]=SW_mem[i-1];

SW_mem[0]=SW_value; // *SW_mem=SW_value

}

}

High level C-code is compiled into ~28 NIOS

instructions, the low-level into 15 NIOS instructions


Top Related