4.12.2014
1
Computer System Structures
cz:Struktury počítačových systémů
Lecturer: Richard Šusta
ČVUT-FEL in Prague, CR – subject A0B35SPS
Version: 1.0
Microprocessors
2
4.12.2014
2
Definition: Microprocessor
A Microprocessor is a Single VLSI Chip that has a CPU and may also have some Other Units (for example, caches, floating point processing arithmetic unit, pipelining, and super-scaling units)
Source Microprocessor Family
Motorola 68HCxxx
Intel 80x86 & i860
Sun SPARC
IBM PowerPC
Source: Raj Kamal, Embedded Systems: Architecture, Programming and Design 3
Existovalo a dodnes existuje mnoho typů od různých výrobců
http://research.microsoft.com/en-us/um/people/gbell/CyberMuseum_contents/Microprocessor_Evolution_Poster.jpg
4.12.2014
3
From 8 bit to 32 bit CISC processors
8 bit µ-processor 8008 8080 8085 Z80
Year 1972 1974 1976 1976
Instructions 66 111 113 158
MIPS 0.06 0.29 0.37 0.58
Minimum system (IO circuits) 20-40 7 3 3
16 and 32 bit
µ-processor 8086/88 80286 80386 80486 Pentium
64bit example: Intel
I5-2500K Quad core
Year 1978/79 1982 1985-91 1989-94 1993-98 2011
Instructions processor
+ numeric cooprocessor 133 ~175 ~195 ~200 ~225 ~500
MIPS 0.33-0.75 0.9-2.6 2.5-11 16-70 100-300 ~80000
Memory Physical / Virtual 1 MB 16 MB/1 GB 4 GB / 64TB 32 GB / 256 TB
Definition: Microcontroller
A microcontroller is a single-chip VLSI unit which, though having limited computational capabilities possesses enhanced input-output capabilities and a number of on-chip functional units
Source Microcontroller Family
Motorola 68HC11xx, HC12xx, HC16xx
Intel 80x86, 8051, 80251
Microchip PIC 16F84, 16F876, PIC18
ARM ARM9, ARM7
Source: Raj Kamal, Embedded Systems: Architecture, Programming and Design 6
4.12.2014
4
Microcontroller
Microcontroller
(uC)
Outp
ut in
terfaces
Input in
tefaces
In device1
In device2
In device3
Out device1
Out device2
Auxiliary
units
7
Example: “Original” 8051 Microcontroller
Oscillator
and timing
4096 Bytes
Program
Memory
128 Bytes
Data
Memory
Two 16 Bit
Timer/Event
Counters
8051
CPU
64 K Byte Bus
Expansion
Control
Programmable
I/O
Programmable
Serial Port Full
Duplex UART
Synchronous Shifter
Internal data bus
External interrupts
subsystem interrupts
Control Parallel ports
Address Data Bus
I/O pins Serial Input
Serial Output
Source: Raj Kamal, Embedded Systems: Architecture, Programming and Design 8
4.12.2014
5
SPS 9
Hardcore/Softcore Processors
Hard-core Processors
A Fabricated Integrated Circuit that may or
may not be Embedded into Additional Logic
Soft-core Processors
A Processor Described in a HDL that is
implemented in Reconfigurable Logic (ie, in
an FPGA), e.g NIOS, MicroBlaze, OpenRISC
SPS 10
Soft-core Processors Logic Elements
192
200
204
390
510
1170
1324
1800
1928
1984
2536
2600
3500
5000
6000
37000
60000
100 1000 10000 100000
Xilinx PicoBlaze
LatticeMico8
PacoBlaze
Nios II/e
DSPuva16
Nios II/s
Xilinx MicroBlaze
Nios II/f
OpenFire
LatticeMico32
aeMB
ARM Cortex-M1
LEON3
LEON2
OpenRISC 1200
S1 Core
Ultra SPARC S1 Core/f
Su
n M
icro
syste
ms
Op
en
so
urc
e
4.12.2014
6
Copyright © 2005 Altera Corporation
Nios II: RISC soft-core processor
Copyright © 2005 Altera Corporation
Nios II Versions
Nios II Processor Comes In Three ISA (Instruction Set Architecture) Compatible Versions
Software Code is Binary Compatible
No Changes Required When CPU is Changed
FAST: Optimized for Speed
STANDARD: Balanced for Speed and
Size
ECONOMY: Optimized for Size
12
4.12.2014
7
Copyright © 2005 Altera Corporation
Nios II: Hard Numbers
Nios II/ f Nios II /s Nios II /e
Stratix II 200 DMIPS @
175MHz
1180 LEs
90 DMIPS @
175MHz
800 LEs
28 DMIPS @
190MHz
400 LEs
Cyclone 100 DMIPS @
125MHz
1800 LEs
62 DMIPS @
125MHz
1200 LEs
20 DMIPS @
140MHz
550 LEs
DMIPS = Dhrystone benchmark MIPS (million instructions per second)
Dhrystone is without float point operations
(Note: Whetstone [cz brousek] has float points)
Comparing with Pentium (1996) 224 DMIPS @ 200MHz
cca 196 DMIPS@175MHz - cca 140 DMIPS@125 MHz
13
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off.
and Altera marks in and outside the U.S.
Processor Core Variations Nios II /f
Fast Nios II /s Standard
Nios II /e Economy
Pipeline 6 Stage 5 Stage None
H/W Multiplier &
Barrel Shifter 1 Cycle 3 Cycle
Emulated
In Software
Branch Prediction Dynamic Static None
Instruction Cache Configurable Configurable None
Data Cache Configurable None None
Logic Requirements
(Typical LEs)
1800 w/o MMU
3200 w/ MMU 1200 600
Custom
Instructions Up to 256
(MMU - Memory Management Unit)
14
4.12.2014
8
SPS 15 15
Mutlicycle CPU
Instr.
decode/reg.
fetch/branch
addr.
ALU
operation
Read
memory
data
Instr.
fetch/
adv. PC
load
store
Reg Branch
Jump
Start
State 0 1
2 3
4 5
6
7 8 9
Compute
memory
addr.
Write
jump addr.
to PC
Write
register
Write
memory
data
Write
register
Write PC
on branch
condition
SPS 16
Different Execution Time of Instructions
instruction
memory
data
memory registers registers ALU
P C
Compute Not used
Not used
Not used
P C
Jump Not used
Not used
Not used
Not used
P C
Branch Not used
P C
Branch Not used
Not used
Not used
P C
Load
Not
P C
Store Not used
Fastest
Slowest
4.12.2014
9
SPS 17
Single-cycle versus multicycle instruction execution.
Clock
Clock
Instr 2 Instr 1 Instr 3 Instr 4
3 cycles 3 cycles 4 cycles 5 cycles
Time saved
Instr 1 Instr 4 Instr 3 Instr 2
Time needed
Time needed
Time allotted
Time allotted
SPS 18
Internal Registers in CPUs
Processor
AC MAR
ALU
Control
Unit
MDR
PC IR
Memory Input/Output
Data Bus
Address Bus
PC Program counter
AC Accumulator
Internal registers
IR Instruction register
MAR Memory address register
MDR Memory data register
4.12.2014
10
SPS 19
Cycles of Instructions
Fetch
Instruction
Decode
Opcode
Execute
Operation
Fetch
Operands
Store
Result IR MAR MDR
SPS 20
Pipelining
Řízení procesoru s pipelining (zřetězeným)
zpracováním instrukcí není jednoduché.
Autoři procesoru MIPS, podobného NIOSu, uvádějí
ve své knize (Paterson, D., Henessy, J.: Computer
Organization and Design, Elsevier), že jimi
navržená jednotka pro pipelining úspěšně otestována
nejméně 100 lidmi na 18 různých pracovištích
obsahovala chybu ztracenou v několika tisících
řádkách HDL kódu, na kterou se přišlo až delším
používání. Více bude v předmětu A0B36APO
Architektura počítačů
4.12.2014
11
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off.
and Altera marks in and outside the U.S.
Minimum Nios II Processor
Program Controller
& Address
Generation
clock
reset
Status & Control
Registers
Instruction Master Port
Data Master Port
General Purpose Registers
Nios II Processor Core
= Configurable
= Fixed
Arithmetic Logic Unit
21
Outside NIOS
Image source: Tron
4.12.2014
12
23
Simple NIOS System on DE2
32-Bit
Nios
Processor
Switch PIO
LED PIO
7-Segment
LED PIO
PIO-32
User-Defined
Interface UART Timer
Address (32)
Read
Write
Data In (32)
Data Out (32)
Nios Processor
Avalo
n B
us
Copyright © 2005 Altera Corporation
Nios II/e on Basic Computer for DE2
24
4.12.2014
13
NIOS DE2 Basic Computer
Start-address End-address Element Size
0x00000000 0x007FFFFF SDRAM 8 MB Ram
--- 120 MB
0x08000000 0x0807FFFF ROM (SRAM) 512 kB read-only
15,5 MB
0x09000000 0x09001FFF On-chip-RAM 8 kB
111,92 MB
0x10000000 0x10000003 LEDR 4 bytes-write-only
0x10000010 0x10000013 LEDG 4 bytes-write-only
0x10000020 0x10000023 HEX3_HEX0 4 bytes-write-only
0x10000030 0x10000033 HEX7_HEX4 4 bytes-write-only
0x10000040 0x10000043 SW 4 bytes-read-only
0x10000050 0x1000005F KEY_BASE 4 bytes data
+12 bytes control
.. other I/O
128 MB
16 MB
112 MB
25
6 M
B
25
Copyright © 2005 Altera Corporation
Memory Map of Basic Computer SDRAM - synchronous dynamic RAM, i.e. synchronized
with the system bus, on the DE2 board
- 1M x 16 bits x 4 banks
- addresses 0x00000000 to 0x007FFFFF.
SRAM - static RAM (SRAM) chip, on the DE2 board
- 256K x 16 bits
- addresses space 0x08000000 to 0x0807FFFF.
On-Chip Memory
8-Kbyte memory inside the Cyclone II FPGA chip
- 2K x 32 bits
- addresses 0x09000000 to 0x09001FFF.
26
4.12.2014
14
Copyright © 2005 Altera Corporation
Nios II/s: DE2 Media Computer
27
28
NIOS DE2 Media Computer
Start-address End-address Element Size
0x00000000 0x007FFFFF SDRAM 8 MB synchron. DRAM
--- 120 MB
0x08000000 0x0807FFFF SRAM 512 kB VGA pixel buffer
15,5 MB
0x09000000 0x09001FFF On-chip-RAM 8 kB VGA char. buffer
111,92 MB
0x10000000 0x10000003 LEDR 4 bytes-write-only
0x10000010 0x10000013 LEDG 4 bytes-write-only
0x10000020 0x10000023 HEX3_HEX0 4 bytes-write-only
0x10000030 0x10000033 HEX7_HEX4 4 bytes-write-only
0x10000040 0x10000043 SW 4 bytes-read-only
0x10000050 0x1000005F KEY_BASE 4 bytes data
+12 bytes control
.. other I/O
128 MB
16 MB
112 MB
25
6 M
B
28
4.12.2014
15
I/O DE2 Basic & Media Computer
0x10000000 LEDR_BASE
0x10000010 LEDG_BASE,
0x10000020 HEX3_HEX0_BASE
0x10000030 HEX7_HEX4_BASE
0x10000040 SW_BASE
0x10000050 KEY_BASE
...others I/O...
Literatura on DVD: DE2_Media_Computer.pdf 29
Switch and Push Button
SW_BASE
KEY_BASE
30
4.12.2014
16
LED
LEDR_BASE
LEDG_BASE
31
HEX
HEX3_HEX0_BASE
HEX7_HEX4_BASE
32
4.12.2014
17
Wiki: Memory Mapped IO Memory-mapped I/O (MMIO) and port I/O (also called isolated I/O or port-mapped
I/O abbreviated PMIO) are two complementary methods of performing input/output between the CPU and peripheral devices in a computer. An alternative approach, not discussed here, is using dedicated I/O processors — commonly known as channels on mainframe computers — that execute their own instructions.
Memory-mapped I/O (not to be confused with memory-mapped file I/O) uses the same address bus to address both memory and I/O devices ‒ the memory and registers of the I/O devices are mapped to (associated with) address values. So when an address is accessed by the CPU, it may refer to a portion of physical RAM, but it can also refer to memory of the I/O device. Thus, the CPU instructions used to access the memory can also be used for accessing devices. Each I/O device monitors the CPU's address bus and responds to any CPU access of an address assigned to that device, connecting the data bus to the desired device's hardware register. To accommodate the I/O devices, areas of the addresses used by the CPU must be reserved for I/O and must not be available for normal physical memory. The reservation might be temporary — the Commodore 64 could bank switch between its I/O devices and regular memory — or permanent.
Port-mapped I/O often uses a special class of CPU instructions specifically for performing I/O. This is found on Intel microprocessors, with the IN and OUT instructions. These instructions can read and write one to four bytes (outb, outw, outl) to an I/O device. I/O devices have a separate address space from general memory, either accomplished by an extra "I/O" pin on the CPU's physical interface, or an entire bus dedicated to I/O. Because the address space for I/O is isolated from that for main memory, this is sometimes referred to as isolated I/O.
[ http://en.wikipedia.org/wiki/Memory-mapped_I/O ]
33
Inside NIOS
FPGA
Avalo
n S
wit
ch
Fab
ric
UART
GPIO
Timer
SPI
SDRAM
Controller
On-Chip
ROM
On-Chip
RAM
Nios II
CPU
Debug Cac
he
4.12.2014
18
NIOS Simplified System
.text
.data
35
NIOS: Common Register Usage
Normal usage Name Reg
0x0000_0000 zero r0
Assembler Temporary at r1
r7
r6
r5
r4
r3
r2
r14
r15
r14
r12
r11
r10
r9
r8
r23
r22
r21
r20
r19
r18
r17
r16
Normal usage Name Reg
Break Temporary bt r25
Exception Temporary et r24
Frame Pointer fp r28
Stack Pointer sp r27
Global Pointer gp r26
Break Return Address ba r30
Return Address ra r31
Exception Return Address ea r29
36
4.12.2014
19
NIOS Assembler
is GNU assembler http://tigcc.ticalc.org/doc/gnuasm.html
http://sources.redhat.com/binutils/docs-2.12/as.info/
by Altera monitor – lieratura on DVD • Doporuceno_Altera_Monitor_Program_Tutorial.pdf
• n2cpu_nii5v1.pdf – instruction set reference
37
General Processor Instruction Formats
Three-Address Instructions
ADD o1, o2, o3 o1 ← o2 + o3
Two-Address Instructions
MOV o1, o2 o1 ← o2
One-Address Instructions
NEXTPC o1 o1 ← PC + 4
Zero-Address Instructions
NOP r1 ← r1 add r0 (NIOS)
R-Type - operands oi are registers only
I-Type - instruction contains an immediate value
C-Type - control instruction
38
4.12.2014
20
NIOS Addressing Modes
(b) Immediate addressing Instruction contains the
operand
Op -20 movi r2, -20
(a) Register direct addressing Register contains the operand
Op R1 mov R2,R1
R2 R1
Operand R2
(c) Displacement (or offset)
addressing Address of operand =
register + constant
stw R2, byte_offset(R1)
stw R2, 100(R1) : R2 → M[R1+100]
Operand
Memory M
Op R2
Address R1
100
Op +
39
RISC vs CISC - Pentium Addressing Modes
[Source: Intel co.] LA=linear address ~ EA
Only for self-study
40
4.12.2014
21
Load/Store
ldw rB, im-const(rA) : rB ← MEM[rA + im-const]
ldw (load 32-bit word from memory)
stw (store 32-bit word into memory)
stw rB, im-const (rA) : rB → MEM[rA + im-const]
41
Example - demo1.s /*.include "nios_macros.s"*/
.equ LEDR_BASE, 0x10000000
.equ SW_BASE, 0x10000040
.text/* executable code follows */
.global_start
_start:
/* initialize base addresses of parallel ports */
movia r10, SW_BASE/* SW slider switch base address */
movia r12, LEDR_BASE/* red LED base address */
LOOP:
ldwio r8, 0(r10)
stwio r8, 0(r12)
br LOOP
.end
42
Pseudo-instruction
movia r12, 0x10000040
=>
orhi r12, r0, 0x1000
addi r12, r0, 0x0040
4.12.2014
22
Some GNU assembler directives
.include insert another file
.text this directive tells the assembler to place the following statements at the end of the code section
.global this directive makes the symbol visible to the program loading instructions into memory.
.data this directive tells the assembler to place the following statements at the end of the data section.
.equ this directive set the value of a symbol.
.word this directive is used to set the content of memory locations to the specified value.
.end Marks the end of current assembly program.
43
Data Types in Instructions
NIOS II registers hold 32-bit (4-byte) words.
Byte = 8 bits - example: ldb / ldbu + ldbio / ldbuio
- load byte extend signed/unsigned
Halfword = 2 bytes example: - ldh / ldhu + ldhio / ldhuio
- load halfword extend signed/unsigned
Doubleword = 8 bytes - user defined instructions only
Other common data sizes include byte and halfword.
Word = 4 bytes - example: - ldw + ldwio - load word
u-unsigned extension io - bypass cashe 44
4.12.2014
23
Unsigned/Signed Extension
• • • X
X • • • • • •
• • •
m
m k
signed
• • • X
Xu' • • • • • •
• • •
m k
0
unsigned
45
ALU Instructions
xori
ori
andi
subi
addi
nor
xor
or
and
sub
add
R-format
xorhi XOR
NOR
orhi OR
andhi AND
subtract
add
I-format operation
pseudo-instruction addi rB ← rA + se(number)
Logic instructions AND, OR, XOR, NOR
do not use se = sign-extension
muli mul multiply mulxuu,mulxsu
mulxss High 32bits
of 64bits High 32bits of
64bit result
Lower 32bits
of 64bit result
46
4.12.2014
24
Increasing Performance
Image source: Tron
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off.
and Altera marks in and outside the U.S.
3 Ways to Increase Performance
Custom Instructions
FPGA
Nios II
Custom Instructions
Accelerate CPU
processing
performance with
application-
specific hardware
Add external
co-processing
hardware to
accelerate data
functions
FPGA
Nios II
Hardware Accelerator
Hardware Accelerators
External CPU or
DSP
Multi-Processor System
PC
Ie
FPGA
Add more
processors
(internal &/or
external) to increase
processing power
Nios II
Nios II
Nios II
48
4.12.2014
25
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off.
and Altera marks in and outside the U.S.
Accelerating Software in FPGA
Accelerator 0
500
1,000
1,500
2,000
2,500
Itera
tio
ns
/Seco
nd
Software Only
Custom Instruction
27 Times
Faster
530 Times
Faster
Program Memory
Nios II
Data Memory
Arbiter
Data Memory
Arbiter
DM
A
Accelerator
DM
A
Control
Custom Instruction • Accelerator running 64Kb CRC at 100 MHz
(CRC can be easily calculated on a Linear
Feedback Shift Register, see the next lecture)
C - Out << >>
&
Custom Logic
+ -
A
B
Nios II Embedded Processor
49
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off.
and Altera marks in and outside the U.S.
Altera C2H – the SW Accelerator Solution
Generates and integrates a custom hardware
accelerator from an ANSI C function.
Program Memory
CPU
Data Memory
Arbiter
Data Memory
Arbiter
C2H
Accelerator
50
4.12.2014
26
Expander created from modified VGAFrame
51
VCCCLOCK_27 INPUT
VCCSW[17..0] INPUT
TD_RESETOUTPUT
VGA_R[9..0]OUTPUT
VGA_G[9..0]OUTPUT
VGA_B[9..0]OUTPUT
VGA_CLKOUTPUT
VGA_BLANKOUTPUT
VGA_HSOUTPUT
VGA_VSOUTPUT
VGA_SYNCOUTPUT
LEDR[17]OUTPUT
LEDR[16]OUTPUT
LEDR[15..12]OUTPUT
LEDR[9..0]OUTPUT
LEDR[11..10]OUTPUT
VC
C
Cyclone II
inclk0 frequency: 27.000 MHz
Operation Mode: Normal
Clk Ratio Ph (dg) DC (%)
c0 1007/1080 0.00 50.00
inclk0 c0
locked
altpll0
inst
CLK_25MHz175
ACLRN
yrow[9..0]
xcolumn[9..0]
VGA_CLK
VGA_BLANK
VGA_HS
VGA_VS
VGA_SYNC
VGAsynchro
inst4
yrow[9..0]
xcolumn[9..0]
VGA_CLK_in
VGA_BLANK_in
VGA_HS_in
VGA_VS_in
VGA_SYNC_in
ACLRN_in
datain[17..0]
VGA_R[9..0]
VGA_G[9..0]
VGA_B[9..0]
VGA_CLK
VGA_BLANK
VGA_HS
VGA_VS
VGA_SYNC
tb_yrow[9..0]
tb_xcolumn[9..0]
VGANiosRectangle
inst1
VC
C
GND
GND
VC
C
Freq25MHz175
datain(17) - enable write; datain(15 downto 12) - memory index
(0-x0, 1-y0, 2-xwidth, 3-yheight; 4 to 6 - RGB color of background, 8 to 10 - RGB color of the rectangle)
show bits of datain regions
VGANiosRectangle
52
yrow[9..0]
xcolumn[9..0]
VGA_CLK
ACLRN
datain[17..0]
VGA_R[9..0]
VGA_G[9..0]
VGA_B[9..0]
store:process(VGA_CLK, ACLRN)
frame:process(VGA_CLK)
subtype datain_t is unsigned(9 downto 0);
type memory_t is array(0 to 15) of datain_t;
signal memory : memory_t;;
x0,y0
xwidth
yh
eig
ht
0 1 2 3 4 5 6 8 9 10
x0 y0 xw yh R G B Rb Gb Bb
paramaters of picture
4.12.2014
27
Top entity of Basic Computer
53
DE2 Basic Computer Schema Detail
CLOCK_50
CLOCK_27
KEY[3..0]
SW[17..0]
UART_RXD LEDG[8..0]
LEDR[17..0]
HEX0[6..0]
HEX1[6..0]
HEX2[6..0]
HEX3[6..0]
HEX4[6..0]
HEX5[6..0]
HEX6[6..0]
HEX7[6..0]
SRAM_ADDR[17..0]
SRAM_CE_N
SRAM_WE_N
SRAM_OE_N
SRAM_UB_N
SRAM_LB_N
UART_TXD
DRAM_ADDR[11..0]
DRAM_BA_1
DRAM_BA_0
DRAM_CAS_N
DRAM_RAS_N
DRAM_CLK
DRAM_CKE
DRAM_CS_N
DRAM_WE_N
DRAM_UDQM
DRAM_LDQM
GPIO_0[35..0]
GPIO_1[35..0]
SRAM_DQ[15..0]
DRAM_DQ[15..0]
DE2_Basic_Computer
inst
VCCCLOCK_50 INPUT
VCCCLOCK_27 INPUT
VCCKEY[3..0] INPUT
VCCSW[17..0] INPUT
VCCUART_RXD INPUT LEDG[8..0]OUTPUT
LEDR[17..0]OUTPUT
HEX0[6..0]OUTPUT
HEX1[6..0]OUTPUT
HEX2[6..0]OUTPUT
HEX3[6..0]OUTPUT
HEX4[6..0]OUTPUT
HEX5[6..0]OUTPUT
HEX6[6..0]OUTPUT
HEX7[6..0]OUTPUT
SRAM_ADDR[17..0]OUTPUT
SRAM_CE_NOUTPUT
SRAM_WE_NOUTPUT
SRAM_OE_NOUTPUT
SRAM_UB_NOUTPUT
SRAM_LB_NOUTPUT
UART_TXDOUTPUT
DRAM_ADDR[11..0]OUTPUT
DRAM_BA_1OUTPUT
DRAM_BA_0OUTPUT
DRAM_CAS_NOUTPUT
DRAM_RAS_NOUTPUT
DRAM_CLKOUTPUT
DRAM_CKEOUTPUT
DRAM_CS_NOUTPUT
DRAM_WE_NOUTPUT
DRAM_UDQMOUTPUT
DRAM_LDQMOUTPUT
VCCGPIO_0[35..0]BIDIR
VCCGPIO_1[35..0]BIDIR
VCCSRAM_DQ[15..0]BIDIR
VCCDRAM_DQ[15..0]BIDIR
54
4.12.2014
28
Connected Accelerator
55
VCCCLOCK_50 INPUT
VCCCLOCK_27 INPUT
VCCKEY[3..0] INPUT
VCCSW[17..0] INPUT
VCCUART_RXD INPUT LEDG[8..0]OUTPUT
LEDR[17..0]OUTPUT
HEX0[6..0]OUTPUT
HEX1[6..0]OUTPUT
HEX2[6..0]OUTPUT
HEX3[6..0]OUTPUT
HEX4[6..0]OUTPUT
HEX5[6..0]OUTPUT
HEX6[6..0]OUTPUT
HEX7[6..0]OUTPUT
SRAM_ADDR[17..0]OUTPUT
SRAM_CE_NOUTPUT
SRAM_WE_NOUTPUT
SRAM_OE_NOUTPUT
SRAM_UB_NOUTPUT
SRAM_LB_NOUTPUT
UART_TXDOUTPUT
DRAM_ADDR[11..0]OUTPUT
DRAM_BA_1OUTPUT
DRAM_BA_0OUTPUT
DRAM_CAS_NOUTPUT
DRAM_RAS_NOUTPUT
DRAM_CLKOUTPUT
DRAM_CKEOUTPUT
DRAM_CS_NOUTPUT
DRAM_WE_NOUTPUT
DRAM_UDQMOUTPUT
DRAM_LDQMOUTPUT
TD_RESETOUTPUT
VGA_R[9..0]OUTPUT
VGA_G[9..0]OUTPUT
VGA_B[9..0]OUTPUT
VGA_CLKOUTPUT
VGA_BLANKOUTPUT
VGA_HSOUTPUT
VGA_VSOUTPUT
VGA_SYNCOUTPUT
VCCGPIO_0[35..0]BIDIR
VCCGPIO_1[35..0]BIDIR
VCCSRAM_DQ[15..0]BIDIR
VCCDRAM_DQ[15..0]BIDIR
CLOCK_50
CLOCK_27
KEY[3..0]
SW[17..0]
UART_RXD LEDG[8..0]
LEDR[17..0]
HEX0[6..0]
HEX1[6..0]
HEX2[6..0]
HEX3[6..0]
HEX4[6..0]
HEX5[6..0]
HEX6[6..0]
HEX7[6..0]
SRAM_ADDR[17..0]
SRAM_CE_N
SRAM_WE_N
SRAM_OE_N
SRAM_UB_N
SRAM_LB_N
UART_TXD
DRAM_ADDR[11..0]
DRAM_BA_1
DRAM_BA_0
DRAM_CAS_N
DRAM_RAS_N
DRAM_CLK
DRAM_CKE
DRAM_CS_N
DRAM_WE_N
DRAM_UDQM
DRAM_LDQM
GPIO_0[35..0]
GPIO_1[35..0]
SRAM_DQ[15..0]
DRAM_DQ[15..0]
DE2_Basic_Computer
inst
VC
C
Cyclone II
inclk0 frequency: 27.000 MHz
Operation Mode: Normal
Clk Ratio Ph (dg) DC (%)
c0 1007/1080 0.00 50.00
inclk0 c0
locked
altpll0
inst2
CLK_25MHz175
ACLRN
yrow[9..0]
xcolumn[9..0]
VGA_CLK
VGA_BLANK
VGA_HS
VGA_VS
VGA_SYNC
VGAsynchro
inst4
yrow[9..0]
xcolumn[9..0]
VGA_CLK_in
VGA_BLANK_in
VGA_HS_in
VGA_VS_in
VGA_SYNC_in
ACLRN_in
datain[17..0]
VGA_R[9..0]
VGA_G[9..0]
VGA_B[9..0]
VGA_CLK
VGA_BLANK
VGA_HS
VGA_VS
VGA_SYNC
tb_yrow[9..0]
tb_xcolumn[9..0]
VGANiosRectangle
inst1
Freq25MHz175
E(x)ternal
Influence Q: – How to transfer
signals depended on different clocks?
A: If you look at the picture you will see the
key to this solution: You just need second hands for automatic
shaking:-)
56 [Image: 9gag.com]
4.12.2014
29
57
Synchronize Signal with 2nd Clock
Signal-
clk1
source
Hand
Snake
communication channel
data
RDY new data are ready
Clk1 Clk2 2 unsynchronized clocks
Send without implicit hand-shaking
(cz: nepotvrzovaný přenos dat)
It is similar to UDP on networks
(UDP - User Datagram Protocol)
58
Handshaking
Clk1
data
Signal-
clk1
source
register
a signal synchronized
by the 1st clock
RDY
ACK
- acknowledge
read by hand-shaking
(cz: kladně potvrzovaný přenos) like TCP = Transmission Control Protocol
Han
dsh
ake
FS
M
clk=5kHz
SLOAD
Data
Clk2
4.12.2014
30
59
Handshake with stable data
Hand
shake
RDY
ACK
data
ACK 5
6
7
Signal-
clk1
source
Register
1
5 5
RDY is synchronized with clk1, but not with clk2
1
4 RDY
request
3
1 1
clk1
data
2
4
Clk1
request
clk2 SLOAD
FSM for Handshaking Moore
Machine ACK RDY
Clock1
S0
S1 RDY
Clk1-signal ='1'
Clk1-signal ='0' and ACK='0'
S2 ACK='1'
Clk1-signal='0'
ACK='0'
Clk1-signal ='1' or ACK='1'
4.12.2014
31
61
Handshaking for
unstable data
clk1
data
Signal-
clk1
source
Shift
register RDY
init1
ACK
- acknowledge H
an
dsh
ake
FS
M
clk2
SLOAD
62
Handshake with unstable data
Hand
shake RDY
ACK
init0 init1
ACK
clk1
5 6
7
Signal-
clk1
source
Shift
1
5 5
RDY is synchronized with clk1, but not with clk2
1
4 RDY
request
3
1 1
init1
clk1
init0
2
4
KEY request clk2
SLOAD
4.12.2014
32
SPS 63
rdy request
ack
data0
clock
aclrn data1
Handshake for unstable data (1/4)
signal stReg
signal
stNext Sta
teR
egis
ter
pro
cess
Nex
tSta
te
pro
cess
data1 <=dataReg;
signal
dataReg signal loadData
ARCHITECTURE rtl of handshake is
Input ports
of ENTITY
Output
ports of
ENTITY
ENTITY handshake is
SPS 64
Handshake for unstable data (2/4)
LIBRARY ieee; USE ieee.std_logic_1164.all;
ENTITY handshake is
PORT( ack, request : IN STD_LOGIC;
data0 : in std_logic_vector(15 downto 0);
clock, aclrn : IN STD_LOGIC;
rdy : OUT STD_LOGIC;
data1 : out std_logic_vector(15 downto 0) );
end handshake;
ARCHITECTURE rtl of handshake is
type state is (s0, s1, s2);
signal stReg, stNext : state;
signal dataReg:std_logic_vector(15 downto 0);
signal loadData:std_logic;
begin StateRegister: PROCESS(clock, aclrn) BEGIN...END PROCESS;
NextState: PROCESS(stReg, ack, request) BEGIN...END PROCESS;
data1<=dataReg;
end rtl;
L:Libraries
EP: entity-ports
AD: architecture
declarations
ACS: architecture concurrent
statements There are 3
concurrent statements:
StateRegister:process,
NextState:process, and
<=
4.12.2014
33
SPS 65
Handshake for unstable data (3/4)
StateRegister: PROCESS(clock, aclrn) begin
if (aclrn = '0') then stReg <= s0; dataReg<=(others=>'0'); elsif rising_edge(clock) then
stReg <= stNext; if loadData='1' then dataReg<=data0; end if; end if; end PROCESS;
PS: process
-synchronous part
PC: process-clock test
PH: process-header with sensitivity list
SPS 66
Handshake for unstable data (4/4)
NextState: PROCESS(stReg, ack, request)
begin
rdy<='0'; loadData <= '0' ;
case stReg is
when s0 => if request = '1' then stNext <= s1;loadData <= '1' ;
else stNext <= s0; end if;
when s1 => rdy<='1';
if ack = '1' then stNext <= s2;
else stNext <= s1; end if;
when s2 => if request = '0' and ack = '0' then stNext <= s0;
else stNext <= s2; end if;
when others => report "Undefined state";
end case;
end PROCESS;
PD:=
process-
dataflow
part
PH: process-header with
sensitivity list
4.12.2014
34
SPS 67
ack
clk
NextState:OUT0
request
reset
s0
s1
0
1
0
D Q
PRE
ENA
CLR
loadData
dataReg[15..0]
ack
clock
aclrn
rdy
data1[15..0]
stReg
NextState
data0[15..0]
request
RTL
s1 s2stReg~1
s0
SPS 68
RDY
Clk2
Asynchronous
clear
active-low
D
CLK
Q
CLRN
D
CLK
Q
CLRN
D
CLK
Q
CLRN
EN
DATA
synchronní
z clk2
Příjem signálu handshake DFFE-registry
DFF DFF
ACK
Data-
CLK1
SLOAD
Synchronizátor není nutný,
ale velmi doporučený
Obvod
potvrzení
4.12.2014
35
NIOS gcc
programming in low-level C language
69
70
optional
parts
optional
parts
Basic Steps of C Compiler
code.C
NIOS
processor Linker
NIOS gcc C-compiler passes 1 to M
C preprocessor modifies source code by substitutions
metacode
code.tmp
Compiler passes M to N: Optimization of metacode
relative object
module
Loader
system
libraries
string.h stdio.h code.h
70
4.12.2014
36
71
C primitive types
1) In many implementations, it is not a standard C datatype, but only common
custom for user's "#define" macro definitions, see next slides
Size Java C C alternative Range
1 boolean any integer, true if !=0 BOOL(1 0 to !=0
8 byte char(2 signed char –128 to +127
8 unsigned char BYTE(1 0 to 255
16 short int signed short –32768 to
+32767
16 unsigned short 0 to + 65535
32 int int signed int -2^31 to 2^31-1
32 unsigned int DWORD(1 0 to 2^32-1
64 long long long int -2^63 to 2^63-1
64 unsigned long LWORD(1 0 to 2^64-1
2) Default is signed but it is best to specify.
Sign Extension Example in C
short int x = 15213;
int ix = (int) x;
short int y = -15213;
int iy = (int) y;
Decimal Hex Binaryx 15213 3B 6D 00111011 01101101
ix 15213 00 00 C4 92 00000000 00000000 00111011 01101101
y -15213 C4 93 11000100 10010011
iy -15213 FF FF C4 93 11111111 11111111 11000100 10010011
72
4.12.2014
37
73
NIOS C #define / const
#define TENDEF 10 //substitution rule
read only access
const int TEN2 = TEN; /* Error: initializer element is not constant
Note: OK in C++, but C variables must not initialize constants * /
const ~ static final in Java
const int TEN=10;
const int TEN3 = TENDEF; // OK
/* after preprocessor run:
const int TEN3 = 10;
*/
74
Definition of BYTE and BOOL
// by substitution rule no ; and no type check
#define BYTE unsigned char
#define BOOL int
// by introducing new type, ending ; is required
typedef unsigned char BYTE;
typedef int BOOL;
C language has no strict type checking #define ~ typedef,
but typedef is usually better integrated into compiler.
4.12.2014
38
75
Bitwise Logic Instructions
AND
OR
XOR
left shift
right shift
NOT
&
|
^
<<
>>
~
n = n | 0xFF00;
Examples:
In C, operator << usually behaves as logical for unsigned
operand and as arithmetic shift for signed operands.
n = n & 0xF0;
n = n >> 4
n = ~0xFF;
n = 0xFF << 4;
n = n ^ 0x80;
NIOS Shift Instructions
sll rC, rA, rB rC ← rA << (rB[4..0])
slli rC, rA, sh rC ← rA << (sh)
---
roli
srai
srli
slli
ror
rol
sra
srl
sll
Register Immediate
right
left
right
right
left
dir
rotate
rotate
arithmetic
logical
logical
shift
barrel shifter
barrel shifter
signed / 2
C >> ; unsigned/2
C << ; *2
note
Immediate
76
4.12.2014
39
77
NIOS and C
1. ORI r4, r2, 0x30
2. ANDI r4, r2, 0x0F
3. XORI r4, r2, 0b10000000
4. NOR r4, r2, r0 /* r4←not r2 */
5. SLLI r4, r2, 2
6. SRLI r4, r2, 3
7. SRAI r4, r2, 4
8. Rotations ROR and ROL has no direct C equivalents.
1. x4 = x2 | 0x30 ;
2. x4 = x2 & 0x0F ;
3. x4 = x2 ^ 0b10000000;
4. x4 = ~x2
5. x4 = x2<<2;
6. x4 = (unsigned int) x2>>3
7. x4 = (signed int) x2>>3;
int x4, x2=-10;
Example - demo2.s
/*.include "nios_macros.s"*/
.equ LEDR_BASE, 0x10000000
.equ SW_BASE, 0x10000040
.text/* executable code follows */
.global_start
_start:
/* initialize base addresses of parallel ports */
movia r10, SW_BASE/* SW slider switch base address */
movia r12, LEDR_BASE/* red LED base address */
LOOP:
ldwio r8, 0(r10)
slli r8, r8, 1 stwio r8, 0(r12)
br LOOP
.end
78
4.12.2014
40
79
volatile keyword
int main(void)
{ // no assumption about value
volatile int x4, x2= -10;
x4 = x2<<2;
x4 = (unsigned int) x2>>3;
x4 = (signed int) x2>>3;
}
int main(void)
{ int x4, x2= -10; //0xfffffff8
x4 = x2<<2;
x4 = (unsigned int) x2>>3;
// x4 = 0x1ffffffe
x4 = (signed int) x2>>3;
// x4=0xfffffffe
}
volatile
winged animal
from plural
of Latin volatilis
= winged
optimized
code
79
Volatile
When the keyword volatile is specified for a variable,
it gives a hint to the compiler not to do certain
optimizations as this variable can change from other
parts of the program unexpectedly.
What is meant here, is that the compiler should not
reuse the value already loaded in a register, but
access the memory again as the value in register is
not guaranteed to be the same as the value stored in
memory.
All addresses of IO devices should be volatile.
[Source: C-reference]
80
4.12.2014
41
81
Pointer Operators
& (address operator)
Returns the address of its operand
Example int y = 5;
int *yPtr;
yPtr = &y; // yPtr gets address of y
yPtr “points to” y
yPtr
y
5
yptr
500000 600000
y
600000 5
address of y
is value of
yptr
82
Pointer Operators
& (address operator)
Returns the address of its operand
* dereference address
Get operand stored in address location
* and & are inverses
(though not always applicable)
Cancel each other out
*&myVar == myVar
and
&*yPtr == yPtr
4.12.2014
42
83
Size of Pointer in C-kod
int * ptri; char * ptrc; double * ptrd;
-
+
ptri
ptri+1
ptrc
ptrc+1
ptrd
ptrd+1
*ptrx ≡ ptrx[0]
*(ptrx+1) ≡ ptrx[1]
*(ptrx+n) ≡ ptrx[n]
*(ptrx-n) ≡ ptrx[-n]
nr1 = sizeof (double);
nr2 = sizeof (double*);
nr1 != nr2
84
Edge Detector – Full Source Code .equ IOBASE, 0x10000000
.equ LEDG, 0x10
.equ SW, 0x40
.text
.global _start
_start:
movia r8, IOBASE
mov r18, zero
LOOP:
ldwio r15, SW(r8)
nor r9, zero, r17 /* r9:=not r17*/
and r9, r9, r16
and r9, r9, r15
cmpne r9, r9, zero
xor r18, r18, r9
stwio r18, LEDR(r8)
mov r17, r16
mov r16, r15
br LOOP
.end
Memory Address
= C pointer
type * pointer;
Dereferencing
Pointer
* (pointer+ix)
≡ pointer[ix]
4.12.2014
43
85
Char [ ]
C-kod:
char text[] = ”Hello World!”;
/* text = &text[0] */
Assembler:
.data
text: .asciz ”Hello World!”
’l’
’l’
’H’
’a’
’W’
’o’
’o’
’ ’
’d’
’r’
’l’
’!’
text:
text[4]
0
1
2
3
4
5
6
7
8
9
10
11
12
text[7]
0x00
86
int [ ]
C-kod:
int vect[] = {1, 10, 0xA};
/* vect = &(vect[0]) */
Assembler:
.data
.align 2
vect: .word 1, 10, 0xA
#little endian byte ordering
0x00
0x00
0x01
0x00
0x00
0x00
0x0A
0x00
0x0A
0x00
0x00
vect:
*(vect+1) ≡ vect[1]
0
1
2
3
4
5
6
7
8
9
10
0x00
*(vect+2) ≡ vect[2]
4.12.2014
44
87
Pointer and const
int x;
int * lpio = 0x10000000;
*lpio = 1; x=*lpio; lpio++;
const int * lpCio = 0x10000000;
*lpCio = 1; x=*lpCio; lpCio++;
int * const lpioC = 0x10000000;
*lpioC = 1; x=*lpioC; lpioC++;
const int * const lpCioC = 0x10000000;
*lpCioC = 1; x=*lpCioC; lpCioC++;
VGARectangle
DE2 Computer Extender
4.12.2014
45
C-program
//#define DEBUG volatile int * red_LED_ptr = (int *) 0x10000000;
volatile int * SW_switch_ptr = (int *) 0x10000040;
void WrData(int address, int data); // WrData to extender
int Count(int * counter, int * updown, int max); // increment/decrement counter on updown bits
int main(void) {
int x0=0, y0=0, xwidth=1, yheight=1;WrData(0, x0);
WrData(1, y0); WrData(2, xwidth); WrData(3, yheight); // initilize extender
while(1) {
int SW_value = *(SW_switch_ptr); // read the SW slider switch values
if(Count(&x0, &SW_value, 640)) WrData(0, x0);
if(Count(&y0, &SW_value, 480)) WrData(1, y0);
if(Count(&xwidth, &SW_value, 640)) WrData(2, xwidth);
if(Count(&yheight, &SW_value, 480)) WrData(3, yheight);
int delay_count = 500000; while(delay_count) --delay_count; // delay loop
}
}
89
Functions // Write on Address of extender data
void WrData(int address, int data) {
int send = (data & 0x3FF) | ((address&3)<<14); // address 15. and 14. bit + data
*red_LED_ptr=send; *red_LED_ptr=(send | 0x20000); // toggle 17. bit
int delay=10; while(delay>0) delay--;
*red_LED_ptr=send; // reset toggle
#ifdef DEBUG
printf("%d:%d\t",address, data);
#endif
}
// increment/decrement counter on updown bits
int Count(int * counter, int * updown, int max)
{ int ud=*updown, cntr=*counter, cntr0=cntr;
if((ud & 1) != 0 && cntr>0) cntr--;
if((ud & 2) != 0 && cntr<max) cntr++;
*updown=ud>>2; *counter=cntr;
return cntr!=cntr0; // move to next value
}
90
4.12.2014
46
Edge Detection in C
Optimal / Non-optimal code
This parts demonstrates links between
assembler and C program
4.12.2014
47
NIOS: Common Register Usage in C Normal usage Name Reg
0x0000_0000 zero r0
Assembler Temporary at r1
Register Arguments (Fourth 32 bits) r7
Register Arguments (Third 32 bits) r6
Register Arguments (Second 32 bits) r5
Register Arguments (First 32 bits) r4
Return Value (most significant 32 bits) r3
Return Value (least-significant 32 bits) r2
r14
r15
r14
r12
r11
r10
General-Purpose Registers r9
Caller-Saved r8
r23
r22
r21
r20
r19
r18
General-Purpose Registers
r17
Callee-Saved r16
Normal usage Name Reg
Break Temporary bt r25
Exception Temporary et r24
Frame Pointer fp r28
Stack Pointer sp r27
Global Pointer gp r26
Break Return Address ba r30
Return Address ra r31
Exception Return Address
ea r29
93
94
VHDL 32 bit edge detector
library ieee; use ieee.std_logic_1164.all;
entity re32d2 is port(data : in std_logic_vector(31 downto 0);
clk : in std_logic; q : out std_logic);
end entity;
architecture rtl of re32d2 is
signal r15, r16, r17: std_logic_vector(31 downto 0); --the shift register
signal r18: std_logic:='0'; --output flip-flop
begin
process (clk)
begin
if (rising_edge(clk)) then
if((r15 and r16 and (not r17))/=X"00000000") then r18<=not r18;
end if;
r17<=r16; r16<=r15; r15<=data; --shift
end if;
end process;
q<=r18; --write output
end rtl;
4.12.2014
48
NIOS SW Edge Detector
1. ldwio r15,SW(r8)
2. nor r9, zero, r17
3. and r9, r9, r16
4. and r9, r9, r15 9. mov r16, r15
8. mov r17, r16
7. stwio r18, LEDR(r8)
6. xor r18, r18, r9
5. cmpne r9, r9, zero
r9
VCC sw[17..0] INPUT
VCC CLOCK_50 INPUT
LEDG[0] OUTPUT
DFF data[31..0]
clock
aclr
q[31..0]
lpm_tff0
inst3
DFF data[31..0]
clock aclr
q[31..0]
lpm_tff0
inst4
DFF data[31..0]
clock
aclr
q[31..0]
lpm_tff0
inst5
NOT
inst10
32
inst7
CLRN
D PRN
Q
DFF
inst6
VC
C
VC
C
Even_Divisor 5000 Unsigned Integer
Parameter Value Type
CLK q
FreqDivByEven
inst15
32
32
32
32
lpm
_a
nd
0
inst1
4
GN
D
GND
q[31..0]
data[31..18]
data[31..0]
data[17..0]
F=10kHz
XOR
inst1
lpm_or32to1
r16 r17
r18
r15 1
2
3 4
5
6
7
8 9
r9
95
96
Low-level C-code ~ Assembler
const int IXLEDG=0x10/sizeof(int);
const int IXSW =0x40/sizeof(int);
int main(void)
{
int reg17, reg16, reg15;
volatile int * const IOBASE= 0x10000000;
int reg18=0;
while(1)
{ reg15 = IOBASE[IXSW];
if(reg15 & reg16 & ~reg17)
reg18= reg18^1;
// store result
IOBASE[IXLEDG] = reg18;
reg17=reg16; // shift
reg16=reg15;
}
}
.equ IOBASE, 0x10000000
.equ LEDG, 0x10
.equ SW, 0x40
.text
.global _start
_start:
movia r8, IOBASE
mov r18, zero
LOOP:
ldwio r15, SW(r8)
nor r9, zero, r17 /* r9:=notr17*/
and r9, r9, r16
and r9, r9, r15
cmpne r9, r9, zero
xor r18, r18, r9
stwio r18, LEDR(r8)
mov r17, r16
mov r16, r15
br LOOP
.end
It corresponds
to our
assembler
code
96
4.12.2014
49
97
Higher level C-code
int main(void)
{ volatile int * lpLEDG = (int *) 0x10000010; // red LED address
volatile int * lpSW= (int *) 0x10000040; // SW switch address
int SW_mem[3], SW_value, ledVal=0;
while(1)
{
SW_value = lpSW[0]; // *lpSW read the SW slider switch values
if(SW_mem[0] & SW_mem[1] & ~SW_mem[2])
ledVal= ledVal^1; // light up the green LEDs
lpLEDG[0] = ledVal;
int i; for(i=2; i>0; i--) SW_mem[i]=SW_mem[i-1];
SW_mem[0]=SW_value; // *SW_mem=SW_value
}
}
High level C-code is compiled into ~28 NIOS
instructions, the low-level into 15 NIOS instructions