computer architecture overvieesb/2018fall.ics332/aug22.pdf · 2018-08-22 · history von neumann...

HistoryVon Neumann Model

Fetch-Decode-Execute CycleSpeeding Things Up

Conclusion

Computer Architecture OverviewICS332 — Operating Systems

Henri Casanova ([email protected])

Spring 2018

Henri Casanova ([email protected]) Computer Architecture Overview



Conclusion

ENIACVon Neumann Model

1946 — ENIAC

Electronic Numerical Integrator And Computer aka “Giant Brain”

First electronic general-purpose computer

Before that, “were humans, who could use non-programmablemechanical and later electrical computation tools

Could be reprogrammed (Stored-Program Computer instead ofFixed-Program Computer)

Main sponsor: University of Pennsylvania / Ballistic ResearchLaboratory ($487k eq. 2016 $7M)

Designers: Mauchly and Eckert

First operators (i.e., programmers): The 6 “ENIAC Girls” (McNulty,Jennings, Snyder, Wescoff, Bilas, and Lichterman)




Conclusion


1946 — ENIAC (Features)

1000x faster than (specialized) electro-mechanical equivalent

2400x times faster than (specialized) human being (30 secondsinstead of 20 hours)

100 kHz / 5 kIPS (now: 4GHz / 5,000 MIPS)

1,000 bits of RAM (i.e., 0.12 KiB)

150 kW (now: 200W)

17,468 vacuum tubes (failure prone, power hungry)

8 × 3 × 100 ft; 27 metric tons (60,000 pounds)




Conclusion


1946 — ENIAC (Pictures)




Conclusion


Von Neumann

ENIAC design frozen in 1943; Eckert and Mauchly work on a newdesign: the EDVAC

1944: Von Neumann (1903-1957) joins Eckert and Mauchly, writes amemo formalizing their ideas

This became the Von Neumann Architecture Model

A Central Processing Unit performs operations and controls thesequence of operationsA Memory Unit contains code and dataSome kind of Input and Output mechanisms (I/O)




Conclusion

Von Neumann ModelMemory UnitCentral Processing Unit

Von Neumann Model

Amazingly it is still possible tothink of the computer this way at aconceptual level (model from ∼70years ago!)

CPU ⇐⇒ Memory

mI/O

Today a computer looks morelike:

Memory

CPU Disk Controller USB Controller Graphics Adapter




Conclusion


Von Neumann Model

Amazingly it is still possible tothink of the computer this way at aconceptual level (model from ∼70years ago!)

CPU ⇐⇒ Memory

mI/O

Today a computer looks morelike:

Memory

CPU Graphics AdapterUSB ControllerDisk Controller

Memory Bus




Conclusion


Von Neumann Model: Origins

1847: Boolean algebra – Truth value (true / false), Boolean logic,Bit (binary digit)

1937: Shannon’s MS Thesis – Any logical, numerical relationshipcan be built using Boolean algebra

Therefore, any “information” can be represented in binary form, andtherefore we can build computers that only understand binary

Building computers this way is technologically convenient:

0 Volt: False (0)∼5 Volt: True (1)




Conclusion


The Von Neumann Architecture

CPU ⇐⇒ MemorymI/O




Conclusion


Memory Unit

Called Memory or RAM (Random Access Memory) for short

I will say “memory” or “RAM” interchangeably

The basic unit of memory is the byte (or octet, or octad, or octade)

1 Byte = 8 bits, e.g., “0110 1011”




Conclusion


Memory Unit

The memory contains numerical “information” / “data” / “content”

Content3141

2592

167-5...




Conclusion


Memory Unit

The “data” are represented in memory in binary as bytes

Content (Human)0000 0011 30000 0001 10000 0100 40000 0001 10001 1001 250000 1001 90000 0010 21010 0111 1671111 1011 -5

... ...




Conclusion


Memory Unit

To be used, the data need to be located precisely in memory: addresses

Address Content (Human)0 0000 0011 31 0000 0001 12 0000 0100 43 0000 0001 14 0001 1001 255 0000 1001 96 0000 0010 27 1010 0111 1678 1111 1011 -5... ... ...




Conclusion


Memory Unit

... but because computers only understand binary, the addresses arebinary too:

Address Content (Human)0000 0000 0000 0011 30000 0001 0000 0001 10000 0010 0000 0100 40000 0011 0000 0001 10000 0100 0001 1001 250000 0101 0000 1001 90000 0110 0000 0010 20000 0111 1010 0111 1670000 1000 1111 1011 -5

... ... ...




Conclusion


Memory Unit

Each byte in memory is labeled by a unique address

We talk of a byte-addressable memory

All addresses on a computer have the same number of bits (e.g.,16-bit addresses)

The CPU has instructions like “Read the byte at address X and giveme its value” and “Write this value into the byte at address Y”

The Memory Unit (Bus + RAM) has the hardware to make theseinstructions happen




Conclusion


Conceptual View of Memory (16-bit addresses example)

Address Content0000 0000 0000 0000 0000 00110000 0000 0000 0001 0000 00010000 0000 0000 0010 0000 01000000 0000 0000 0011 0000 00010000 0000 0000 0100 0000 01010000 0000 0000 0101 0000 10010000 0000 0000 0110 0000 00100000 0000 0000 0111 0000 01100000 0000 0000 1000 0000 0101

... ...1111 1111 1111 1111 0010 0101

At address0000 0000 0000 0011the content is0000 0001

(The contents of uninitial-ized memory are random)




Conclusion



Let’s consider a memory 8-bit addresses with this initial state.

We can write a program that does “At address 1000 0000, store theaddress of the first ’9’ (0000 1001) in memory”

Address Content0000 0000 0000 00110000 0001 0000 00010000 0010 0000 01000000 0011 0000 00010000 0100 0000 01010000 0101 0000 10010000 0110 0000 00100000 0111 0000 01100000 1000 0000 0101

... ...1000 0000 0110 01011000 0001 1001 0111

=⇒

Address Content0000 0000 0000 00110000 0001 0000 00010000 0010 0000 01000000 0011 0000 00010000 0100 0000 01010000 0101 0000 10010000 0110 0000 00100000 0111 0000 01100000 1000 0000 0101

... ...1000 0000 0000 01011000 0001 1001 0111




Conclusion



Let’s consider a memory 8-bit addresses with this initial state.We can write a program that does “At address 1000 0000, store theaddress of the first ’9’ (0000 1001) in memory”

Address Content0000 0000 0000 00110000 0001 0000 00010000 0010 0000 01000000 0011 0000 00010000 0100 0000 01010000 0101 0000 10010000 0110 0000 00100000 0111 0000 01100000 1000 0000 0101

... ...1000 0000 0110 01011000 0001 1001 0111

=⇒

Address Content0000 0000 0000 00110000 0001 0000 00010000 0010 0000 01000000 0011 0000 00010000 0100 0000 01010000 0101 0000 10010000 0110 0000 00100000 0111 0000 01100000 1000 0000 0101

... ...1000 0000 0000 01011000 0001 1001 0111




Conclusion


Indirection

An address is just information

In the previous slide we’ve done indirection

The content at a memory location is the address of another memorylocation: we call this a pointer/referenceAt that other memory location is some content that we care about

which in our case is the value ’9’but which could be yet another address

It’s the job of the programmer to know what memory content means(the CPU has no idea), which is a source of bugs

Very well-known difficulty when writing assembly (ICS312/ICS331)High-level programming languages help, but in C you can dowhatever:

e.g., on a 64-bit architecture a C pointer is simply an unsigned long

unsigned long x = 42;

int *ptr = (int *)x; // bogus pointer!




Conclusion


Hello World! (Well... not really)

Let’s consider the following pseudo-code:

Step 1) Set the content of variable A to the content at address 1000 0000Step 2) Set the content of variable B to the content at address 1000 0001Step 3) Add A and B together and store the result in AStep 4) Set the content at address 1000 0001 to the contents of AStep 5) Go back to Step 1

or in assembly (pseudo-)instructions:

// MIPS-like (ICS 331)

S1: LOAD A, (1000 0000)

S2: LOAD B, (1000 0001)

S3: ADD A, B

S4: STORE A, (1000 0010)

S5: JMP S1

// x86-like (ICS 312)

S1: MOV AL, [1000 0000]

S2: MOV BL, [1000 0001]

S3: ADD AL, BL

S4: MOV [1000 0010], AL

S5: JMP S1




Conclusion


Binary Instruction Encoding

Instructions are encoded in binary, based on the specification of themicroprocessor your computer uses

Here are some x86 instruction encodings:

Instruction Encoding (in hex) SizeADD EAX, 1 83C001 3 bytesADD EAX, -1 83C0FF 3 bytesADD EAX, -100000 056079FEFF 5 bytesADD EAX, EBX 01D8 2 bytes

Some instructions are shorter than others, which impacts the size ofthe executable

An assembler transforms assembly code into binary code, soprogrammers typically don’t know the binary code for instructions




Conclusion


Binary Instruction Encoding

Instructions are encoded in binary, based on the specification of themicroprocessor your computer usesHere are some x86 instruction encodings:

Instruction Encoding (in hex) SizeADD EAX, 1 83C001 3 bytesADD EAX, -1 83C0FF 3 bytesADD EAX, -100000 056079FEFF 5 bytesADD EAX, EBX 01D8 2 bytes

Some instructions are shorter than others, which impacts the size ofthe executable

An assembler transforms assembly code into binary code, soprogrammers typically don’t know the binary code for instructions




Conclusion


The program is stored in RAM

alongwith data

Address Content (hex) Meaning0000 0000 83 ADD EAX, 10000 0001 C00000 0010 010000 0011 01 ADD EAX, EBX0000 0100 D80000 0101 050000 0110 60 ADD EAX, -1000000000 0111 790000 1000 FE0000 1000 FF

... ...

... ...

1000 0000 05 Some data1000 0001 4F Some data1000 0010 2C Some data1000 0011 00 Some data

Once a program is loaded inmemory its address spacecontains both code and data

The CPU can’t tell thedifference, only theprogrammer can

This is conveniently hiddenfrom the programmer, unlessyou write assembly

It’s the CPU job tounderstand that 83C0D1means ADD EAX, 1




Conclusion


The program is stored in RAM alongwith data

Address Content (hex) Meaning0000 0000 83 ADD EAX, 10000 0001 C00000 0010 010000 0011 01 ADD EAX, EBX0000 0100 D80000 0101 050000 0110 60 ADD EAX, -1000000000 0111 790000 1000 FE0000 1000 FF

... ...

... ...1000 0000 05 Some data1000 0001 4F Some data1000 0010 2C Some data1000 0011 00 Some data

Once a program is loaded inmemory its address spacecontains both code and data

The CPU can’t tell thedifference, only theprogrammer can

This is conveniently hiddenfrom the programmer, unlessyou write assembly

It’s the CPU job tounderstand that 83C0D1means ADD EAX, 1




Conclusion


Memory Unit: Conclusions

The memory is basically an indexed array of bytes

The memory contents have various useful meaning:

integers, character codes, floating-point numbers, ... but also higherlevel abstractions: RGB values, coordinates in space-time, images...addresses (pointers)instructions (i.e., executable code) understood by a CPU




Conclusion







Conclusion


Central Processing Unit

The CPU reads data from memory into registers, writes data fromregisters to memory, and computes

The component that performs the computational operations is calledthe ALU (Arithmetic and Logic Unit)

It can perform what you expect (+, -, /, *, OR, AND, XOR, ...)

Operands and results of operations must all be in registers

Unfortunately, there are very few registers

e.g., Intel-i7 8 × 32-bit; 16 × 64-bit; (and 16 FP 128- or 256-bit)

This is a pain when writing assembly by hand

But the compiler does all that work for us when we use high-levellanguages




Conclusion


Central Processing Unit

The CPU also controls the execution of the program’s instructions

The Control Unit is the component in charge of controlling theprogram execution, and it uses dedicated registers:

Program Counter: Contains the address of the next instruction thatshould be executed: is incremented after each instruction but can beset to whatever address when there is a change in control flowCurrent Instruction: The binary code of the instruction which iscurrently being executedOther registers: Stack Pointer, Frame Pointer, ...

The Control Unit decodes the instructions (i.e., interprets their bits)and makes them happen

This is a main topic of a Computer Architecture course




Conclusion

Fetch-Decode-Execute CycleInitializationFetchDecodeExecuteRepeat...I/O

Fetch-Decode-Execute Cycle

The Fetch-Decode-Execute Cycle

The Control Unit fetches the next program instruction from memoryusing the program counterThe instruction is decoded and signals are sent to hardwarecomponents (memory controller, ALU, I/O controller)The instruction is executed:

Values are fetched from memory and put in the registersComputation is performed by the ALU and results are stored inregistersRegister values are pushed back to memoryProgram state is modified (Program Counter, Stack Pointer, ...)

Repeat

Computers implement many variations on this cycle, with tons ofbells and whistles to make it as fast as possible

But one can still program with the above model in mind (butcertainly without fully understanding performance issues)




Conclusion




The Control Unit fetches the next program instruction from memoryusing the program counter

The instruction is decoded and signals are sent to hardwarecomponents (memory controller, ALU, I/O controller)The instruction is executed:


Repeat






Conclusion




The Control Unit fetches the next program instruction from memoryusing the program counterThe instruction is decoded and signals are sent to hardwarecomponents (memory controller, ALU, I/O controller)

The instruction is executed:


Repeat






Conclusion




The Control Unit fetches the next program instruction from memoryusing the program counterThe instruction is decoded and signals are sent to hardwarecomponents (memory controller, ALU, I/O controller)The instruction is executed:


Repeat






Conclusion


Fetch-Decode-Execute

Let’s consider a simplistic hypothetical Von Neumann architecture

Memory contains 256 × 1 byte

CPU has 2 “data” registers (A and B), 2 “control” registers(Program Counter and Current Instruction)

CPU instructions encoded on 1 byte (8 bits): 3-bit “opcode”(operation code) and 5-bit operands:

Opcode 000: Load to register A from memoryOpcode 001: Load to register B from memoryOpcode 010: Add B to A; store the result in AOpcode 011: Store the value of A to memoryOpcode 100: JumpOpcode 111: Halt (program terminates)

We will assume that initially A = 5 and B = 151




Conclusion


Sample Execution Decoding

From the previous slide, our instructions are as follows:

Opcode 000: Load to register A from memoryOpcode 001: Load to register B from memoryOpcode 010: Add B to A; store the result in AOpcode 011: Store the value of A to memoryOpcode 100: JumpOpcode 111: Halt (program terminates)

So, for instance, here are meanings of example instructions:

00010111: Load the byte in RAM at address 00010111 into registerA (“LOAD A, (10111)” in MIPS-like assembly)010?????: A = A + B (we don’t care what the 5 trailing bits arebecause this instruction takes no operand)10000011: Jump to the instruction at address 00000011 and executeit




Conclusion


(Initialization)-Fetch-Decode-Execute

CPU

ALU

undefinedA

undefinedB

Registers

Control Unit

undefinedPC

undefinedCI

CU Registers

Memory

Address Content Meaning

0000 0100 000 10000 LOAD A, (10000)

0000 0101 001 10001 LOAD B, (10001)

0000 0110 010 00000 ADD A, B

0000 0111 011 10001 STORE A, (10001)

0000 1000 100 00100 JMP (0 0100)

... ... ...0001 0000 0000 0101 5d0001 0001 1001 0111 151d

... ... ...

The Program (its Code and its Data) is loaded into memory (Guess whodoes that?)




Conclusion


(Initialization)-Fetch-Decode-Execute

CPU

ALU

undefinedA

undefinedB

Registers

Control Unit

0000 0100PC

undefinedCI

CU Registers

Memory


0000 0100 000 10000 LOAD A, (10000)

0000 0101 001 10001 LOAD B, (10001)

0000 0110 010 00000 ADD A, B

0000 0111 011 10001 STORE A, (10001)

0000 1000 100 00100 JMP (0 0100)

... ... ...0001 0000 0000 0101 5d0001 0001 1001 0111 151d

... ... ...

The Program Counter is set to the address of the first instruction of theprogram (Guess who does that?)




Conclusion



CPU

ALU

undefinedA

undefinedB

Registers

Control Unit

0000 0100PC

undefinedCI

CU Registers

Memory


0000 0100 000 10000 LOAD A, (10000)

0000 0101 001 10001 LOAD B, (10001)

0000 0110 010 00000 ADD A, B

0000 0111 011 10001 STORE A, (10001)

0000 1000 100 00100 JMP (0 0100)

... ... ...0001 0000 000 00101 5d0001 0001 100 10111 151d

... ... ...

A request is put on the Address Bus to retrieve the value in memory ataddress PC = 0000 0100




Conclusion



CPU

ALU

undefinedA

undefinedB

Registers

Control Unit

0000 0100PC

0001 0000CI

CU Registers

Memory


0000 0100 000 10000 LOAD A, (10000)

0000 0101 001 10001 LOAD B, (10001)

0000 0110 010 00000 ADD A, B

0000 0111 011 10001 STORE A, (10001)

0000 1000 100 00100 JMP (0 0100)

... ... ...0001 0000 0000 0101 5d0001 0001 1001 0111 151d

... ... ...

The Memory Unit puts the requested data on the Data Bus and the CPUputs it into the CI register




Conclusion



CPU

ALU

undefinedA

undefinedB

Registers

Control Unit

0000 0101PC

0001 0000CI

CU Registers

Memory

Address Data Meaning

0000 0100 000 10000 LOAD A, (10000)

0000 0101 001 10001 LOAD B, (10001)

0000 0110 010 00000 ADD A, B

0000 0111 011 10001 STORE A, (10001)

0000 1000 100 00100 JMP (0 0100)

... ... ...0001 0000 0000 0101 5d0001 0001 1001 0111 151d

... ... ...

The PC register value is incremented: its new value is the address of thenext instruction to execute




Conclusion



CPU

ALU

undefinedA

undefinedB

Registers

Control Unit

0000 0101PC

00010000CI

CU Registers

Memory

Address Data Meaning

0000 0100 000 10000 LOAD A, (10000)

0000 0101 001 10001 LOAD B, (10001)

0000 0110 010 00000 ADD A, B

0000 0111 011 10001 STORE A, (10001)

0000 1000 100 00100 JMP (0 0100)

... ... ...0001 0000 0000 0101 5d0001 0001 1001 0111 151d

... ... ...

The instruction is decoded: 00010000 means “000 = LOAD A fromaddress 000(10000)”




Conclusion



CPU

ALU

undefinedA

undefinedB

Registers

Control Unit

0000 0101PC

00010000CI

CU Registers

Memory

Address Contents Meaning

0000 0100 000 10000 LOAD A, (10000)

0000 0101 001 10001 LOAD B, (10001)

0000 0110 010 00000 ADD A, B

0000 0111 011 10001 STORE A, (10001)

0000 1000 100 00100 JMP (0 0100)

... ... ...0001 0000 0000 0101 5d0001 0001 1001 0111 151d

... ... ...

The instruction is executed: The value of the memory at address00010000 is requested (using the address bus)




Conclusion



CPU

ALU

0000 0101A

undefinedB

Registers

Control Unit

0000 0101PC

0001 0000CI

CU Registers

Memory


0000 0100 000 10000 LOAD A, (10000)

0000 0101 001 10001 LOAD B, (10001)

0000 0110 010 00000 ADD A, B

0000 0111 011 10001 STORE A, (10001)

0000 1000 100 00100 JMP (0 0100)

... ... ...0001 0000 0000 0101 5d0001 0001 1001 0111 151d

... ... ...

The instruction is executed: The content at address 10000, that is 00000101 is put on the Data Bus and written to register A




Conclusion


Fetch-Decode-Execute-(Repeat)

Repeat!




Conclusion



CPU

ALU

0000 0101A

undefinedB

Registers

Control Unit

0000 0110PC

0011 0001CI

CU Registers

Memory


0000 0100 000 10000 LOAD A, (10000)

000 00101 001 10001 LOAD B, (10001)

0000 0110 010 00000 ADD A, B

0000 0111 011 10001 STORE A, (10001)

0000 1000 100 00100 JMP (0 0100)

... ... ...0001 0000 0000 0101 5d0001 0001 1001 0111 151d

... ... ...

Fetch (Note that the value of PC is incremented)




Conclusion



CPU

ALU

0000 0101A

undefinedB

Registers

Control Unit

0000 0110PC

001 10001CI

CU Registers

Memory


0000 0100 000 10000 LOAD A, (10000)

0000 0101 001 10001 LOAD B, (10001)

0000 0110 010 00000 ADD A, B

0000 0111 011 10001 STORE A, (10001)

0000 1000 100 00100 JMP (0 0100)

... ... ...0001 0000 0000 0101 5d0001 0001 1001 0111 151d

... ... ...

The instruction is decoded: 00110001 means “001 = LOAD B fromaddress 000(10001)”




Conclusion



CPU

ALU

0000 0101A

1001 0111B

Registers

Control Unit

0000 0110PC

0011 0001CI

CU Registers

Memory


0000 0100 000 10000 LOAD A, (10000)

0000 0101 001 10001 LOAD B, (10001)

0000 0110 010 00000 ADD A, B

0000 0111 011 10001 STORE A, (10001)

0000 1000 100 00100 JMP (0 0100)

... ... ...0001 0000 0000 0101 5d0001 0001 1001 0111 151d

... ... ...

The instruction is executed: Value read at address 00010001, that is,1001 0111 is written to register B




Conclusion



CPU

ALU

0000 0101A

1001 0111B

Registers

Control Unit

0000 0111PC

0100 0000CI

CU Registers

Memory


0000 0100 000 10000 LOAD A, (10000)

0000 0101 001 10001 LOAD B, (10001)

0000 0110 010 00000 ADD A, B

0000 0111 011 10001 STORE A, (10001)

0000 1000 100 00100 JMP (0 0100)

... ... ...0001 0000 0000 0101 5d0001 0001 1001 0111 151d

... ... ...

Fetch




Conclusion



CPU

ALU

0000 0101A

1001 0111B

Registers

Control Unit

0000 0111PC

01000000CI

CU Registers

Memory


0000 0100 000 10000 LOAD A, (10000)

0000 0101 001 10001 LOAD B, (10001)

0000 0110 010 00000 ADD A, B

0000 0111 011 10001 STORE A, (10001)

0000 1000 100 00100 JMP (0 0100)

... ... ...0001 0000 0000 0101 5d0001 0001 1001 0111 151d

... ... ...

The instruction is decoded: 01000000 means “010 = ADD A, B (theoperand is ignored)”




Conclusion



CPU

ALU

1001 1100A

1001 0111B

Registers

Control Unit

0000 0111PC

0100 0000CI

CU Registers

Memory


0000 0100 000 10000 LOAD A, (10000)

0000 0101 001 10001 LOAD B, (10001)

0000 0110 010 00000 ADD A, B

0000 0111 011 10001 STORE A, (10001)

0000 1000 100 00100 JMP (0 0100)

... ... ...0001 0000 0000 0101 5d0001 0001 1001 0111 151d

... ... ...

The instruction is executed (A ← A+B)




Conclusion



CPU

ALU

1001 1100A

1001 0111B

Registers

Control Unit

0000 1000PC

0111 0001CI

CU Registers

Memory


0000 0100 000 10000 LOAD A, (10000)

0000 0101 001 10001 LOAD B, (10001)

0000 0110 010 00000 ADD A, B

0000 0111 011 10001 STORE A, (10001)

0000 1000 100 00100 JMP (0 0100)

... ... ...0001 0000 0000 0101 5d0001 0001 1001 0111 151d

... ... ...

Fetch




Conclusion



CPU

ALU

1001 1100A

1001 0111B

Registers

Control Unit

0000 1000PC

0111 0001CI

CU Registers

Memory


0000 0100 000 10000 LOAD A, (10000)

0000 0101 001 10001 LOAD B, (10001)

0000 0110 010 00000 ADD A, B

0000 0111 011 10001 STORE A, (10001)

0000 1000 100 00100 JMP (0 0100)

... ... ...0001 0000 0000 0101 5d0001 0001 1001 1100 156d

... ... ...

(Let’s skip the Decode part) Execute




Conclusion



CPU

ALU

1001 1100A

1001 0111B

Registers

Control Unit

0000 1001PC

1000 0100CI

CU Registers

Memory


0000 0100 000 10000 LOAD A, (10000)

0000 0101 001 10001 LOAD B, (10001)

0000 0110 010 00000 ADD A, B

0000 0111 011 10001 STORE A, (10001)

0000 1000 100 00100 JMP (00 0100)

... ... ...0001 0000 0000 0101 5d0001 0001 1001 0111 156d

... ... ...

Fetch




Conclusion



CPU

ALU

1001 1010A

1001 0111B

Registers

Control Unit

0000 0100PC

1000 0100CI

CU Registers

Memory


0000 0100 000 10000 LOAD A, (10000)

0000 0101 001 10001 LOAD B, (10001)

0000 0110 010 00000 ADD A, B

0000 0111 011 10001 STORE A, (10001)

0000 1000 100 00100 JMP (0 0100)

... ... ...0001 0000 0000 0101 5d0001 0001 1001 1010 d154

... ... ...

Execute - the JMP instruction modifies the value of a control register (PC)

The next instruction to execute will be LOAD A, (10000)

And like that we have implemented an infinite loop...




Conclusion



CPU

ALU

1001 1010A

1001 0111B

Registers

Control Unit

0000 0100PC

1000 0100CI

CU Registers

Memory


0000 0100 000 10000 LOAD A, (10000)

0000 0101 001 10001 LOAD B, (10001)

0000 0110 010 00000 ADD A, B

0000 0111 011 10001 STORE A, (10001)

0000 1000 100 00100 JMP (0 0100)

... ... ...0001 0000 0000 0101 5d0001 0001 1001 1010 d154

... ... ...

Execute - the JMP instruction modifies the value of a control register (PC)The next instruction to execute will be LOAD A, (10000)

And like that we have implemented an infinite loop...




Conclusion


Fetch-Decode-Execute Practice

It’s a pretty good idea to review these slides and see if you can goback to the first slide (initialization) and see if you can yourself gothrough the fetch-decode-execute cycle

We’ll have a simple homework assignment along these lines

But just in case, let’s do one together right now...




Conclusion


In-class activity (just to make sure we’re all on board)

CPU

undefinedA

0000 0110B

Registers

0000 0001PC

UndefinedCI

CU Registers

Opcode Meaning 5-bit operand000 Load to register A from memory address001 Load to register B from memory address010 Add B to A; store the result in A ignored011 Store the value of A to memory address100 Jump address111 Halt ignored

Memory

Address Content

0000 0000 001 100100000 0001 000 100110000 0010 010 000000000 0011 011 101110000 0100 001 101110000 0101 111 00000

... ...0001 0010 0000 01100001 0011 1000 0111

What is the decimal value of register B when the program terminates?




Conclusion


In-class activity solution

CPU

1000 1101A

1000 1101B

Registers

0000 0110PC

1110 000CI

CU Registers

Opcode Meaning 5-bit operand000 Load to register A from memory address001 Load to register B from memory address010 Add B to A; store the result in A ignored011 Store the value of A to memory address100 Jump address111 Halt ignored

Memory


0000 0000 001 100100000 0001 000 10011 A ← 135d0000 0010 010 00000 A ← 135d + 6d = 141d0000 0011 011 10111 (00010111) ← 141d0000 0100 001 10111 B ← (00010111) = 141d0000 0101 111 00000 Halt

... ...0001 0010 0000 0110 6d0001 0011 1000 0111 135d

Answer: the decimal value of B is 141




Conclusion


There is more to Fetch-Decode-Execute

This was a simplified view of the way things work

Control and data paths are implemented by several hardwarecomponents

There is usually more than one ALU

There are caches between the CPU and the memory

There are even multiple CPUs

The cycle is pipelined: Fetch the instruction i + 1 while instruction iis being executed

Decades of computer architecture research have gone into improvingspeed, thus often leading to high hardware complexity (and doingsmart things in hardware requires more logic gates and wires, thusincreasing CPU cost)

But, conceptually, it is still Fetch-Decode-Execute.




Conclusion







Conclusion


I/O

Let’s leave this topic for (much) later...

Let’s just assume that there is an I/O Controller and that the CPUcan talk to it to make I/O happen (reads and writes)

After all there is a Memory Controller and at the conceptual levelthey are not so different




Conclusion

CachingLocalityDirect Memory AccessCurrent Architectures

The RAM is slow

A big speed issue: the memory is slow

Accessing a register is very fast

e.g., a 4GHz CPU can update a register in 0.25 nanosecond (1 cycle)

Accessing the memory takes about 10 ns

The memory is ∼40 times slower than the CPU

What does the CPU do while it’s waiting for the memory togive it data?

NOTHING!! (yes, this is a problem)

This is the famous “Von-Neumann Bottleneck”

Many techniques have been develop to address this




Conclusion


Several levels of RAM

We would like a gigantic and fast memory

Could we just build the memory just as gazillions of registers?

No!!! Cost/physics make it impossible

Instead, we play a trick to provide the illusion of a fast memory

This trick is called the memory hierarchy




Conclusion


The Memory Hierarchy

fast slow

small large

(CPU)Registers

MemoryMemory Bus

I/ODevices

I/OBus




Conclusion



fast slow

small large

(CPU)Registers

MemoryMemory Bus I/O

Devices

I/OBus




Conclusion



fast slow

small large

(CPU)Registers

Memory

MemoryBus I/O

Devices

I/OBus

Cache

Few 100s Bytes< 1 ns

Compiler

kB to MB1 ns

Hardware

GB10 nsOS

TB1+ ms

OS




Conclusion


The Memory Hierarchy in a Nutshell

When a program accesses a byte in memory:

It checks whether the byte is in cache, and if so, it just gets itOtherwise, the byte value is brought from the (slow) memory intothe (fast) cacheThe values around the byte are also brought into the cache

Analogy:

To write a paper you need a reference book from the libraryYou go to the library and find the book on a shelf, noticing that thebooks around it are on the same topic! You can...

Leave the book at the library and go to the library each time youneed one referenceTake only the one book... but if it makes a reference to another bookon the same topic you’ll have to go back to the libraryOr take the one book and the books around it and put them on yourdesk... and if THE reference makes a reference maybe you’ll have thereferred book right thereIn this last option your desk is a “cache for the library”




Conclusion





Analogy:


Leave the book at the library and go to the library each time youneed one reference

Take only the one book... but if it makes a reference to another bookon the same topic you’ll have to go back to the libraryOr take the one book and the books around it and put them on yourdesk... and if THE reference makes a reference maybe you’ll have thereferred book right thereIn this last option your desk is a “cache for the library”




Conclusion





Analogy:


Leave the book at the library and go to the library each time youneed one referenceTake only the one book... but if it makes a reference to another bookon the same topic you’ll have to go back to the library

Or take the one book and the books around it and put them on yourdesk... and if THE reference makes a reference maybe you’ll have thereferred book right thereIn this last option your desk is a “cache for the library”




Conclusion





Analogy:


Leave the book at the library and go to the library each time youneed one referenceTake only the one book... but if it makes a reference to another bookon the same topic you’ll have to go back to the libraryOr take the one book and the books around it and put them on yourdesk... and if THE reference makes a reference maybe you’ll have thereferred book right thereIn this last option your desk is a “cache for the library”




Conclusion


Why does it work?

TEMPORAL LOCALITY

A program tends to reference addresses it has already referenced

e.g., Counters

The first access is expensive: Fetching the value takes many cycles

Each subsequent accesses are cheap: The value is in cache

The “I need that same book again” analogy




Conclusion


Why does it work?

SPATIAL LOCALITY

A program tends to reference addresses next to addresses it hasalready referenced

e.g., When manipulating arrays (i.e., contiguous bytes in memory)

The access to element i is expensive: Fetching the value takes manycycles

Access to elements i + 1, i + 2, ... are cheap: The values are incache!

The “I need a book on that same shelf” analogy




Conclusion


The Memory Hierarchy: Memory Caches

In reality there is more than one level of cache (L1, L2, L3)

Trade-offs between size, speed, and cost

L1 (the closest/fastest to the CPU) is actually split into Data Cacheand Instructions Cache

Chunks of data are brought from (far-away) memory and are copiedand kept around in (nearby) caches

The same data exist in multiple levels of memory at once, whichleads to interesting issues/problems we might discuss (see ICS 432)

Cache Hit: When a data item is found in cache (e.g., we would talkof a “L2 cache hit”)

Cache Miss: When a data item is not found in cache (e.g., we wouldtalk of a “L1 cache hit”)

We’ll use this hit/miss terminology for several OS concepts...




Conclusion


Direct Memory Access (DMA)

Often, one has to copy large chunks of data to/from RAM from/tosome peripheral device (graphics card, network card, sound card,disk)

In the pure Von-Neumann model, the CPU has to be involved foreach copy operation

The problem is the memory copies take a long time (even withcaches), and the CPU spends its life twiddling its thumbs while thecopies are taking place ause

It would be better to have copies occur independently so that theCPU can do something useful while the memory copy is takingplace

This is called Direct Memory Access (DMA)




Conclusion


Direct Memory Access (DMA)

DMA is used on all modern computers

e.g., the Intel i7 has an on-chip DMA controller

How DMA works (without getting into details):

The CPU simply tells the DMA controller to initiate a RAM copyWhen the copy is complete the DMA controller tells the CPU “it’sdone” by generating an interrupt (more on interrupts very soon)In the meantime, the CPU was free to do whatever




Conclusion


DMA is not free

To perform data transfers the DMA controller uses the memory bus

In the meantime, the code executed by the CPU likely also uses thememory bus

Therefore, they can interfere with each other

There are several ways in which this interference can be managed(give priority to DMA, to CPU, weight usage, ...)

See a Computer Architecture course

In general, using DMA leads to much better performance anywayand (good) software should to it as often as possible




Conclusion


Current Architectures

Current architectures are much more complex than what we justdescribed

Because constructors cannot increase clock rate further (power/heatissues), our current CPUs are multi-core

Multiple “low” clock rate CPUs on a single chip

This is a great solution to a problem, but most users/programmerswould rather have a 100 GHz single core than 50 2 GHz cores

We’ll talk about multi-core architectures later in the semester




Conclusion


Example of a real-life system

Picture obtained with lstopo

(sudo apt-get install hwloc)




Conclusion

Conclusion

If you want to know more:

Take ICS312 / ICS331Take Computer Architecture (EE 461, ICS431)Computer Organization and Design,Patterson and Hennessy

We will have a quiz on these lecture notesnext week


computer architecture overvieesb/2018fall.ics332/aug22.pdf · 2018-08-22 · history von neumann...

Documents