instructions: language of the computerleduc/slides2ga3/2ga3slides2.pdf · 2020-01-06 · chapter 2...

COMPUTER ORGANIZATION AND DESIGNThe Hardware/Software Interface

ARMEdition

Chapter 2Instructions: Language of the Computer

Modified and extended by R.J. Leduc - 2016

Chapter 2 — Instructions: Language of the Computer — 2

Instruction Set The instruction set of a computer is its

repertoire of instructions that it can perform Different computers have different instruction sets

But with many aspects in common Early computers had very simple instruction sets As resources expanded, we got complex

instruction set computers (CISC) Many modern computers now also have simple

instruction sets. Called reduced instruction set computers (RISC)

§2.1 Int roductio n


The ARMv8 Instruction Set Textbook uses a subset of the ARMv8 instruction

set, called LEGv8 Commercialized by ARM Holdings (www.arm.com) Arm processors have large share of embedded

core market Applications in consumer electronics, network/storage

equipment, cameras, printers, …

ARMv8 is typical of many modern instruction set architectures (ISA) See ARM Reference Data tear-out card (green card at

front of textbook) and Figure 2.1 in textbook

http://www.mips.com/


Arithmetic Operations In this section, we will be vague about what variables

(such as a,b, and c below) are and clarify this in the next section

Add and subtract operations require three operands (number is rigid) Two sources and one destination

ADD a, b, c // store b + c in a

All arithmetic operations have this form Assembly language program only has one

instruction per line Notation “//” indicates start of a comment which

continues to end of line and no further

§2.2 Op eration s of the C

omp uter H

a rdware


Arithmetic Operations II Design Principle 1: Simplicity favours

regularity Regularity makes implementation simpler Simplicity enables higher performance at

lower cost


Arithmetic Example C code:

f = (g + h) - (i + j);

Compiled LEGv8 code:

ADD t0, g, h // temp t0 = g + hADD t1, i, j // temp t1 = i + jSUB f, t0, t1 // f = t0 - t1


Register Operands Registers are special named storage locations built

directly inside the processor Registers are used for frequently accessed data ARMv8 has 32 registers, each storing 64 bits

These registers are called collectively a register file

31 of these are general purpose and are labeled X0 to X30

The last register is called XZR and is always zero Arithmetic operands are restricted: they can only

be registers

§2.3 Op erands of the C

ompu ter H

ar dware


Register Operands II As 64 bits of data occurs so often in

LEGv8, we give it the name “doubleword”

We call 32 bits of data a “word” There are 31 32-bit general purpose

sub-registers labeled W0 to W30 Why only 32 registers?

Design Principle 2: Smaller is faster i.e. main memory: millions of

locations


Register Operand Example C code:f = (g + h) - (i + j); f, …, j in X19, X20, …, X23

Compiled LEGv8 code:ADD X9, X20, X21ADD X10, X22, X23SUB X19, X9, X10


Memory Operands Main memory used for composite

data Arrays, structures, dynamic data

To apply arithmetic operations Load values from memory into

registers Store result from register to

memory Means ARMv8 must include

instructions to transfer data between memory and registers

Called data transfer instructions

To access a location in memory, an instruction must supply the memory address


Memory Operands II Memory is byte addressed

Each address identifies an 8-bit byte Memory is essentially a large, single

dimensional array The address acts as the index of

the array Addresses start at zero and go to

264 -1 For example, the address of the

third memory location is 2 The value of memory[2] is 10 If: char v[]; with v = 2, then v[0]

= 10, and v[1] = 100, thus the address of v[1] would be v+(1)(1) = 3


Memory Operands III A doubleword requires 8 bytes to

store it Address of subsequent doublewords

thus differ by 8 If: int v[]; with v = 0, then v[0] = 1,

and v[1] = 101 and is stored at address v + (1)(8) = 8

Many architectures, have an alignment restriction: Words must start at addresses that

are multiples of 4 Doublewords must start at

addresses that are multiples of 8. ARMv8 does not have this restriction

for regular data access It does have restrictions for instruction

fetches and stack accesses


Endianess Computers can be divided by how they store a multi-

byte number (say a word) in byte-addressable memory Consider the 32 bit hexadecimal number 90AB12CD16

To store it, we would have to use 4 bytes, but do we store the left most digits first, or the right most? Big-endian: mem[0] = 90, mem[1] = AB, mem[2] =

12, mem[3] = CD Little-endian: mem[0] = CD, mem[1] = 12, mem[2] =

AB, mem[3] = 90 ARMv8 can be configured to work as either format Only important if you access data as both a

doubleword, and as 8 separate bytes


Load Operations Data transfer operations that copy data from memory to a register

are called load operations The ARMv8 command is called LDUR It takes as operands:

A destination register A register containing the base address in memory An offset constant (i.e. #64)

Consider an array A of 100 doublewords and that the starting address of A is stored in the register X22

To load the data stored at A[8] into the temporary variable X9, the command would be: LDUR X9, [X22, #64] Copies the data stored at mem[X22 + 64] into register X9 As a doubleword requires 8 bytes, location A[k] requires offset

of: k times 8


Store Operations Data transfer operations that copy data from a register

to memory are called store operations The ARMv8 command is called STUR It takes as operands:

The register to be stored A register containing the base address in memory An offset constant (i.e #8)

Consider an array A of 50 doublewords and that the starting address of A is stored in the register X22

To copy the data contained by register X9 to A[1], the command would be: STUR X9, [X22, #8] Copies the data stored in X9 to mem[X22 + 8]


Memory Operand Example C code with A an array of doublewords:A[12] = h + A[8]; Assume variable h has been associated by the

compiler with register X21, and that the base address of A is stored in X22

Compiled LEGv8 code: Index 8 requires offset of 64; index 12 requires

offset of 96LDUR X9,[X22,#64] //temp reg X9 gets A[8]

ADD X9,X21,X9 //reg x9 gets h + A[8]

STUR X9,[X22,#96] //stores result into A[12]


Registers vs. Memory Registers are faster to access than main memory and

cache memory, and use less energy Operating on memory data requires loads and stores

More instructions to be executed Most programs have more variables then registers Compiler tries to keep most frequently used variables

in registers, and the rest in main memory When the contents of a register must be saved to

memory to allow a less frequently used variable to be loaded, we call this “spilling to memory”

Compiler must use registers for variables as much as possible Register optimization is important!


Immediate Operands Common to use a constant in operations If the constant “4” is located in memory at location X20 +

AddrConstant4, we would need the instructions below to do an add operationLDUR X9, [X20, addrConstant4] //X9 now contains 4

ADD X22, X22, X9 //X22 = X22 +X9 (X9 = 4)

A more efficient alternative is the add immediate operation where constant data specified as part of the instructionADDI X22, X22, #4 // X22 = X22 + 4

Please read the “elaborations” on pages73- 74 in textbook

Design Principle 3: Make the common case fast Small constants are common Immediate operand avoids a load instruction


Representation of Numbers The familiar way of representing numbers is to use 10

digits (0, 1, … , 9) Such numbers are called base-10, or decimal numbers

e.g. 4831 The position of each digit represents a “power'” of the

base (10):

• 4831 = 4 x 103 + 8 x 102 + 3 x 101 + 1 x 100 In logic circuits it is awkward to directly represent digits

like 4, 8, and 3 We want only 2 digits: 0, 1

Therefore we will use base-2, or binary numbers

§2.4 Sig ned an d U

nsig ned Num

bers


Binary Integers We will first only deal with unsigned (+ve only) integers Binary numbers also use positional notation. Each digit

represents a “power of 2”, e.g.:

1101 = 1 x 23 + 1 x 22 + 0 x 21 +1 x 20 = (13)10

Since the meaning of a digit depends on the base being used, we sometimes write (1101)2, or (4813)10

The largest number that can be represented in n bits is 2n -1 as zero takes up a spot

e.g. (1111)2 = (15)10 (1111 1111)2 = (255)10

For unsigned numbers, the rightmost bit is called the least significant bit (LSB) and the leftmost bit the most significant bit (MSB)


Converting Decimal to Binary

Want to convert decimal number D = dk-1 ... d1 d0 with value (V)10, to binary number B = bn-1 ... b2 b1 b0

We would have:

V = bn-1 x 2n-1 + ... + b2 x 22 + b1 x 21 + b0 x 20

We next note that if we divide V by 2 we get:

V/2 = bn-1 x 2n-2 + ... + b2 x 21 + b1 x 20 + b0 /2 = Q1 + b0

/2

With integer division, no fractions: just quotient and remainder.

Above, Q1 is the quotient and b0 in {0,1} is the remainder (i.e. b0 = V - Q1 x 2)


Converting Decimal to Binary II

Thus, if we divide (V)10 by 2, the remainder is b0, the LSB of B

We next note Q1 is also a binary number

If we divide Q1 by 2, we get b1 as the remainder

If we repeat until our quotient is 0, we can extract every digit of B


Octal and Hexadecimal Numbers

We have looked at radixes (bases) 10 and 2 so far

Can have any radix r Thus have number K = (kn-1 kn-2

... k1 k0)r$

This has base-10 value of:

V(K) = Si = 0 to n-1 ki x r i

For computers, we are interested in octal (radix-8) and hexadecimal (radix-16)


Octal and Hexadecimal Numbers II

Why do we care about hexadecimal?

Easy to convert between binary and hexadecimal Hexadecimal is more compact thus easier for humans to use

A 16 digit binary number is only a 4 digit hexadecimal number!

The C language uses the 0xnnnn notation for hexadecimal numbers


Hexadecimal to/from Binary

As 24 = 16, 4 bits represent one hexadecimal digit

Binary to hexadecimal From right to left, group the binary digits into groups of 4 If last group has less than four digits, add zeros to the left to make 4 Convert each group of 4 to the corresponding hexadecimal digit(1010 1111 0010 0101)2 = (AF25)16

Reverse the process:

(A3)16 → (1010 0011)2


Signed Numbers

For an n-bit signed number, the leftmost bit is used as the sign bit. The number is negative when bn-1 = 1 The number is positive when bn-1 = 0

bn−1⏟sign

bn−2 ...b1b0⏟magnitude


Sign and Magnitude Representation

Left most bit is sign bit The remaining digits represent magnitude

e.g. +5 = 0101 -5 = 1101 Problem: To add a +ve and -ve number:

First, compare to determine largest Then, subtract smallest from largest Too costly!


One's Complement

For n-bit negative number K, the 1's complement, K1, is defined as below. The magnitude of K is |K|

K1 = (2n - 1) - |K|

If we let n = 4, and K = -5, we have 2n = 16, and |K| = 5 = (0101)2

K1 = (16 - 1) – 5 = (10) 10 = (1010)2

It can be shown that K1 can be determined by complementing each bit of |K|

Problem: Sometimes needs extra step when adding +ve and -ve numbers


Two's Complement This is the representation used in today's computers

for signed integers For n-bit negative number K, the 2's complement, K2 is

defined as follows:K2 = 2n - |K|

If we let n = 4, and K = -5, we have 2n = 16, and |K| = 5 = (0101)2

K2 = 16 – 5 = (11) 10 = (1011)2

For an n-bit number: the largest negative number that can be stored is -2n-1 the largest positive number is 2n-1 - 1


Shortcuts for Two's Complement Def: Given an n-bit vector P, the bit-wise complement of

P, denoted P, is obtained by complementing each bit of P For n-bit negative number K, the 1's complement, K1, is

defined as below. The magnitude of K is |K|K1 = (2n - 1) - |K|

We note that:K1 + 1 = (2n - 1) - |K| +1 = 2n - |K| = K2

Short-cut: K2 = K1 + 1 = |K| + 1 For (-52), K2 = 52 + 1

= 0011 0100 + 1= 1100 1011 + 1= 1100 1100


Shortcuts for Two's Complement II

From last slide, we found |-52| = 0011 0100 and K2 = 1100 1100

Shorter-cut: examine bits of |K| right-to-left. Copy bits of |K| up to and including first bit that is 1 and complement the rest

For (-52), we get:

|K|= 00110⏞complement

100⏟copy

K2= 11001⏞complemented

100⏟copied

If you have a negative n-bit number, perform the above steps to find it's magnitude


Sign Extension Sign extension is when you have an n-bit

number and you want to represent it as an m-bit number, with m>n (i.e. more bits) Need to do this while preserving the numeric

value Method: replicate the sign bit to the left

For unsigned values, you always extend with 0s

Examples: 8-bit to 16-bit +2: 0000 0010 => 0000 0000 0000 0010 –2: 1111 1110 => 1111 1111 1111 1110


Sign Extension II When loading a 64 bit number into a 64 bit

register, there is no sign extension issue When loading a 32 bit, 16 or 8 bit number into a

64 bit register, the number is typically sign extended so it fills the full 64 bit register

The ARMv8 instruction set allows user to specify if a loaded byte should be sign extended or treated as an unsigned number LDURSB: sign-extend loaded byte LDURB: zero-extend loaded byte (i.e. treat as

an unsigned number)


Sign Extension III The C programming language assumes integer variables

(char, int, long int) are signed You add the “unsigned” key word to specify the integer is

unsigned Useful when makes no sense for a number to be

negative (i.e. a memory address) Useful if you want to compare a value from an 8 bit

data register based on bit pattern only Allows you to store a larger range of positive numbers

(max 2n - 1 instead of 2n-1 - 1) When you use two variables\constants of different sizes

in an operation, C will extend the smaller to the size of the larger before the operation takes place


Numerical Overflow When we work with numbers ourselves,

we allow them to be as small or as large as needed

When we store a number in a register or variable of a fixed number of bits, it might be too large or too small to fit When this happens, we say overflow

has occurred This means the number actually stored

will be incorrect


Numerical Overflow II In a 64 bit register:

Storing a signed (2s complement) integer larger than (263 – 1) or smaller than (-263) will cause overflow

Storing an unsigned integer larger than (264 – 1) will cause overflow

If the results of an arithmetic operation (i.e. addition or subtraction) produces a result that is too large or too small to store, this will cause overflow

An unsigned char (8 bits) in C can store numbers 0-255: If you try to store 256 = (100000000)2, you would

only store the last 8 bits, thus (00000000)2 = 0!


Representing Instructions Instructions to the processor are encoded as a

32 bit binary number We call the numeric version of instructions machine

language We call a sequence of binary instructions machine

code

For LEGv8 instructions Each assembly instruction and its parameters

correspond to a unique machine language instruction

An instruction is encoded as a 32-bit instruction word

§2.5 Re presen ting Ins truction s in the C

omp uter


Representing Instructions II A machine code instruction is given in a set number of

distinct formats (i.e. the layout of the instruction) i.e. one for load/store, one for arithmetic operations etc.

Each format is composed of a number of distinct fields such as: Opcode: the field that denotes the operation (function) and

format of an instruction Address: for a load/store, contains the address offset Register operand: the 32 registers used in LEGv8 are

simply referred to by their number (0 to 31)

Each field of a machine code instruction can be simply thought of as a number

Each format is similar, but not identical


LEGv8 R-format Instructions

This format is used for instructions specified using three registers (R), such as add operations

The instruction fields are: opcode: operation code. Dictates which format type to use Rm: the second source register operand shamt: shift amount (000000 for now) (discussed in Section 2.6) Rn: the first source register operand Rd: the destination register

Note that for a given format, the number of bits for each field is fixed

opcode Rm shamt Rn Rd

11 bits 5 bits 6 bits 5 bits 5 bits


R-format Example

ADD X9,X20,X21 //X9 = X20+X21

1112ten 21ten 0ten 20ten 9ten

10001011000two 10101two 000000two 10100two 01001two

(1000 1011 0001 0101 0000 0010 1000 1001)2 =

(8B150289)16




LEGv8 D-format Instructions

Format used for Load/store instructions Rn: base register address: constant offset from contents of base register. Allows access

to 256 bytes before or after address in base register (i.e. +/- 32 doublewords)

Rt: destination (if load) register number or source (if store) register number

Op2: expands the opcode field Design Principle 9: Good design demands good compromises

Different formats complicate decoding, but allow 32-bit instructions uniformly

Keep formats as similar as possible

opcode op2 Rn Rt


address


LEGv8 I-format Instructions

Format for instructions with immediate parameter (i.e. ADDI) Rn: source register Rd: destination register

Immediate field is zero-extended (i.e the value is unsigned) Allows value to be larger Means we need both an ADDI and SUBI instruction

opcode Rn Rd

10 bits 12 bits 5 bits 5 bits

immediate


LEGv8 Instruction Encoding

Format for instructions we have introduced so far In the table:

“reg” means register number between 0-31 “address” means 9 or 12 bit constant, depending on

format “n.a.” means the field does not appear in this format


Corrections: LEGv8 Instruction Encoding

Table 2.5 above has errors: Last column should be Rd\Rt Last four rows: Rm = n.a. and Rd\Rt = reg


LEGv8 Instruction Encoding Examples

Above gives decimal value for fields for several example instructions

Second page of green card in textbook gives opcodes for instructions

Read example on page 86 of text


Stored Program Computers Instructions represented as

binary numbers, just like data Instructions and data stored in

memory A program is just a sequence of

machine language instructions and corresponding data

To execute a program: Load program into memory Tell computer to start executing at

memory location of first instruction word of program

The BIG Picture


Stored Program Computers II Allows computers to be very general

Simply create the appropriate sequence of machine code

Transforms computer into whatever the program does (accounting program, text editor etc.)

Programs can create programs e.g., compilers, linkers, …

Binary compatibility allows compiled programs to work on different computers Standardized ISAs

The BIG Picture


Logical Operations Most of the time we operate on doublewords Sometimes we want to access only a certain

byte within the doubleword Sometimes we want to access only certain

bits in a doubleword Sometimes we want to be able to set, clear

or negate only certain bits in a doubleword LEGv8 provides instructions that can

perform logical operations on registers

§2.6 Lo gical Operation s


Boolean Axioms Logical AND is “.”, logical OR is “+”, and

“x” means logical negation


Bit-wise AND Operation Bit-wise operations on two n-bit numbers perform

bit operations on each pair of bits of the operands i.e. logical AND on bits b0 of each number, and bits b1

of each number etc. The C language bit-wise AND operator is “&”

AND X9,X10,X11 //X9 = X10 & X11

00000000 00000000 00000000 00000000 00000000 00000000 00001101 11000000X10

X11

X9

00000000 00000000 00000000 00000000 00000000 00000000 00111100 00000000

00000000 00000000 00000000 00000000 00000000 00000000 00001100 00000000


Masking Bits Sometimes we want to access only certain bits in

a variable Consider an 8 bit variable containing:

(0010 1101)2

What if we only wanted to know if b2 was one or not? If we do a bit-wise AND with (0000 0100)2, we set all

bits to zero while leaving b2 unchanged Then test to see if result equals zero

This is called “masking” as it “conceals” some bits


Bit-wise OR Operation Performs logical OR bit operations on each pair of bits of the operands

Can be used to set specific bits to “1” and leave other bits unchanged

The C language bit-wise OR operator is “|”ORR X9,X10,X11 //X9 = X10 | X11

00000000 00000000 00000000 00000000 00000000 00000000 00001101 11000000X10

X11

X9

00000000 00000000 00000000 00000000 00000000 00000000 00111100 00000000

00000000 00000000 00000000 00000000 00000000 00000000 00111101 11000000


Bit-wise NOT Operation Performs logical negation on each bit of an n-bit number

i.e. 0010 11012 becomes 1101 00102

The C language bit-wise NOT operator is “~” LEGv8 does not have an explicit bit-wise NOT, but instead

makes use of the exclusive OR operation An exclusive OR with a “1” always provides negation


EOR Operation Performs bit-wise exclusive OR operation Set second operand to be all ones (like X12

below), and you get a bit-wise complement

EOR X9,X10,X12 // NOT operation of X10

00000000 00000000 00000000 00000000 00000000 00000000 00001101 11000000X10

X12

X9

11111111 11111111 11111111 11111111 11111111 11111111 11111111 11111111

11111111 11111111 11111111 11111111 11111111 11111111 11110010 00111111


Shift Operations In a shift operation, the bits in a register are

moved left or right, and zeros are moved in to fill empty slots

Shift left logical

1010 1101 shifted left by 2 spots gives: 1011 0100

Shift right logical

1010 1101 shifted right by 2 spots gives: 0010 1011


Multiplying by 2 Shift registers can be used to divide and

multiply unsigned integers by 2 Shifting a number to the left multiplies a

number by 2 Consider the 5-bit number below, shifted left by 1

(00101)2 = (5)10 --> (01010)2 = (10)10

Essentially, you are increasing the power of 2 for each digit by 1

i.e. b0 x 20 becomes b0 x 21, where 20 x 2 = 21


Dividing by 2 Shifting a number to the right divides a

number by 2 If the number is odd, the result is the

integer quotient. Consider the 5-bit numbers below, shifted

right by 1(00110)2 = (6)10 --> (00011)2 = (3)10

(00101)2 = (5)10 --> (00010)2 &= (2)10 Essentially, you are decreasing the power

of 2 for each digit by 1


Packing Bytes Into a Word Shift registers can be used to pack 4 specific bytes of data

into a word format Say we have four 8 bit unsigned variables:

v1, v2, v3, v4 We want to store them in a 32 bit unsigned variable V in

the sequence: v4 v2 v1 v3

We can use the sequence: V = v4, shift left by 8 V = V + v2, shift left by 8 V = V + v1, shift left by 8 V = V + v3


Shift Instruction Format

Shift machine language instructions use the R-format The shamt field specifies how many positions to shift The Rm field is unused and is set to zero The C language left-shift operator is “<<” Shift left LEGv8 operator: LSL

LSL X11,X19, #4 // reg X11 = reg X19 << 4 bits The C language right-shift operator is “>>” Shift left LEGv8 operator: LSR

LSR X11,X19, #4 // reg X11 = reg X19 >> 4 bits NOTE: in the above operations, X19 is not altered




Logical Operations Table Summary of instructions for bit-wise manipulation

Operation C Java LEGv8

Shift left << << LSL

Shift right >> >>> LSR

Bit-by-bit AND & & AND, ANDI

Bit-by-bit OR | | ORR, ORI

Bit-by-bit NOT ~ ~ EOR, EORI

Note that Fig 2.8 in text shows incorrect LEGv8 operator for bit-wise OR


Branch Operations A common programming action is to evaluate a

condition and then choose between two choices e.g. an “if” statement

The basic building blocks to implement this are branch statements

There are two basic types: Conditional branches Unconditional branches

Note: assembler calculates addresses for branches as well as data addresses for data loads

Syntax for label is: “name:”

§2.7 Ins truction s for Making D

ecision s


Branch Operations II Conditional: branch to a labeled instruction if a

condition is true Otherwise, continue sequentially

Compare and branch if zero:

CBZ register, L1 if (register == 0) branch to instruction labeled L1;

Compare and branch if not zero:

CBNZ register, L1 if (register != 0) branch to instruction labeled L1;

Unconditional: always branch to labeled instruction

B L1 branch always to instruction labeled L1


Compiling If Statements C code:

if (i==j) f = g+h;else f = g-h;

Variables f, g,h,i,j stored in registers X19, …, X23 Compiled LEGv8 code:SUB X9,X22,X23 // X9 = X22-X23 = i-jCBNZ X9,Else // go to Else if i != jADD X19,X20,X21 // f = g+hB Exit // jump to after else part Else: SUB X19,X20,x21 // f = g-hExit: …

Assembler calculates addresses


Compiling Loop Statements C code:

while (save[i] == k) i = i + 1;

i in X22, k in X24, address of save in X25 Compiled LEGv8 code:

Loop: LSL X10,X22,#3 // X10 = i * 8

ADD X10,X10,X25 // X10 = &save[i] LDUR X9,[X10,#0] //X9 = save[i] SUB X11,X9,X24 //X11 = save[i] - k CBNZ X11,Exit //exit if save[i] != k

ADDI X22,X22,#1 / i = i+1 B Loop // start next iteration of loopExit: … // first instruction after the loop

In C language, “&” returns the address of the variable in memory, instead of its contents


Basic Blocks Sequences of Instruction that end in a branch so

important, they are given their own term A basic block is a sequence of instructions with

No embedded branches (except at end) No branch targets (except at beginning)

A compiler identifies basic blocks for optimization

An advanced processor can accelerate execution of basic blocks


Additional Comparisons We want to be able to test for a wide range of

conditions: Less than (<) Less than or equal (<=) Greater than (>) Greater than or equal (>=) Equal (=) Not equal (!=)

Can handle these cases by simply keeping track of four extra bits

The bits record what occurred during an instruction


Setting Flags These four added bits are called condition codes, or flags Flags are only set after the execution of the following

LEGv8 instructions: ADDS, ADDIS, ANDS, ANDIS, SUBS, SUBIS These are the same as the instructions as before, with

an added “and set flags” The condition codes (flags) are:

negative (N): result had 1 in MSB zero (Z): result was 0 overlow (V): result overflowed carry (C): result had carryout from MSB, or borrow

(consider (92)10 - (48)10) into MSB


More Conditional Branches Use subtract to set flags, then conditionally branch:

B.EQ (branch equal) B.NE (branch not equal) B.LT (less than, signed), B.LO (less than, unsigned) B.LE (less than or equal, signed), B.LS (less than or equal, unsigned) B.GT (greater than, signed), B.HI (greater than, unsigned) B.GE (greater than or equal, signed), B.HS (greater than or equal, unsigned) B.MI (branch on minus: N =1) B.PL (branch on plus: N= 0) B.VS (branch on overflow set: V = 1) B.VC (branch on overflow clear: V = 0)

See Figure 2.10 in text for how flags are tested for comparisons


Signed vs. Unsigned Comparing bit patterns differs based on whether

number is treated as signed or unsigned For signed comparison, MSB = 1 makes things negative,

thus smaller For unsigned comparison, MSB = 1 is still positive and

makes things even bigger

Consider the example below: X22 = 1111 1111 1111 1111 1111 1111 1111 1111 X23 = 0000 0000 0000 0000 0000 0000 0000 0001 X22 < X23 # signed

–1 < +1 X22 > X23 # unsigned

+4,294,967,295 > +1

Conditional Example if (a > b) a += 1; // a += 1 means a = a + 1

Assume a in X22, b in X23

SUBS X9,X22,X23 // use subtract to make comparison

B.LE Exit // conditional branch

ADDI X22,X22,#1

Exit:



Procedure Calling A procedure is a stored subroutine that performs a

specific task based on the parameters passed to it It may also provide a return value

The program that executes the procedure is called the caller

The procedure being executed is called the callee To execute a procedure, the program must:

1. Place parameters where procedure can access them

2. Transfer control to procedure

3. Acquire storage for procedure

4. Perform desired task

5. Place results where caller can access them

6. Return to place of call

§2.8 Su pportin g P

roce dures i n Com

p uter Hardw

are


Procedure Calling II Easiest to pass data to/from procedure using

registers X0-X7: registers used to pass parameters and return

values LR (X30): the link register that stores the address of

the first instruction after the procedure call in the caller

LEGv8 has a branch-and-link (BL) instruction: It saves the address of the following instruction (the

return address) in register LR, and then branches to specified address

BL ProcedureAddress


Procedure Calling III A key part of the stored-program concept is the need to keep track of the address of current instruction being executed This address is stored in the program counter (PC)

register The BL command saves PC + 4 into register LR To return from the procedure, the branch register instruction is used: BR LR

The BR instruction copies address stored in register LR to PC register


The Stack What if a procedure needs:

more than 8 parameters? needs to define local variables that don't fit in registers

(i.e. structures, arrays etc.)? Store incoming values of registers that it will be using?

A procedure uses a special area of memory called the stack

A stack is a last-in-first-out queue with operations: push (place data on stack) pop (remove data from stack)


The Stack II A stack requires a pointer to most recently allocated address on the stack The stack pointer (SP) register is used for this In LEGv8, this is register X28. See elaboration on page 102 of text for ARMv8

details. For historical reasons, the stack grows from higher addresses to lower addresses Pushing onto stack means subtracting from SP Popping means adding to SP


Leaf Procedure Example C code:long long int leaf_example (long long int g, long long int h, long long int i, long long int j){ long long int f; f = (g + h) - (i + j); return f;}

Arguments g,h,i,j in registers X0, …, X3 f in X19 We will save all registers we modify to stack to ensure

we don't lose data needed by caller

LEGv8 code:leaf_example:

SUBI SP,SP,#24

STUR X10,[SP,#16]

STUR X9,[SP,#8]

STUR X19,[SP,#0]

ADD X9,X0,X1

ADD X10,X2,X3

SUB X19,X9,X10

ADD X0,X19,XZR

LDUR X10,[SP,#16]

LDUR X9,[SP,#8]

LDUR X19,[SP,#0]

ADDI SP,SP,#24

BR LR


Leaf Procedure Example II

Save X10, X9, X19 on stack

X9 = g + h

X10 = i + jf = X9 – X10copy f to return register

Restore X10, X9, X19 from stack

Return to caller

Local Data on the Stack


Shows values of SP and what's on the stack (a) before, (b) during, and (c) after procedure call

Register Usage In last example, we stored to stack all registers that we used

Don't want to store registers that don't contain needed data

LEGv8 uses the following convention: X9 to X15: temporary registers

Not preserved by the callee so we don't have to store X19 to X28: saved registers

If used, the callee saves and restores them


Register Usage II

Above shows which registers must be saved, and which do not



Nested Procedures Procedures that call other procedures are more complicated as one next call can overwrite information needed by current procedure call

For nested call, caller needs to save on the stack: Its return address Any arguments and temporaries needed after the call

Restore from the stack after the call See “Nested Procedures” section on page 104 of text for more details


Local Variables on Stack Variables that are local to a procedure are called automatic variables They are created when procedure starts and destroyed when procedure exits

What if not enough registers free, or variable type doesn't fit in a register (i.e. an array or structure)? Then the local variable is created on the stack


Local Variables on Stack II Segment of stack containing procedures saved registers and local variables is called a procedure frame or activation record

Some compilers use a frame pointer (register FP or X29) Points to the first doubleword of the procedure's frame Provides a stable base register for local variable

access Can just use SP, but makes variable access more complicated

The Stack with FP


Shows values of SP and what's on the stack (a) before, (b) during, and (c) after procedure call


Variables and the Heap In addition to automatic variables, we need to allocate static data (i.e. constants) and dynamic data structures

Dynamic data structures are used for variables whose size can change over time (i.e. a linked list)

Dynamic data is stored in a section of the memory called the heap

The C language allocates space on the heap with the malloc() function, and frees it with the free() function.


Memory Layout The next slide shows the LEGv8 memory convention for allocating memory for use with Linux operating system

The stack starts at the high end of memory space and grows down

The heap starts at low end of memory and grows to meet the stack

There is also an area for static data, machine code (called text segment), and a reserved area


Memory Layout II Text: program code Static data: global

variables e.g., static variables in C,

constant arrays and strings

Dynamic data: heap E.g., malloc in C, new in

Java Stack: automatic

storage

LEGv8 Registers

X8 is used by procedures that return result via a pointer X16-X18 should not be used



Character Data (8 bits) Common for computers to store text data Most use 8 bits to represent characters Most common is the American Standard code for Information Interchange (ASCII)

Byte-encoded character sets ASCII: 128 characters

95 graphic characters, 33 control characters Latin-1: 256 characters

ASCII, +96 more graphic characters

§2.9 Co m

muni cating w

ith Peo ple


ASCII Representation of Characters

ASCII only uses the rightmost 7 bits, the eighth is unspecified Not shown values are control characters such as tab and backspace


Character Data (32 bits) The characters of some human languages do not fit in 8 bits Need larger formats

Unicode: 32-bit character set Used in Java, C++ wide characters, … Most of the world’s alphabets, plus symbols

UTF-8, UTF-16: variable-length encodings


Byte Operations As working with text is common, this means we want to

be able to work with a single byte LEGv8 provides operations to load and store a single byte

Load byte: LDURB Rt, [Rn, offset] Stores 1 byte in rightmost 8 bits of register Loads zeros for remaining 56 bits in rt ARMv8 provides a load version that signs extends the

byte Store byte:

STURB Rt, [Rn, offset] Store just rightmost byte into memory


Halfword Operations When working with Unicode, it is common to work with 16 bit

(a halfword) characters LEGv8 provides operations to load and store a single

halfword Load halfword:

LDURH Rt, [Rn, offset] Stores 1 halfword in rightmost 16 bits of register Loads zeros for remaining 48 bits in rt ARMv8 provides a load version that signs extends the

halfword Store halfword:

STURH Rt, [Rn, offset] Store just rightmost halfword


Character Strings Characters are normally grouped into

strings, which have a variable length i.e. “Hello”

The C language uses the null character '\0' (zero) to mark the end of the string

See string copy example on page 112 of text

Most constants are small 12-bit immediate is sufficient

For the occasional 32-bit constant, LEGv8 has commands that allow a 16 bit constant to be loaded into either the first, second, third, or fourth 16 bits of the desired register Move wide with zeros (MOVZ) loads a 16 bit constant and sets

all other bits to zero Move wide with keep (MOVK) loads a 16 bit constant and leaves

all other bits as they are Above commands are specified with a LSL command that is only

allowed the values of 0, 16, 32, 48 to specify which 16 bits to overwrite

MOVZ X9, 255, LSL 16 These instructions use the IW-format


32-bit Constants§2.10 L E

Gv8 A

ddress ing for 32-Bit I m

media tes and A

ddre sses


LEGv8 IW-format Instructions

Format for instructions with wide immediate parameter opcode: base opcode is (110100101)2 for MOVZ and (111100101)2

for MOVK Rd: destination register Quad: pattern “00” represents first 16 bits, …, “11” last 16 bits

Value is specified by the (0,16,32, or 48) parameter to LSL Immediate: 16 bit constant to be loaded into Rd register

Actual opcode is 11 bits: it is the concatenation of the 9 bit base opcode and the 2 bit quad

field The textbook sometimes labels this format “IW” and sometimes “IM”

opcode immediate Rd

9 bits 2 bits 16 bits 5 bits

quad

Below is an example that: loads a 16 bit constant into the 2nd 16 bits of the

register, setting all other bits to zero Then loads a second 16 bit constant into the

first 16 bits of the register


0000 0000 0000 0000

32-bit Constants Example

MOVZ X9,255,LSL 16

MOVK X9,255,LSL 0

0000 0000 0000 0000 0000 0000 1111 1111 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1111 1111 0000 0000 1111 1111

Branch Addressing Unconditional branches use the B-type instruction format shown below

First field is the 6 bit opcode Second field is the signed 26 bit address offset

Offset is relative to the program counter register As each instruction is 32 bits (4 bytes, 1 word), the offset specifies how many words the address is away from the address in the PC register

Example: B 10000

Address to branch to is: PC + 4(10000)10


5 10000ten

6 bits 26 bits

Branch Addressing II Conditional branches use the CB-type instruction format

shown below First field is the 8 bit opcode The 2nd field is the 19-bit signed address offset

Offset is relative to the program counter register Offset specified in words (4 bytes), not bytes

Example: CBNZ X19, Exit // go to Exit if X19 != 0

Address to branch to is: PC + 4(Exit) Read example on page 119 of text


181 Exit

8 bits 19 bits

19

5 bits

Decoding Machine Language Figure 2.1 in text explains all of LEGv8 assembly instructions introduced in this chapter

Figure 2.20 in text shows the LEGv8 encoding of opcodes for the LEGv8 machine language

Figure 2.21 in text shows all of the LEGv8 instruction formats (see also page one of the green card)

See “Decoding Machine Code” example on page 123 of text



Synchronization Read Section 2.11 for your own interest

§2.11 Parallelis m

and Instruc tions: Synchro nization


Translation and Startup§2.12 T

ranslati ng and Startin g a P

ro gram

This section should be read carefully, and in detail

It is not very important for this course on computer hardware, BUT IS IMPORTANT for your understanding of how software works

I will only cover some highlights, but you should read it in depth


Translation and Startup II

Many compilers produce object modules directly

Static linking

Read Fig 2.22 description in text


Producing an Object Module Assembler (or compiler) translates program into

machine instructions and puts them into an object file

Provides information for building a complete program from the pieces Header: describes size and position of other pieces of file Text segment: contains the machine code Static data segment: data allocated for the life of the

program Relocation info: for contents that depend on absolute

location of loaded program Symbol table: global definitions and external references Debug info: for associating with source code


Linking Object Modules A linker is a program that combines independently

assembled object files and resolves undefined labels and produces an executable file

The linker produces an executable image by:

1. Merging object files (including library routines)

2. Resolving labels (determines their addresses)

3. Patch location-dependent and external refs Linker uses the relocation info and symbol table in

each object module to resolve undefined labels When library routines are added as part of the linking

process, this produces a statically linked executable


Loading a Program A loader is a system program that places a program

file into main memory so that it is ready to execute It loads executable file on disk into memory

1. Reads header to determine segment sizes needed

2. Creates virtual address space large enough

3. Copies text (machine code) and data from file to memory

4. Sets up arguments (if any) on stack

5. Initializes registers (including SP, FP)

6. Jumps to startup routine for program Copies parameters to argument registers and calls main

procedure of program When main returns, does exit system call to terminate program


Dynamic Linking In static linking, all library routines (whether used or not by the program) are added to the executable Adds a lot of unneeded bloat to the executable If new version available, it isn't used

A dynamically linked library (DLL) are library routines that are only linked to a program during execution

A DLL only links/loads library procedure when it is called Requires procedure code to be relocatable Automatically picks up new library versions


C Sort Example This provides a complete example of a sorting example written in C converted to assembly

Includes example of one procedure calling a second

Read Section 2.13 on your own

§2.13 A C

Sort E

xamp le to P

u t It All Togethe r


Arrays vs. Pointers Read Section 2.14 on your own

§2.14 Arrays v ersus P

ointers


Real Stuff: The Rest of ARMv8 Instruction Set

Read Section 2.19 for own interest Contains ARMv8 instructions not in LEGv8 Discusses differences between LEGv8 and

ARMv8 Useful if you wish to program in ARMv8

assembly in the future

§2.19 The rest of A

RM

v8 Inst ruction Set


Fallacies and Pitfalls Read Section 2.20 on your own

§2.20 Fallacies and P

itfalls


Concluding Remarks Read Section 2.21 on your own

§2.21 Conclud ing R

emarks

instructions: language of the computerleduc/slides2ga3/2ga3slides2.pdf · 2020-01-06 · chapter 2...

Documents