7: basic x86 architecture - 1).pdf · 7: basic x86 architecture . computer architecture and systems...

72
1 7: Basic x86 architecture Computer Architecture and Systems Programming 252-0061-00, Herbstsemester 2013 Timothy Roscoe

Upload: doanhanh

Post on 25-Mar-2018

252 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

1

7: Basic x86 architecture

Computer Architecture and Systems Programming

252-0061-00, Herbstsemester 2013

Timothy Roscoe

Page 2: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

2

7.1: What is an instruction set architecture?

Computer Architecture and Systems Programming

252-0061-00, Herbstsemester 2013

Timothy Roscoe

Page 3: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

3

Definitions

• Architecture: (also instruction set architecture: ISA) The parts of a processor design that one needs to understand to write assembly code. Examples: – instruction set specification, registers.

• Microarchitecture: Implementation of the architecture. • Examples:

– cache sizes and core frequency.

• Example ISAs: x86, MIPS, ia64, VAX, Alpha, ARM, etc.

Page 4: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

4

Instruction Set Architecture

• Assembly Language View – Processor state

• Registers, memory, … – Instructions

• addl, movl, leal, … • How instructions are encoded as bytes

• Layer of Abstraction – Above: how to program machine

• Processor executes instructions in a sequence

– Below: what needs to be built • Use variety of tricks to make it run fast • E.g., execute multiple instructions

simultaneously

ISA

Compiler OS

CPU Design

Circuit Design

Chip Layout

Application Program

Page 5: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

5

CISC Instruction Sets

– Complex Instruction Set Computer – Dominant style through mid-80’s

• Stack-oriented instruction set – Use stack to pass arguments, save program counter – Explicit push and pop instructions

• Arithmetic instructions can access memory – addl %eax, 12(%ebx,%ecx,4)

• requires memory read and write • Complex address calculation

• Condition codes – Set as side effect of arithmetic and logical instructions

• Philosophy – Add instructions to perform “typical” programming tasks

Page 6: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

6

RISC Instruction Sets

– Reduced Instruction Set Computer – Internal project at IBM, later popularized by Hennessy (Stanford)

and Patterson (Berkeley) • Fewer, simpler instructions

– Might take more to get given task done – Can execute them with small and fast hardware

• Register-oriented instruction set – Many more (typically 32) registers – Use for arguments, return pointer, temporaries

• Only load and store instructions can access memory – Similar to Y86 mrmovl and rmmovl – see later!

• No Condition codes – Test instructions return 0/1 in register

Page 7: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

7

Contrast with x86 / 64-bit

• Operations are highly uniform – All encoded in exactly 32 bits – All take the same time to execute (mostly) – All operate between registers, or only load/store – All operate on 64 or 32 bit quantities (nothing

smaller)

• No condition codes: use registers • Lots of registers, including zero

– All registers are uniform

Page 8: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

8

Other RISC features (not in Alpha)

• Explicit delay slots (e.g. MIPS) – E.g. can’t use a value until 2 instructions after the load

• Make most instructions conditional (e.g. ARM) – Needs condition codes (why?)

– Reduces branches, increases code density

• Etc.

• Key message: x86 is not the only way to do this!

Page 9: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

9

CISC vs. RISC

• Original Debate – Strong opinions! – CISC proponents---easy for compiler, fewer code bytes – RISC proponents---better for optimizing compilers, can

make run fast with simple chip design • Current Status

– For desktop processors, choice of ISA not a technical issue • With enough hardware, can make anything run fast • Code compatibility more important

– For embedded processors, RISC still makes sense • Smaller, cheaper, less power • For how much longer?

Page 10: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

10

Comparison with MIPS (remember Digital Design?)

• MIPS is RISC: Reduced Instruction Set – Motivation: simpler is faster

• Fewer gates ⇒ higher frequency • Fewer gates ⇒ more transistors left for cache

– Seemed like a really good idea • x86 is CISC: Complex Instruction Set

– More complex instructions, addressing modes • Intel turned out to be way too good at manufacturing • Difference in gate count became too small to make a

difference • x86 inside is mostly RISC anyway, decode logic is small

– ⇒ Argument is mostly irrelevant these days

Page 11: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

11

There are many architectures…

• You’ve already seen MIPS 2000 → MIPS 3000 → … – Workstations, minicomputers, now mostly embedded networking

• IBM S/360 → S/370 → … → zSeries – First to separate architecture from (many) implementations

• ARM (several variants) – Very common in embedded systems, basis for Advanced OS course at ETHZ

• IBM POWER → PowerPC (→ Cell, sort of) – Basis for all 3 last-gen games console systems

• DEC Alpha – Personal favorite; killed by Compaq, team left for Intel to work on…

• Intel Itanium – First 64-bit Intel product; very fast (esp. FP), hot, and expensive – Mostly overtaken by 64-bit x86 designs

• etc.

Page 12: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

12

Summary

• Architecture vs. Microarchitecture

• Instruction set architectures

• RISC vs. CISC

• x86: comparison with MIPS

Page 13: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

13

7.2: A bit of x86 history

Computer Architecture and Systems Programming

252-0061-00, Herbstsemester 2013

Timothy Roscoe

Page 14: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

14

Intel x86 Processors

• The x86 Architecture dominates the computer market

• Evolutionary design – Backwards compatible up until 8086, introduced in 1978 – Added more features as time goes on

• Complex instruction set computer (CISC)

– Many different instructions with many different formats • But, only small subset encountered with Linux programs

– Hard to match performance of Reduced Instruction Set Computers (RISC)

– But, Intel has done just that!

Page 15: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

15

Intel x86 Evolution: Milestones

Name Date Transistors MHz • 8086 1978 29K 5-10

– First 16-bit processor. Basis for IBM PC & DOS – 1MB address space

• 80386 1985 275K 16-33 – First 32 bit processor , referred to as IA32 – Added “flat addressing” – Capable of running Unix – 32-bit Linux/gcc uses no instructions introduced in later models

• Pentium 4F 2005 230M 2800-3800 – First 64-bit [x86] processor – Meanwhile, Pentium 4s (Netburst arch.) phased out in favor of

“Core” line

Page 16: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

16

Intel x86 Processors: Overview

X86-64 / EM64t

X86-32/IA32

X86-16 8086 286

386 486 Pentium Pentium MMX

Pentium III

Pentium 4

Pentium 4E

Pentium 4F Core 2 Duo Core i7

IA: often redefined as latest Intel architecture

time

Architectures Processors

MMX

SSE

SSE2

SSE3

SSE4

Page 17: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

17

Intel x86 Processors, contd.

• Machine Evolution 486 1989 1.9M Pentium 1993 3.1M Pentium/MMX ‘97 74.5M PentiumPro 1995 6.5M Pentium III 1999 8.2M Pentium 4 2001 42M Core 2 Duo 2006 291M

• Added Features – Instructions to support multimedia operations

• Parallel operations on 1, 2, and 4-byte data, both integer & FP – Instructions to enable more efficient conditional operations

Page 18: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

18

x86 Clones: Advanced Micro Devices (AMD)

• Historically – AMD has followed just behind Intel – A little bit slower, a lot cheaper

• Then – Recruited top circuit designers from Digital Equipment

Corp. and other downward trending companies – Built Opteron: tough competitor to Pentium 4 – Developed x86-64, their own extension to 64 bits

• Recently – Intel much quicker with dual core design – Intel currently far ahead in performance – em64t backwards compatible to x86-64

Page 19: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

19

Intel’s 64-Bit (partially true…)

• Intel Attempted Radical Shift from IA32 to IA64 – Totally different architecture (Itanium) – Executes IA32 code only as legacy – Performance disappointing

• AMD Stepped in with Evolutionary Solution – x86-64 (now called “AMD64”)

• Intel Felt Obligated to Focus on IA64 – Hard to admit mistake or that AMD is better

• 2004: Intel Announces EM64T extension to IA32 – Extended Memory 64-bit Technology – Almost identical to x86-64!

Page 20: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

20

Intel Nehalem-EX

• Current leader (for the next few weeks) – 2.3 billion transistors/die – 8 or 10 cores per die – 2 threads per core – Up to 8 packages

(= 128 contexts!) – 4 memory channels per package – Virtualization support – etc.

• Good illustration of why it is hard to teach state-of-the-art processor design!

Page 21: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

21

Intel Single-Chip Cloud Computer - 2010

• Experimental processor (only a few 100 made) – Designed for research – Working version in our Lab

• 48 old-style Pentium cores • Very fast interconnection

network – Hardware support for

messaging between cores – Variable speed of network

• Non-cache coherent – Sharing memory between

cores won’t work with a conventional OS!

Page 22: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

22

A quick note on syntax

There are two common ways to write x86 Assembler:

• AT&T syntax – What we'll use in this course, common on Unix

• Intel syntax – Generally used for Windows machines

Page 23: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

23

7.3: Basics of machine code

Computer Architecture and Systems Programming

252-0061-00, Herbstsemester 2013

Timothy Roscoe

Page 24: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

24

CPU

Assembly programmer’s view

Programmer-Visible State – PC: Program counter

• Address of next instruction • Called “EIP” (IA32) or “RIP” (x86-64)

– Register file • Heavily used program data

– Condition codes • Store status information about most

recent arithmetic operation • Used for conditional branching

Memory • Byte addressable array • Code, user data, (some) OS data • Includes stack used to support

procedures

PC Registers

Memory

Object Code Program Data OS Data

Addresses

Data

Instructions

Stack

Condition Codes

Page 25: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

25

Compiling into assembly

int sum(int x, int y) { int t = x+y; return t; }

Generated ia32 assembly sum: pushl %ebp movl %esp,%ebp movl 12(%ebp),%eax addl 8(%ebp),%eax movl %ebp,%esp popl %ebp ret

Obtain with command

gcc -O -S code.c

Produces file code.s

Some compilers use single instruction “leave”

C code

Page 26: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

26

Assembly data types

• “Integer” data of 1, 2, or 4 bytes – Data values – Addresses (untyped pointers)

• Floating point data of 4, 8, or 10 bytes

• No aggregate types such as arrays or

structures – Just contiguously allocated bytes in memory

Page 27: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

27

Assembly code operations

• Perform arithmetic function on register or memory data

• Transfer data between memory and register – Load data from memory into register – Store register data into memory

• Transfer control

– Unconditional jumps to/from procedures – Conditional branches

Page 28: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

28

Code for sum 0x401040 <sum>: 0x55 0x89 0xe5 0x8b 0x45 0x0c 0x03 0x45 0x08 0x89 0xec 0x5d 0xc3

Object code • Assembler

– Translates .s into .o – Binary encoding of each instruction – Nearly-complete image of

executable code – Missing linkages between code in

different files • Linker

– Resolves references between files – Combines with static run-time

libraries • E.g., code for malloc, printf

– Some libraries are dynamically linked

• Linking occurs when program begins execution

• Total of 13 bytes

• Each instruction 1, 2, or 3 bytes

• Starts at address 0x401040

Page 29: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

29

Machine instruction example

• C Code – Add two signed integers

• Assembly – Add 2 4-byte integers

• “Long” words in GCC parlance • Same instruction whether

signed or unsigned – Operands:

• x: Register %eax • y: Memory M[%ebp+8] • t: Register %eax

– Return function value in %eax • Object Code

– 3-byte instruction – Stored at address 0x401046

int t = x+y;

addl 8(%ebp),%eax

0x401046: 03 45 08

Similar to expression:

x += y

More precisely:

int eax;

int *ebp;

eax += ebp[2]

Page 30: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

30

Disassembled 00401040 <_sum>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 8b 45 0c mov 0xc(%ebp),%eax 6: 03 45 08 add 0x8(%ebp),%eax 9: 89 ec mov %ebp,%esp b: 5d pop %ebp c: c3 ret d: 8d 76 00 lea 0x0(%esi),%esi

Disassembling object code

• Disassembler – objdump -d p – Useful tool for examining object code – Analyzes bit pattern of series of instructions – Produces approximate rendition of assembly code – Can be run on either a.out (complete executable) or .o file

Page 31: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

31

Disassembled 0x401040 <sum>: push %ebp

0x401041 <sum+1>: mov %esp,%ebp 0x401043 <sum+3>: mov 0xc(%ebp),%eax 0x401046 <sum+6>: add 0x8(%ebp),%eax 0x401049 <sum+9>: mov %ebp,%esp 0x40104b <sum+11>: pop %ebp 0x40104c <sum+12>: ret 0x40104d <sum+13>: lea 0x0(%esi),%esi

Alternate disassembly

Within gdb Debugger – gdb p – disassemble sum

• Disassemble procedure – x/13b sum

• Examine the 13 bytes starting at sum

Object 0x401040: 0x55 0x89 0xe5 0x8b 0x45 0x0c 0x03 0x45 0x08 0x89 0xec 0x5d 0xc3

Page 32: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

32

What can be disassembled?

• Anything that can be interpreted as executable code • Disassembler examines bytes and reconstructs assembly source

% objdump -d WINWORD.EXE WINWORD.EXE: file format pei-i386 No symbols in "WINWORD.EXE". Disassembly of section .text: 30001000 <.text>: 30001000: 55 push %ebp 30001001: 8b ec mov %esp,%ebp 30001003: 6a ff push $0xffffffff 30001005: 68 90 10 00 30 push $0x30001090 3000100a: 68 91 dc 4c 30 push $0x304cdc91

Page 33: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

33

Summary

• Compiling into assembly

• Data types in assembly

• Assembly code operations

• Object code, and disassembling it

Page 34: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

34

7.4: 32-bit x86 architecture

Computer Architecture and Systems Programming

252-0061-00, Herbstsemester 2013

Timothy Roscoe

Page 35: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

35

Integer registers (ia32) %eax

%ecx

%edx

%ebx

%esi

%edi

%esp

%ebp

%ax

%cx

%dx

%bx

%si

%di

%sp

%bp

%ah

%ch

%dh

%bh

%al

%cl

%dl

%bl

16-bit virtual registers (backwards compatibility)

gene

ral p

urpo

se

accumulate

counter

data

base

source index

destination index

stack pointer

base pointer

Origin (mostly obsolete)

Page 36: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

36

Moving data: ia32

• movx Source, Dest – x in {b, w, l}

– movl Source, Dest: Move 4-byte “long word”

– movw Source, Dest: Move 2-byte “word”

– movb Source, Dest: Move 1-byte “byte”

• Lots of these in typical code

%eax

%ecx

%edx

%ebx

%esi

%edi

%esp

%ebp

Page 37: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

37

Moving data: ia32

movl Source, Dest:

• Operand Types – Immediate: Constant integer data

• Example: $0x400, $-533 • Like C constant, but prefixed with ‘$’ • Encoded with 1, 2, or 4 bytes

– Register: One of 8 integer registers • Example: %eax, %edx • But %esp and %ebp reserved for special use • Others have special uses for particular instructions

– Memory: 4 consecutive bytes of memory at address given by register

• Simplest example: (%eax) • Various other “address modes”

%eax

%ecx

%edx

%ebx

%esi

%edi

%esp

%ebp

Page 38: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

38

movl operand combinations

Cannot do memory-memory transfer with a single instruction

movl

Imm

Reg

Mem

Reg

Mem

Reg

Mem

Reg

Source Dest C Analog

movl $0x4,%eax temp = 0x4;

movl $-147,(%eax) *p = -147;

movl %eax,%edx temp2 = temp1;

movl %eax,(%edx) *p = temp;

movl (%eax),%edx temp = *p;

Src,Dest

Page 39: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

39

Simple memory addressing modes

• Normal (R) Mem[Reg[R]] – Register R specifies memory address

movl (%ecx),%eax

• Displacement D(R) Mem[Reg[R]+D] – Register R specifies start of memory region

– Constant displacement D specifies offset movl 8(%ebp),%edx

Page 40: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

40

Using simple addressing modes

void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0; }

swap: pushl %ebp movl %esp,%ebp pushl %ebx movl 12(%ebp),%ecx movl 8(%ebp),%edx movl (%ecx),%eax movl (%edx),%ebx movl %eax,(%edx) movl %ebx,(%ecx) movl -4(%ebp),%ebx movl %ebp,%esp popl %ebp ret

Body

Set Up

Finish

Page 41: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

41

Using simple addressing modes

void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0; }

swap: pushl %ebp movl %esp,%ebp pushl %ebx movl 12(%ebp),%ecx movl 8(%ebp),%edx movl (%ecx),%eax movl (%edx),%ebx movl %eax,(%edx) movl %ebx,(%ecx) movl -4(%ebp),%ebx movl %ebp,%esp popl %ebp ret

Body

Set Up

Finish

Page 42: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

42

Understanding swap

void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0; }

movl 12(%ebp),%ecx # ecx = yp movl 8(%ebp),%edx # edx = xp movl (%ecx),%eax # eax = *yp (t1) movl (%edx),%ebx # ebx = *xp (t0) movl %eax,(%edx) # *xp = eax movl %ebx,(%ecx) # *yp = ebx

Stack (in memory)

Register Value

%ecx yp %edx xp %eax t1 %ebx t0

• • •

yp

xp

Rtn adr

Old %ebp %ebp 0

4

8

12

Offset

Old %ebx -4

Page 43: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

43

Understanding swap

%epb → 0 -4

4 xp 8 yp 12

Offset

Address

0x124 123 0x120 456 0x11c 0x118 0x114 0x110 0x120 0x10c 0x124 0x108 Rtn adr 0x104 0x100

%eax

0x124 %edx

0x120 %ecx

%ebx

%esi

%edi

%esp

0x104 %ebp

Regi

ster

file

Mem

ory

movl 12(%ebp),%ecx # ecx = yp movl 8(%ebp),%edx # edx = xp movl (%ecx),%eax # eax = *yp (t1) movl (%edx),%ebx # ebx = *xp (t0) movl %eax,(%edx) # *xp = eax movl %ebx,(%ecx) # *yp = ebx

Page 44: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

44

Understanding swap

%epb → 0 -4

4 xp 8 yp 12

Offset

Address

0x124 123 0x120 456 0x11c 0x118 0x114 0x110 0x120 0x10c 0x124 0x108 Rtn adr 0x104 0x100

456 %eax

0x124 %edx

0x120 %ecx

%ebx

%esi

%edi

%esp

0x104 %ebp

Regi

ster

file

Mem

ory

movl 12(%ebp),%ecx # ecx = yp movl 8(%ebp),%edx # edx = xp movl (%ecx),%eax # eax = *yp (t1) movl (%edx),%ebx # ebx = *xp (t0) movl %eax,(%edx) # *xp = eax movl %ebx,(%ecx) # *yp = ebx

Page 45: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

45

Understanding swap

%epb → 0 -4

4 xp 8 yp 12

Offset

Address

0x124 123 0x120 456 0x11c 0x118 0x114 0x110 0x120 0x10c 0x124 0x108 Rtn adr 0x104 0x100

456 %eax

0x124 %edx

0x120 %ecx

123 %ebx

%esi

%edi

%esp

0x104 %ebp

Regi

ster

file

Mem

ory

movl 12(%ebp),%ecx # ecx = yp movl 8(%ebp),%edx # edx = xp movl (%ecx),%eax # eax = *yp (t1) movl (%edx),%ebx # ebx = *xp (t0) movl %eax,(%edx) # *xp = eax movl %ebx,(%ecx) # *yp = ebx

Page 46: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

46

Complete memory addressing modes

• Most General Form:

– D: Constant “displacement” 1, 2, or 4 bytes – Rb: Base register: Any of 8 integer registers – Ri: Index register: Any, except for %esp

• Unlikely you’d use %ebp, either – S: Scale: 1, 2, 4, or 8 (why these numbers?)

• Special Cases (Rb,Ri) Mem[Reg[Rb]+Reg[Ri]] D(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]+D] (Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]]

D(Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]+ D]

(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]] D(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]+D] (Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]]

Page 47: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

47

Address computation examples

%edx

%ecx

0xf000

0x100

Expression Address Computation Address

0x8(%edx) 0xf000 + 0x8 0xf008

(%edx,%ecx) 0xf000 + 0x100 0xf100

(%edx,%ecx,4) 0xf000 + 4*0x100 0xf400

0x80(,%edx,2) 2*0xf000 + 0x80 0x1e080

Page 48: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

48

Address computation instruction

• leal Src,Dest – Src is address mode expression

– Set Dest to address denoted by expression

• Uses – Computing addresses without a memory reference

• E.g., translation of p = &x[i];

– Computing arithmetic expressions of the form x + k*y • k = 1, 2, 4, or 8

Page 49: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

49

Summary

• 32-bit x86 registers

• mov instruction: loads and stores

• memory addressing modes – Example: swap()

• leal: address computation

Page 50: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

50

7.5: ia32 integer arithmetic

Computer Architecture and Systems Programming

252-0061-00, Herbstsemester 2013

Timothy Roscoe

Page 51: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

51

Some arithmetic operations

• Two operand instructions: Format Computation addl Src,Dest Dest ← Dest + Src subl Src,Dest Dest ← Dest - Src imull Src,Dest Dest ← Dest * Src sall Src,Dest Dest ← Dest << Src Also called shll sarl Src,Dest Dest ← Dest >> Src Arithmetic shrl Src,Dest Dest ← Dest >> Src Logical xorl Src,Dest Dest ← Dest ^ Src andl Src,Dest Dest ← Dest & Src orl Src,Dest Dest ← Dest | Src

• No distinction between signed and unsigned int (why?)

Page 52: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

52

Some arithmetic operations

• One operand instructions Format Computation

incl Dest Dest ← Dest + 1

decl Dest Dest ← Dest - 1

negl Dest Dest ← -Dest

notl Dest Dest ← ~Dest

• See book for more instructions

Page 53: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

53

Using leal for arithmetic expressions

int arith (int x, int y, int z) { int t1 = x+y; int t2 = z+t1; int t3 = x+4; int t4 = y * 48; int t5 = t3 + t4; int rval = t2 * t5; return rval; }

arith: pushl %ebp movl %esp,%ebp movl 8(%ebp),%eax movl 12(%ebp),%edx leal (%edx,%eax),%ecx leal (%edx,%edx,2),%edx sall $4,%edx addl 16(%ebp),%ecx leal 4(%edx,%eax),%eax imull %ecx,%eax movl %ebp,%esp popl %ebp ret

Body

Set Up

Finish

Page 54: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

54

Understanding arith int arith (int x, int y, int z) { int t1 = x+y; int t2 = z+t1; int t3 = x+4; int t4 = y * 48; int t5 = t3 + t4; int rval = t2 * t5; return rval; }

movl 8(%ebp),%eax # eax = x movl 12(%ebp),%edx # edx = y leal (%edx,%eax),%ecx # ecx = x+y (t1) leal (%edx,%edx,2),%edx # edx = 3*y sall $4,%edx # edx = 48*y (t4) addl 16(%ebp),%ecx # ecx = z+t1 (t2) leal 4(%edx,%eax),%eax # eax = 4+t4+x (t5) imull %ecx,%eax # eax = t5*t2 (rval)

y

x

Rtn adr

Old %ebp %ebp 0

4

8

12

Offset Stack

• • •

z 16

• • •

z

y

x

Rtn adr

Old %ebp

Page 55: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

55

Another example

int logical(int x, int y) { int t1 = x^y; int t2 = t1 >> 17; int mask = (1<<13) - 7; int rval = t2 & mask; return rval; }

logical: pushl %ebp movl %esp,%ebp movl 8(%ebp),%eax xorl 12(%ebp),%eax sarl $17,%eax andl $8185,%eax movl %ebp,%esp popl %ebp ret

Body

Setup

Finish

movl 8(%ebp),%eax # eax = x xorl 12(%ebp),%eax # eax = x^y (t1) sarl $17,%eax # eax = t1>>17 (t2) andl $8185,%eax # eax = t2 & 8185

213 = 8192, 213 – 7 = 8185

Page 56: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

56

7.6: 64-bit x86 architecture

Computer Architecture and Systems Programming

252-0061-00, Herbstsemester 2013

Timothy Roscoe

Page 57: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

57

Data representations: ia32 and x86-64

C data type Typical 32-bit ia32 Intel x86-64

char 1 1 1

short 2 2 2

int 4 4 4

long 4 4 8

long long 8 8 8

float 4 4 4

double 8 8 8

long double 8 10/12 10/16

char * (or any other pointer)

4 4 8

Sizes of C objects (in bytes)

Page 58: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

58

%rax

%rbx

%rcx

%rdx

%rsi

%rdi

%rsp

%rbp

x86-64 integer registers

– Extend existing registers. Add 8 new ones. – Make %ebp/%rbp general purpose

%eax

%ebx

%ecx

%edx

%esi

%edi

%esp

%ebp

%r8

%r9

%r10

%r11

%r12

%r13

%r14

%r15

%r8d

%r9d

%r10d

%r11d

%r12d

%r13d

%r14d

%r15d

Page 59: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

59

Instructions

• Long word l (4 Bytes) ↔ Quad word q (8 Bytes)

• New instructions: – movl → movq – addl → addq – sall → salq – etc.

• 32-bit instructions that generate 32-bit results

– Set higher order bits of destination register to 0 – Example: addl

Page 60: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

60

Swap in 32-bit mode void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0; }

swap: pushl %ebp movl %esp,%ebp pushl %ebx movl 12(%ebp),%ecx movl 8(%ebp),%edx movl (%ecx),%eax movl (%edx),%ebx movl %eax,(%edx) movl %ebx,(%ecx) movl -4(%ebp),%ebx movl %ebp,%esp popl %ebp ret

Body

Setup

Finish

Page 61: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

61

Swap in 64-bit Mode

• Operands passed in registers (why useful?) – First (xp) in %rdi, second (yp) in %rsi – 64-bit pointers

• No stack operations required • 32-bit data

– Data held in registers %eax and %edx – movl operation

void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0; }

swap: movl (%rdi), %edx movl (%rsi), %eax movl %eax, (%rdi) movl %edx, (%rsi) retq

Page 62: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

62

Swap Long Ints in 64-bit Mode

• 64-bit data – Data held in registers %rax and %rdx – movq operation – “q” stands for quad-word

void swap_l (long int *xp, long int *yp) { long int t0 = *xp; long int t1 = *yp; *xp = t1; *yp = t0; }

swap_l: movq (%rdi), %rdx movq (%rsi), %rax movq %rax, (%rdi) movq %rdx, (%rsi) retq

Page 63: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

63

7.7: Condition codes

Computer Architecture and Systems Programming

252-0061-00, Herbstsemester 2013

Timothy Roscoe

Page 64: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

64

Processor State (ia32, Partial) • Information about

currently executing program – Temporary data

( %eax, … )

– Location of runtime stack ( %ebp,%esp )

– Location of current code control point ( %eip, … )

– Status of recent tests ( CF,ZF,SF,OF) %eip

General purpose registers

Current stack top

Current stack frame

Instruction pointer

CF ZF SF OF Condition codes

%eax

%ecx

%edx

%ebx

%esi

%edi

%esp

%ebp

Page 65: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

65

Condition codes (implicit setting)

• Single bit registers CF Carry Flag (for unsigned) SF Sign Flag (for signed) ZF Zero Flag OF Overflow Flag (for signed)

• Implicitly set (think of it as side effect) by arithmetic operations

Example: addl/addq Src,Dest ↔ t = a+b – CF set if carry out from most significant bit (unsigned overflow) – ZF set if t == 0 – SF set if t < 0 (as signed) – OF set if two’s complement (signed) overflow

(a>0 && b>0 && t<0) || (a<0 && b<0 && t>=0)

• Not set by lea instruction • Full documentation link on course website

Page 66: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

66

Condition Codes (Explicit Setting: Compare)

• Explicit Setting by Compare Instruction cmpl/cmpq Src2,Src1 cmpl b,a like computing a-b without setting destination

CF set if carry out from most significant bit (used for unsigned comparisons) ZF set if a == b SF set if (a-b) < 0 (as signed) OF set if two’s complement (signed) overflow: (a>0 && b<0 && (a-b)<0) || (a<0 && b>0 && (a-b)>0)

Page 67: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

67

Condition Codes (Explicit Setting: Test)

• Explicit Setting by Test instruction

testl/testq Src2,Src1 testl b,a like computing a&b w/o setting destination

– Sets condition codes based on value of Src1 & Src2 – Useful to have one of the operands be a mask

ZF set when a&b == 0 SF set when a&b < 0

Page 68: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

68

Reading Condition Codes

• SetX Instructions – Set single byte based on combinations of

condition codes

SetX Condition Description

sete ZF Equal / Zero setne ~ZF Not Equal / Not Zero sets SF Negative setns ~SF Nonnegative setg ~(SF^OF)&~ZF Greater (Signed) setge ~(SF^OF) Greater or Equal (Signed) setl (SF^OF) Less (Signed) setle (SF^OF)|ZF Less or Equal (Signed) seta ~CF&~ZF Above (unsigned) setb CF Below (unsigned)

Page 69: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

69

Reading Condition Codes (Cont.) • setx Instructions:

Set single byte based on combination of condition codes

• One of 8 addressable byte registers – Does not alter remaining 3 bytes

– Typically use movzbl to finish job

int gt (int x, int y) { return x > y; }

movl 12(%ebp),%eax # eax = y cmpl %eax,8(%ebp) # Compare x : y setg %al # al = x > y movzbl %al,%eax # Zero rest of %eax

Body

%eax

%ecx

%edx

%ebx

%esi

%edi

%esp

%ebp

%al %ah

%cl %ch

%dl %dh

%bl %bh

Page 70: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

70

Reading Condition Codes: x86-64

• setx Instructions: – Set single byte based on combination of condition codes

– Does not alter remaining 3 bytes

int gt (long x, long y) { return x > y; }

xorl %eax, %eax # eax = 0 cmpq %rsi, %rdi # Compare x and y setg %al # al = x > y

Body (same for both)

long lgt (long x, long y) { return x > y; }

Is %rax zero? Yes: 32-bit instructions set high order 32 bits to 0!

Page 71: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

71

Jumping

jX Instructions: Jump to different part of code depending on condition codes

jX Condition Description jmp 1 Unconditional

je ZF Equal / Zero

jne ~ZF Not Equal / Not Zero

js SF Negative

jns ~SF Non-negative

jg ~(SF^OF)&~ZF Greater (Signed)

jge ~(SF^OF) Greater or Equal (Signed)

jl (SF^OF) Less (Signed)

jle (SF^OF)|ZF Less or Equal (Signed)

ja ~CF&~ZF Above (unsigned)

jb CF Below (unsigned)

Page 72: 7: Basic x86 architecture - 1).pdf · 7: Basic x86 architecture . Computer Architecture and Systems Programming . ... • Assembly – Add 2 4-byte integers ... Alternate disassembly

72

Summary

• Condition codes (C, Z, S, O)

• Explicit setting of condition codes – Compare

– Test

• Reading condition codes – setX

• Jumps