7: basic x86 architecture - 1).pdf · 7: basic x86 architecture . computer architecture and systems...
TRANSCRIPT
1
7: Basic x86 architecture
Computer Architecture and Systems Programming
252-0061-00, Herbstsemester 2013
Timothy Roscoe
2
7.1: What is an instruction set architecture?
Computer Architecture and Systems Programming
252-0061-00, Herbstsemester 2013
Timothy Roscoe
3
Definitions
• Architecture: (also instruction set architecture: ISA) The parts of a processor design that one needs to understand to write assembly code. Examples: – instruction set specification, registers.
• Microarchitecture: Implementation of the architecture. • Examples:
– cache sizes and core frequency.
• Example ISAs: x86, MIPS, ia64, VAX, Alpha, ARM, etc.
4
Instruction Set Architecture
• Assembly Language View – Processor state
• Registers, memory, … – Instructions
• addl, movl, leal, … • How instructions are encoded as bytes
• Layer of Abstraction – Above: how to program machine
• Processor executes instructions in a sequence
– Below: what needs to be built • Use variety of tricks to make it run fast • E.g., execute multiple instructions
simultaneously
ISA
Compiler OS
CPU Design
Circuit Design
Chip Layout
Application Program
5
CISC Instruction Sets
– Complex Instruction Set Computer – Dominant style through mid-80’s
• Stack-oriented instruction set – Use stack to pass arguments, save program counter – Explicit push and pop instructions
• Arithmetic instructions can access memory – addl %eax, 12(%ebx,%ecx,4)
• requires memory read and write • Complex address calculation
• Condition codes – Set as side effect of arithmetic and logical instructions
• Philosophy – Add instructions to perform “typical” programming tasks
6
RISC Instruction Sets
– Reduced Instruction Set Computer – Internal project at IBM, later popularized by Hennessy (Stanford)
and Patterson (Berkeley) • Fewer, simpler instructions
– Might take more to get given task done – Can execute them with small and fast hardware
• Register-oriented instruction set – Many more (typically 32) registers – Use for arguments, return pointer, temporaries
• Only load and store instructions can access memory – Similar to Y86 mrmovl and rmmovl – see later!
• No Condition codes – Test instructions return 0/1 in register
7
Contrast with x86 / 64-bit
• Operations are highly uniform – All encoded in exactly 32 bits – All take the same time to execute (mostly) – All operate between registers, or only load/store – All operate on 64 or 32 bit quantities (nothing
smaller)
• No condition codes: use registers • Lots of registers, including zero
– All registers are uniform
8
Other RISC features (not in Alpha)
• Explicit delay slots (e.g. MIPS) – E.g. can’t use a value until 2 instructions after the load
• Make most instructions conditional (e.g. ARM) – Needs condition codes (why?)
– Reduces branches, increases code density
• Etc.
• Key message: x86 is not the only way to do this!
9
CISC vs. RISC
• Original Debate – Strong opinions! – CISC proponents---easy for compiler, fewer code bytes – RISC proponents---better for optimizing compilers, can
make run fast with simple chip design • Current Status
– For desktop processors, choice of ISA not a technical issue • With enough hardware, can make anything run fast • Code compatibility more important
– For embedded processors, RISC still makes sense • Smaller, cheaper, less power • For how much longer?
10
Comparison with MIPS (remember Digital Design?)
• MIPS is RISC: Reduced Instruction Set – Motivation: simpler is faster
• Fewer gates ⇒ higher frequency • Fewer gates ⇒ more transistors left for cache
– Seemed like a really good idea • x86 is CISC: Complex Instruction Set
– More complex instructions, addressing modes • Intel turned out to be way too good at manufacturing • Difference in gate count became too small to make a
difference • x86 inside is mostly RISC anyway, decode logic is small
– ⇒ Argument is mostly irrelevant these days
11
There are many architectures…
• You’ve already seen MIPS 2000 → MIPS 3000 → … – Workstations, minicomputers, now mostly embedded networking
• IBM S/360 → S/370 → … → zSeries – First to separate architecture from (many) implementations
• ARM (several variants) – Very common in embedded systems, basis for Advanced OS course at ETHZ
• IBM POWER → PowerPC (→ Cell, sort of) – Basis for all 3 last-gen games console systems
• DEC Alpha – Personal favorite; killed by Compaq, team left for Intel to work on…
• Intel Itanium – First 64-bit Intel product; very fast (esp. FP), hot, and expensive – Mostly overtaken by 64-bit x86 designs
• etc.
12
Summary
• Architecture vs. Microarchitecture
• Instruction set architectures
• RISC vs. CISC
• x86: comparison with MIPS
13
7.2: A bit of x86 history
Computer Architecture and Systems Programming
252-0061-00, Herbstsemester 2013
Timothy Roscoe
14
Intel x86 Processors
• The x86 Architecture dominates the computer market
• Evolutionary design – Backwards compatible up until 8086, introduced in 1978 – Added more features as time goes on
• Complex instruction set computer (CISC)
– Many different instructions with many different formats • But, only small subset encountered with Linux programs
– Hard to match performance of Reduced Instruction Set Computers (RISC)
– But, Intel has done just that!
15
Intel x86 Evolution: Milestones
Name Date Transistors MHz • 8086 1978 29K 5-10
– First 16-bit processor. Basis for IBM PC & DOS – 1MB address space
• 80386 1985 275K 16-33 – First 32 bit processor , referred to as IA32 – Added “flat addressing” – Capable of running Unix – 32-bit Linux/gcc uses no instructions introduced in later models
• Pentium 4F 2005 230M 2800-3800 – First 64-bit [x86] processor – Meanwhile, Pentium 4s (Netburst arch.) phased out in favor of
“Core” line
16
Intel x86 Processors: Overview
X86-64 / EM64t
X86-32/IA32
X86-16 8086 286
386 486 Pentium Pentium MMX
Pentium III
Pentium 4
Pentium 4E
Pentium 4F Core 2 Duo Core i7
IA: often redefined as latest Intel architecture
time
Architectures Processors
MMX
SSE
SSE2
SSE3
SSE4
17
Intel x86 Processors, contd.
• Machine Evolution 486 1989 1.9M Pentium 1993 3.1M Pentium/MMX ‘97 74.5M PentiumPro 1995 6.5M Pentium III 1999 8.2M Pentium 4 2001 42M Core 2 Duo 2006 291M
• Added Features – Instructions to support multimedia operations
• Parallel operations on 1, 2, and 4-byte data, both integer & FP – Instructions to enable more efficient conditional operations
18
x86 Clones: Advanced Micro Devices (AMD)
• Historically – AMD has followed just behind Intel – A little bit slower, a lot cheaper
• Then – Recruited top circuit designers from Digital Equipment
Corp. and other downward trending companies – Built Opteron: tough competitor to Pentium 4 – Developed x86-64, their own extension to 64 bits
• Recently – Intel much quicker with dual core design – Intel currently far ahead in performance – em64t backwards compatible to x86-64
19
Intel’s 64-Bit (partially true…)
• Intel Attempted Radical Shift from IA32 to IA64 – Totally different architecture (Itanium) – Executes IA32 code only as legacy – Performance disappointing
• AMD Stepped in with Evolutionary Solution – x86-64 (now called “AMD64”)
• Intel Felt Obligated to Focus on IA64 – Hard to admit mistake or that AMD is better
• 2004: Intel Announces EM64T extension to IA32 – Extended Memory 64-bit Technology – Almost identical to x86-64!
20
Intel Nehalem-EX
• Current leader (for the next few weeks) – 2.3 billion transistors/die – 8 or 10 cores per die – 2 threads per core – Up to 8 packages
(= 128 contexts!) – 4 memory channels per package – Virtualization support – etc.
• Good illustration of why it is hard to teach state-of-the-art processor design!
21
Intel Single-Chip Cloud Computer - 2010
• Experimental processor (only a few 100 made) – Designed for research – Working version in our Lab
• 48 old-style Pentium cores • Very fast interconnection
network – Hardware support for
messaging between cores – Variable speed of network
• Non-cache coherent – Sharing memory between
cores won’t work with a conventional OS!
22
A quick note on syntax
There are two common ways to write x86 Assembler:
• AT&T syntax – What we'll use in this course, common on Unix
• Intel syntax – Generally used for Windows machines
23
7.3: Basics of machine code
Computer Architecture and Systems Programming
252-0061-00, Herbstsemester 2013
Timothy Roscoe
24
CPU
Assembly programmer’s view
Programmer-Visible State – PC: Program counter
• Address of next instruction • Called “EIP” (IA32) or “RIP” (x86-64)
– Register file • Heavily used program data
– Condition codes • Store status information about most
recent arithmetic operation • Used for conditional branching
Memory • Byte addressable array • Code, user data, (some) OS data • Includes stack used to support
procedures
PC Registers
Memory
Object Code Program Data OS Data
Addresses
Data
Instructions
Stack
Condition Codes
25
Compiling into assembly
int sum(int x, int y) { int t = x+y; return t; }
Generated ia32 assembly sum: pushl %ebp movl %esp,%ebp movl 12(%ebp),%eax addl 8(%ebp),%eax movl %ebp,%esp popl %ebp ret
Obtain with command
gcc -O -S code.c
Produces file code.s
Some compilers use single instruction “leave”
C code
26
Assembly data types
• “Integer” data of 1, 2, or 4 bytes – Data values – Addresses (untyped pointers)
• Floating point data of 4, 8, or 10 bytes
• No aggregate types such as arrays or
structures – Just contiguously allocated bytes in memory
27
Assembly code operations
• Perform arithmetic function on register or memory data
• Transfer data between memory and register – Load data from memory into register – Store register data into memory
• Transfer control
– Unconditional jumps to/from procedures – Conditional branches
28
Code for sum 0x401040 <sum>: 0x55 0x89 0xe5 0x8b 0x45 0x0c 0x03 0x45 0x08 0x89 0xec 0x5d 0xc3
Object code • Assembler
– Translates .s into .o – Binary encoding of each instruction – Nearly-complete image of
executable code – Missing linkages between code in
different files • Linker
– Resolves references between files – Combines with static run-time
libraries • E.g., code for malloc, printf
– Some libraries are dynamically linked
• Linking occurs when program begins execution
• Total of 13 bytes
• Each instruction 1, 2, or 3 bytes
• Starts at address 0x401040
29
Machine instruction example
• C Code – Add two signed integers
• Assembly – Add 2 4-byte integers
• “Long” words in GCC parlance • Same instruction whether
signed or unsigned – Operands:
• x: Register %eax • y: Memory M[%ebp+8] • t: Register %eax
– Return function value in %eax • Object Code
– 3-byte instruction – Stored at address 0x401046
int t = x+y;
addl 8(%ebp),%eax
0x401046: 03 45 08
Similar to expression:
x += y
More precisely:
int eax;
int *ebp;
eax += ebp[2]
30
Disassembled 00401040 <_sum>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 8b 45 0c mov 0xc(%ebp),%eax 6: 03 45 08 add 0x8(%ebp),%eax 9: 89 ec mov %ebp,%esp b: 5d pop %ebp c: c3 ret d: 8d 76 00 lea 0x0(%esi),%esi
Disassembling object code
• Disassembler – objdump -d p – Useful tool for examining object code – Analyzes bit pattern of series of instructions – Produces approximate rendition of assembly code – Can be run on either a.out (complete executable) or .o file
31
Disassembled 0x401040 <sum>: push %ebp
0x401041 <sum+1>: mov %esp,%ebp 0x401043 <sum+3>: mov 0xc(%ebp),%eax 0x401046 <sum+6>: add 0x8(%ebp),%eax 0x401049 <sum+9>: mov %ebp,%esp 0x40104b <sum+11>: pop %ebp 0x40104c <sum+12>: ret 0x40104d <sum+13>: lea 0x0(%esi),%esi
Alternate disassembly
Within gdb Debugger – gdb p – disassemble sum
• Disassemble procedure – x/13b sum
• Examine the 13 bytes starting at sum
Object 0x401040: 0x55 0x89 0xe5 0x8b 0x45 0x0c 0x03 0x45 0x08 0x89 0xec 0x5d 0xc3
32
What can be disassembled?
• Anything that can be interpreted as executable code • Disassembler examines bytes and reconstructs assembly source
% objdump -d WINWORD.EXE WINWORD.EXE: file format pei-i386 No symbols in "WINWORD.EXE". Disassembly of section .text: 30001000 <.text>: 30001000: 55 push %ebp 30001001: 8b ec mov %esp,%ebp 30001003: 6a ff push $0xffffffff 30001005: 68 90 10 00 30 push $0x30001090 3000100a: 68 91 dc 4c 30 push $0x304cdc91
33
Summary
• Compiling into assembly
• Data types in assembly
• Assembly code operations
• Object code, and disassembling it
34
7.4: 32-bit x86 architecture
Computer Architecture and Systems Programming
252-0061-00, Herbstsemester 2013
Timothy Roscoe
35
Integer registers (ia32) %eax
%ecx
%edx
%ebx
%esi
%edi
%esp
%ebp
%ax
%cx
%dx
%bx
%si
%di
%sp
%bp
%ah
%ch
%dh
%bh
%al
%cl
%dl
%bl
16-bit virtual registers (backwards compatibility)
gene
ral p
urpo
se
accumulate
counter
data
base
source index
destination index
stack pointer
base pointer
Origin (mostly obsolete)
36
Moving data: ia32
• movx Source, Dest – x in {b, w, l}
– movl Source, Dest: Move 4-byte “long word”
– movw Source, Dest: Move 2-byte “word”
– movb Source, Dest: Move 1-byte “byte”
• Lots of these in typical code
%eax
%ecx
%edx
%ebx
%esi
%edi
%esp
%ebp
37
Moving data: ia32
movl Source, Dest:
• Operand Types – Immediate: Constant integer data
• Example: $0x400, $-533 • Like C constant, but prefixed with ‘$’ • Encoded with 1, 2, or 4 bytes
– Register: One of 8 integer registers • Example: %eax, %edx • But %esp and %ebp reserved for special use • Others have special uses for particular instructions
– Memory: 4 consecutive bytes of memory at address given by register
• Simplest example: (%eax) • Various other “address modes”
%eax
%ecx
%edx
%ebx
%esi
%edi
%esp
%ebp
38
movl operand combinations
Cannot do memory-memory transfer with a single instruction
movl
Imm
Reg
Mem
Reg
Mem
Reg
Mem
Reg
Source Dest C Analog
movl $0x4,%eax temp = 0x4;
movl $-147,(%eax) *p = -147;
movl %eax,%edx temp2 = temp1;
movl %eax,(%edx) *p = temp;
movl (%eax),%edx temp = *p;
Src,Dest
39
Simple memory addressing modes
• Normal (R) Mem[Reg[R]] – Register R specifies memory address
movl (%ecx),%eax
• Displacement D(R) Mem[Reg[R]+D] – Register R specifies start of memory region
– Constant displacement D specifies offset movl 8(%ebp),%edx
40
Using simple addressing modes
void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0; }
swap: pushl %ebp movl %esp,%ebp pushl %ebx movl 12(%ebp),%ecx movl 8(%ebp),%edx movl (%ecx),%eax movl (%edx),%ebx movl %eax,(%edx) movl %ebx,(%ecx) movl -4(%ebp),%ebx movl %ebp,%esp popl %ebp ret
Body
Set Up
Finish
41
Using simple addressing modes
void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0; }
swap: pushl %ebp movl %esp,%ebp pushl %ebx movl 12(%ebp),%ecx movl 8(%ebp),%edx movl (%ecx),%eax movl (%edx),%ebx movl %eax,(%edx) movl %ebx,(%ecx) movl -4(%ebp),%ebx movl %ebp,%esp popl %ebp ret
Body
Set Up
Finish
42
Understanding swap
void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0; }
movl 12(%ebp),%ecx # ecx = yp movl 8(%ebp),%edx # edx = xp movl (%ecx),%eax # eax = *yp (t1) movl (%edx),%ebx # ebx = *xp (t0) movl %eax,(%edx) # *xp = eax movl %ebx,(%ecx) # *yp = ebx
Stack (in memory)
Register Value
%ecx yp %edx xp %eax t1 %ebx t0
• • •
yp
xp
Rtn adr
Old %ebp %ebp 0
4
8
12
Offset
Old %ebx -4
43
Understanding swap
%epb → 0 -4
4 xp 8 yp 12
Offset
Address
0x124 123 0x120 456 0x11c 0x118 0x114 0x110 0x120 0x10c 0x124 0x108 Rtn adr 0x104 0x100
%eax
0x124 %edx
0x120 %ecx
%ebx
%esi
%edi
%esp
0x104 %ebp
Regi
ster
file
Mem
ory
movl 12(%ebp),%ecx # ecx = yp movl 8(%ebp),%edx # edx = xp movl (%ecx),%eax # eax = *yp (t1) movl (%edx),%ebx # ebx = *xp (t0) movl %eax,(%edx) # *xp = eax movl %ebx,(%ecx) # *yp = ebx
44
Understanding swap
%epb → 0 -4
4 xp 8 yp 12
Offset
Address
0x124 123 0x120 456 0x11c 0x118 0x114 0x110 0x120 0x10c 0x124 0x108 Rtn adr 0x104 0x100
456 %eax
0x124 %edx
0x120 %ecx
%ebx
%esi
%edi
%esp
0x104 %ebp
Regi
ster
file
Mem
ory
movl 12(%ebp),%ecx # ecx = yp movl 8(%ebp),%edx # edx = xp movl (%ecx),%eax # eax = *yp (t1) movl (%edx),%ebx # ebx = *xp (t0) movl %eax,(%edx) # *xp = eax movl %ebx,(%ecx) # *yp = ebx
45
Understanding swap
%epb → 0 -4
4 xp 8 yp 12
Offset
Address
0x124 123 0x120 456 0x11c 0x118 0x114 0x110 0x120 0x10c 0x124 0x108 Rtn adr 0x104 0x100
456 %eax
0x124 %edx
0x120 %ecx
123 %ebx
%esi
%edi
%esp
0x104 %ebp
Regi
ster
file
Mem
ory
movl 12(%ebp),%ecx # ecx = yp movl 8(%ebp),%edx # edx = xp movl (%ecx),%eax # eax = *yp (t1) movl (%edx),%ebx # ebx = *xp (t0) movl %eax,(%edx) # *xp = eax movl %ebx,(%ecx) # *yp = ebx
46
Complete memory addressing modes
• Most General Form:
– D: Constant “displacement” 1, 2, or 4 bytes – Rb: Base register: Any of 8 integer registers – Ri: Index register: Any, except for %esp
• Unlikely you’d use %ebp, either – S: Scale: 1, 2, 4, or 8 (why these numbers?)
• Special Cases (Rb,Ri) Mem[Reg[Rb]+Reg[Ri]] D(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]+D] (Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]]
D(Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]+ D]
(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]] D(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]+D] (Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]]
47
Address computation examples
%edx
%ecx
0xf000
0x100
Expression Address Computation Address
0x8(%edx) 0xf000 + 0x8 0xf008
(%edx,%ecx) 0xf000 + 0x100 0xf100
(%edx,%ecx,4) 0xf000 + 4*0x100 0xf400
0x80(,%edx,2) 2*0xf000 + 0x80 0x1e080
48
Address computation instruction
• leal Src,Dest – Src is address mode expression
– Set Dest to address denoted by expression
• Uses – Computing addresses without a memory reference
• E.g., translation of p = &x[i];
– Computing arithmetic expressions of the form x + k*y • k = 1, 2, 4, or 8
49
Summary
• 32-bit x86 registers
• mov instruction: loads and stores
• memory addressing modes – Example: swap()
• leal: address computation
50
7.5: ia32 integer arithmetic
Computer Architecture and Systems Programming
252-0061-00, Herbstsemester 2013
Timothy Roscoe
51
Some arithmetic operations
• Two operand instructions: Format Computation addl Src,Dest Dest ← Dest + Src subl Src,Dest Dest ← Dest - Src imull Src,Dest Dest ← Dest * Src sall Src,Dest Dest ← Dest << Src Also called shll sarl Src,Dest Dest ← Dest >> Src Arithmetic shrl Src,Dest Dest ← Dest >> Src Logical xorl Src,Dest Dest ← Dest ^ Src andl Src,Dest Dest ← Dest & Src orl Src,Dest Dest ← Dest | Src
• No distinction between signed and unsigned int (why?)
52
Some arithmetic operations
• One operand instructions Format Computation
incl Dest Dest ← Dest + 1
decl Dest Dest ← Dest - 1
negl Dest Dest ← -Dest
notl Dest Dest ← ~Dest
• See book for more instructions
53
Using leal for arithmetic expressions
int arith (int x, int y, int z) { int t1 = x+y; int t2 = z+t1; int t3 = x+4; int t4 = y * 48; int t5 = t3 + t4; int rval = t2 * t5; return rval; }
arith: pushl %ebp movl %esp,%ebp movl 8(%ebp),%eax movl 12(%ebp),%edx leal (%edx,%eax),%ecx leal (%edx,%edx,2),%edx sall $4,%edx addl 16(%ebp),%ecx leal 4(%edx,%eax),%eax imull %ecx,%eax movl %ebp,%esp popl %ebp ret
Body
Set Up
Finish
54
Understanding arith int arith (int x, int y, int z) { int t1 = x+y; int t2 = z+t1; int t3 = x+4; int t4 = y * 48; int t5 = t3 + t4; int rval = t2 * t5; return rval; }
movl 8(%ebp),%eax # eax = x movl 12(%ebp),%edx # edx = y leal (%edx,%eax),%ecx # ecx = x+y (t1) leal (%edx,%edx,2),%edx # edx = 3*y sall $4,%edx # edx = 48*y (t4) addl 16(%ebp),%ecx # ecx = z+t1 (t2) leal 4(%edx,%eax),%eax # eax = 4+t4+x (t5) imull %ecx,%eax # eax = t5*t2 (rval)
y
x
Rtn adr
Old %ebp %ebp 0
4
8
12
Offset Stack
• • •
z 16
• • •
z
y
x
Rtn adr
Old %ebp
55
Another example
int logical(int x, int y) { int t1 = x^y; int t2 = t1 >> 17; int mask = (1<<13) - 7; int rval = t2 & mask; return rval; }
logical: pushl %ebp movl %esp,%ebp movl 8(%ebp),%eax xorl 12(%ebp),%eax sarl $17,%eax andl $8185,%eax movl %ebp,%esp popl %ebp ret
Body
Setup
Finish
movl 8(%ebp),%eax # eax = x xorl 12(%ebp),%eax # eax = x^y (t1) sarl $17,%eax # eax = t1>>17 (t2) andl $8185,%eax # eax = t2 & 8185
213 = 8192, 213 – 7 = 8185
56
7.6: 64-bit x86 architecture
Computer Architecture and Systems Programming
252-0061-00, Herbstsemester 2013
Timothy Roscoe
57
Data representations: ia32 and x86-64
C data type Typical 32-bit ia32 Intel x86-64
char 1 1 1
short 2 2 2
int 4 4 4
long 4 4 8
long long 8 8 8
float 4 4 4
double 8 8 8
long double 8 10/12 10/16
char * (or any other pointer)
4 4 8
Sizes of C objects (in bytes)
58
%rax
%rbx
%rcx
%rdx
%rsi
%rdi
%rsp
%rbp
x86-64 integer registers
– Extend existing registers. Add 8 new ones. – Make %ebp/%rbp general purpose
%eax
%ebx
%ecx
%edx
%esi
%edi
%esp
%ebp
%r8
%r9
%r10
%r11
%r12
%r13
%r14
%r15
%r8d
%r9d
%r10d
%r11d
%r12d
%r13d
%r14d
%r15d
59
Instructions
• Long word l (4 Bytes) ↔ Quad word q (8 Bytes)
• New instructions: – movl → movq – addl → addq – sall → salq – etc.
• 32-bit instructions that generate 32-bit results
– Set higher order bits of destination register to 0 – Example: addl
60
Swap in 32-bit mode void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0; }
swap: pushl %ebp movl %esp,%ebp pushl %ebx movl 12(%ebp),%ecx movl 8(%ebp),%edx movl (%ecx),%eax movl (%edx),%ebx movl %eax,(%edx) movl %ebx,(%ecx) movl -4(%ebp),%ebx movl %ebp,%esp popl %ebp ret
Body
Setup
Finish
61
Swap in 64-bit Mode
• Operands passed in registers (why useful?) – First (xp) in %rdi, second (yp) in %rsi – 64-bit pointers
• No stack operations required • 32-bit data
– Data held in registers %eax and %edx – movl operation
void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0; }
swap: movl (%rdi), %edx movl (%rsi), %eax movl %eax, (%rdi) movl %edx, (%rsi) retq
62
Swap Long Ints in 64-bit Mode
• 64-bit data – Data held in registers %rax and %rdx – movq operation – “q” stands for quad-word
void swap_l (long int *xp, long int *yp) { long int t0 = *xp; long int t1 = *yp; *xp = t1; *yp = t0; }
swap_l: movq (%rdi), %rdx movq (%rsi), %rax movq %rax, (%rdi) movq %rdx, (%rsi) retq
63
7.7: Condition codes
Computer Architecture and Systems Programming
252-0061-00, Herbstsemester 2013
Timothy Roscoe
64
Processor State (ia32, Partial) • Information about
currently executing program – Temporary data
( %eax, … )
– Location of runtime stack ( %ebp,%esp )
– Location of current code control point ( %eip, … )
– Status of recent tests ( CF,ZF,SF,OF) %eip
General purpose registers
Current stack top
Current stack frame
Instruction pointer
CF ZF SF OF Condition codes
%eax
%ecx
%edx
%ebx
%esi
%edi
%esp
%ebp
65
Condition codes (implicit setting)
• Single bit registers CF Carry Flag (for unsigned) SF Sign Flag (for signed) ZF Zero Flag OF Overflow Flag (for signed)
• Implicitly set (think of it as side effect) by arithmetic operations
Example: addl/addq Src,Dest ↔ t = a+b – CF set if carry out from most significant bit (unsigned overflow) – ZF set if t == 0 – SF set if t < 0 (as signed) – OF set if two’s complement (signed) overflow
(a>0 && b>0 && t<0) || (a<0 && b<0 && t>=0)
• Not set by lea instruction • Full documentation link on course website
66
Condition Codes (Explicit Setting: Compare)
• Explicit Setting by Compare Instruction cmpl/cmpq Src2,Src1 cmpl b,a like computing a-b without setting destination
CF set if carry out from most significant bit (used for unsigned comparisons) ZF set if a == b SF set if (a-b) < 0 (as signed) OF set if two’s complement (signed) overflow: (a>0 && b<0 && (a-b)<0) || (a<0 && b>0 && (a-b)>0)
67
Condition Codes (Explicit Setting: Test)
• Explicit Setting by Test instruction
testl/testq Src2,Src1 testl b,a like computing a&b w/o setting destination
– Sets condition codes based on value of Src1 & Src2 – Useful to have one of the operands be a mask
ZF set when a&b == 0 SF set when a&b < 0
68
Reading Condition Codes
• SetX Instructions – Set single byte based on combinations of
condition codes
SetX Condition Description
sete ZF Equal / Zero setne ~ZF Not Equal / Not Zero sets SF Negative setns ~SF Nonnegative setg ~(SF^OF)&~ZF Greater (Signed) setge ~(SF^OF) Greater or Equal (Signed) setl (SF^OF) Less (Signed) setle (SF^OF)|ZF Less or Equal (Signed) seta ~CF&~ZF Above (unsigned) setb CF Below (unsigned)
69
Reading Condition Codes (Cont.) • setx Instructions:
Set single byte based on combination of condition codes
• One of 8 addressable byte registers – Does not alter remaining 3 bytes
– Typically use movzbl to finish job
int gt (int x, int y) { return x > y; }
movl 12(%ebp),%eax # eax = y cmpl %eax,8(%ebp) # Compare x : y setg %al # al = x > y movzbl %al,%eax # Zero rest of %eax
Body
%eax
%ecx
%edx
%ebx
%esi
%edi
%esp
%ebp
%al %ah
%cl %ch
%dl %dh
%bl %bh
70
Reading Condition Codes: x86-64
• setx Instructions: – Set single byte based on combination of condition codes
– Does not alter remaining 3 bytes
int gt (long x, long y) { return x > y; }
xorl %eax, %eax # eax = 0 cmpq %rsi, %rdi # Compare x and y setg %al # al = x > y
Body (same for both)
long lgt (long x, long y) { return x > y; }
Is %rax zero? Yes: 32-bit instructions set high order 32 bits to 0!
71
Jumping
jX Instructions: Jump to different part of code depending on condition codes
jX Condition Description jmp 1 Unconditional
je ZF Equal / Zero
jne ~ZF Not Equal / Not Zero
js SF Negative
jns ~SF Non-negative
jg ~(SF^OF)&~ZF Greater (Signed)
jge ~(SF^OF) Greater or Equal (Signed)
jl (SF^OF) Less (Signed)
jle (SF^OF)|ZF Less or Equal (Signed)
ja ~CF&~ZF Above (unsigned)
jb CF Below (unsigned)
72
Summary
• Condition codes (C, Z, S, O)
• Explicit setting of condition codes – Compare
– Test
• Reading condition codes – setX
• Jumps