computer architecture - with basic os services · terminal usage outline 1 terminal usage 2...
TRANSCRIPT
Computer ArchitectureWith basic OS services
Karst Koymans
Informatics Institute
University of Amsterdam
(version 17.33, 2017/09/15 13:02:54)
Tuesday, September 12, 2017
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 1 / 84
1 Terminal usage
2 Execution of a command line
3 Inspecting processes and executables
4 The gcc compiler
5 Processor architectures
History of Intel x86
6 Assembly language
Assembly examples
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 2 / 84
Terminal usage
Outline
1 Terminal usage
2 Execution of a command line
3 Inspecting processes and executables
4 The gcc compiler
5 Processor architecturesHistory of Intel x86
6 Assembly languageAssembly examples
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 3 / 84
Terminal usage
Terminal connected to a mainframe
Serial connection
ASCII over RS232 using a TTY (teletypewriter; teleprinter)
Teletype Model 33 ASR (1963)
Before ASCII 5-bit Baudot code (Model 32) was used
Remote connections used a modem (modulator/demodulator)
Physical significance of CRLF
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 4 / 84
Terminal: Teletype Model 33 ASR
Source: https://en.wikipedia.org/wiki/File:Teletype-IMG_7287.jpg
Terminal usage
Terminals under X
Modern “terminals” are virtualAn xterm process runs locally and communicates
with the X server for (screen) output and (keyboard) inputusually via TCP/IP (or pipe or Unix domain socket)
and also with some program (for instance a shell)using the terminal via a “virtual serial line”
A virtual terminal or pseudoterminal has two sidesMaster: /dev/ptmx
Slaves: /dev/pts/0, /dev/pts/1, . . .
The master loops back to the slave and vice versaThe client program just sees the “serial line”
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 6 / 84
Terminal usage
Terminals over the network
A remote connection of a virtual terminal introduces
a second virtual terminal on the remote computer
For instance ssh (secure shell) connects to a daemon program (sshd)
which starts another program (often again a shell)
with its own (remote) pseudoterminal
This makes that two (virtual) terminals are involved
With all related issues (TERM, locales, . . . )
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 7 / 84
One real and one virtual terminal in an ssh session
Source: https://cse.sc.edu/˜matthews/Courses/510/
Note: the first terminal could also be made virtual/pseudo
Execution of a command line
Outline
1 Terminal usage
2 Execution of a command line
3 Inspecting processes and executables
4 The gcc compiler
5 Processor architecturesHistory of Intel x86
6 Assembly languageAssembly examples
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 9 / 84
Execution of a command line
What happens when a user types a command?
We work in the Linux environment
Other OSs are based on similar principles
We suppose we work in one of the standard “shells”
Most people use bash
I like zsh
As an example command take
ls $HOME/*.txt
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 10 / 84
Execution of a command line
Interpretation of a command by the shell
The shell (sh, bash, dash, ksh, csh, tcsh, zsh)
reads a line from its STDIN, for instance: “ls $HOME/*.txt”
The shell processes this inputvariable expansion (local and from environment)
$HOME −→ /home/stephen
pathname expansion via shell matching
*.txt −→ a.txt b.txt
. . . (more expansions)
The final result is a word list: arg0 arg1 . . . argn
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 11 / 84
Execution of a command line
Search for a command in the file system
Search for arg0 (ls) using the $PATH environment variable
zsh uses the stat system call, before using execve
sh uses the execve system call right away
So sh “searches” just by attempting to execute
How do we know all this?
Reading source code and/or
Executing the strace command
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 12 / 84
Execution of a command line
The strace command
strace
Uses the ptrace (process trace) system callstrace -e trace=process <<command>>
Trace system calls related to processes
strace -e trace=file <<command>>
Trace system calls using file names
strace -e trace=network <<command>>
Trace network related system calls
strace -e trace=memory <<command>>
Trace memory related system calls
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 13 / 84
Execution of a command line
Search for a command in the file system
The stat system call
int stat(const char *pathname, struct stat *buf);
Struct stat contains fields like
st_dev st_ino
st_mode st_nlink
st_uid st_gid
st_size st_atim
st_mtim st_ctim
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 14 / 84
Execution of a command line
Search for a command in the file system
Before executing execve, the shell forks
Otherwise it won’t survive the execve
pid_t fork(void); /* fresh (copy of) memory */
fork() creates a new process
The fork system call is “expensive” because of lots of copying
Therefore new system calls were introduced: vfork, clone
vfork() is an ugly hack (legacy), which needs careful application
clone() is a more recent improvement with fine-grained control
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 15 / 84
Execution of a command line
The vfork and clone system calls
long clone(unsigned long flags, ...);
clone creates a new PID, but in fact this is a thread
TGID (thread group id) is the “parent thread”
Also getpid reports the TGID, not the actual PID
Again a great Linux hack to fool us all (legacy)
Flags fine-tune resource sharing between parent and child
pid_t vfork(void); /* shared memory with parent */
This is a special case of clone() with as flags
CLONE_VM | CLONE_VFORK | SIGCHLD
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 16 / 84
Execution of a command line
The execve system call
Now finally the child process executes the execve system call
int execve(const char *filename,
char *const argv[],
char *const envp[]);
filename is (the usually fully qualified) pathname to the executable
argv is the argument vector (arg0, arg1, ... argn)
envp is the environment (array of (key,value)-pairs)
The parent uses the wait system call waiting for child to finish
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 17 / 84
Execution of a command line
Executable file formats
Many different formats
Some examples
a.out (“assembler output”, oldest Unix format)
COFF (“Common Object File Format”, Unix System V)
ELF (“Executable and Linkable Format”, SVR4)
Mach-O (“Mach Object”, NeXTSTEP, macOS)PE/PE32+ (“Portable Executable”, Windows)
which is a modified kind of COFF
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 18 / 84
Execution of a command line
Executables
Executable files come in two forms
A binary executable, on Linux
ELF format (Executable and Linkable Format)
The older a.out format is deprecated
An interpreter script/file
The first line is of the form “#! interpreter [optionalargument]”
interpreter [optionalargument] filename arg1 ... argn
is what finally gets executed
interpreter must1 be a binary (executable)
1Since Linux 2.6.28 it seems four level deep recursion is allowedKarst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 19 / 84
Execution of a command line
ELF (Executable and Linkable Format)
Generated by the compiler (gcc, cc, ...)
gcc/cc1 (with assembler (as) and linker (ld)) does all the hard workgcc generates machine code with supporting information
entry point of executable (_start)the dynamic linking loader to use (.interp section)
readelf -p .interp
The (“ld.so”) dynamic linker/loader is itself statically linkedShared objects are loaded in a fixed place in physical memory
but in a different virtual memory location per process
using position independent code (PIC)
managed by the dynamic linker/loader during runtime (lazily)Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 20 / 84
Inspecting processes and executables
Outline
1 Terminal usage
2 Execution of a command line
3 Inspecting processes and executables
4 The gcc compiler
5 Processor architecturesHistory of Intel x86
6 Assembly languageAssembly examples
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 21 / 84
Inspecting processes and executables
The ps command
ps has a weird mix of options (legacy)
- - (GNU long format)
- (Traditional Unix format)
(BSD format; no hyphen)
which can even be mixed...
Examples
ps --format comm,pid,ppid,args,psr
ps -eFww
ps xf
See also the pstree command
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 22 / 84
Inspecting processes and executables
The readelf command
The readelf command shows the structure of ELF binaries
readelf -We <<binary>>
Show ELF header, program (segment) headers and section headers
readelf -Wx <<section>> <<binary>>
Hex dump «section»
readelf -Wp <<section>> <<binary>>
Print strings from «section»
Sections are used for the linking phase (relocation)
Segments are used for loading during runtime (program execution)
Dynamic linker uses .got and .plt sections
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 23 / 84
Inspecting processes and executables
The objdump command
The objdump command has capabilities like readelf,
but can also disassemble!
objdump -wf <<binary>>
Show part of ELF header of «binary»
objdump -wdj <<section>> <<binary>>
Only show «section» disassembly from «binary»
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 24 / 84
Inspecting processes and executables
The nm and strip commands
The nm command lists symbols defined in an object file
A symbol is a symbolic name for the memory address
of some function or a piece of data
The strip command removes symbols from object files
Once linked, many symbols are not needed any more in the final binary
Except relocation information for dynamically linked shared libraries
The main application for strip is reducing size and hiding content
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 25 / 84
Inspecting processes and executables
The ldd and “ld.so” commands
The ldd command shows shared object dependencies
ldd calls the “ld.so” dynamic linker/loader, which is actually
/lib64/ld-linux-x86-64.so.2 −→ (symlink)
/lib/x86_64-linux-gnu/ld-2.24.so (or similar)
with the LD_TRACE_LOADED_OBJECTS environment variable set
Be careful with using ldd on untrusted binaries
Some parts of the binary might get executed
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 26 / 84
Inspecting processes and executables
The lsof command
The lsof command is a Swiss Army knife showing open files
It also can show the shared objects loaded into the address space
lsof -p $$
lsof has many more interesting options, for instance
lsof -i
listing Internet address use
lsof <<file>>
listing all processes using «file»
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 27 / 84
Inspecting processes and executables
The /proc process pseudo-filesystem
The /proc file system provides an interface to kernel data
cat /proc/$$/environ | tr ’\0’ ’\n’
shows environment of the current shell ($$)
where $$ can be replaced by any «pid»
cat /proc/$$/maps
shows currently mapped memory regions
cat /proc/$$/status
shows process status
cat /proc/mounts
shows currently mounted filesystems
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 28 / 84
The gcc compiler
Outline
1 Terminal usage
2 Execution of a command line
3 Inspecting processes and executables
4 The gcc compiler
5 Processor architecturesHistory of Intel x86
6 Assembly languageAssembly examples
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 29 / 84
The gcc compiler
An example program
Example C program that prints “Hello OS3!”
#include <stdio.h>
int main(int argc, char *argv[])
{
printf("Hello OS3!\n");
return 2017;
/* What is the actual exit status? Why? */
}
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 30 / 84
The gcc compiler
Compiling the example program
gcc hello.c
generates an executable named “a.out” but in ELF format
gcc -o hello hello.c
generates an executable named “hello” in ELF format
gcc -v -o hello hello.c
shows the different stages of the compilation
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 31 / 84
The gcc compiler
Preprocessing
Option “-E” only preprocesses your source code
Header file inclusion and macro processing
cpp is a separate command to do preprocessing
The gcc driver calls cc1 to do the preprocessing
cc1 uses an internal cpp routine
gcc -E -o hello.i hello.c
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 32 / 84
The gcc compiler
Compilation
Option “-S” compiles your source code into assembly
Output is text file with assembly instructions
No binary output yet
gcc -S -o hello.s hello.i
The gcc driver now calls cc1 to do the compilation proper
To eliminate the exception handling frames
add the “-fno-asynchronous-unwind-tables” option
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 33 / 84
The gcc compiler
Assembly
Option “-c” generates object (binary) code from assembly
gcc -c -o hello.o hello.s
Skips the linking phase
Generates a relocatable object fileThe gcc driver calls as to do the compilation
as --64 -o hello.o hello.s
This example is for a 64-bit system
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 34 / 84
The gcc compiler
Linking
The standard invocation of gcc finally links the executable
gcc -o hello hello.o
Implemented by collect2
which itself calls the real linker ld
ld may combine several object files
common relocatable files (*.o)static libraries, archives (*.a)
archives are created by the ar command
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 35 / 84
The gcc compiler
Useful gcc options
To save all intermediate files invoke gcc as follows
gcc -save-temps -o hello hello.c
To show gcc driver steps without executing
gcc -### -o hello hello.c
To show gcc driver step details
gcc -dumpspecs
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 36 / 84
The gcc compiler
Dynamic libraries
Result of the linking phasea static2 executable
which can run by itself
a dynamically linked executable
which needs Shared Objects (.so) to run
Shared objects
use Position Independent Code (PIC)
are loaded and linked at runtime by the dynamic linker/loader
2Almost nothing is statically linked nowadays (even libc needs dynamic linking)Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 37 / 84
The gcc compiler
C runtime
C startup files added by the gcc driver and given to the linker
crt1.o, crti.o, crtbegin.o, crtend.o, crtn.o
Entry point is the routine _start (in crt1.o)
After initialization main is calledint main(int argc, char *argv[]) (official standard)
The environment is given by extern char **environ;
int main(int argc, char *argv[], char *envp[]) (Linux)
After the environment there is still more information
for the dynamic linker (also see getauxval(3))
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 38 / 84
The gcc compiler
C runtime special kernel support
On startup the kernel supplies a special shared object
called vDSO (“virtual Dynamic Shared Object”)
which is used for performance reasons with very frequent,
but easy and uniform system calls like gettimeofday(2)
ldd shows the vDSO as
linux-vdso.so.1 (on x86 64, 64-bit)
linux-gate.so.1 (on i386, 32-bit)
The dynamic linker finds the vDSO via the auxiliary vector
which the kernel puts on the stack before the environment
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 39 / 84
Processor architectures
Outline
1 Terminal usage
2 Execution of a command line
3 Inspecting processes and executables
4 The gcc compiler
5 Processor architecturesHistory of Intel x86
6 Assembly languageAssembly examples
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 40 / 84
Processor architectures
Central processors
Many different (micro)processors (CPUs) and architecturesCISC
x86 (Intel)x86-64 (AMD/Intel)z/Architecture (IBM)
RISCAlpha (DEC)ARM/ARMv8 (ARM)MIPS (MIPS)PowerPC (Apple/IBM/Motorola)SPARC (Sun)
OtherItanium/IA-64 (HP/Intel)
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 41 / 84
Processor architectures
Basic CPU operation
Operation of a CPU, repeatFetch instructionDecode instructionExecute instruction
Structure of a CPURegisters (local memory)ALU (Arithmetic Logic Unit)CU (Control Unit)
HardwiredMicroprogrammed
MMU (Memory Management Unit; external memory)Caches
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 42 / 84
Processor architectures History of Intel x86
Outline
1 Terminal usage
2 Execution of a command line
3 Inspecting processes and executables
4 The gcc compiler
5 Processor architecturesHistory of Intel x86
6 Assembly languageAssembly examples
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 43 / 84
Processor architectures History of Intel x86
Intel 8008
General registers (8-bit)
A(ccumulator), B, C, D, E
Indirect addressing registers (8-bit)
H(igh), L(ow) used as a 14-bit pair H∥L (concatenation)
Program counter (14-bit)
PC (with push-down stack of size 7 14-bit registers)
Status register (8-bit) with flags
C(arry), P(arity), Z(ero), S(ign)
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 44 / 84
Processor architectures History of Intel x86
Intel 8008 registers
07
A
B
C
D
E07813
H L
0267
S Z P C013
PC
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 45 / 84
Processor architectures History of Intel x86
Intel 8080
General registers (8-bit)
A(ccumulator), B, C, D, E
Concatenations B∥C and D∥E as 16-bit registers
Indirect addressing registers (8-bit)
H(igh), L(ow) used as a 16-bit pair H∥L
Program counter and stack pointer (16-bit)
PC, SP
Status register (8-bit) with flags
C(arry), P(arity), Z(ero), S(ign), A(djust)3
3Adjust is Auxiliary Carry of 4 lowest bits for BCDKarst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 46 / 84
Processor architectures History of Intel x86
Intel 8080 registers07815
A}
Accumulator
B C
D E
H L
Register pairs
B∥C, D∥E, H∥L
015
PC}
Program Counter
015
SP}
Stack Pointer
02467
S Z AC P C}
Status Flags
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 47 / 84
Processor architectures History of Intel x86
Intel 8086 (1)
General registers (16-bit)AX(== AH∥AL)BX(== BH∥BL)CX(== CH∥CL)DX(== DH∥DL)
Index registers (16-bit)SI (Source Index)DI (Destination Index)BP (Base Pointer)SP (Stack Pointer)
Program counter (16-bit)IP (Instruction Pointer)
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 48 / 84
Processor architectures History of Intel x86
Intel 8086 (2)
Status register (16-bit) with new flags
T(rap), for single stepping
I(nterrupt enable)
D(irection), for string processing
O(verflow), a kind of “signed carry”
Segment registers (16-bit)
CS (Code Segment)
DS (Data Segment)
ES (Extra Segment)
SS (Stack Segment)
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 49 / 84
Processor architectures History of Intel x86
Intel 8086 registers (1)
07815
AH AL AX
BH BL BX
CH CL CX
DH DL
General registers
DX
01516171819
0 0 0 0 SI Source Index0 0 0 0 DI Destination Index0 0 0 0 BP Base Pointer0 0 0 0 SP
Index
registers
Stack Pointer
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 50 / 84
Processor architectures History of Intel x86
Intel 8086 registers (2)
015
IP}
Instruction Pointer
02467891011
OD I T S Z AC P C}
Status Flags
0123419
CS 0 0 0 0 Code Segment
DS 0 0 0 0 Data Segment
ES 0 0 0 0 Extra Segment
SS 0 0 0 0
Segment
registers
Stack Segment
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 51 / 84
Processor architectures History of Intel x86
Segmented addressing calculations
Address space size is extended to (about) 220 (and not 232)
For instance a segmented stack pointer (SS:SP)
is calculated as 16 ∗ SS + SP (or SS << 4 + SP)
Giving access to an address space of 1 Mebibyte (or even more?)
A20 gate
FFFF:FFFF = 10FFEF,
accessing 1 MiB plus 65520 bytes (almost 64 KiB)
Does this wrap around (A20 line closed) or not (A20 line open)?
This is still an issue4 today (legacy!) in real mode boot code4For details, see http://wiki.osdev.org/A20_Line
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 52 / 84
Processor architectures History of Intel x86
Intel 80386 (1)
Introduces the IA-32 (or i386) ISA (Instruction Set Architecture)Virtual MemoryProtected Mode
Older 16-bit mode is called “Real Mode”
General registers (32-bit)EAX (AX in low order 16 bits)EBX (BX in low order 16 bits)ECX (CX in low order 16 bits)EDX (DX in low order 16 bits)
Index registers (32-bit)ESI (SI in low order 16 bits)EDI (DI in low order 16 bits)EBP (BP in low order 16 bits)ESP (SP in low order 16 bits)
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 53 / 84
Processor architectures History of Intel x86
Intel 80386 general registers031
EAX Extended accumulator
EBX Extended base register
ECX Extended count register
EDX Extended data register
078151631
EAX
AX
AH AL
and similar for extended B, C and D registers
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 54 / 84
Processor architectures History of Intel x86
Intel 80386 index registers0151631
ESI
SI
Extended
Source Index0151631
EDI
DI
Extended
Destination Index0151631
EBP
BP
Extended
Base Pointer0151631
ESP
SP
Extended
Stack PointerKarst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 55 / 84
Processor architectures History of Intel x86
Intel 80386 (2)
Instruction Pointer (32-bit)EIP (IP in low order 16 bits)
Segment registers (16-bit)CS (Code Segment)DS (Data Segment)ES (Extra Segment)FS (F or Extra2 Segment)GS (G or Extra3 Segment)SS (Stack Segment)
Now segments are indices into a table describingThe Segment Base address (32-bit)The Segment Limit (20-bit)
Status register EFLAGS (32-bit) with new flagsKarst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 56 / 84
Processor architectures History of Intel x86
Address size issues
Intel Pentium Pro introduces PAE (Physical Address Extension)
Circumventing the 232 byte = 4 GiB (Gibibyte) limit
Page table entries grow from 32-bit to 64-bit physical addressesPage frame numbers increase from 20 bits to 24 bits in size
Each memory page is 4 KiB5(12 bits in size)
Maximum physical memory size grows from 4 GiB to 64 GiB
Per process virtual address space stays at 32 bits (max 4 GiB)
Virtual to physical address translation uses multi-level page tables5PSE (Page Size Extension) increases this to 2 MiB,
or even 1 GiB (in 64-bit long mode)Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 57 / 84
Processor architectures History of Intel x86
AMD’s 64-bit solution
AMD introduces a 64-bit architecture (Opteron, 2003)
Now known as x86-64 (x86 64, x64, AMD64, Intel 64)
Intel had their own IA-64 architecture on their Itanium chips
IA-64 is a completely different architecture and ISAIntel adopted AMD64 ISA (IA-32e −→ EM64T −→ Intel 64)
with only minor differences
hence essentially compatible
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 58 / 84
x86-64 main registers
Source: https://www.cs.purdue.edu/homes/cs250/LectureNotes/x86-64-registers.png
Processor architectures History of Intel x86
X86-64 general purpose registers (64-bit)
RAX (EAX/AX/AL in low order 32/16/8 bits; AH in bits 15-8)
RBX (EBX/BX/BL in low order 32/16/8 bits; BH in bits 15-8)
RCX (ECX/CX/CL in low order 32/16/8 bits; CH in bits 15-8)
RDX (EDX/DX/DL in low order 32/16/8 bits; DH in bits 15-8)
R8, . . . , R15 (extra 64-bit registers)
Low order 32/16/8 bits addressable as R8D/R8W/R8B
and similarly for R9, . . . , R15
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 60 / 84
Processor architectures History of Intel x86
X86-64 index, pointer and instruction registers
Index registers (64-bit)
RSI (ESI/SI/SIL in low order 32/16/8 bits)
RDI (EDI/DI/DIL in low order 32/16/8 bits)
Pointer registers (64-bit)
RBP (EBP/BP/BPL in low order 32/16/8 bits)
RSP (ESP/SP/SPL in low order 32/16/8 bits)
Instruction register (64-bit)
RIP (EIP/IP in low order 32/16 bits)
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 61 / 84
Processor architectures History of Intel x86
X86-64 status, segment and hidden registers
Status register RFLAGS (64-bit) with new flags
Segment registers (still 16-bit) are deprecated in “long mode”
Except for FS and GS and some protection flags for the others
Hidden processor registers
GDTR (Global Descriptor Table Register), . . .
control, debug, . . . registers
All of this is just the programming model
The real hardware looks very different
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 62 / 84
Processor architectures History of Intel x86
X86-64 floating point and multimedia registers
Floating point registers (80-bit)and MMX (MultiMedia Extensions) (64-bit)
FPR0, . . . , FPR7 (MMX0, . . . , MMX7 in low order 64 bits)
SIMD (Single Instruction Multiple Data) extensions,including SSE (Streaming SIMD Extensions)
XMM0, . . . , XMM7 (128-bit)
AVX (Advanced Vector Extensions)YMM0, . . . , YMM15 (256-bit)
with XMM0, . . . , XMM15 in the low order 128 bits
ZMM0, . . . , ZMM31 (512-bit), containing YMM0-31 and XMM0-31
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 63 / 84
Processor architectures History of Intel x86
Processor operating modes
The CPU supports two main modes (with submodes)Long mode (used in 64-bit OS)
64-bit mode (flat address space)
Compatibility mode (for 16- or 32-bit protected mode programs)
Legacy mode (used in 32-bit OS)
Protected mode (virtual memory enabled)
Virtual 8086 mode (“real mode”, but still protected)
Real mode (when powered on; real legacy)
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 64 / 84
Assembly language
Outline
1 Terminal usage
2 Execution of a command line
3 Inspecting processes and executables
4 The gcc compiler
5 Processor architecturesHistory of Intel x86
6 Assembly languageAssembly examples
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 65 / 84
Assembly language
Assembly syntax for x86
There are two syntax flavours in useIntel syntax (“dst before src”)
used mainly in the Windows world (Netwide Assembler, NASM)
AT&T syntax (“src before dst”)
used mainly in Unix world (Gnu Assembler, gas/as)
gas can also work with Intel syntax,
using the -msyntax=intel option or
the .intel_syntax directive
gcc can also generate Intel syntax using -masm=intel
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 66 / 84
Assembly language
Comparison of Intel and AT&T syntax
Intel AT&T
mov rax, 15 mov $15, %rax
add rax, rbx addq %rbx, %rax
sub edi, esi subl %esi, %edi
imul r9w, r8w imulw %r8w, %r9w
or bpl, 0xf orb $0xf, %bpl
[base+index*scale+displ] displ(base,index,scale)
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 67 / 84
Assembly language
Opcodes
Opcodes are coding instructions, working on operands
Data transfer: reading, writing, copying, moving
Computational (arithmetic and logic): add, subtract, multiply,
divide, shift, compare, and, or, xor, negation
Control flow: branching, jumping, procedure call
. . .
Operands
Source: immediate, register, memory, I/O port
Destination: register, memory, I/O port
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 68 / 84
Assembly language
Operand addressing modes (1)
Immediate
Value (=14) encoded in instruction itself
Example: movq $14,%rax (Intel: mov rax,0xe)
Register (direct)
Operand is in a register, which is specified in the instruction
Example: movq %rbx,%rax (Intel: mov rax,rbx)
Memory (direct)
Operand is in memory; memory address (=14) is in the instruction
Example: movq 14,%rax (Intel: mov rax,[14])
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 69 / 84
Assembly language
Operand addressing modes (2)
Register (indirect)Register contains a pointer giving the address of the operand
Example: movq (%bx),%ax (Intel: mov rax,[rbx])
Memory (indirect)Memory contains a pointer giving the address of the operand
This is not used on x86
It needs two memory operations
IndexedAddress based on base, scale, index and displacement
Example: movq 4(%rbx,%rcx,2),%rax
(Intel: mov rax,[rbx+rcx*2+0x4])Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 70 / 84
Assembly language
I/O ports
Separate address space of 216 bytes
Modern OSs use mostly memory mapped I/O
During boot BIOS/UEFI uses I/O ports in real mode
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 71 / 84
Assembly language
Operand sizes
Byte (8 bits)
Word (16 bits)
Doubleword (32 bits)
Quadword (64 bits)
Doublequadword/Octaword (128 bits)
. . .
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 72 / 84
Assembly language
Endianness
Enough to start a digital war
Again legacy issues
Depends on how you “picture” sequences
Better to use first/last instead of high/low or most/least significant
Significance is only significant for numbers ;)
First/last refers to writing timing/direction/order
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 73 / 84
Assembly language
Procedure (subroutine) calls
call/ret (always used in pairs)
call pushes the return address (next PC)
onto the stack and jumps into the called procedure
ret pops the return address from the stack and continues from there
enter/leave (not always used in pairs)
Utility instructions to manipulate stack frames
Easy, but often slower than manipulating the stack explicitly yourself
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 74 / 84
Assembly language
Library calls
Like procedure calls, but
jumping outside of your own code
into a precompiled library
often shared code with other programs
Needs a standardized interface between caller and callee
Caller: procedure calling another one
Callee: procedure called by another one
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 75 / 84
Assembly language
The ltrace command
ltrace is used for following library callsltrace <<command>>
Traces the library calls «command» makes
Hooks into the dynamic linking system
ltrace -c <<command>>
Gives nice statistics of library use
ltrace -S <<command>>
Also includes system calls (like strace)
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 76 / 84
Assembly language
C calling conventions for x86 architecture
Part of the System V Application Binary Interface
Big differences between IA-32 (32 bit) and x86-64 (64 bit)
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 77 / 84
Assembly language
C calling conventions for IA-32
Return value in eax (or edx∥eax if the result is too big)
Parameters to the procedure pushed on the stack (right to left)
Callee must preserve ebx, esi, edi, ebp, esp
Caller must preserve eax, ecx, edx (if needed)
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 78 / 84
Assembly language
C calling conventions for x86-64
Return value in rax (or rdx∥rax if the result is too big)
First six parameters in registers rdi, rsi, rdx, rcx, r8, r9
Remaining parameters pushed on the stack (right to left)
Callee must preserve rbx, rsp, rbp, r12, r13, r14, r15
Caller must preserve rax, rdi, rsi, rdx, rcx, r8, r9, r10, r11 (if needed)
Caller also cleans up the stack
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 79 / 84
Assembly language
System calls
int 0x80Software interrupt, used for making a system call (deprecated)
sysenter/sysexitFaster way of making a system call (used on IA-32)
syscall/sysretFaster way of making a system call (used on x86-64)
Usually an application uses a library (glibc) with a wrapperThe wrapper executes the actual system call instruction (int,sysenter or syscall)
Some “system calls” (“vsyscall”, for example gettimeofday())might never leave userland, using the vDSOIA-32 wrappers may execute the __kernel_vsyscallroutine from vDSO
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 80 / 84
Assembly language
Debugging with gdb
gdb
may use debugging symbols if compilation is done with gcc -g
uses the ptrace system callgdb <<binary>> runs a «binary» under gdb control
(gdb) break main
(gdb) run
(gdb) info registers
(gdb) next
(gdb) disassemble
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 81 / 84
Assembly language Assembly examples
Outline
1 Terminal usage
2 Execution of a command line
3 Inspecting processes and executables
4 The gcc compiler
5 Processor architecturesHistory of Intel x86
6 Assembly languageAssembly examples
Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 82 / 84
Assembly language Assembly examples
IA32 exit system call
.global _start
.text
_start:
mov $1,%eax # system call 1 is exit
mov $42,%ebx # exit value
int $0x80 # execute system call
“Compile” (assemble) with
gcc -m32 -nostdlib -o exit int 80 exit int 80.s
Works also on x86-64, compiling with -m64, but is deprecatedKarst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 83 / 84
Assembly language Assembly examples
x86-64 exit system call
.global _start
.text
_start:
mov $60,%eax # exit is system call number 60
mov $42,%edi # exit value
syscall
“Compile” (assemble) with
gcc -nostdlib -o exit syscall exit syscall.s
This does not work on IA-32 (has no syscall)Karst Koymans (UvA) Computer Architecture Tuesday, September 12, 2017 84 / 84