crash n' burn: writing linux application fault handlers

Crash N' BurnOR Version 1.1

When bad things happens to good programs...

Gilad Ben-YossefChief Coffee Drinker Codefidence Ltd. [email protected] http://codefidence.com

1

What's this tutorial is about?

Segmentation fault: core dumped

2

Dealing with faults

3

What's wrong with core dumps?

Instant gratification No space left on device for 753Mb core dump No source, no (network) access but working code needed for paycheck Access to external state (e.g. FPGA) Easier access to internal state machine. Custom fault behavior Haiku error messages4

Haiku error messages?Firs t smoke , the n si len ce. Th is thousa nd d ol lar rout er di es so b eaut iful ly. Seg mentati on f aul t: core dum ped5

The Plan

We shall:

Trap signals sent by the kernel in response to faults (SIGSEGV and friends) Print back trace and custom state information (Haiku form optional) ??? Profit!

Easy to do Difficult to do right6

Signals

Signals are asynchronous notifications sent to a process by the kernel, another process or itself Process can register a signal handler function to respond to signal Process faults make the kernel generate a signal ... which the process can catch and respond to

Signals Worth CatchingSIGQUIT - Quit from keyboard SIGILL - Illegal Instruction SIGABRT - Abort signal from abort(3) SIGFPE - Floating point exception SIGSEGV - Invalid memory reference SIGBUS - Bus error (bad memory access)

Catching Signalsint sigaction(int signum, \ const struct sigaction *act, \ struct sigaction *oldact); Register a signal handler.

signum: signal number. act: pointer to new struct sigaction. oldact: pointer to buffer to be filled with current sigaction (or NULL, if not interested).

Catching Signals cont.

The sigaction structure is defined as:

struct sigaction { void (*sa_handler)(int); void (*sa_sigaction)(int, siginfo_t *, void *); sigset_t sa_mask; int sa_flags; ... } sa_hander and sa_sigaction are two forms of signal handler call backs. We'll use the SA_SIGINFO flag to choose the sa_sigaction form sa_mask holds the mask of signals which will be blocked during the callback run. We'll flip all bits.

Where:

Registering Handler Examplestruct sigaction act; memset(&act, 0, sizeof (act)); act.sa_handler = my_handler; sigfillset (&act.sa_mask); act.sa_flags = 0; return sigaction(SIGSEGV, &act, NULL);

Signal Handler

Signal handler prototype:void handler (int signal, siginfo_t * siginfo, \ void * context)

Where:

signal is the signal number siginfo is a pointer to struct siginfo_t context is a pointer to architecture specific structure holding context of interrupted program.

Signal info

struct siginfo_t holdes information about the signal delivered. Interesting fields for exceptions include:

si_errno: errno value

Not always filled on all platforms/versions It's an index to a list of specific error descriptions. See sigaction(2). For SIGILL, SIGFPE, SIGSEGV, and SIGBUS only.

si_code: Error description code

si_addr: Fault address

Signal Context

A structure that saves the hardware context which the signal interrupted

Architecture specific Undocumented Changes between release e.g. getting IP in various architectures:

x86: context->uc_mcontext.gregs[REG_EIP] PPC: context->uc_mcontext.regs->nip

Check out sys/ucontext.h for your favorite architecture

Getting a Backtrace

glibc back trace support:

#include int backtrace(void **buffer, int size); Fills the buffer with call stack address char ** backtrace_symbols(void *buffer, int size); Returns a malloc-ed array of strings of function names. Returned buffer needs to be free()-ed. void backtrace_symbols_fd(void *const *buffer, int size, int fd); Prints function names to file descriptor fd.

Symbols taken from dynamic symbol table, use -rdynamic to populate.

Nave Example

WARNING! The code you are about to see is wrong It is also very common...

What's Wrong?

Async-signal non safe functions Heap usage after malloc arena corruption Not thread safe Signal handler induced stack munging is hiding real fault location

On some architectures at least.

Async-signal Safety

Signal handler run asynchronously Can't share locks between signal handler and main program

If lock is taken and signal handler is called we have dead lock.

Can only use list of async-safe functions defined in POSIX.1-2003

See signal(2) for the list.

fprint, malloc, backtrace_symbols, fflush are not on the list

Heap Usage

The fault may have occurred due to malloc arena corruption Trying to malloc() / free() memory may lead to double fault. So don't ...

Do not call malloc / free anything Do not call functions that do

free, backtrace_symbols obviously not good

Detecting Heap Usage

Poison__malloc_hook and friends:void * kill_malloc(size_t size, const void *caller) { printf("Malloc called from %p\n", caller); abort(); } __malloc_hook = kill_malloc;

Poison the heap:char * p = sbrk(0); memset(p-1024, 42, 1024);

Dynamic linker heap usage

backtrace and friends are dynamically loaded from libgcc.so The dynamic linker calls malloc to load the new library So...

make dummy call to backtrace when installing handler, to force linker to load libgcc with a sane heap. Or statically link libgcc in.

Thread Safety

Multiple threads can fault together

Will garble our output

Use spin lock in signal handler to block concurrent faulting threads Can't spin on the lock if contending thread is of higher RT priority Use pthread_spin_trylock() and sleep with pselect() if failed.

Handler Stack MungingOriginal user mode stack 0x1234.. ... 0x1255... ... Handler returns bar(...) Handler called Munged user mode stack 0x1234... ... 0xffffe... 0x1266... signal_handler() foo(...) foo(...)

Signal handling code

Kernel

trampoline in vsyscall page

Putting It All Together

Fork a watchdog process sleeping on a pipe to handle faults

System wide daemon also possible

Collect information in signal handler and send it over the pipe to the watchdog process for analysis, printing etc. Finalize by sending backtrace_symbols_fd down the pipe Use EIP from signal context to overcome stack munging

Questions?Slides & code at: http://tuxology.netGilad Ben-Yossef Chief Coffee Drinker Codefidence Ltd. [email protected] http://codefidence.com 2008 Codefidence Ltd. Released under a CC-by-sa 2.5 License.25

crash n' burn: writing linux application fault handlers

Documents

malloc arena corruption

signal handler

int size

signal context

gilad codefidence

code

signal

int