debugging linux

Download Debugging linux

If you can't read please download the document

Upload: andrea-righi

Post on 16-Apr-2017

2.838 views

Category:

Technology


1 download

TRANSCRIPT

Tecniche di debugging nel kernel Linux

Agenda

Overview (kernel programming)

Kernel crash classification

Debugging techniques

Example(s)

Q/A

What's a kernel?

The kernel provides an abstraction layer for the applications to use the physical hardware resources

Kernel basic facilitiesProcess management

Memory management

Device management

System call interface

User space

Good for debugging (gdb)

Lots of user-space libraries available

Unpredictable latency (context switch, scheduler, syscall, ...)

Overhead

Impossibility to fully interact with interrupt routines

Impossibility to access certain memory address

More difficult to share certain features with other drivers

Reliability: user processes can be terminated upon critical system events (OOM, filesystem errors, etc.)

Kernel space

Written in C and assembly

No debugging tool (kgdb, UML, ...)

Bugs can hang the entire system

User memory is swappable, kernel memory can't be swapped out

Kernel stack size is small (8K / 4K - THREAD_SIZE_ORDER)

Floating point is forbidden

Userspace libraries are not available

Linux kernel must be portable (this is important if you consider to contribute mainstream)

Closed source kernel modules taint the kernel

Example kernel module

#include #include

/* Module constructor */static int __init hello_init(void){printk(KERN_INFO "Hello, world!\n");return 0;}

/* Module destructor */static void __exit hello_exit(void){printk(KERN_INFO "Goodbye\n");}

module_init(hello_init);module_exit(hello_exit);

MODULE_LICENSE("GPL");MODULE_AUTHOR("Andrea Righi ");MODULE_DESCRIPTION("BetterEmbedded hello world example");

Kernel problems

Kernel panic (fatal error for the system)

Kernel oops (non-fatal error)

Wrong result (fatal from user's perspective)

Kernel panic

No recovery is possibleExample: exception in an atomic context (i.e., interrupt)

Typically result in a system reboot (panic=N), or blinking LED or just hang

[ 165.552280] general protection fault: 0000 [#1] PREEMPT SMP [ 165.553055] Modules linked in: crashtest(O) [last unloaded: crashtest][ 165.553092] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G O 3.10.0-rc7+ #535[ 165.553092] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011[ 165.553092] task: ffff88003d90a2c0 ti: ffff88003d92e000 task.ti: ffff88003d92e000[ 165.553092] RIP: 0010:[] [] __kmalloc_track_caller+0xd5/0x2b0[ 165.553092] RSP: 0018:ffff88003e003988 EFLAGS: 00010206[ 165.553092] RAX: 0000000000000000 RBX: ffff88003e1d6a20 RCX: 00000000000be841[ 165.553092] RDX: 00000000000be801 RSI: 0000000000000000 RDI: 0000000000000001[ 165.553092] RBP: ffff88003e0039c8 R08: 00000000001d6a20 R09: 0000000000000000[ 165.553092] R10: 0000000000000000 R11: 0000000000000001 R12: 7878787878787878[ 165.553092] R13: 0000000000010220 R14: 0000000000000240 R15: ffff88003d801780[ 165.553092] FS: 0000000000000000(0000) GS:ffff88003e000000(0000) knlGS:0000000000000000[ 165.553092] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b[ 165.553092] CR2: 00000000081ab008 CR3: 0000000037dc8000 CR4: 00000000000006e0[ 165.553092] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000[ 165.553092] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400[ 165.553092] Stack:[ 165.553092] 00000000000be801 ffff88003d92ffd8 ffffffff8161683d ffff880034e3f300[ 165.553092] ffff88003e003a17 0000000000000020 0000000000000240 0000000000000000[ 165.553092] ffff88003e003a00 ffffffff8161433c ffff880034e3f300 0000000000000020...

...[ 165.553092] Call Trace:[ 165.553092] [ 165.553092] [] ? __alloc_skb+0x7d/0x290[ 165.553092] [] __kmalloc_reserve.isra.52+0x3c/0xa0[ 165.553092] [] __alloc_skb+0x7d/0x290[ 165.553092] [] tcp_send_ack+0x3b/0xf0[ 165.553092] [] __tcp_ack_snd_check+0x5e/0xa0[ 165.553092] [] tcp_rcv_established+0x204/0x6f0[ 165.553092] [] ? put_lock_stats.isra.26+0xe/0x40[ 165.553092] [] tcp_v4_do_rcv+0x161/0x360[ 165.553092] [] ? _raw_spin_lock_nested+0x79/0x90[ 165.553092] [] tcp_v4_rcv+0x731/0x980[ 165.553092] [] ? __lock_is_held+0x5f/0x80[ 165.553092] [] ip_local_deliver_finish+0xc8/0x2f0[ 165.553092] [] ? ip_local_deliver_finish+0x4a/0x2f0[ 165.553092] [] ip_local_deliver+0x47/0x80[ 165.553092] [] ip_rcv_finish+0x140/0x5e0[ 165.553092] [] ip_rcv+0x233/0x380[ 165.553092] [] __netif_receive_skb_core+0x6a2/0x970[ 165.553092] [] ? __netif_receive_skb_core+0x50/0x970[ 165.553092] [] __netif_receive_skb+0x21/0x70[ 165.553092] [] netif_receive_skb+0x23/0x1f0[ 165.553092] [] napi_gro_receive+0x98/0xd0[ 165.553092] [] e1000_clean_rx_irq+0x18a/0x520[ 165.553092] [] e1000_clean+0x251/0x910[ 165.553092] [] ? put_lock_stats.isra.26+0xe/0x40[ 165.553092] [] ? lock_release_holdtime.part.27+0xd4/0x160[ 165.553092] [] net_rx_action+0xd5/0x2e0[ 165.553092] [] __do_softirq+0xf7/0x420[ 165.553092] [] irq_exit+0xb5/0xc0[ 165.553092] [] do_IRQ+0x63/0xd0[ 165.553092] Code: c8 48 8b 55 c0 48 8b 81 38 e0 ff ff a8 08 0f 85 5f 01 00 00 4c 8b 23 4d 85 e4 0f 84 15 01 00 00 49 63 47 20 48 8d 4a 40 4d 8b 07 8b 1c 04 4c 89 e0 65 49 0f c7 08 0f 94 c0 84 c0 74 97 49 63 [ 165.553092] RIP [] __kmalloc_track_caller+0xd5/0x2b0[ 165.553092] RSP [ 165.553092] ---[ end trace baac76a23c6da73c ]---[ 165.553092] Kernel panic - not syncing: Fatal exception in interrupt

Kernel oops

A message is displayed in the log when a recoverable error has occurred in kernel spaceExample: access a bad address (i.e., NULL pointer dereference)

An oops does not mean the system has crashed

Current process is killed

Oops message is displayed along with a registers dump and a stack trace

[ 75.962412] BUG: unable to handle kernel NULL pointer dereference at (null)[ 75.963046] IP: [] procfs_write+0x2d6/0x320 [crashtest][ 75.963046] PGD 3a78d067 PUD 362be067 PMD 0 [ 75.963046] Oops: 0002 [#1] PREEMPT SMP [ 75.963046] Modules linked in: crashtest(O)[ 75.963046] CPU: 0 PID: 1587 Comm: bash Tainted: G O 3.10.0-rc7+ #535[ 75.963046] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011[ 75.963046] task: ffff88003a7ec580 ti: ffff8800362f6000 task.ti: ffff8800362f6000[ 75.963046] RIP: 0010:[] [] procfs_write+0x2d6/0x320 [crashtest][ 75.963046] RSP: 0018:ffff8800362f7e78 EFLAGS: 00010297[ 75.963046] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 000000000000004e[ 75.963046] RDX: 0000000000000000 RSI: ffffffffa0000469 RDI: ffff8800362f7eaa[ 75.963046] RBP: ffff8800362f7ee0 R08: 0000000000000000 R09: 0000000000000000[ 75.963046] R10: ffff88003a7ec580 R11: 0000000000000000 R12: 0000000000000003[ 75.963046] R13: 000000000000000a R14: ffff8800362f7f50 R15: 0000000000000000[ 75.963046] FS: 0000000000000000(0000) GS:ffff88003de00000(0063) knlGS:00000000f75f76c0[ 75.963046] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033[ 75.963046] CR2: 0000000000000000 CR3: 0000000036209000 CR4: 00000000000006f0[ 75.963046] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000[ 75.963046] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400[ 75.963046] Stack:[ 75.963046] ffffffff811b66cb 0000000000000000 0000000000000000 ffff88003a7ec580[ 75.963046] ffff8800362f7ec8 4f49545045435845 000000000000004e 0000000000000000[ 75.963046] 0000000000000000 00000000463b9fa0 ffff8800362fd300 000000000000000a[ 75.963046] Call Trace:[ 75.963046] [] ? vfs_write+0x1bb/0x1f0[ 75.963046] [] proc_reg_write+0x3d/0x80[ 75.963046] [] vfs_write+0xc8/0x1f0[ 75.963046] [] SyS_write+0x55/0xa0[ 75.963046] [] sysenter_dispatch+0x7/0x1f[ 75.963046] [] ? trace_hardirqs_on_thunk+0x3a/0x3f[ 75.963046] Code: e1 f3 6f e1 48 c7 c7 60 09 00 a0 e8 d5 f3 6f e1 e9 e2 fd ff ff c7 45 d0 78 56 34 12 e9 d6 fd ff ff e8 bf fc ff ff e9 cc fd ff ff 04 25 00 00 00 00 00 00 00 00 e9 bc fd ff ff eb fe 66 c7 07 [ 75.963046] RIP [] procfs_write+0x2d6/0x320 [crashtest][ 75.963046] RSP [ 75.963046] CR2: 0000000000000000[ 75.998054] ---[ end trace 33bbddb47601039c ]---

Kernel fault classification

panic(have a nice day... ;-))

BUG() / BUG_ON(condition)

exception (i.e., invalid opcode, division by zero, ...)

memory corruptionstack overflow/underflowNOTE: in kernel space stack size is limited to 2 pages (8K in almost all architectures)

write after free

write to a bad address

concurrent access without protections (locks, etc.)

soft lockuplock a CPU without giving other tasks a chance to run

hard lockuplock a CPU without giving other tasks or interrupts a chance to run

hung task: task doesn't get a chance to run for more than N seconds

scheduling while atomic

deadlock

use FPU registers in kernel space

Useful debugging kernel options

Kernel Hacking section ->CONFIG_KALLSYMS_ALL: print function names instead of addresses in kernel messages

CONFIG_FRAME_POINTER: get useful stack info in case of kernel bugs

CONFIG_DEBUG_ATOMIC_SLEEP: enable sleep inside atomic section checks (i.e., sleep from interrupt handler, sleep when a lock is held, etc...)

CONFIG_LOCKUP_DETECTOR: detect hard and soft lockups

CONFIG_LOCKDEP: lock dependency enging (deadlock detection)

CONFIG_DYNAMIC_FTRACE: enable individual function tracing dynamically (via debugfs /sys/kernel/debug/tracing)

Debugging techniques

blinking LED

printk()

procfs

SysReq key (Documentation/sysrq.txt)

function instrumentation (kprobes)

dynamic ftrace (CONFIG_DYNAMIC_FTRACE)

debugger (kgdb)

printk()

Advantageseasy to use

no need any other system support

Disadvantageshave to modify and rebuild kernel/modules

no interactive debugging

printk(): levels

printk levelsKERN_EMERG: system is unusable

KERN_ALERT: action must be taken immediately

KERN_CRIT: critical condition

KERN_ERR: error condition

KERN_WARNING: warning condition

KERN_NOTICE: normal condition

KERN_INFO: informational

KERN_DEBUG: debug message

Show kernel messages:# dmesg

Redirect all kernel messages to the console# echo 8 > /proc/sys/kernel/printk

procfs

static int procfs_read(struct seq_file *m, void *v){...}

static ssize_t procfs_write(struct file *file, const char __user *ubuf, size_t count, loff_t *pos){...}

static int procfs_open(struct inode *inode, struct file *file){ return single_open(file, procfs_read, NULL);}

static int procfs_release(struct inode *inode, struct file *file){ return 0;}

static const struct file_operations procfs_fops = { .open = procfs_open, .read = seq_read, .write = procfs_write, .llseek = seq_lseek, .release = procfs_release,};

static int __init myproc_init(void){ if (!proc_create(myproc, 0666, NULL, &procfs_fops)) return -ENOMEM; return 0;}

static void __exit myproc_exit(void){ remove_proc_entry(myproc, NULL);}

Kprobes (Kernel probes)

Kprobes allow to dynamically break into any kernel routine and collect debugging and performance information (CONFIG_KPROBES=y)

Trap almost every kernel code address, specifying a handler routine to be invoked when the breakpoint is hit

How does it work?Make a copy of the probed instruction and replace the original instruction with a breakpoint instruction (int3 on x86)

When the breakpoint is hit, a trap occurs, CPU's registers are saved and the control passes to the Kprobes pre-handler

The saved instruction is executed in single-step mode

The Kprobes post-handler is executed

The rest of the original function is executed

Kprobes (example)

static int my_handler(struct kprobe *p, struct pt_regs *regs){/* Do something here... */}

static struct kprobe my_kp = {.pre_handler = my_wrapper,.symbol_name = schedule_timeout,};

static int __init my_kprobe_init(void){ int ret;

ret = register_kprobe(&my_kp); if (ret < 0) { printk(KERN_INFO "%s: error %d\n", __func__, ret); return ret; } return 0;}

static void __exit my_kprobe_exit(void){ unregister_kprobe(&my_kp);}

Dump a stack trace

static const char function_name[] = "schedule_timeout";

static int my_handler(struct kprobe *p, struct pt_regs *regs){dump_stack();printk(KERN_INFO "%s called %s(%d)\n", current->comm, function_name, (int)regs->di);}

static struct kprobe my_kp = {.pre_handler = my_wrapper,.symbol_name = function_name,};

static int __init my_kprobe_init(void){ int ret;

ret = register_kprobe(&my_kp); if (ret < 0) { printk(KERN_INFO "%s: error %d\n", __func__, ret); return ret; } return 0;}

static void __exit my_kprobe_exit(void){ unregister_kprobe(&my_kp);}

Dynamic ftrace

# mount -t debufs none /sys/kernel/debug# cd /sys/kernel/debug# echo sys_nanosleep hrtimer_interrupt > set_ftrace_filter# echo function > current_tracer# echo 1 > tracing_on# usleep 1# echo 0 > tracing_on# cat trace# tracer: function## entries-in-buffer/entries-written: 5/5 #P:4## _-----=> irqs-off# / _----=> need-resched# | / _---=> hardirq/softirq# || / _--=> preempt-depth# ||| / delay# TASK-PID CPU# |||| TIMESTAMP FUNCTION# | | | |||| | | usleep-2665 [001] .... 4186.475355: sys_nanosleep