dead lock analysis of spin_lock() in linux kernel (english)
TRANSCRIPT
1
Outline• spin_lock and semaphore in linux kernel
– Introduction and difference.– Dead lock example of spin_lock.
• What is Context– What is “context”.– Control flow of procedure call, and interrupt handler.
• Log analysis• Conclusion
– How to prevent dead lock of spin_lock.
Spin lock & Semaphore
• Semaphore:– When init value is 1, it can be a mutex lock to prevent compromise of
critical section, just like spin lock.– Different from spin lock, thread goes sleep for waiting lock when failed
to get the lock.
• Spin lock:– Thread doesn’t go sleep for waiting lock when failed to get the lock, it
continue loop of trying to get lock.
2
Spin lock• Spin lock usage for mutex lock :
3
CriticalSection
code
Spin_unlock(&mutex_lock)
CriticalSection
code
Spin_lock(&mutex_lock)
Spin_unlock(&mutex_lock)
1Thread A start execution.
Kernel code : Thread ‘s time slice is decreased to zero. Thread’s context will be saved, then processor is assigned to another thread
2Timer interrupt preempt thread A
Spin_lock(&mutex_lock)
3
Thread B failed to get lock , and continue loop for trying getting lock forever
Kernel code : Thread ‘s time slice is decreased to zero. Thread’s context will be saved, then processor is assigned to another thread
4Timer interrupt preempt thread B
5 Thread A finish critical section.
Thread A Thread B
What is context
• What does “context” means?– A set of dedicated hardware resource that program will
use to meet the need of successful execution.• Such as :
– general purpose register for computing.– stack memory for support of procedure call.
– But from kernel’s point of view, “dedicated context of process” actually is simulated, in fact resources are limited.
• kernel slices time and do context saving & restoring in purpose of emulating a multi-processor environment.
• Program (process) will think just like that it have a dedicated context.
4
What is context• What is user context and interrupt context
– user context: provided by kernel context-switch facility which is triggered by timer interrupt, owner is call a user process, runs in user space code with user mode or in kernel space code with svc mode.
– Interrupt context: part of registers (context?) save and restore by interrupt handler by itself.
• Actually part of interrupt context(reg) will be the some context(reg) of some user process.
5
Processor time axis
Save every register which will be used later into stack.
……
Restore those register which have been used.And jump to return address (r14 register)
Pci bus interrupt
Timer interrupt
Timer interrupt
Thread A
Thread A
Thread B
Thread B
A’s subroutine
Int_handler()
What is context• Compare Interrupt handler & procedure call.
– Interrupt handler run as a procedure call.– The difference is that
• int_handler don’t receive any parameter and don’t return any value.• Program is even unaware of execution of int_handler.
6
Processor time axis
Pci bus interrupt
Timer interrupt
Timer interrupt
Thread A
Thread A
Thread B
Thread B
subroutine
Save every register which will be used later into stack.
……
Restore those register which have been used, and jump to return address(r14).
Save every register which will be used later into stack.
Read parameter in param register…
Put return value in param registerRestore those register which have been used,
and jump to return address(r14).
Void Foo(void) : user space
Int_handler(): kernel space
double-acquire deadlock(1/2)
• Spin_lock convention– Unlike spin lock implementation in other operating
system, linux kernel’s spin lock is not recursive. – Double-acquire deadlock example as followed:
7
Spin_lock(&mutex_lock);fooB();
Spin_unlock(&mutex_lock);
Thread A
Save every register which will be used later into stack.Read parameter in param register
…Spin_lock(&mutex_lock);
…Put return value in param register
Restore those register which have been used, and jump to return address(r14).
Void fooB(void)
double-acquire deadlock(2/2) • Spin_lock synchronization between user context and interrupt context
– Double-acquire deadlock example(2) as followed:
– Example that won’t have Double-acquire deadlock as followed:
8
Spin_lock(&mutex_lock);
Spin_unlock(&mutex_lock);
Thread A
Save every register which will be used later into stack.…
Spin_lock(&mutex_lock);…
Restore those register which have been used, and jump to return address(r14).
Sdio_int_handler()Interrupt happens just after thread A get spin lock
Sdio_int handler will be busy-waiting mutex_lock
Spin_lock(&mutex_lock);
Spin_unlock(&mutex_lock);
Thread A
Save every register which will be used later into stack.
…Spin_lock(&mutex_lock);
…Restore those register which have been used,
and jump to return address(r14).
Sdio_int_handler()
Timer Interrupt happens just after thread A get spin lock
Kernel code : Thread ‘s time slice is decreased to zero. Thread’s context will be saved, then processor is assigned to another thread
Thread B’s user code execution
Sdio Interrupt happens just after thread A get spin lock
Sdio_int handler and thread B will be busy-waiting mutex_lock
Log Analysis(1) • In our case, CheckCallbackTimeout() might just
interrupt WiMAXQueryImformation() in user context(CM_Query thread)
9
Spin_lock(&mutex_lock);
Spin_unlock(&mutex_lock);
Thread ATimer Interrupt happens just after thread A get spin lock
Kernel code : …If (timer has to be exucuted){ CheckCallbackTimeout();}……Return;
CheckCallbackTimeout{ LDDB_spin_lock(); …
}
Log Analysis(2) • Timer callback function is called in __irq_svc.• __irq_svc is a subroutine which is only called by irq
handler.
10
Conclusion – Immediate Solution
• Use spin_lock_irqsave and spin_lock_irqrestore.– Turn off interrupt before acquire spin lock.
11
Conclusion – what action we have to take right now
• What should we do before implementation - Identify those context which open the same lock to do synchronization. – Prevent double-acquire deadlock scenario with interrupt disable API,
when lock is shared in interrupt and user context.– Prevent using semaphore in interrupt context.– Leave interrupt as soon as possible, and postpone task into other user
context, such as work queue.
• Turn on CONFIG_PROVE_LOCKING, CONFIG_DEBUG_LOCK_ALLOC, CONFIG_DEBUG_SPINLOCK – That will help debugging.
12
Reference • Linux.Kernel.Development.3rd.Edition, Robert
Love.• Linux device driver programming 驅動程式設計 ,
平田 豐 .
13
Appendix-context switch• Context-switch code
– Restore and jump should be combined to a atomic operation.
Copyright 2009 FUJITSU LIMITED 14
Timer interrupt code : …If thread ‘s time slice is decreased to zero. { save r0~r15 into current ’s TCB; restore B’s r0~r14 registers; jump r15 <- B’s TCB[15] + 3 } return from interrupt;
Spin_lock(&mutex_lock);……
Spin_unlock(&mutex_lock);
……
Sleep(2000ms);……
Sema_get(&mutex_lock)
Sleep function (kernel code ): …… save r0~r1 into current’s TCB; restore A’s r0~r14 registers; jump r15 <- A’s TCB[15] + 3 return ;
semaphore function (kernel code ):…. if lsemaphore is zero { save r0~r14 into current’s TCB; restore A’s r0~r14 registers; jump r15 <- B’s TCB[15] + 3} return ;
Thread A
Thread B
1
2
3
4
5