1 solaris scheduling bongio jeremy wenjin hu. 2 overview table driven loadable class module...

46
1 Solaris Scheduling Bongio Jeremy Wenjin Hu

Post on 22-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

1

Solaris Scheduling

Bongio Jeremy

Wenjin Hu

2

Overview

Table Driven Loadable class module Thread-level scheduling

The Solaris kernel may be seen as a bundle of kernel threads A kernel thread is the entity that is scheduled by the kernel If no lightweight process is attached, it is also known as a

system thread

Kernel preemptable

3

Class and priority

Interrupt global prio -> 100~109/160~169 user prio -> 0-9 Not a really scheduling class

Real Time global prio -> 100~159 user prio -> 0-59

SYS global prio -> 60~99 user prio -> 0-39

TimeShare global prio -> 0~59 user prio -> 0-59

Interactive sharesTS dispatch table

4

Priority Classes

Global Priority Scheme and Scheduling Classes (1)

5

Class and priority (Con’t)

Two level prioritySystemwide-relative priority(Global Priority)

NOT tunable(Global Range)

Class-relative priority(Class Priority)

Tuned by the kernel/dispatcher (Adjustment Rang)

6

/uts/common/sys/class.h

105 typedef struct sclass {

106 char *cl_name; /* class name */ 107 /* class specific initialization function */ 108 pri_t (*cl_init)(id_t, int, classfuncs_t **);

/*scheduling-class-dependent(class_ops & thread_ops)*/ /*thread can enter the class*/

109 classfuncs_t *cl_funcs; /* pointer to classfuncs structure */

/*kernel lock for synchronized access to the class structure*/110 krwlock_t *cl_lock; /* class structure read/write lock */ 111 int cl_count; /* # of threads trying to load class */

112 } sclass_t;

7

SYS

Critical resource Preemptable NOT time sliced Priority defined in a simple array NOT loadable In the TS framework TS/IA may be temporarily adjust to SYS

8

/uts/common/disp/ts_dptbl.c 77 #define TSGPUP0 0 /* Global priority for TS user priority 0 */ 78 #define TSGPKP0 60 /* Global priority for TS kernel priority 0 */ 7980 /* 81 * array of global priorities used by ts procs sleeping or 82 * running in kernel mode after sleep 83 */ 84 85 pri_t config_ts_kmdpris[] = { 86 TSGPKP0, TSGPKP0+1, TSGPKP0+2, TSGPKP0+3, 87 TSGPKP0+4, TSGPKP0+5, TSGPKP0+6, TSGPKP0+7, 88 TSGPKP0+8, TSGPKP0+9, TSGPKP0+10, TSGPKP0+11, 89 TSGPKP0+12, TSGPKP0+13, TSGPKP0+14, TSGPKP0+15, 90 TSGPKP0+16, TSGPKP0+17, TSGPKP0+18, TSGPKP0+19, 91 TSGPKP0+20, TSGPKP0+21, TSGPKP0+22, TSGPKP0+23, 92 TSGPKP0+24, TSGPKP0+25, TSGPKP0+26, TSGPKP0+27, 93 TSGPKP0+28, TSGPKP0+29, TSGPKP0+30, TSGPKP0+31, 94 TSGPKP0+32, TSGPKP0+33, TSGPKP0+34, TSGPKP0+35, 95 TSGPKP0+36, TSGPKP0+37, TSGPKP0+38, TSGPKP0+39 96 };

9

Realtime

Capable of preempting SYS Memory locking Run at fixed priority but can be configured

(kernel cannot change its priority)

10

/src/uts/common/sys/rt.h

46 typedef struct rtdpent {

47 pri_t rt_globpri; /* global (class independent) priority */

48 int rt_quantum; /* default quantum associated with this level */

49 } rtdpent_t;

11

/uts/common/disp/rt_dptbl.c

73 #define RTGPPRIO0 100 /* Global priority for RT priority 0 */ 74 75 rtdpent_t config_rt_dptbl[] = { 76 77 /* prilevel Time quantum */ 78 79 RTGPPRIO0, 100, 80 RTGPPRIO0+1, 100, 81 RTGPPRIO0+2, 100, 97 RTGPPRIO0+18, 80, 102 RTGPPRIO0+23, 60, 107 RTGPPRIO0+28, 60, 112 RTGPPRIO0+33, 40, 117 RTGPPRIO0+38, 40, 122 RTGPPRIO0+43, 20, 127 RTGPPRIO0+48, 20, 137 RTGPPRIO0+58, 10, 138 RTGPPRIO0+59, 10 139 };

12

Dispatcher Table

Contains default values for priority and priortity readjustment

Get its global priority by user priority as index Get quantum by its global priority Indicate how to adjust the priority

TS example

13

/src/uts/common/sys/ts.h 47 typedef struct tsdpent { 48 pri_t ts_globpri; /* global (class independent) priority */

49 int ts_quantum; /* time quantum given to procs at this level *//*favors IA or CPU-bound*/

/*parameters to calculate the class-relative priority*//*deduct 10 from current globpri value(decreasing the priority)*/

50 pri_t ts_tqexp; /* ts_umdpri assigned when proc at this level */ 51 /* exceeds its time quantum */52 pri_t ts_slpret; /* ts_umdpri assigned when proc at this level */ 53 /* returns to user mode after sleeping */

/*Control I/O bound decay*/

/*threshhold*/54 short ts_maxwait; /* bumped to ts_lwait if more than ts_maxwait */ 55 /* secs elapse before receiving full quantum */ 56 short ts_lwait; /* ts_umdpri assigned if ts_dispwait exceeds */ 57 /* ts_maxwait */

/*Controls thread starvation*/ 58 } tsdpent_t;

14

/uts/common/disp/ts_dptbl.c

77 #define TSGPUP0 0 /* Global priority for TS user priority 0 */ 78 #define TSGPKP0 60 /* Global priority for TS kernel priority 0 */

98 tsdpent_t config_ts_dptbl[] = { 99 100 /* glbpri qntm tqexp slprt mxwt lwt */ 101 102 TSGPUP0+0, 20, 0, 50, 0, 50, 124 TSGPUP0+22, 12, 12, 52, 0, 52, 136 TSGPUP0+34, 8, 24, 53, 0, 53, 149 TSGPUP0+47, 4, 37, 58, 0, 58, 159 TSGPUP0+57, 4, 47, 58, 0, 59, 160 TSGPUP0+58, 4, 48, 58, 0, 59, 161 TSGPUP0+59, 2, 49, 59, 32000, 59 162 };

15

Thread priority calculation& Dispatcher Algorithm

Quantum corresponding to the class-relative priority Lower priority but longer quantum to favor IA ts_cpupri is used to index into TS disptbl and updated

itself by the corresponding ts_tqexp user mode priority calculated ts_globpri=TSGPUP+ts_umdpri (used as index) t_pri=ts_globpri or

t_pri=lowest SYS priority if ts_flags=TSKPRI(indicates working in SYS class)

16

/uts/common/sys/ts.h 64 typedef struct tsproc { 65 int ts_timeleft; /* time remaining in procs quantum */

/*updated per sec by ts_update() and compared with tsdpend.ts_maxwait*/66 uint_t ts_dispwait; /* wall clock seconds since start */ 67 /* of quantum (not reset upon preemption */

71 pri_t ts_umdpri; /* user mode priority within ts class */ /*adjustment: ts_umdpri=ts_cpupri+ts_upri+ts_boost*/

68 pri_t ts_cpupri; /* system controlled component of ts_umdpri */ 69 pri_t ts_uprilim; /* user priority limit */ 70 pri_t ts_upri; /* user priority */ 74 char ts_boost; /* interactive priority offset */

/*distinguish between IA and TS*/75 uchar_t ts_flags; /* flags defined below */

72 pri_t ts_scpri; /* remembered priority, for schedctl */ 73 char ts_nice; /* nice value for compatibility */

76 kthread_t *ts_tp; /* pointer to thread */ 77 struct tsproc *ts_next; /* link to next tsproc on list */ 78 struct tsproc *ts_prev; /* link to previous tsproc on list */ 79 } tsproc_t;

17

/uts/common/sys/thread.h

106 typedef struct _kthread { /*set by tsdpent_t.ts_globpri*/

120 pri_t t_pri; /* assigned thread priority */

/*scheduling-class-specific structure linked to every kthread */129 struct thread_ops *t_clfuncs; /* scheduling class ops vector */ 130 void *t_cldata; /* per scheduling class specific data */

/*inherited from parent thread, initial thread is LWP/kthread*/121 pri_t t_epri; /* inherited thread priority */

/*the SYS thread priority for this thread*//*Increase the TS/IA class to SYS class if critical resource obtained*/

195 uint_t t_kpri_req; /* kernel priority required */ 336 } kthread_t;

18

/uts/common/disp/ts.c

128 #define TS_NEWUMDPRI(tspp) \ 129 { \ 130 pri_t pri; \ 131 pri = (tspp)->ts_cpupri + (tspp)->ts_upri + (tspp)->ts_boost; \ 132 if (pri > ts_maxumdpri) \ 133 (tspp)->ts_umdpri = ts_maxumdpri; \ 134 else if (pri < 0) \ 135 (tspp)->ts_umdpri = 0; \ 136 else \ 137 (tspp)->ts_umdpri = pri; \ 138 ASSERT((tspp)->ts_umdpri >= 0 && (tspp)->ts_umdpri <=

ts_maxumdpri); \ 139 }

19

1659 ts_tick(kthread_t *t)

1693 tspp->ts_cpupri = ts_dptbl[tspp->ts_cpupri].ts_tqexp; 1694 TS_NEWUMDPRI(tspp); 1696 new_pri = ts_dptbl[tspp->ts_umdpri].ts_globpri;

1706 if ((t->t_schedflag & TS_LOAD) 1710 tspp->ts_timeleft = 1711 ts_dptbl[tspp->ts_cpupri].

ts_quantum;

20

TS Class vs IA Class

The same dispatcher table IA for windows (This can be observed by

last semester’s project, the active windows has more chances to run)

Share TS’s thread tsproc_t data structure by flag ts_flags

21

TS Class vs IA Class (Con’t)

ts_boost for IA (+10, cancel the effect of ts_tqexp)

ts_boost for TS (0) IA use setfrontdq() for getting scheduled ASAP TS use:

setbackdq() for maintaining a banlance in queue depth across processors

setfrontdq() for waiting for a while

22

/uts/common/sys/ts.h 61 /* 62 * time-sharing class specific thread structure 63 */ 64 typedef struct tsproc { 65 int ts_timeleft; /* time remaining in procs quantum */ 66 uint_t ts_dispwait; /* wall clock seconds since start */ 67 /* of quantum (not reset upon preemption */ 68 pri_t ts_cpupri; /* system controlled component of ts_umdpri */ 69 pri_t ts_uprilim; /* user priority limit */ 70 pri_t ts_upri; /* user priority */ 71 pri_t ts_umdpri; /* user mode priority within ts class */ 72 pri_t ts_scpri; /* remembered priority, for schedctl */ 73 char ts_nice; /* nice value for compatibility */ 74 char ts_boost; /* interactive priority offset */

75 uchar_t ts_flags; /* flags defined below */

76 kthread_t *ts_tp; /* pointer to thread */ 77 struct tsproc *ts_next; /* link to next tsproc on list */ 78 struct tsproc *ts_prev; /* link to previous tsproc on list */ 79 } tsproc_t; 80 81 82 /* flags */83 #define TSKPRI 0x01 /* thread at kernel mode priority */ 84 #define TSBACKQ 0x02 /* thread goes to back of disp q when preempted */ 85 #define TSIA 0x04 /* thread is interactive */ 86 #define TSIASET 0x08 /* interactive thread is "on" */ 87 #define TSIANICED 0x10 /* interactive thread has been niced */ 88 #define TSRESTORE 0x20 /* thread was not preempted, due to schedctl */ 89 /* restore priority from ts_scpri */

23

/uts/common/disp/ts.c 1653 * Check for time slice expiration. If time slice has expired 1654 * move thread to priority specified in tsdptbl for time slice expiration 1655 * and set runrun to cause preemption. 1656 */ 1657 1658 static void 1659 ts_tick(kthread_t *t)

1668 if ((tspp->ts_flags & TSKPRI) == 0) { 1669 if (--tspp->ts_timeleft <= 0) { 1670 pri_t new_pri; 1671 1672 /* 1673 * If we're doing preemption control and trying to 1674 * avoid preempting this thread, just note that 1675 * the thread should yield soon and let it keep 1676 * running (unless it's been a while). 1677 */ 1678 if (t->t_schedctl && schedctl_get_nopreempt(t)) { 1679 if (tspp->ts_timeleft > -SC_MAX_TICKS) { 1680 DTRACE_SCHED1(schedctl__nopreempt, 1681 kthread_t *, t); 1682 schedctl_set_yield(t, 1); 1683 thread_unlock_nopreempt(t); 1684 return; 1685 } 1686

24

/uts/common/disp/ts.c 1653 * Check for time slice expiration. If time slice has expired 1654 * move thread to priority specified in tsdptbl for time slice expiration 1655 * and set runrun to cause preemption. 1656 */ 1657 1658 static void 1659 ts_tick(kthread_t *t)

1668 if ((tspp->ts_flags & TSKPRI) == 0) { 1669 if (--tspp->ts_timeleft <= 0) { 1670 pri_t new_pri; 1671 1672 /* 1673 * If we're doing preemption control and trying to 1674 * avoid preempting this thread, just note that 1675 * the thread should yield soon and let it keep 1676 * running (unless it's been a while). 1677 */ 1678 if (t->t_schedctl && schedctl_get_nopreempt(t)) { 1679 if (tspp->ts_timeleft > -SC_MAX_TICKS) { 1680 DTRACE_SCHED1(schedctl__nopreempt, 1681 kthread_t *, t); 1682 schedctl_set_yield(t, 1); 1683 thread_unlock_nopreempt(t); 1684 return; 1685 } 1686

25

/uts/common/disp/ts.c 16861687 TNF_PROBE_2(schedctl_failsafe, 1688 "schedctl TS ts_tick", /* CSTYLED */, 1689 tnf_pid, pid, ttoproc(t)->p_pid, 1690 tnf_lwpid, lwpid, t->t_tid); 1691 } 1692 tspp->ts_flags &= ~TSRESTORE; 1693 tspp->ts_cpupri = ts_dptbl[tspp->ts_cpupri].ts_tqexp; 1694 TS_NEWUMDPRI(tspp); 1695 tspp->ts_dispwait = 0; 1696 new_pri = ts_dptbl[tspp->ts_umdpri].ts_globpri; 1697 ASSERT(new_pri >= 0 && new_pri <= ts_maxglobpri);1698 /* 1699 * When the priority of a thread is changed, 1700 * it may be necessary to adjust its position 1701 * on a sleep queue or dispatch queue. 1702 * The function thread_change_pri accomplishes 1703 * this. 1704 */ 1705 if (thread_change_pri(t, new_pri, 0)) { 1706 if ((t->t_schedflag & TS_LOAD) && 1707 (lwp = t->t_lwp) && 1708 lwp->lwp_state == LWP_USER) 1709 t->t_schedflag &= ~TS_DONT_SWAP; 1710 tspp->ts_timeleft = 1711 ts_dptbl[tspp->ts_cpupri].ts_quantum; 1712 } else { 1713 tspp->ts_flags |= TSBACKQ; 1714 cpu_surrender(t); 1715 } 1716 TRACE_2(TR_FAC_DISP, TR_TICK, 1717 "tick:tid %p old pri %d", t, oldpri); 1718 } else if (t->t_state == TS_ONPROC && 1719 t->t_pri < t->t_disp_queue->disp_maxrunpri) { 1720 tspp->ts_flags |= TSBACKQ; 1721 cpu_surrender(t); 1722 }

26

/uts/common/disp/ts.c 1698 /* 1699 * When the priority of a thread is changed, 1700 * it may be necessary to adjust its position 1701 * on a sleep queue or dispatch queue. 1702 * The function thread_change_pri accomplishes 1703 * this. 1704 */ 1705 if (thread_change_pri(t, new_pri, 0)) { 1706 if ((t->t_schedflag & TS_LOAD) && 1707 (lwp = t->t_lwp) && 1708 lwp->lwp_state == LWP_USER) 1709 t->t_schedflag &= ~TS_DONT_SWAP; 1710 tspp->ts_timeleft = 1711 ts_dptbl[tspp->ts_cpupri].ts_quantum; 1712 } else { 1713 tspp->ts_flags |= TSBACKQ; 1714 cpu_surrender(t); 1715 } 1716 TRACE_2(TR_FAC_DISP, TR_TICK, 1717 "tick:tid %p old pri %d", t, oldpri); 1718 } else if (t->t_state == TS_ONPROC && 1719 t->t_pri < t->t_disp_queue->disp_maxrunpri) { 1720 tspp->ts_flags |= TSBACKQ; 1721 cpu_surrender(t); 1722 }

27

Priority Inheritance

Prevent priority inversion Each thread has two priorities: global priority and inherited priority.

The inherited priority is normally zero unless the thread is sitting on a resource that is required by a higher priority thread.

When a thread blocks on a resource, it attempts to "will" or pass on its priority to all threads that are directly or indirectly blocking it. The pi_willto() function checks each thread that is blocking the resource or that is blocking a thread in the syncronization chain. When it sees threads that are a lower priority, those threads inherit the priority of the blocked thread. It stops traversing the syncronization chain when it hits an object that is not blocked or is higher priority than the willing thread.

If someone maliciously grab some resource?

28

Mechanism / Frame

Dispatcher: manage queues of runable threads---run the highest priority thread---recalculate the thread priority

Multi dispatch queuesone for each processor

one kernel preempt queue for systemwidefor unbound RT threads

one kernel preempt queue for each processor setfor RT threads

Double linked list Clock-driven

29

/src/uts/common/sys/disp.h

47 typedef struct dispq { 48 kthread_t *dq_first; /* first thread on queue or NULL */ 49 kthread_t *dq_last; /* last thread on queue or NULL */ 50 int dq_sruncnt; /* number of loaded, runnable */ 51 /* threads on queue */ 52 } dispq_t;

30

/uts/common/sys/disp.h 54 /* 55 * Dispatch queue structure. 56 */ 57 typedef struct _disp { 58 disp_lock_t disp_lock; /* protects dispatching fields */ 59 pri_t disp_npri; /* # of priority levels in queue */ 60 dispq_t *disp_q; /* the dispatch queue */ 61 dispq_t *disp_q_limit; /* ptr past end of dispatch queue */ 62 ulong_t *disp_qactmap; /* bitmap of active dispatch queues */ 63 64 /* 65 * Priorities: 66 * disp_maxrunpri is the maximum run priority of runnable threads 67 * on this queue. It is -1 if nothing is runnable. 68 * 69 * disp_max_unbound_pri is the maximum run priority of threads on 70 * this dispatch queue but runnable by any CPU. This may be left 71 * artificially high, then corrected when some CPU tries to take 72 * an unbound thread. It is -1 if nothing is runnable. 73 */ 74 pri_t disp_maxrunpri; /* maximum run priority */ 75 pri_t disp_max_unbound_pri; /* max pri of unbound threads */ 76 77 volatile int disp_nrunnable; /* runnable threads in cpu dispq */ 78 79 struct cpu *disp_cpu; /* cpu owning this queue or NULL */ 80 } disp_t;

31

Dispatcher

ts_tick() recalculate the priority of the running process ts_update() recalculate the priority of the process in a dispatch

queue or sleep queue setfrontdq()&setbackdq() will cause preempt() ts_tick() will cause cpu_surrend() ts_tick() or ts_yield will cause swtch() swtch() will call disp() disp() looks for the highest-priority to run

First search kernel preempt queue Then search the queue of the current CPU Search the dispatch queue of other CPUs Idle thread

32

/uts/common/disp/disp.c 685 while ((pri = kpq->disp_maxrunpri) >= 0 && 686 pri >= dp->disp_maxrunpri && 687 (cpup->cpu_flags & CPU_OFFLINE) == 0 &&

/*fectch the best-priority thread from the kernel preempt queue*/ 688 (tp = disp_getbest(kpq)) != NULL) { 695 }

698 pri = dp->disp_maxrunpri;

707 if (pri == -1) { 708 if (!(cpup->cpu_flags & CPU_OFFLINE)) {

/*find a processor with the highest-priority thread*/710 if ((tp = disp_getwork(cpup)) == NULL) { 711 tp = cpup->cpu_idle_thread; 718 } 719 } else { 721 tp = cpup->cpu_idle_thread; 727 }

734 dq = &dp->disp_q[pri]; 735 tp = dq->dq_first;

33

Scheduler activation

Preemption control The management of the LWP-to-User-

thread problem The management fo keeping the correct

number of LWPs available for a threaded process

34

LWP

Lightweight process Execute a function call to a function that is part of another process’s address space, pass the function

arguments, and get a return value as if the function was part of the calling process A lightweight process can be considered as the swappable portion of a kernel thread a lightweight process is to think of them as "virtual CPUs" which perform the processing for

applications. Application threads are attached to available lightweight processes, which are attached to a a kernel thread, which is scheduled on the system's CPU dispatch queue.

Communication between the kernel and user-level threads library Based on shared memory pages System call lwp_schedctl()

primordial thread t0 sc_init() establish the shared memory pages and the upcall door.

35

User thread vs Kernel thread

A kernel thread is the entity that is scheduled by the kernel. If no lightweight process is attached, it is also known as a system thread. It uses kernel text and global data, but has its own kernel stack, as well as a data structure to hold scheduling and syncronization

information. Kernel threads can be independently scheduled on CPUs. Context

switching between kernel threads is very fast because memory mappings do not have to be flushed.

User threads are scheduled via a scheduler in libthread. This scheduler does implement priorities, but does not implement time slicing. If time slicing is desired, it must be programmed in.

36

User thread vs Kernel thread(Con’t)

User thread use thread library’s own scheduler

Solaris currently ships with two threads libraries: libthread.so, for support of the Solaris threads interfaces, user

threads are created by a call to thr_create(3THR) (Solaris threads)

libpthread.so, the POSIX (Portable Operating System Interface for Unix) threads APIs, user threads are created by a call to pthread_create(3THR) (POSIX threads).

37

User thread vs Kernel thread

The Multithreaded Process Model (1)

38

Scheduler activation

Schedctl_init() initialize, turning on premption control.

sc_init() estalbish a door for kernel-to-user upcalls

schedctl_block() determin if this LWP is the last one in the process when LWP is to sleep.

39

Preemption control

In ts-tick, if ts_timeleft reaches 0, give the kthread a few extra ticks beyond its time quantum to free the critical resources

Get one more time slice to run, not allow the scheduler activation to keep thread running indefinitely.

40

Citation

1. Solaris Internals 2.

http://www.princeton.edu/~psg/unix/Solaris/troubleshoot/process.html

41

Questions ?

42

Scheduler activation

Give the kthread a few extra clock ticks beyond its time quantum to complete its task and free the lock

Activated by mutex lock Only given one extra ts_tick()

43

Processor Set

RT in the same processor set Interrupt disabled in the RT set

44

Kernel Service

System Calls: The kernel executes requests submitted by processes via system calls. The system call interface invokes a special trap instruction.

Hardware Exceptions: The kernel notifies a process that attempts several illegal activities such as dividing by zero or overflowing the user stack.

Hardware Interrupts: Devices use interrupts to notify the kernel of status changes (such as I/O completions).

Resource Management: The kernel manages resources via special processes such as the pagedaemon.

45

Hash table

150 #define TS_LISTS 16 /* number of lists, must be power of 2 */153 #define TS_LIST_HASH(tp) (((uintptr_t)(tp) >> 9) & (TS_LISTS -

1))

235 static kmutex_t ts_dptblock; /* protects time sharing dispatch table */

236 static kmutex_t ts_list_lock[TS_LISTS]; /* protects tsproc lists */ 237 static tsproc_t ts_plisthead[TS_LISTS]; /* dummy tsproc at head

of lists */

46

Key feature

Table driven Priority inversion—priority inheritance User mode thread—Kernel mode thread Kernel preemptable Scheduler activation Processor set