the linux scheduler: a decade of wasted coresjplozi/wastedcores/files/extended_talk.pdf · the...
TRANSCRIPT
![Page 1: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/1.jpg)
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1
Jean-Pierre Lozi
Baptiste Lepers
Fabien Gaud
Alexandra Fedorova
Justin Funston
Vivien Quéma
THE LINUX SCHEDULER: A DECADE OF WASTED CORES
![Page 2: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/2.jpg)
IS THE SCHEDULER OF YOUR MACHINE WORKING?
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 2
![Page 3: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/3.jpg)
IS THE SCHEDULER OF YOUR MACHINE WORKING?
It must be! 15 years ago, Linus Torvalds was already saying:
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 2
“And you have to realize that there are not very many things
that have aged as well as the scheduler. Which is just another
proof that scheduling is easy.”
![Page 4: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/4.jpg)
IS THE SCHEDULER OF YOUR MACHINE WORKING?
It must be! 15 years ago, Linus Torvalds was already saying:
Since then, people have been running applications on their multicore machines all the time, and they run, CPU usage is high, everything seems fine.
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 2
“And you have to realize that there are not very many things
that have aged as well as the scheduler. Which is just another
proof that scheduling is easy.”
![Page 5: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/5.jpg)
IS THE SCHEDULER OF YOUR MACHINE WORKING?
It must be! 15 years ago, Linus Torvalds was already saying:
Since then, people have been running applications on their multicore machines all the time, and they run, CPU usage is high, everything seems fine.
But would you notice if some cores remained idle intermittently, when they shouldn’t ?
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 2
“And you have to realize that there are not very many things
that have aged as well as the scheduler. Which is just another
proof that scheduling is easy.”
![Page 6: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/6.jpg)
IS THE SCHEDULER OF YOUR MACHINE WORKING?
It must be! 15 years ago, Linus Torvalds was already saying:
Since then, people have been running applications on their multicore machines all the time, and they run, CPU usage is high, everything seems fine.
But would you notice if some cores remained idle intermittently, when they shouldn’t ?
Do you keep monitoring tools (htop) running all the time?
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 2
“And you have to realize that there are not very many things
that have aged as well as the scheduler. Which is just another
proof that scheduling is easy.”
![Page 7: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/7.jpg)
IS THE SCHEDULER OF YOUR MACHINE WORKING?
It must be! 15 years ago, Linus Torvalds was already saying:
Since then, people have been running applications on their multicore machines all the time, and they run, CPU usage is high, everything seems fine.
But would you notice if some cores remained idle intermittently, when they shouldn’t ?
Do you keep monitoring tools (htop) running all the time?
Even if you do, would you be able to identify faulty behavior from normal noise?
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 2
“And you have to realize that there are not very many things
that have aged as well as the scheduler. Which is just another
proof that scheduling is easy.”
![Page 8: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/8.jpg)
IS THE SCHEDULER OF YOUR MACHINE WORKING?
It must be! 15 years ago, Linus Torvalds was already saying:
Since then, people have been running applications on their multicore machines all the time, and they run, CPU usage is high, everything seems fine.
But would you notice if some cores remained idle intermittently, when they shouldn’t ?
Do you keep monitoring tools (htop) running all the time?
Even if you do, would you be able to identify faulty behavior from normal noise?
Would you ever suspect the scheduler?
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 2
“And you have to realize that there are not very many things
that have aged as well as the scheduler. Which is just another
proof that scheduling is easy.”
![Page 9: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/9.jpg)
THIS TALK
Over the past few years of working on various projects, we sometimes saw strange, hard to explain performance results.
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 3
![Page 10: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/10.jpg)
THIS TALK
Over the past few years of working on various projects, we sometimes saw strange, hard to explain performance results.
An example: running a TPC-H benchmark on a 64-core machine, our runs much faster when pinning threads to cores than when we let the Linux scheduler do its job.
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 3
![Page 11: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/11.jpg)
THIS TALK
Over the past few years of working on various projects, we sometimes saw strange, hard to explain performance results.
An example: running a TPC-H benchmark on a 64-core machine, our runs much faster when pinning threads to cores than when we let the Linux scheduler do its job.
Memory locality issue? Impossible, hardware counters showed no difference in the % of remote memory accesses, in cache misses, etc.
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 3
![Page 12: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/12.jpg)
THIS TALK
Over the past few years of working on various projects, we sometimes saw strange, hard to explain performance results.
An example: running a TPC-H benchmark on a 64-core machine, our runs much faster when pinning threads to cores than when we let the Linux scheduler do its job.
Memory locality issue? Impossible, hardware counters showed no difference in the % of remote memory accesses, in cache misses, etc.
Contention over some resource (spinlock, etc.)? We investigated this for a long time, but couldn’t find anything that looked off.
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 3
![Page 13: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/13.jpg)
THIS TALK
Over the past few years of working on various projects, we sometimes saw strange, hard to explain performance results.
An example: running a TPC-H benchmark on a 64-core machine, our runs much faster when pinning threads to cores than when we let the Linux scheduler do its job.
Memory locality issue? Impossible, hardware counters showed no difference in the % of remote memory accesses, in cache misses, etc.
Contention over some resource (spinlock, etc.)? We investigated this for a long time, but couldn’t find anything that looked off.
Overhead of context switches? Threads moved a lot but we proved that the overhead was negligible.
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 3
![Page 14: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/14.jpg)
THIS TALK
Over the past few years of working on various projects, we sometimes saw strange, hard to explain performance results.
An example: running a TPC-H benchmark on a 64-core machine, our runs much faster when pinning threads to cores than when we let the Linux scheduler do its job.
Memory locality issue? Impossible, hardware counters showed no difference in the % of remote memory accesses, in cache misses, etc.
Contention over some resource (spinlock, etc.)? We investigated this for a long time, but couldn’t find anything that looked off.
Overhead of context switches? Threads moved a lot but we proved that the overhead was negligible.
We ended up suspecting the core behavior of the scheduler.
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 3
![Page 15: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/15.jpg)
THIS TALK
Over the past few years of working on various projects, we sometimes saw strange, hard to explain performance results.
An example: running a TPC-H benchmark on a 64-core machine, our runs much faster when pinning threads to cores than when we let the Linux scheduler do its job.
Memory locality issue? Impossible, hardware counters showed no difference in the % of remote memory accesses, in cache misses, etc.
Contention over some resource (spinlock, etc.)? We investigated this for a long time, but couldn’t find anything that looked off.
Overhead of context switches? Threads moved a lot but we proved that the overhead was negligible.
We ended up suspecting the core behavior of the scheduler.
We implemented high-resolution tracing tools and saw that some cores were idle while others overloaded...
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 3
![Page 16: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/16.jpg)
THIS TALK
Over the past few years of working on various projects, we sometimes saw strange, hard to explain performance results.
An example: running a TPC-H benchmark on a 64-core machine, our runs much faster when pinning threads to cores than when we let the Linux scheduler do its job.
Memory locality issue? Impossible, hardware counters showed no difference in the % of remote memory accesses, in cache misses, etc.
Contention over some resource (spinlock, etc.)? We investigated this for a long time, but couldn’t find anything that looked off.
Overhead of context switches? Threads moved a lot but we proved that the overhead was negligible.
We ended up suspecting the core behavior of the scheduler.
We implemented high-resolution tracing tools and saw that some cores were idle while others overloaded...
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 3
![Page 17: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/17.jpg)
THIS TALK
This is how we found our first performance bug. Which made us investigate more...
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 4
![Page 18: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/18.jpg)
THIS TALK
This is how we found our first performance bug. Which made us investigate more...
In the end: four Linux scheduler performance bugs that we found, analyzed and fixed
Always the same symptom: idle cores while others are overloaded
The bug-hunting was tough, and led us to develop our own tools
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 4
![Page 19: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/19.jpg)
THIS TALK
This is how we found our first performance bug. Which made us investigate more...
In the end: four Linux scheduler performance bugs that we found, analyzed and fixed
Always the same symptom: idle cores while others are overloaded
The bug-hunting was tough, and led us to develop our own tools
After fixing some of the bugs :
12-23% performance improvement on a popular database with TPC-H
137× performance improvement on HPC workloads
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 4
![Page 20: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/20.jpg)
THIS TALK
This is how we found our first performance bug. Which made us investigate more...
In the end: four Linux scheduler performance bugs that we found, analyzed and fixed
Always the same symptom: idle cores while others are overloaded
The bug-hunting was tough, and led us to develop our own tools
After fixing some of the bugs :
12-23% performance improvement on a popular database with TPC-H
137× performance improvement on HPC workloads
Not always possible to provide a simple, working fix...
Intrisic problems with the design of the scheduler?
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 4
![Page 21: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/21.jpg)
THIS TALK
Main takeaway of our analysis: more research must be directedtowards implementing an efficient scheduler for multicore architectures,because contrary to what a lot of us think, this is *not* a solved problem!
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 5
![Page 22: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/22.jpg)
THIS TALK
Main takeaway of our analysis: more research must be directedtowards implementing an efficient scheduler for multicore architectures,because contrary to what a lot of us think, this is *not* a solved problem!
Need convincing? Let’s go through it together...
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 5
![Page 23: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/23.jpg)
THIS TALK
Main takeaway of our analysis: more research must be directedtowards implementing an efficient scheduler for multicore architectures,because contrary to what a lot of us think, this is *not* a solved problem!
Need convincing? Let’s go through it together...
...starting with a bit of background...
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 5
![Page 24: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/24.jpg)
THE COMPLETELY FAIR SCHEDULER (CFS): CONCEPT
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 6
Core 0 Core 1 Core 2 Core 3
![Page 25: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/25.jpg)
THE COMPLETELY FAIR SCHEDULER (CFS): CONCEPT
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 6
Core 0 Core 1 Core 2 Core 3
R = 103
R = 82
R = 24
R = 18
R = 12
One runqueue where threads
are globally sorted by runtime
![Page 26: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/26.jpg)
THE COMPLETELY FAIR SCHEDULER (CFS): CONCEPT
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 6
Core 0 Core 1 Core 2 Core 3
R = 103
R = 82
R = 24
R = 18
R = 12
One runqueue where threads
are globally sorted by runtime
When a thread is done running
for its timeslice : enqueued againR = 112
![Page 27: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/27.jpg)
THE COMPLETELY FAIR SCHEDULER (CFS): CONCEPT
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 6
Core 0 Core 1 Core 2 Core 3
R = 103
R = 82
R = 24
R = 18
R = 12
One runqueue where threads
are globally sorted by runtime
When a thread is done running
for its timeslice : enqueued againR = 112
Some tasks have a lower niceness
and thus have a longer timeslice
(allowed to run longer)
![Page 28: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/28.jpg)
THE COMPLETELY FAIR SCHEDULER (CFS): CONCEPT
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 6
Core 0 Core 1 Core 2 Core 3
R = 103
R = 82
R = 24
R = 18
R = 12
One runqueue where threads
are globally sorted by runtime
When a thread is done running
for its timeslice : enqueued againR = 112
Some tasks have a lower niceness
and thus have a longer timeslice
(allowed to run longer)
Threads get their next task
from the global runqueue
![Page 29: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/29.jpg)
THE COMPLETELY FAIR SCHEDULER (CFS): CONCEPT
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 6
Core 0 Core 1 Core 2 Core 3
R = 103
R = 82
R = 24
R = 18
R = 12
One runqueue where threads
are globally sorted by runtime
When a thread is done running
for its timeslice : enqueued againR = 112
Some tasks have a lower niceness
and thus have a longer timeslice
(allowed to run longer)
Threads get their next task
from the global runqueue
Of course, cannot work with a single
runqueue because of contention
![Page 30: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/30.jpg)
CFS: IN PRACTICE
One runqueue per core to avoid contention
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 7
W=6
Core 0 Core 1
W=1
W=1
W=1
W=1
W=1
W=1
![Page 31: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/31.jpg)
CFS: IN PRACTICE
One runqueue per core to avoid contention
CFS periodically balances “loads”:
load(task) = weight1 x % cpu use2
1The lower the niceness, the higher the weight
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 7
W=6
Core 0 Core 1
W=1
W=1
W=1
W=1
W=1
W=1
![Page 32: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/32.jpg)
CFS: IN PRACTICE
One runqueue per core to avoid contention
CFS periodically balances “loads”:
load(task) = weight1 x % cpu use2
1The lower the niceness, the higher the weight
2We don’t want a high-priority thread that sleeps a lot to take a whole CPU for itself and then mostly sleep!
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 7
W=6
Core 0 Core 1
W=1
W=1
W=1
W=1
W=1
W=1
![Page 33: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/33.jpg)
CFS: IN PRACTICE
One runqueue per core to avoid contention
CFS periodically balances “loads”:
load(task) = weight1 x % cpu use2
1The lower the niceness, the higher the weight
2We don’t want a high-priority thread that sleeps a lot to take a whole CPU for itself and then mostly sleep!
Since there can be many cores: hierarchical approach!
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 7
W=6
Core 0 Core 1
W=1
W=1
W=1
W=1
W=1
W=1
![Page 34: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/34.jpg)
L=2000 L=6000 L=1000
CFS IN PRACTICE : HIERARCHICAL LOAD BALANCING
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 8
L=1000
L=1000
L=3000
L=1000
L=1000
L=1000
L=1000
L=1000
L=1000
L=1000
Core 0 Core 1 Core 2 Core 3
L=3000
![Page 35: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/35.jpg)
L=2000 L=6000 L=1000
CFS IN PRACTICE : HIERARCHICAL LOAD BALANCING
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 8
L=1000
L=1000
L=3000
L=1000
L=1000
L=1000
L=1000
L=1000
L=1000
L=1000
Core 0 Core 1 Core 2 Core 3
L=3000
![Page 36: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/36.jpg)
L=2000 L=6000 L=1000
CFS IN PRACTICE : HIERARCHICAL LOAD BALANCING
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 8
L=1000
L=1000
L=3000
L=1000
L=1000
L=1000
L=1000
L=1000
L=1000
L=1000
Core 0 Core 1 Core 2 Core 3
L=3000
![Page 37: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/37.jpg)
L=2000 L=6000 L=1000
CFS IN PRACTICE : HIERARCHICAL LOAD BALANCING
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 8
L=1000
L=1000
L=3000
L=1000
L=1000
L=1000
L=1000
L=1000
L=1000
L=1000
Core 0 Core 1 Core 2 Core 3
L=3000Balanced!
![Page 38: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/38.jpg)
L=2000 L=6000 L=1000
CFS IN PRACTICE : HIERARCHICAL LOAD BALANCING
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 8
L=1000
L=1000
L=3000
L=1000
L=1000
L=1000
L=1000
L=1000
L=1000
L=1000
Core 0 Core 1 Core 2 Core 3
L=3000Balanced!
![Page 39: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/39.jpg)
L=2000 L=4000 L=3000
CFS IN PRACTICE : HIERARCHICAL LOAD BALANCING
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 8
L=1000
L=1000
L=3000
L=1000
L=1000
L=1000
L=1000
L=1000
Core 0 Core 1 Core 2 Core 3
L=3000
L=1000
L=1000
Balanced! Balanced!
![Page 40: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/40.jpg)
AVG(L)=3500L=2000
AVG(L)=2500L=4000 L=3000
CFS IN PRACTICE : HIERARCHICAL LOAD BALANCING
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 8
L=1000
L=1000
L=3000
L=1000
L=1000
L=1000
L=1000
L=1000
Core 0 Core 1 Core 2 Core 3
L=3000
L=1000
L=1000
![Page 41: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/41.jpg)
AVG(L)=3000L=3000 L=3000L=3000
AVG(L)=3000CFS IN PRACTICE : HIERARCHICAL LOAD BALANCING
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 8
L=1000
L=1000
L=3000
L=1000
L=1000
L=1000
L=1000
Core 0 Core 1 Core 2 Core 3
L=3000
L=1000
L=1000L=1000
![Page 42: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/42.jpg)
AVG(L)=3000L=3000 L=3000L=3000
AVG(L)=3000CFS IN PRACTICE : HIERARCHICAL LOAD BALANCING
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 8
L=1000
L=1000
L=3000
L=1000
L=1000
L=1000
L=1000
Core 0 Core 1 Core 2 Core 3
L=3000
L=1000
L=1000L=1000
Balanced!
![Page 43: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/43.jpg)
CFS IN PRACTICE : HIERARCHICAL LOAD BALANCING
Note that only the average load of groups is considered
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 9
![Page 44: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/44.jpg)
CFS IN PRACTICE : HIERARCHICAL LOAD BALANCING
Note that only the average load of groups is considered
If for some reason the lower-level load-balancing fails, nothing happens at a higher level:
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 9
![Page 45: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/45.jpg)
CFS IN PRACTICE : HIERARCHICAL LOAD BALANCING
Note that only the average load of groups is considered
If for some reason the lower-level load-balancing fails, nothing happens at a higher level:
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 9
L=3000
L=1000
L=1000
L=1000
L=1000
Core 0 Core 1 Core 2 Core 3
L=0 L=6000 L=3000 L=3000
L=1000
L=1000
AVG(L)=3000 AVG(L)=3000
L=1000
L=1000
L=100
![Page 46: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/46.jpg)
CFS IN PRACTICE : HIERARCHICAL LOAD BALANCING
Note that only the average load of groups is considered
If for some reason the lower-level load-balancing fails, nothing happens at a higher level:
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 9
L=3000
L=1000
L=1000
L=1000
L=1000
Core 0 Core 1 Core 2 Core 3
L=0 L=6000 L=3000 L=3000
L=1000
L=1000
AVG(L)=3000 AVG(L)=3000
L=1000
L=1000
L=100
![Page 47: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/47.jpg)
CFS IN PRACTICE : HIERARCHICAL LOAD BALANCING
Note that only the average load of groups is considered
If for some reason the lower-level load-balancing fails, nothing happens at a higher level:
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 9
L=3000
L=1000
L=1000
L=1000
L=1000
Core 0 Core 1 Core 2 Core 3
L=0 L=6000 L=3000 L=3000
L=1000
L=1000
AVG(L)=3000 AVG(L)=3000
L=1000
L=1000
L=100
Balanced!
![Page 48: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/48.jpg)
CFS IN PRACTICE : HIERARCHICAL LOAD BALANCING
Note that only the average load of groups is considered
If for some reason the lower-level load-balancing fails, nothing happens at a higher level:
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 9
L=3000
L=1000
L=1000
L=1000
L=1000
Core 0 Core 1 Core 2 Core 3
L=0 L=6000 L=3000 L=3000
L=1000
L=1000
AVG(L)=3000 AVG(L)=3000
L=1000
L=1000
L=100
Balanced!
!!!
![Page 49: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/49.jpg)
CFS IN PRACTICE: MORE HEURISTICS
Load calculations are actually more complicated, use more heuristics.
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 10
![Page 50: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/50.jpg)
CFS IN PRACTICE: MORE HEURISTICS
Load calculations are actually more complicated, use more heuristics.
One of them aims to increase fairness between “sessions”.
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 10
![Page 51: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/51.jpg)
CFS IN PRACTICE: MORE HEURISTICS
Load calculations are actually more complicated, use more heuristics.
One of them aims to increase fairness between “sessions”.
Objective: making sure that launching lots of threads from one terminal doesn’t prevent other processes on the machine (potentially from other users) from running.
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 10
![Page 52: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/52.jpg)
CFS IN PRACTICE: MORE HEURISTICS
Load calculations are actually more complicated, use more heuristics.
One of them aims to increase fairness between “sessions”.
Objective: making sure that launching lots of threads from one terminal doesn’t prevent other processes on the machine (potentially from other users) from running.
Otherwise, easy to use more resources than other users by spawning many threads...
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 10
![Page 53: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/53.jpg)
CFS IN PRACTICE: MORE HEURISTICS
Load calculations are actually more complicated, use more heuristics.
One of them aims to increase fairness between “sessions”.
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 11
L=1000
L=1000
L=1000
L=1000
L=1000
Session (tty) 2
Session (tty) 1
![Page 54: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/54.jpg)
CFS IN PRACTICE: MORE HEURISTICS
Load calculations are actually more complicated, use more heuristics.
One of them aims to increase fairness between “sessions”.
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 11
L=1000
L=1000
L=1000
L=1000
L=1000
Session (tty) 2
Session (tty) 1
L=1000L=1000
L=1000 L=1000
L=1000
![Page 55: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/55.jpg)
CFS IN PRACTICE: MORE HEURISTICS
Load calculations are actually more complicated, use more heuristics.
One of them aims to increase fairness between “sessions”.
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 11
L=1000
L=1000
L=1000
L=1000
L=1000
Session (tty) 2
Session (tty) 1
L=1000L=1000
L=1000 L=1000
L=1000
50% of a
CPU
150%
![Page 56: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/56.jpg)
CFS IN PRACTICE: MORE HEURISTICS
Load calculations are actually more complicated, use more heuristics.
One of them aims to increase fairness between “sessions”.
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 11
L=1000
L=1000
L=1000
L=1000
L=1000
Session (tty) 2
Session (tty) 1
L=1000L=1000
L=1000 L=1000
L=1000
50% of a
CPU
150%
![Page 57: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/57.jpg)
CFS IN PRACTICE: MORE HEURISTICS
Load calculations are actually more complicated, use more heuristics.
Solution: divide the load of a task by the number of threads in its tty...
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 12
![Page 58: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/58.jpg)
CFS IN PRACTICE: MORE HEURISTICS
Load calculations are actually more complicated, use more heuristics.
Solution: divide the load of a task by the number of threads in its tty...
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 12
L=1000
L=250L=250
Session (tty) 2
Session (tty) 1
L=250 L=250
![Page 59: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/59.jpg)
CFS IN PRACTICE: MORE HEURISTICS
Load calculations are actually more complicated, use more heuristics.
Solution: divide the load of a task by the number of threads in its tty...
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 12
L=1000
L=250L=250
Session (tty) 2
Session (tty) 1
L=1000
L=250
L=250
L=250 L=250
L=250
L=250
![Page 60: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/60.jpg)
CFS IN PRACTICE: MORE HEURISTICS
Load calculations are actually more complicated, use more heuristics.
Solution: divide the load of a task by the number of threads in its tty...
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 12
L=1000
L=250L=250
Session (tty) 2
Session (tty) 1
L=1000
L=250
L=250
100% of a
CPU
100% of a
CPU
L=250 L=250
L=250
L=250
![Page 61: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/61.jpg)
CFS IN PRACTICE: MORE HEURISTICS
Load calculations are actually more complicated, use more heuristics.
Solution: divide the load of a task by the number of threads in its tty...
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 12
L=1000
L=250L=250
Session (tty) 2
Session (tty) 1
L=1000
L=250
L=250
100% of a
CPU
100% of a
CPU
L=250 L=250
L=250
L=250
![Page 62: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/62.jpg)
BUG 1/4: GROUP IMBALANCE
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 13
Session (tty) 2
Session (tty) 1
![Page 63: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/63.jpg)
BUG 1/4: GROUP IMBALANCE
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 13
Session (tty) 2
Session (tty) 1
Load(thread) = %cpu × weight / #threads
= 100 × 10 / 1
= 1000
Load(thread) = %cpu × weight / #threads
= 100 × 10 / 8
= 125
![Page 64: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/64.jpg)
BUG 1/4: GROUP IMBALANCE
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 13
Session (tty) 2
Session (tty) 1
Load(thread) = %cpu × weight / #threads
= 100 × 10 / 1
= 1000
Load(thread) = %cpu × weight / #threads
= 100 × 10 / 8
= 125
L=1000
L=125
L=125
L=125
L=125
L=125
L=125
L=125
L=125
![Page 65: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/65.jpg)
BUG 1/4: GROUP IMBALANCE
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 14
L=1000
L=125
Core 0 Core 1 Core 2 Core 3
L=0 L=1000 L=500 L=500
L=125
L=125
L=125
L=125
L=125
L=125
L=125
![Page 66: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/66.jpg)
BUG 1/4: GROUP IMBALANCE
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 14
L=1000
L=125
Core 0 Core 1 Core 2 Core 3
L=0 L=1000 L=500 L=500
L=125
L=125
L=125
L=125
L=125
L=125
L=125
![Page 67: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/67.jpg)
BUG 1/4: GROUP IMBALANCE
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 14
L=1000
L=125
Core 0 Core 1 Core 2 Core 3
L=0 L=1000 L=500 L=500
L=125
L=125
L=125
L=125
L=125
L=125
L=125
![Page 68: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/68.jpg)
BUG 1/4: GROUP IMBALANCE
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 14
L=1000
L=125
Core 0 Core 1 Core 2 Core 3
L=0 L=1000 L=500 L=500
L=125
L=125
L=125
L=125
L=125
L=125
L=125
Balanced!
![Page 69: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/69.jpg)
BUG 1/4: GROUP IMBALANCE
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 14
L=1000
L=125
Core 0 Core 1 Core 2 Core 3
L=0 L=1000 L=500 L=500
L=125
L=125
L=125
L=125
L=125
L=125
L=125
Balanced!
![Page 70: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/70.jpg)
BUG 1/4: GROUP IMBALANCE
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 14
L=1000
L=125
Core 0 Core 1 Core 2 Core 3
L=0 L=1000 L=500 L=500
L=125
L=125
L=125
L=125
L=125
L=125
L=125
Balanced! Balanced!
![Page 71: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/71.jpg)
BUG 1/4: GROUP IMBALANCE
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 14
L=1000
L=125
Core 0 Core 1 Core 2 Core 3
L=0 L=1000 L=500 L=500
AVG(L)=500 AVG(L)=500
L=125
L=125
L=125
L=125
L=125
L=125
L=125
Balanced! Balanced!
![Page 72: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/72.jpg)
BUG 1/4: GROUP IMBALANCE
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 14
L=1000
L=125
Core 0 Core 1 Core 2 Core 3
L=0 L=1000 L=500 L=500
AVG(L)=500 AVG(L)=500Balanced!
L=125
L=125
L=125
L=125
L=125
L=125
L=125
Balanced! Balanced!
![Page 73: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/73.jpg)
BUG 1/4: GROUP IMBALANCE
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 14
L=1000
L=125
Core 0 Core 1 Core 2 Core 3
L=0 L=1000 L=500 L=500
AVG(L)=500 AVG(L)=500Balanced!
L=125
L=125
L=125
L=125
L=125
L=125
L=125
Balanced! Balanced!
!!!
![Page 74: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/74.jpg)
BUG 1/4: GROUP IMBALANCE
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 14
L=1000
L=125
Core 0 Core 1 Core 2 Core 3
L=0 L=1000 L=500 L=500
AVG(L)=500 AVG(L)=500Balanced!
L=125
L=125
L=125
L=125
L=125
L=125
L=125
Balanced! Balanced!
!!!
![Page 75: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/75.jpg)
BUG 1/4: GROUP IMBALANCE
Another example, on a 64-core machine, with load balancing:
First between pairs of cores (Bulldozer architecture, a bit like hyperthreading)
Then between NUMA nodes
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 15
![Page 76: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/76.jpg)
BUG 1/4: GROUP IMBALANCE
Another example, on a 64-core machine, with load balancing:
First between pairs of cores (Bulldozer architecture, a bit like hyperthreading)
Then between NUMA nodes
User 1 launches :ssh <machine> R & ssh <machine> R &
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 15
![Page 77: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/77.jpg)
BUG 1/4: GROUP IMBALANCE
Another example, on a 64-core machine, with load balancing:
First between pairs of cores (Bulldozer architecture, a bit like hyperthreading)
Then between NUMA nodes
User 1 launches :ssh <machine> R & ssh <machine> R &
User 2 launches :ssh <machine> make –j 64 kernel
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 15
![Page 78: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/78.jpg)
BUG 1/4: GROUP IMBALANCE
Another example, on a 64-core machine, with load balancing:
First between pairs of cores (Bulldozer architecture, a bit like hyperthreading)
Then between NUMA nodes
User 1 launches :ssh <machine> R & ssh <machine> R &
User 2 launches :ssh <machine> make –j 64 kernel
The bug happens at two levels :
Other core on pair of core idle
Other cores on NUMA node less busy...
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 15
![Page 79: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/79.jpg)
BUG 1/4: GROUP IMBALANCE
Another example, on a 64-core machine, with load balancing:
First between pairs of cores (Bulldozer architecture, a bit like hyperthreading)
Then between NUMA nodes
User 1 launches :ssh <machine> R & ssh <machine> R &
User 2 launches :ssh <machine> make –j 64 kernel
The bug happens at two levels :
Other core on pair of core idle
Other cores on NUMA node less busy...
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 15
![Page 80: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/80.jpg)
BUG 1/4: GROUP IMBALANCE
Another example, on a 64-core machine, with load balancing:
First between pairs of cores (Bulldozer architecture, a bit like hyperthreading)
Then between NUMA nodes
User 1 launches :ssh <machine> R & ssh <machine> R &
User 2 launches :ssh <machine> make –j 64 kernel
The bug happens at two levels :
Other core on pair of core idle
Other cores on NUMA node less busy...
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 15
![Page 81: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/81.jpg)
BUG 1/4: GROUP IMBALANCE
A simple solution: balance the minimum load of groups instead of the average
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 16
L =
1000
L=125
Core 0 Core 1 Core 2 Core 3
L=0 L=1000 L=500 L=500
L=125
L=125
L=125
L=125
L=125
L=125
L=125
![Page 82: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/82.jpg)
BUG 1/4: GROUP IMBALANCE
A simple solution: balance the minimum load of groups instead of the average
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 16
L =
1000
L=125
Core 0 Core 1 Core 2 Core 3
L=0 L=1000 L=500 L=500
L=125
L=125
L=125
L=125
L=125
L=125
L=125
![Page 83: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/83.jpg)
BUG 1/4: GROUP IMBALANCE
A simple solution: balance the minimum load of groups instead of the average
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 16
L =
1000
L=125
Core 0 Core 1 Core 2 Core 3
L=0 L=1000 L=500 L=500
L=125
L=125
L=125
L=125
L=125
L=125
L=125
![Page 84: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/84.jpg)
BUG 1/4: GROUP IMBALANCE
A simple solution: balance the minimum load of groups instead of the average
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 16
L =
1000
L=125
Core 0 Core 1 Core 2 Core 3
L=0 L=1000 L=500 L=500
L=125
L=125
L=125
L=125
L=125
L=125
L=125
Balanced!
![Page 85: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/85.jpg)
BUG 1/4: GROUP IMBALANCE
A simple solution: balance the minimum load of groups instead of the average
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 16
L =
1000
L=125
Core 0 Core 1 Core 2 Core 3
L=0 L=1000 L=500 L=500
L=125
L=125
L=125
L=125
L=125
L=125
L=125
Balanced!
![Page 86: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/86.jpg)
BUG 1/4: GROUP IMBALANCE
A simple solution: balance the minimum load of groups instead of the average
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 16
L =
1000
L=125
Core 0 Core 1 Core 2 Core 3
L=0 L=1000 L=500 L=500
L=125
L=125
L=125
L=125
L=125
L=125
L=125
Balanced! Balanced!
![Page 87: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/87.jpg)
BUG 1/4: GROUP IMBALANCE
A simple solution: balance the minimum load of groups instead of the average
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 16
L =
1000
L=125
Core 0 Core 1 Core 2 Core 3
L=0 L=1000 L=500 L=500
MIN(L)=0 MIN(L)=500
L=125
L=125
L=125
L=125
L=125
L=125
L=125
![Page 88: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/88.jpg)
BUG 1/4: GROUP IMBALANCE
A simple solution: balance the minimum load of groups instead of the average
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 16
L =
1000
L=125
Core 0 Core 1 Core 2 Core 3
L=1000 L=500
L=125
L=125
L=125
L=125
L=125
L=125
L=125
MIN(L)=250 MIN(L)=250L=250 L=250
![Page 89: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/89.jpg)
BUG 1/4: GROUP IMBALANCE
A simple solution: balance the minimum load of groups instead of the average
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 16
L =
1000
L=125
Core 0 Core 1 Core 2 Core 3
L=1000 L=500
L=125
L=125
L=125
L=125
L=125
L=125
L=125
MIN(L)=250 MIN(L)=250L=250 L=250Balanced!
![Page 90: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/90.jpg)
BUG 1/4: GROUP IMBALANCE
A simple solution: balance the minimum load of groups instead of the average
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 16
L =
1000
L=125
Core 0 Core 1 Core 2 Core 3
L=1000 L=500
L=125
L=125
L=125
L=125
L=125
L=125
L=125
MIN(L)=250 MIN(L)=250L=250 L=250
![Page 91: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/91.jpg)
BUG 1/4: GROUP IMBALANCE
A simple solution: balance the minimum load of groups instead of the average
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 16
L =
1000
L=125
Core 0 Core 1 Core 2 Core 3
L=1000 L=500
L=125
L=125
L=125
L=125
L=125
L=125
L=125
MIN(L)=250 MIN(L)=250L=250 L=250
![Page 92: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/92.jpg)
BUG 1/4: GROUP IMBALANCE
A simple solution: balance the minimum load of groups instead of the average
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 16
L =
1000
L=125
Core 0 Core 1 Core 2 Core 3
L=1000 L=500
L=125
L=125
L=125
L=125
L=125
Balanced!
L=125
L=125
MIN(L)=250 MIN(L)=250L=250 L=250
![Page 93: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/93.jpg)
BUG 1/4: GROUP IMBALANCE
A simple solution: balance the minimum load of groups instead of the average
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 16
L =
1000
L=125
Core 0 Core 1 Core 2 Core 3
L=1000 L=500
L=125
L=125
L=125
L=125
L=125
Balanced!
L=125
L=125
MIN(L)=250 MIN(L)=250L=250 L=250
![Page 94: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/94.jpg)
BUG 1/4: GROUP IMBALANCE
A simple solution: balance the minimum load of groups instead of the average
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 16
L =
1000
L=125
Core 0 Core 1 Core 2 Core 3
L=1000
L=125
L=125
L=125
L=125
L=125
Balanced!
L=125
L=125
MIN(L)=250L=250 L=325 L=325
MIN(L)=325
![Page 95: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/95.jpg)
BUG 1/4: GROUP IMBALANCE
A simple solution: balance the minimum load of groups instead of the average
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 16
L =
1000
L=125
Core 0 Core 1 Core 2 Core 3
L=1000
L=125
L=125
L=125
L=125
L=125
Balanced! Balanced!
L=125
L=125
MIN(L)=250L=250 L=325 L=325
MIN(L)=325
![Page 96: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/96.jpg)
BUG 1/4: GROUP IMBALANCE
A simple solution: balance the minimum load of groups instead of the average
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 16
L =
1000
L=125
Core 0 Core 1 Core 2 Core 3
L=1000
L=125
L=125
L=125
L=125
L=125
Balanced! Balanced!
L=125
L=125
MIN(L)=250L=250 L=325 L=325
MIN(L)=325
![Page 97: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/97.jpg)
BUG 1/4: GROUP IMBALANCE
A simple solution: balance the minimum load of groups instead of the average
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 16
L =
1000
L=125
Core 0 Core 1 Core 2 Core 3
L=1000
L=125
L=125
L=125
L=125
L=125
Balanced! Balanced!
L=125
L=125
MIN(L)=250L=250 Balanced! L=325 L=325
MIN(L)=325
![Page 98: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/98.jpg)
BUG 1/4: GROUP IMBALANCE
A simple solution: balance the minimum load of groups instead of the average
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 17
![Page 99: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/99.jpg)
BUG 1/4: GROUP IMBALANCE
A simple solution: balance the minimum load of groups instead of the average
After the fix, make runs 13% faster, and R is not impacted
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 17
![Page 100: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/100.jpg)
BUG 1/4: GROUP IMBALANCE
A simple solution: balance the minimum load of groups instead of the average
After the fix, make runs 13% faster, and R is not impacted
A simple solution, but is it ideal? Minimum load more volatile than average...
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 17
![Page 101: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/101.jpg)
BUG 1/4: GROUP IMBALANCE
A simple solution: balance the minimum load of groups instead of the average
After the fix, make runs 13% faster, and R is not impacted
A simple solution, but is it ideal? Minimum load more volatile than average...
May cause lots of unnecessary rebalancing. Revamping load calculations needed?
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 17
![Page 102: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/102.jpg)
BUG 2/4: SCHEDULING GROUP CONSTRUCTION
Hierarchical load balancing is based on groups of cores named scheduling domains
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 18
![Page 103: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/103.jpg)
BUG 2/4: SCHEDULING GROUP CONSTRUCTION
Hierarchical load balancing is based on groups of cores named scheduling domains
Based on affinity, i.e., pairs of cores, dies, CPUs, NUMA nodes, etc.
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 18
![Page 104: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/104.jpg)
BUG 2/4: SCHEDULING GROUP CONSTRUCTION
Hierarchical load balancing is based on groups of cores named scheduling domains
Based on affinity, i.e., pairs of cores, dies, CPUs, NUMA nodes, etc.
Each scheduling domain contains groups that are the lower-level scheduling domains
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 18
![Page 105: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/105.jpg)
BUG 2/4: SCHEDULING GROUP CONSTRUCTION
Hierarchical load balancing is based on groups of cores named scheduling domains
Based on affinity, i.e., pairs of cores, dies, CPUs, NUMA nodes, etc.
Each scheduling domain contains groups that are the lower-level scheduling domains
For instance, on our 64-core AMD Bulldozer machine:
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 18
![Page 106: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/106.jpg)
BUG 2/4: SCHEDULING GROUP CONSTRUCTION
Hierarchical load balancing is based on groups of cores named scheduling domains
Based on affinity, i.e., pairs of cores, dies, CPUs, NUMA nodes, etc.
Each scheduling domain contains groups that are the lower-level scheduling domains
For instance, on our 64-core AMD Bulldozer machine:
At level 1, each pair of core (scheduling domains) contain cores (scheduling groups)
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 18
![Page 107: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/107.jpg)
BUG 2/4: SCHEDULING GROUP CONSTRUCTION
Hierarchical load balancing is based on groups of cores named scheduling domains
Based on affinity, i.e., pairs of cores, dies, CPUs, NUMA nodes, etc.
Each scheduling domain contains groups that are the lower-level scheduling domains
For instance, on our 64-core AMD Bulldozer machine:
At level 1, each pair of core (scheduling domains) contain cores (scheduling groups)
At level 2, each CPU (s.d.) contain pairs of cores (s.g.)
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 18
![Page 108: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/108.jpg)
BUG 2/4: SCHEDULING GROUP CONSTRUCTION
Hierarchical load balancing is based on groups of cores named scheduling domains
Based on affinity, i.e., pairs of cores, dies, CPUs, NUMA nodes, etc.
Each scheduling domain contains groups that are the lower-level scheduling domains
For instance, on our 64-core AMD Bulldozer machine:
At level 1, each pair of core (scheduling domains) contain cores (scheduling groups)
At level 2, each CPU (s.d.) contain pairs of cores (s.g.)
At level 3, each group of directly connected CPUs (s.d.) contain CPUs (s.g.)
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 18
![Page 109: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/109.jpg)
BUG 2/4: SCHEDULING GROUP CONSTRUCTION
Hierarchical load balancing is based on groups of cores named scheduling domains
Based on affinity, i.e., pairs of cores, dies, CPUs, NUMA nodes, etc.
Each scheduling domain contains groups that are the lower-level scheduling domains
For instance, on our 64-core AMD Bulldozer machine:
At level 1, each pair of core (scheduling domains) contain cores (scheduling groups)
At level 2, each CPU (s.d.) contain pairs of cores (s.g.)
At level 3, each group of directly connected CPUs (s.d.) contain CPUs (s.g.)
At level 4, the whole machine (s.d.) contains group of directly connected CPUs (s.g.)
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 18
![Page 110: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/110.jpg)
BUG 2/4: SCHEDULING GROUP CONSTRUCTION
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 19
Bulldozer 64-core:
Eight CPUs, with
8 cores each,
non-complete
interconnect graph!
![Page 111: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/111.jpg)
BUG 2/4: SCHEDULING GROUP CONSTRUCTION
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 19
At the first level,
the first core
balances load
with the other core
on the same pair
(because they
share resources,
high affinity)
![Page 112: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/112.jpg)
BUG 2/4: SCHEDULING GROUP CONSTRUCTION
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 19
At the 2nd level,
the first pair
balances load
with other pairs
on the same CPU
![Page 113: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/113.jpg)
BUG 2/4: SCHEDULING GROUP CONSTRUCTION
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 19
At the 3rd level,
the first CPU
balances load
with directly
connected CPUS
![Page 114: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/114.jpg)
BUG 2/4: SCHEDULING GROUP CONSTRUCTION
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 19
At the 4th level,
the first group of
directly
connected CPUs
balances load
with the other
groups of directly
connected CPUs
![Page 115: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/115.jpg)
BUG 2/4: SCHEDULING GROUP CONSTRUCTION
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 19
Groups of CPUs
built by:
(1) picking first
CPU and looking
for all directly
connected CPUs
![Page 116: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/116.jpg)
BUG 2/4: SCHEDULING GROUP CONSTRUCTION
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 19
Groups of CPUs
built by:
(2) picking first
CPU not in a
group and
looking for all
directly
connected CPUs
![Page 117: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/117.jpg)
BUG 2/4: SCHEDULING GROUP CONSTRUCTION
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 19
And then stop,
because all CPUs
are in a group
![Page 118: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/118.jpg)
BUG 2/4: SCHEDULING GROUP CONSTRUCTION
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 19
And then stop,
because all CPUs
are in a group
![Page 119: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/119.jpg)
BUG 2/4: SCHEDULING GROUP CONSTRUCTION
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 20
Suppose we
taskset an
application on
these two nodes,
two hops apart
![Page 120: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/120.jpg)
BUG 2/4: SCHEDULING GROUP CONSTRUCTION
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 20
And threads
are created
on this core
![Page 121: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/121.jpg)
BUG 2/4: SCHEDULING GROUP CONSTRUCTION
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 20
Load gets
correctly balanced
on the pair of
cores
![Page 122: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/122.jpg)
BUG 2/4: SCHEDULING GROUP CONSTRUCTION
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 20
Load gets
correctly balanced
on the CPU
(8 threads)
![Page 123: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/123.jpg)
BUG 2/4: SCHEDULING GROUP CONSTRUCTION
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 20
No stealing
at level 3,
because nodes
not directly
connected (1 hop
apart)
![Page 124: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/124.jpg)
BUG 2/4: SCHEDULING GROUP CONSTRUCTION
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 20
At level 4,
stealing between
the red and green
groups...
Overloaded node
in both groups!
![Page 125: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/125.jpg)
BUG 2/4: SCHEDULING GROUP CONSTRUCTION
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 20
load(red) =
16 * load(thread)
load(green) =
16 * load(thread)
![Page 126: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/126.jpg)
BUG 2/4: SCHEDULING GROUP CONSTRUCTION
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 20
load(red) =
16 * load(thread)
load(green) =
16 * load(thread)
![Page 127: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/127.jpg)
BUG 2/4: SCHEDULING GROUP CONSTRUCTION
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 20
load(red) =
16 * load(thread)
load(green) =
16 * load(thread)
!!!
![Page 128: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/128.jpg)
BUG 2/4: SCHEDULING GROUP CONSTRUCTION
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 20
load(red) =
16 * load(thread)
load(green) =
16 * load(thread)
!!!
![Page 129: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/129.jpg)
BUG 2/4: SCHEDULING GROUP CONSTRUCTION
Fix: build the domains by creating one “directly connected” group for every CPU
Instead of the first CPU and the first one not “covered” by a group
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 21
![Page 130: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/130.jpg)
BUG 2/4: SCHEDULING GROUP CONSTRUCTION
Fix: build the domains by creating one “directly connected” group for every CPU
Instead of the first CPU and the first one not “covered” by a group
Performance improvement of NAS applications on two nodes :
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 21
Application With bug After fix Improvement
BT 99 56 1.75x
CG 42 15 2.73x
EP 73 36 2x
LU 1040 38 27x
![Page 131: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/131.jpg)
BUG 2/4: SCHEDULING GROUP CONSTRUCTION
Fix: build the domains by creating one “directly connected” group for every CPU
Instead of the first CPU and the first one not “covered” by a group
Performance improvement of NAS applications on two nodes :
Very good improvement for LU because more threads than cores if can’t use 16 cores
Solves spinlock issues (convoys)
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 21
Application With bug After fix Improvement
BT 99 56 1.75x
CG 42 15 2.73x
EP 73 36 2x
LU 1040 38 27x
![Page 132: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/132.jpg)
BUG 3/4: MISSING SCHEDULING DOMAINS
In addition to this, when domains re-built, levels 3 and 4 not re-built...
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 22
![Page 133: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/133.jpg)
BUG 3/4: MISSING SCHEDULING DOMAINS
In addition to this, when domains re-built, levels 3 and 4 not re-built...
I.e., no balancing between directly connected or 1-hop CPUs (i.e. any CPU)
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 22
![Page 134: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/134.jpg)
BUG 3/4: MISSING SCHEDULING DOMAINS
In addition to this, when domains re-built, levels 3 and 4 not re-built...
I.e., no balancing between directly connected or 1-hop CPUs (i.e. any CPU)
Happens for instance when disabling and re-enabling a core
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 22
![Page 135: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/135.jpg)
BUG 3/4: MISSING SCHEDULING DOMAINS
In addition to this, when domains re-built, levels 3 and 4 not re-built...
I.e., no balancing between directly connected or 1-hop CPUs (i.e. any CPU)
Happens for instance when disabling and re-enabling a core
Launch an application, first thread created on CPU 1
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 22
![Page 136: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/136.jpg)
BUG 3/4: MISSING SCHEDULING DOMAINS
In addition to this, when domains re-built, levels 3 and 4 not re-built...
I.e., no balancing between directly connected or 1-hop CPUs (i.e. any CPU)
Happens for instance when disabling and re-enabling a core
Launch an application, first thread created on CPU 1
First thread will stay on CPU 1, next threads will be created on CPU 1 (default Linux)
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 22
![Page 137: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/137.jpg)
BUG 3/4: MISSING SCHEDULING DOMAINS
In addition to this, when domains re-built, levels 3 and 4 not re-built...
I.e., no balancing between directly connected or 1-hop CPUs (i.e. any CPU)
Happens for instance when disabling and re-enabling a core
Launch an application, first thread created on CPU 1
First thread will stay on CPU 1, next threads will be created on CPU 1 (default Linux)
All the application will be on CPU 1 forever!
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 22
![Page 138: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/138.jpg)
BUG 3/4: MISSING SCHEDULING DOMAINS
In addition to this, when domains re-built, levels 3 and 4 not re-built...
I.e., no balancing between directly connected or 1-hop CPUs (i.e. any CPU)
Happens for instance when disabling and re-enabling a core
Launch an application, first thread created on CPU 1
First thread will stay on CPU 1, next threads will be created on CPU 1 (default Linux)
All the application will be on CPU 1 forever!
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 22
![Page 139: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/139.jpg)
BUG 3/4: MISSING SCHEDULING DOMAINS
In addition to this, when domains re-built, levels 3 and 4 not re-built...
I.e., no balancing between directly connected or 1-hop CPUs (i.e. any CPU)
Happens for instance when disabling and re-enabling a core
Launch an application, first thread created on CPU 1
First thread will stay on CPU 1, next threads will be created on CPU 1 (default Linux)
All the application will be on CPU 1 forever!
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 22
Application With bug After fix Improvement
BT 122 23 5.2x
CG 134 5.4 25x
EP 72 18 4x
LU 2196 16 137x
![Page 140: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/140.jpg)
BUG 4/4: OVERLOAD-ON-WAKEUP
Until now, we analyzed the behavior of the the periodic, (buggy) hierarchical load balancing that uses (buggy) scheduling domains
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 23
![Page 141: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/141.jpg)
BUG 4/4: OVERLOAD-ON-WAKEUP
Until now, we analyzed the behavior of the the periodic, (buggy) hierarchical load balancing that uses (buggy) scheduling domains
But there is another way load is balanced: threads get to pick on which core they get woken up when they are done blocking (after a lock acquisition, an I/O)...
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 23
![Page 142: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/142.jpg)
BUG 4/4: OVERLOAD-ON-WAKEUP
Until now, we analyzed the behavior of the the periodic, (buggy) hierarchical load balancing that uses (buggy) scheduling domains
But there is another way load is balanced: threads get to pick on which core they get woken up when they are done blocking (after a lock acquisition, an I/O)...
Here is how it works: when a thread wakes up, it looks for non-busy cores on the same CPU in order to decide on which core it should wake up.
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 23
![Page 143: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/143.jpg)
BUG 4/4: OVERLOAD-ON-WAKEUP
Until now, we analyzed the behavior of the the periodic, (buggy) hierarchical load balancing that uses (buggy) scheduling domains
But there is another way load is balanced: threads get to pick on which core they get woken up when they are done blocking (after a lock acquisition, an I/O)...
Here is how it works: when a thread wakes up, it looks for non-busy cores on the same CPU in order to decide on which core it should wake up.
Only cores that are on the same CPU, in order to improve data locality...
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 23
![Page 144: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/144.jpg)
BUG 4/4: OVERLOAD-ON-WAKEUP
Until now, we analyzed the behavior of the the periodic, (buggy) hierarchical load balancing that uses (buggy) scheduling domains
But there is another way load is balanced: threads get to pick on which core they get woken up when they are done blocking (after a lock acquisition, an I/O)...
Here is how it works: when a thread wakes up, it looks for non-busy cores on the same CPU in order to decide on which core it should wake up.
Only cores that are on the same CPU, in order to improve data locality...
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 23
![Page 145: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/145.jpg)
BUG 4/4: OVERLOAD-ON-WAKEUP
Commercial DB with TPC-H, 64 threads on 64 cores, nothing else on the machine.
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 24
![Page 146: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/146.jpg)
BUG 4/4: OVERLOAD-ON-WAKEUP
Commercial DB with TPC-H, 64 threads on 64 cores, nothing else on the machine.
With threads pinned to cores, works fine. With Linux scheduling, execution much slower, phases with overloaded cores while there are long-term idle cores!
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 24
![Page 147: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/147.jpg)
BUG 4/4: OVERLOAD-ON-WAKEUP
Commercial DB with TPC-H, 64 threads on 64 cores, nothing else on the machine.
With threads pinned to cores, works fine. With Linux scheduling, execution much slower, phases with overloaded cores while there are long-term idle cores!
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 24
![Page 148: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/148.jpg)
BUG 4/4: OVERLOAD-ON-WAKEUP
Commercial DB with TPC-H, 64 threads on 64 cores, nothing else on the machine.
With threads pinned to cores, works fine. With Linux scheduling, execution much slower, phases with overloaded cores while there are long-term idle cores!
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 24
![Page 149: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/149.jpg)
BUG 4/4
Beginning: 8 threads / CPU, cores busy
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 25
![Page 150: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/150.jpg)
BUG 4/4
Beginning: 8 threads / CPU, cores busy
Occasionally, 1 DB thread migrated to other CPU because transient thread appeared during rebalancing which looked like imbalance (only instant loads considered)
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 25
![Page 151: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/151.jpg)
BUG 4/4
Beginning: 8 threads / CPU, cores busy
Occasionally, 1 DB thread migrated to other CPU because transient thread appeared during rebalancing which looked like imbalance (only instant loads considered)
Now, 9 threads on one CPU, and 7 on another one. CPU with 9 threads slow, slows down all execution because all threads wait for each other (barriers), i.e. idle cores everywhere...
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 25
9 threads
7 threads Idle (long)
Slowed down execution
![Page 152: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/152.jpg)
BUG 4/4
Beginning: 8 threads / CPU, cores busy
Occasionally, 1 DB thread migrated to other CPU because transient thread appeared during rebalancing which looked like imbalance (only instant loads considered)
Now, 9 threads on one CPU, and 7 on another one. CPU with 9 threads slow, slows down all execution because all threads wait for each other (barriers), i.e. idle cores everywhere...
Barriers: threads keep sleeping and waking up, but extra thread never wakes up on idle core, because waking up algorithm only considers local CPU!
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 25
9 threads
7 threads Idle (long)
Slowed down execution
![Page 153: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/153.jpg)
BUG 4/4
Beginning: 8 threads / CPU, cores busy
Occasionally, 1 DB thread migrated to other CPU because transient thread appeared during rebalancing which looked like imbalance (only instant loads considered)
Now, 9 threads on one CPU, and 7 on another one. CPU with 9 threads slow, slows down all execution because all threads wait for each other (barriers), i.e. idle cores everywhere...
Barriers: threads keep sleeping and waking up, but extra thread never wakes up on idle core, because waking up algorithm only considers local CPU!
Periodic rebalancing can’t rebalance load most of the time because many idle cores ⇒ Hard to see an imbalance between 9-thread and 7-thread CPU...
“Solution”: wake up on core idle for the longest time (not great for energy)
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 25
9 threads
7 threads Idle (long)
Slowed down execution
![Page 154: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/154.jpg)
BUG 4/4
Beginning: 8 threads / CPU, cores busy
Occasionally, 1 DB thread migrated to other CPU because transient thread appeared during rebalancing which looked like imbalance (only instant loads considered)
Now, 9 threads on one CPU, and 7 on another one. CPU with 9 threads slow, slows down all execution because all threads wait for each other (barriers), i.e. idle cores everywhere...
Barriers: threads keep sleeping and waking up, but extra thread never wakes up on idle core, because waking up algorithm only considers local CPU!
Periodic rebalancing can’t rebalance load most of the time because many idle cores ⇒ Hard to see an imbalance between 9-thread and 7-thread CPU...
“Solution”: wake up on core idle for the longest time (not great for energy)
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 25
9 threads
7 threads Idle (long)
Slowed down execution
![Page 155: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/155.jpg)
WHERE DO WE GO FROM HERE?
Load balancing on a multicore machine usually considered a solved problem
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 26
![Page 156: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/156.jpg)
WHERE DO WE GO FROM HERE?
Load balancing on a multicore machine usually considered a solved problem
To recap, on Linux, load balancing works that way:
Hierarchical rebalancing uses a metric named load,
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 26
![Page 157: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/157.jpg)
WHERE DO WE GO FROM HERE?
Load balancing on a multicore machine usually considered a solved problem
To recap, on Linux, load balancing works that way:
Hierarchical rebalancing uses a metric named load,
↑ Fundamental issue here
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 26
![Page 158: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/158.jpg)
WHERE DO WE GO FROM HERE?
Load balancing on a multicore machine usually considered a solved problem
To recap, on Linux, load balancing works that way:
Hierarchical rebalancing uses a metric named load,
↑ Fundamental issue here
to periodically balance threads between scheduling domains.
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 26
![Page 159: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/159.jpg)
WHERE DO WE GO FROM HERE?
Load balancing on a multicore machine usually considered a solved problem
To recap, on Linux, load balancing works that way:
Hierarchical rebalancing uses a metric named load,
↑ Fundamental issue here
to periodically balance threads between scheduling domains.
↑ Fundamental issue here
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 26
![Page 160: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/160.jpg)
WHERE DO WE GO FROM HERE?
Load balancing on a multicore machine usually considered a solved problem
To recap, on Linux, load balancing works that way:
Hierarchical rebalancing uses a metric named load,
↑ Fundamental issue here
to periodically balance threads between scheduling domains.
↑ Fundamental issue here
In addition to this, threads balance load by selecting core where to wake up.
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 26
![Page 161: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/161.jpg)
WHERE DO WE GO FROM HERE?
Load balancing on a multicore machine usually considered a solved problem
To recap, on Linux, load balancing works that way:
Hierarchical rebalancing uses a metric named load,
↑ Fundamental issue here
to periodically balance threads between scheduling domains.
↑ Fundamental issue here
In addition to this, threads balance load by selecting core where to wake up.
↑ Fundamental issue here
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 26
![Page 162: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/162.jpg)
WHERE DO WE GO FROM HERE?
Load balancing on a multicore machine usually considered a solved problem
To recap, on Linux, load balancing works that way:
Hierarchical rebalancing uses a metric named load,
↑ Fundamental issue here
to periodically balance threads between scheduling domains.
↑ Fundamental issue here
In addition to this, threads balance load by selecting core where to wake up.
↑ Fundamental issue here
Wait, does anything work at all?
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 26
![Page 163: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/163.jpg)
WHERE DO WE GO FROM HERE?
Many major issues went unnoticed for years in the scheduler...How can we prevent this from happening again?
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 27
![Page 164: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/164.jpg)
WHERE DO WE GO FROM HERE?
Many major issues went unnoticed for years in the scheduler...How can we prevent this from happening again?
Code testing
No clear fault (no crash, no deadlock, etc.)
Existing tools don’t target these bugs
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 27
![Page 165: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/165.jpg)
WHERE DO WE GO FROM HERE?
Many major issues went unnoticed for years in the scheduler...How can we prevent this from happening again?
Code testing
No clear fault (no crash, no deadlock, etc.)
Existing tools don’t target these bugs
Performance regression
Usually done with 1 app on a machine to avoid interactions
Insufficient coverage
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 27
![Page 166: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/166.jpg)
WHERE DO WE GO FROM HERE?
Many major issues went unnoticed for years in the scheduler...How can we prevent this from happening again?
Code testing
No clear fault (no crash, no deadlock, etc.)
Existing tools don’t target these bugs
Performance regression
Usually done with 1 app on a machine to avoid interactions
Insufficient coverage
Model checking, formal proofs
Complex, parallel code: so far, nobody knows how to do it...
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 27
![Page 167: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/167.jpg)
WHERE DO WE GO FROM HERE?
A pragmatic “solution”: can’t prevent bugs, let’s detect them with a sanity checker
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 28
![Page 168: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/168.jpg)
WHERE DO WE GO FROM HERE?
A pragmatic “solution”: can’t prevent bugs, let’s detect them with a sanity checker
Always same symptom: some idle cores while others overloaded
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 28
![Page 169: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/169.jpg)
WHERE DO WE GO FROM HERE?
A pragmatic “solution”: can’t prevent bugs, let’s detect them with a sanity checker
Always same symptom: some idle cores while others overloaded
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 28
Idle core while a core is
overloaded?
Monitor thread migrations,
creations, destructions
Yes
Every
second100ms
Report a bug
Imbalance not fixed
![Page 170: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/170.jpg)
WHERE DO WE GO FROM HERE?
A pragmatic “solution”: can’t prevent bugs, let’s detect them with a sanity checker
Always same symptom: some idle cores while others overloaded
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 28
Idle core while a core is
overloaded?
Monitor thread migrations,
creations, destructions
Yes
Every
second100ms
Report a bug
Imbalance not fixed
Not an assertion/watchdog :
might not be a bug
![Page 171: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/171.jpg)
WHERE DO WE GO FROM HERE?
A pragmatic “solution”: can’t prevent bugs, let’s detect them with a sanity checker
Always same symptom: some idle cores while others overloaded
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 28
Idle core while a core is
overloaded?
Monitor thread migrations,
creations, destructions
Yes
Every
second100ms
Report a bug
Imbalance not fixed
Not an assertion/watchdog :
might not be a bug
situation has to last
for a long time
![Page 172: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/172.jpg)
WHERE DO WE GO FROM HERE?
We might miss some bugs. Not an issue, bugs that impact performance happen often
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 29
![Page 173: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/173.jpg)
WHERE DO WE GO FROM HERE?
We might miss some bugs. Not an issue, bugs that impact performance happen often
We’ll eventually catch them
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 29
![Page 174: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/174.jpg)
WHERE DO WE GO FROM HERE?
We might miss some bugs. Not an issue, bugs that impact performance happen often
We’ll eventually catch them
Low overhead, possible to reduce period (will just take longer to detect bugs)
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 29
![Page 175: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/175.jpg)
WHERE DO WE GO FROM HERE?
We might miss some bugs. Not an issue, bugs that impact performance happen often
We’ll eventually catch them
Low overhead, possible to reduce period (will just take longer to detect bugs)
All bugs presented here detected with sanity checker
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 29
![Page 176: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/176.jpg)
WHERE DO WE GO FROM HERE?
We might miss some bugs. Not an issue, bugs that impact performance happen often
We’ll eventually catch them
Low overhead, possible to reduce period (will just take longer to detect bugs)
All bugs presented here detected with sanity checker
Possible to replay bugs, and produce graphical traces to understand them better
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 29
![Page 177: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/177.jpg)
WHERE DO WE GO FROM HERE?
We might miss some bugs. Not an issue, bugs that impact performance happen often
We’ll eventually catch them
Low overhead, possible to reduce period (will just take longer to detect bugs)
All bugs presented here detected with sanity checker
Possible to replay bugs, and produce graphical traces to understand them better
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 29
![Page 178: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/178.jpg)
CONCLUSION
Scheduling (as in dividing CPU cycles among theads) was thought to be a solved problem.
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 30
![Page 179: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/179.jpg)
CONCLUSION
Scheduling (as in dividing CPU cycles among theads) was thought to be a solved problem.
Analysis: fundamental issues in the load metric, scheduling domains, scheduling choices...
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 30
![Page 180: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/180.jpg)
CONCLUSION
Scheduling (as in dividing CPU cycles among theads) was thought to be a solved problem.
Analysis: fundamental issues in the load metric, scheduling domains, scheduling choices...
Very bug-prone implementation following years of adapting to hardware
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 30
![Page 181: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/181.jpg)
CONCLUSION
Scheduling (as in dividing CPU cycles among theads) was thought to be a solved problem.
Analysis: fundamental issues in the load metric, scheduling domains, scheduling choices...
Very bug-prone implementation following years of adapting to hardware
Can’t ensure simple “invariant”: no idle cores while overloaded cores
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 30
![Page 182: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/182.jpg)
CONCLUSION
Scheduling (as in dividing CPU cycles among theads) was thought to be a solved problem.
Analysis: fundamental issues in the load metric, scheduling domains, scheduling choices...
Very bug-prone implementation following years of adapting to hardware
Can’t ensure simple “invariant”: no idle cores while overloaded cores
Proposed fixes: not always satisfactory
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 30
![Page 183: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/183.jpg)
CONCLUSION
Scheduling (as in dividing CPU cycles among theads) was thought to be a solved problem.
Analysis: fundamental issues in the load metric, scheduling domains, scheduling choices...
Very bug-prone implementation following years of adapting to hardware
Can’t ensure simple “invariant”: no idle cores while overloaded cores
Proposed fixes: not always satisfactory
Proposed pragmatic detection approach (“sanity checker”): helpful
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 30
![Page 184: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/184.jpg)
CONCLUSION
Scheduling (as in dividing CPU cycles among theads) was thought to be a solved problem.
Analysis: fundamental issues in the load metric, scheduling domains, scheduling choices...
Very bug-prone implementation following years of adapting to hardware
Can’t ensure simple “invariant”: no idle cores while overloaded cores
Proposed fixes: not always satisfactory
Proposed pragmatic detection approach (“sanity checker”): helpful
Code testing, performance regression, model checking / proofs: can’t work for now.
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 30
![Page 185: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/185.jpg)
CONCLUSION
Scheduling (as in dividing CPU cycles among theads) was thought to be a solved problem.
Analysis: fundamental issues in the load metric, scheduling domains, scheduling choices...
Very bug-prone implementation following years of adapting to hardware
Can’t ensure simple “invariant”: no idle cores while overloaded cores
Proposed fixes: not always satisfactory
Proposed pragmatic detection approach (“sanity checker”): helpful
Code testing, performance regression, model checking / proofs: can’t work for now.
Our takeaway: more research must be directed towards implementing an efficient andreliable scheduler for multicore architectures!
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 30
![Page 186: The Linux Scheduler: a Decade of Wasted Coresjplozi/wastedcores/files/extended_talk.pdf · THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1 Jean-Pierre Lozi jplozi@unice.fr Baptiste](https://reader035.vdocuments.site/reader035/viewer/2022070720/5ee0f74ead6a402d666c049a/html5/thumbnails/186.jpg)
CONCLUSION
Scheduling (as in dividing CPU cycles among theads) was thought to be a solved problem.
Analysis: fundamental issues in the load metric, scheduling domains, scheduling choices...
Very bug-prone implementation following years of adapting to hardware
Can’t ensure simple “invariant”: no idle cores while overloaded cores
Proposed fixes: not always satisfactory
Proposed pragmatic detection approach (“sanity checker”): helpful
Code testing, performance regression, model checking / proofs: can’t work for now.
Our takeaway: more research must be directed towards implementing an efficient andreliable scheduler for multicore architectures!
THE LINUX SCHEDULER: A DECADE OF WASTED CORES 30