What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 1/205
What Do You Mean by “Cache Friendly”?
Björn Fahller
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 2/205
typedef uint32_t (*timer_cb)(void*);struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev;};
static timer timeouts = { 0, NULL, NULL, &timeouts, &timeouts };
timer* schedule_timer(uint32_t deadline, timer_cb cb, void* userp){ timer* iter = timeouts.prev; while (iter != &timeouts !& is_after(iter!>deadline, deadline)) iter = iter!>prev; add_behind(iter, deadline, cb, userp);}
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 3/205
typedef uint32_t (*timer_cb)(void*);struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev;};
static timer timeouts = { 0, NULL, NULL, &timeouts, &timeouts };
timer* schedule_timer(uint32_t deadline, timer_cb cb, void* userp){ timer* iter = timeouts.prev; while (iter != &timeouts !& is_after(iter!>deadline, deadline)) iter = iter!>prev; add_behind(iter, deadline, cb, userp);}
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 4/205
typedef uint32_t (*timer_cb)(void*);struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev;};
static timer timeouts = { 0, NULL, NULL, &timeouts, &timeouts };
timer* schedule_timer(uint32_t deadline, timer_cb cb, void* userp){ timer* iter = timeouts.prev; while (iter != &timeouts !& is_after(iter!>deadline, deadline)) iter = iter!>prev; add_behind(iter, deadline, cb, userp);}
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 5/205
typedef uint32_t (*timer_cb)(void*);struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev;};
static timer timeouts = { 0, NULL, NULL, &timeouts, &timeouts };
timer* schedule_timer(uint32_t deadline, timer_cb cb, void* userp){ timer* iter = timeouts.prev; while (iter != &timeouts !& is_after(iter!>deadline, deadline)) iter = iter!>prev; add_behind(iter, deadline, cb, userp);}
void cancel_timer(timer* t) { t!>next!>prev = t!>prev; t!>prev!>next = t!>next; free(t);}
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 6/205
What Do You Mean by “Cache Friendly”?
Björn Fahller
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 7/205
Simplistic model of cache behaviour
Includes
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 8/205
Simplistic model of cache behaviour
Includes
● The cache is small
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 9/205
Simplistic model of cache behaviour
Includes
● The cache is small● and consists of fixed size lines
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 10/205
Simplistic model of cache behaviour
Includes
● The cache is small● and consists of fixed size lines● and data access hit is very fast
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 11/205
Simplistic model of cache behaviour
Includes
● The cache is small● and consists of fixed size lines● and data access hit is very fast● and data acess miss is very slow
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 12/205
Simplistic model of cache behaviour
Includes
● The cache is small● and consists of fixed size lines● and data access hit is very fast● and data acess miss is very slow
Excludes
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 13/205
Simplistic model of cache behaviour
Includes
● The cache is small● and consists of fixed size lines● and data access hit is very fast● and data acess miss is very slow
Excludes
● Multiple levels of caches
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 14/205
Simplistic model of cache behaviour
Includes
● The cache is small● and consists of fixed size lines● and data access hit is very fast● and data acess miss is very slow
Excludes
● Multiple levels of caches● Associativity
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 15/205
Simplistic model of cache behaviour
Includes
● The cache is small● and consists of fixed size lines● and data access hit is very fast● and data acess miss is very slow
Excludes
● Multiple levels of caches● Associativity● Threading
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 16/205
Simplistic model of cache behaviour
Includes
● The cache is small● and consists of fixed size lines● and data access hit is very fast● and data acess miss is very slow
Excludes
● Multiple levels of caches● Associativity● Threading
All models are wrong, but some are useful
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 17/205
const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;
int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;
0x3A100x40100x40000x4FF0
cache
0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0
memorySimplistic model of cache behaviour
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 18/205
const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;
int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;
0x3A100x40100x40000x4FF0
cache
0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0
memorySimplistic model of cache behaviour
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 19/205
const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;
int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;
0x3A100x40100x40000x4FF0
cache
0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0
memorySimplistic model of cache behaviour
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 20/205
const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;
int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;
0x3A100x40100x40000x4FF0
cache
0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0
memorySimplistic model of cache behaviour
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 21/205
const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;
int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;
0x3A100x40100x40000x4FF0
cache
0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0
memorySimplistic model of cache behaviour
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 22/205
const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;
int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;
0x3A100x40100x40000x4FF0
cache
0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0
memorySimplistic model of cache behaviour
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 23/205
const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;
int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;
0x40100x40000x4FF0
cache
0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0
memorySimplistic model of cache behaviour
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 24/205
const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;
int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;
0x40100x40000x4FF0
cache
0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0
memory
0x4040
Simplistic model of cache behaviour
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 25/205
const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;
int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;
0x40100x40000x4FF0
cache
0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0
memory
0x4040
Simplistic model of cache behaviour
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 26/205
const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;
int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;
0x40100x40000x4FF0
cache
0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0
memory
0x4040
Simplistic model of cache behaviour
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 27/205
const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;
int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;
0x40100x40000x4FF0
cache
0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0
memory
0x4040
Simplistic model of cache behaviour
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 28/205
const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;
int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;
0x40100x40000x4FF0
cache
0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0
memory
0x4040
Simplistic model of cache behaviour
0x4010
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 29/205
const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;
int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;
0x40100x40000x4FF0
cache
0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0
memory
0x4040
Simplistic model of cache behaviour
0x4010
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 30/205
const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;
int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;
0x40000x4FF0
cache
0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0
memory
0x4040
Simplistic model of cache behaviour
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 31/205
const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;
int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;
0x40000x4FF0
cache
0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0
memory
0x4040
0x4080
Simplistic model of cache behaviour
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 32/205
const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;
int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;
0x40000x4FF0
cache
0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0
memory
0x4040
0x4080
Simplistic model of cache behaviour
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 33/205
const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;
int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;
0x40000x4FF0
cache
0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0
memory
0x4040
0x4080
Simplistic model of cache behaviour
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 34/205
const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;
int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;
0x40000x4FF0
cache
0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0
memory
0x4040
0x4080
Simplistic model of cache behaviour
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 35/205
const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;
int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;
0x40000x4FF0
cache
0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0
memory
0x4040
0x4080
Simplistic model of cache behaviour
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 36/205
const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;
int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;
0x40000x4FF0
cache
0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0
memory
0x4040
0x4080
0x4080
Simplistic model of cache behaviour
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 37/205
const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;
int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;
0x40000x4FF0
cache
0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0
memory
0x4040
0x4080
0x4080
Simplistic model of cache behaviour
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 38/205
Analysis of implementation
int main() { std!:random_device rd; std!:mt19937 gen(rd()); std!:uniform_int_distribution<uint32_t> dist;
for (int k = 0; k < 10; !+k) { timer* prev = nullptr; for (int i = 0; i < 20'000; !+i) { timer* t = schedule_timer( dist(gen), [](void*){return 0U;}, nullptr); if (i & 1) cancel_timer(prev); prev = t; } while (shoot_first()) ; } }
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 39/205
Analysis of implementation
int main() { std!:random_device rd; std!:mt19937 gen(rd()); std!:uniform_int_distribution<uint32_t> dist;
for (int k = 0; k < 10; !+k) { timer* prev = nullptr; for (int i = 0; i < 20'000; !+i) { timer* t = schedule_timer( dist(gen), [](void*){return 0U;}, nullptr); if (i & 1) cancel_timer(prev); prev = t; } while (shoot_first()) ; } }
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 40/205
Analysis of implementation
int main() { std!:random_device rd; std!:mt19937 gen(rd()); std!:uniform_int_distribution<uint32_t> dist;
for (int k = 0; k < 10; !+k) { timer* prev = nullptr; for (int i = 0; i < 20'000; !+i) { timer* t = schedule_timer( dist(gen), [](void*){return 0U;}, nullptr); if (i & 1) cancel_timer(prev); prev = t; } while (shoot_first()) ; } }
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 41/205
Analysis of implementation
int main() { std!:random_device rd; std!:mt19937 gen(rd()); std!:uniform_int_distribution<uint32_t> dist;
for (int k = 0; k < 10; !+k) { timer* prev = nullptr; for (int i = 0; i < 20'000; !+i) { timer* t = schedule_timer( dist(gen), [](void*){return 0U;}, nullptr); if (i & 1) cancel_timer(prev); prev = t; } while (shoot_first()) ; } }
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 42/205
Analysis of implementation
int main() { std!:random_device rd; std!:mt19937 gen(rd()); std!:uniform_int_distribution<uint32_t> dist;
for (int k = 0; k < 10; !+k) { timer* prev = nullptr; for (int i = 0; i < 20'000; !+i) { timer* t = schedule_timer( dist(gen), [](void*){return 0U;}, nullptr); if (i & 1) cancel_timer(prev); prev = t; } while (shoot_first()) ; } }
bool shoot_first() { if (timeouts.next != &timeouts) return false;
timer* t = timeouts.next; t!>callback(t!>userp); cancel_timer(t); return true;}
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 43/205
Analysis of implementation
valgrind --tool=callgrind –-cache-sim=yes –-dump-instr=yes --branch-sim=yes
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 44/205
Analysis of implementation
valgrind --tool=callgrind –-cache-sim=yes –-dump-instr=yes --branch-sim=yes
Essentially a profiler thatcollects info about callhierarchies, number ofcalls, and time spent.
The CPU simulator is notcycle accurate, so see
timing results as a broadpicture.
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 45/205
Analysis of implementation
valgrind --tool=callgrind –-cache-sim=yes –-dump-instr=yes --branch-sim=yes
Essentially a profiler thatcollects info about callhierarchies, number ofcalls, and time spent.
The CPU simulator is notcycle accurate, so see
timing results as a broadpicture.
Simulates a CPU cache,flattened to 2 levels, L1 and LL.
It shows you where you getcache misses.
L1 is by default a model ofyour host CPU L1, but youcan change size, line-size,
and associativity.
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 46/205
Analysis of implementation
valgrind --tool=callgrind –-cache-sim=yes –-dump-instr=yes --branch-sim=yes
Essentially a profiler thatcollects info about callhierarchies, number ofcalls, and time spent.
The CPU simulator is notcycle accurate, so see
timing results as a broadpicture.
Simulates a CPU cache,flattened to 2 levels, L1 and LL.
It shows you where you getcache misses.
L1 is by default a model ofyour host CPU L1, but youcan change size, line-size,
and associativity.
Collects statistics perinstruction instead of per
source line. Can helppinpointing bottlenecks.
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 47/205
Analysis of implementation
valgrind --tool=callgrind –-cache-sim=yes –-dump-instr=yes --branch-sim=yes
Essentially a profiler thatcollects info about callhierarchies, number ofcalls, and time spent.
The CPU simulator is notcycle accurate, so see
timing results as a broadpicture.
Simulates a CPU cache,flattened to 2 levels, L1 and LL.
It shows you where you getcache misses.
L1 is by default a model ofyour host CPU L1, but youcan change size, line-size,
and associativity.
Collects statistics perinstruction instead of per
source line. Can helppinpointing bottlenecks.
Simulates a branchpredictor.
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 48/205
Analysis of implementation
valgrind --tool=callgrind –-cache-sim=yes –-dump-instr=yes --branch-sim=yes
Essentially a profiler thatcollects info about callhierarchies, number ofcalls, and time spent.
The CPU simulator is notcycle accurate, so see
timing results as a broadpicture.
Simulates a CPU cache,flattened to 2 levels, L1 and LL.
It shows you where you getcache misses.
L1 is by default a model ofyour host CPU L1, but youcan change size, line-size,
and associativity.
Collects statistics perinstruction instead of per
source line. Can helppinpointing bottlenecks.
Simulates a branchpredictor.
Very slow!
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 49/205
Live demo
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 50/205
typedef uint32_t (*timer_cb)(void*);typedef struct timer { uint32_t deadline;
timer_cb callback; void* userp; struct timer* next; struct timer* prev;} timer;
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 51/205
typedef uint32_t (*timer_cb)(void*);typedef struct timer { uint32_t deadline;
timer_cb callback; void* userp; struct timer* next; struct timer* prev;} timer;
!/ 4 bytes
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 52/205
typedef uint32_t (*timer_cb)(void*);typedef struct timer { uint32_t deadline;
timer_cb callback; void* userp; struct timer* next; struct timer* prev;} timer;
!/ 4 bytes!/ 4 bytes padding for alignment
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 53/205
typedef uint32_t (*timer_cb)(void*);typedef struct timer { uint32_t deadline;
timer_cb callback; void* userp; struct timer* next; struct timer* prev;} timer;
!/ 4 bytes!/ 4 bytes padding for alignment!/ 8 bytes
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 54/205
typedef uint32_t (*timer_cb)(void*);typedef struct timer { uint32_t deadline;
timer_cb callback; void* userp; struct timer* next; struct timer* prev;} timer;
!/ 4 bytes!/ 4 bytes padding for alignment!/ 8 bytes!/ 8 bytes
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 55/205
typedef uint32_t (*timer_cb)(void*);typedef struct timer { uint32_t deadline;
timer_cb callback; void* userp; struct timer* next; struct timer* prev;} timer;
!/ 4 bytes!/ 4 bytes padding for alignment!/ 8 bytes!/ 8 bytes!/ 8 bytes
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 56/205
typedef uint32_t (*timer_cb)(void*);typedef struct timer { uint32_t deadline;
timer_cb callback; void* userp; struct timer* next; struct timer* prev;} timer;
!/ 4 bytes!/ 4 bytes padding for alignment!/ 8 bytes!/ 8 bytes!/ 8 bytes!/ 8 bytes
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 57/205
typedef uint32_t (*timer_cb)(void*);typedef struct timer { uint32_t deadline;
timer_cb callback; void* userp; struct timer* next; struct timer* prev;} timer;
!/ 4 bytes!/ 4 bytes padding for alignment!/ 8 bytes!/ 8 bytes!/ 8 bytes!/ 8 bytes!/ sum = 40 bytes
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 58/205
typedef uint32_t (*timer_cb)(void*);typedef struct timer { uint32_t deadline;
timer_cb callback; void* userp; struct timer* next; struct timer* prev;} timer;
!/ 4 bytes!/ 4 bytes padding for alignment!/ 8 bytes!/ 8 bytes!/ 8 bytes!/ 8 bytes!/ sum = 40 bytes
66% of all L1d cache misses
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 59/205
typedef uint32_t (*timer_cb)(void*);typedef struct timer { uint32_t deadline;
timer_cb callback; void* userp; struct timer* next; struct timer* prev;} timer;
!/ 4 bytes!/ 4 bytes padding for alignment!/ 8 bytes!/ 8 bytes!/ 8 bytes!/ 8 bytes!/ sum = 40 bytes
66% of all L1d cache misses
Rule of thumb:Follow pointer => cache miss
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 60/205
typedef uint32_t (*timer_cb)(void*);typedef struct timer { uint32_t deadline;
timer_cb callback; void* userp; struct timer* next; struct timer* prev;} timer;
!/ 4 bytes!/ 4 bytes padding for alignment!/ 8 bytes!/ 8 bytes!/ 8 bytes!/ 8 bytes!/ sum = 40 bytes
66% of all L1d cache misses
Rule of thumb:Follow pointer => cache miss
33% of all L1d cache misses
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 61/205
Chasing pointers is expensive.Let’s get rid of the pointers.
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 62/205
typedef uint32_t (*timer_cb)(void*);typedef uint32_t timer;struct timer_data { uint32_t deadline; timer id; void* userp; timer_cb callback;};
std!:vector<timer_data> timeouts;uint32_t next_id = 0;
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 63/205
typedef uint32_t (*timer_cb)(void*);typedef uint32_t timer;struct timer_data { uint32_t deadline; timer id; void* userp; timer_cb callback;};
std!:vector<timer_data> timeouts;uint32_t next_id = 0;
24 bytes per entry.No pointer chasing
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 64/205
typedef uint32_t (*timer_cb)(void*);typedef uint32_t timer;struct timer_data { uint32_t deadline; timer id; void* userp; timer_cb callback;};
std!:vector<timer_data> timeouts;uint32_t next_id = 0;
24 bytes per entry.No pointer chasing
Linear structure
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 65/205
typedef uint32_t (*timer_cb)(void*);typedef uint32_t timer;struct timer_data { uint32_t deadline; timer id; void* userp; timer_cb callback;};
std!:vector<timer_data> timeouts;uint32_t next_id = 0;
timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp){ auto idx = timeouts.size(); timeouts.push_back({}); while (idx > 0 !& is_after(timeouts[idx-1].deadline, deadline)) { timeouts[idx] = std!:move(timeouts[idx-1]); !-idx; } timeouts[idx] = timer_data{deadline, next_id!+, userp, cb }; return next_id;}
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 66/205
typedef uint32_t (*timer_cb)(void*);typedef uint32_t timer;struct timer_data { uint32_t deadline; timer id; void* userp; timer_cb callback;};
std!:vector<timer_data> timeouts;uint32_t next_id = 0;
timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp){ auto idx = timeouts.size(); timeouts.push_back({}); while (idx > 0 !& is_after(timeouts[idx-1].deadline, deadline)) { timeouts[idx] = std!:move(timeouts[idx-1]); !-idx; } timeouts[idx] = timer_data{deadline, next_id!+, userp, cb }; return next_id;}
Linear insertion sort
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 67/205
typedef uint32_t (*timer_cb)(void*);typedef uint32_t timer;struct timer_data { uint32_t deadline; timer id; void* userp; timer_cb callback;};
std!:vector<timer_data> timeouts;uint32_t next_id = 0;
timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp){ auto idx = timeouts.size(); timeouts.push_back({}); while (idx > 0 !& is_after(timeouts[idx-1].deadline, deadline)) { timeouts[idx] = std!:move(timeouts[idx-1]); !-idx; } timeouts[idx] = timer_data{deadline, next_id!+, userp, cb }; return next_id;}
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 68/205
typedef uint32_t (*timer_cb)(void*);typedef uint32_t timer;struct timer_data { uint32_t deadline; timer id; void* userp; timer_cb callback;};
std!:vector<timer_data> timeouts;uint32_t next_id = 0;
timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp){ auto idx = timeouts.size(); timeouts.push_back({}); while (idx > 0 !& is_after(timeouts[idx-1].deadline, deadline)) { timeouts[idx] = std!:move(timeouts[idx-1]); !-idx; } timeouts[idx] = timer_data{deadline, next_id!+, userp, cb }; return next_id;}
void cancel_timer(timer t) { auto i = std!:find_if(timeouts.begin(), timeouts.end(), [t](const auto& e) { return e.id != t; });
timeouts.erase(i);}
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 69/205
typedef uint32_t (*timer_cb)(void*);typedef uint32_t timer;struct timer_data { uint32_t deadline; timer id; void* userp; timer_cb callback;};
std!:vector<timer_data> timeouts;uint32_t next_id = 0;
timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp){ auto idx = timeouts.size(); timeouts.push_back({}); while (idx > 0 !& is_after(timeouts[idx-1].deadline, deadline)) { timeouts[idx] = std!:move(timeouts[idx-1]); !-idx; } timeouts[idx] = timer_data{deadline, next_id!+, userp, cb }; return next_id;}
void cancel_timer(timer t) { auto i = std!:find_if(timeouts.begin(), timeouts.end(), [t](const auto& e) { return e.id != t; });
timeouts.erase(i);}
Linear search
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 70/205
Analysis of implementation
perf stat -e cycles,instructions,l1d-loads,l1d-load-misses
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 71/205
Analysis of implementation
perf stat -e cycles,instructions,l1d-loads,l1d-load-misses
Presents statistics fromwhole run of program,
using counters from HWand linux kernel.
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 72/205
Analysis of implementation
perf stat -e cycles,instructions,l1d-loads,l1d-load-misses
Presents statistics fromwhole run of program,
using counters from HWand linux kernel.
Number of cycles perinstruction is a proxy forhow much the CPU is
working or waiting.
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 73/205
Analysis of implementation
perf stat -e cycles,instructions,l1d-loads,l1d-load-misses
Presents statistics fromwhole run of program,
using counters from HWand linux kernel.
Number of cycles perinstruction is a proxy forhow much the CPU is
working or waiting.
Number of reads fromL1d cache, and numberof misses. Speculative
execution can make thesenumbers confusing.
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 74/205
Analysis of implementation
perf stat -e cycles,instructions,l1d-loads,l1d-load-misses
Presents statistics fromwhole run of program,
using counters from HWand linux kernel.
Number of cycles perinstruction is a proxy forhow much the CPU is
working or waiting.
Number of reads fromL1d cache, and numberof misses. Speculative
execution can make thesenumbers confusing.
Very fast!
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 75/205
Analysis of implementation
perf record -e cycles,instructions,l1d-loads,l1d-load-misses --call-graph=lbr
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 76/205
Analysis of implementation
perf record -e cycles,instructions,l1d-loads,l1d-load-misses --call-graph=lbr
Records where in yourprogram the counters are
gathered.
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 77/205
Analysis of implementation
perf record -e cycles,instructions,l1d-loads,l1d-load-misses --call-graph=lbr
Records where in yourprogram the counters are
gathered.
Records call graph info,instead of just location.LBR requires no special
compilation flags.
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 78/205
Analysis of implementation
perf record -e cycles,instructions,l1d-loads,l1d-load-misses --call-graph=lbr
Records where in yourprogram the counters are
gathered.
Records call graph info,instead of just location.LBR requires no special
compilation flags.
Very fast!
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 79/205
Live demo
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 80/205
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 81/205
Linear search is expensive.Maybe try binary search?
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 82/205
typedef uint32_t (*timer_cb)(void*);struct timer_data { uint32_t deadline; uint32_t id; void* userp; timer_cb callback;};struct timer { uint32_t deadline; uint32_t id;};
std!:vector<timer_data> timeouts;uint32_t next_id = 0;
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 83/205
typedef uint32_t (*timer_cb)(void*);struct timer_data { uint32_t deadline; uint32_t id; void* userp; timer_cb callback;};struct timer { uint32_t deadline; uint32_t id;};
std!:vector<timer_data> timeouts;uint32_t next_id = 0;
timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp){ timer_data element{deadline, next_id, userp, cb}; auto i = std!:lower_bound(timeouts.begin(), timeouts.end(), element, is_after); timeouts.insert(i, element); return {deadline, next_id!+};}
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 84/205
typedef uint32_t (*timer_cb)(void*);struct timer_data { uint32_t deadline; uint32_t id; void* userp; timer_cb callback;};struct timer { uint32_t deadline; uint32_t id;};
std!:vector<timer_data> timeouts;uint32_t next_id = 0;
timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp){ timer_data element{deadline, next_id, userp, cb}; auto i = std!:lower_bound(timeouts.begin(), timeouts.end(), element, is_after); timeouts.insert(i, element); return {deadline, next_id!+};}
Binary search forinsertion point
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 85/205
typedef uint32_t (*timer_cb)(void*);struct timer_data { uint32_t deadline; uint32_t id; void* userp; timer_cb callback;};struct timer { uint32_t deadline; uint32_t id;};
std!:vector<timer_data> timeouts;uint32_t next_id = 0;
timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp){ timer_data element{deadline, next_id, userp, cb}; auto i = std!:lower_bound(timeouts.begin(), timeouts.end(), element, is_after); timeouts.insert(i, element); return {deadline, next_id!+};}
Linear insertion
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 86/205
typedef uint32_t (*timer_cb)(void*);struct timer_data { uint32_t deadline; uint32_t id; void* userp; timer_cb callback;};struct timer { uint32_t deadline; uint32_t id;};
std!:vector<timer_data> timeouts;uint32_t next_id = 0;
timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp){ timer_data element{deadline, next_id, userp, cb}; auto i = std!:lower_bound(timeouts.begin(), timeouts.end(), element, is_after); timeouts.insert(i, element); return {deadline, next_id!+};}
Linear insertion
void cancel_timer(timer t) { timer_data element{t.deadline, t.id, nullptr, nullptr}; auto [lo, hi] = std!:equal_range(timeouts.begin(), timeouts.end(), element, is_after); auto i = std!:find_if(lo, hi, [t](const auto& e) { return e.id != t.id; });
if (i != hi) { timeouts.erase(i); } }
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 87/205
typedef uint32_t (*timer_cb)(void*);struct timer_data { uint32_t deadline; uint32_t id; void* userp; timer_cb callback;};struct timer { uint32_t deadline; uint32_t id;};
std!:vector<timer_data> timeouts;uint32_t next_id = 0;
timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp){ timer_data element{deadline, next_id, userp, cb}; auto i = std!:lower_bound(timeouts.begin(), timeouts.end(), element, is_after); timeouts.insert(i, element); return {deadline, next_id!+};}
Linear insertion
void cancel_timer(timer t) { timer_data element{t.deadline, t.id, nullptr, nullptr}; auto [lo, hi] = std!:equal_range(timeouts.begin(), timeouts.end(), element, is_after); auto i = std!:find_if(lo, hi, [t](const auto& e) { return e.id != t.id; });
if (i != hi) { timeouts.erase(i); } }
Binary search fortimers with thesame deadline
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 88/205
typedef uint32_t (*timer_cb)(void*);struct timer_data { uint32_t deadline; uint32_t id; void* userp; timer_cb callback;};struct timer { uint32_t deadline; uint32_t id;};
std!:vector<timer_data> timeouts;uint32_t next_id = 0;
timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp){ timer_data element{deadline, next_id, userp, cb}; auto i = std!:lower_bound(timeouts.begin(), timeouts.end(), element, is_after); timeouts.insert(i, element); return {deadline, next_id!+};}
Linear insertion
void cancel_timer(timer t) { timer_data element{t.deadline, t.id, nullptr, nullptr}; auto [lo, hi] = std!:equal_range(timeouts.begin(), timeouts.end(), element, is_after); auto i = std!:find_if(lo, hi, [t](const auto& e) { return e.id != t.id; });
if (i != hi) { timeouts.erase(i); } }
Linear search formatching id
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 89/205
typedef uint32_t (*timer_cb)(void*);struct timer_data { uint32_t deadline; uint32_t id; void* userp; timer_cb callback;};struct timer { uint32_t deadline; uint32_t id;};
std!:vector<timer_data> timeouts;uint32_t next_id = 0;
timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp){ timer_data element{deadline, next_id, userp, cb}; auto i = std!:lower_bound(timeouts.begin(), timeouts.end(), element, is_after); timeouts.insert(i, element); return {deadline, next_id!+};}
Linear insertion
void cancel_timer(timer t) { timer_data element{t.deadline, t.id, nullptr, nullptr}; auto [lo, hi] = std!:equal_range(timeouts.begin(), timeouts.end(), element, is_after); auto i = std!:find_if(lo, hi, [t](const auto& e) { return e.id != t.id; });
if (i != hi) { timeouts.erase(i); } } Linear removal
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 90/205
Live demo
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 91/205
Searches not visible in profiling.Number of reads reduced.
Number of cache misses high.memmove() dominates.
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 92/205
Searches not visible in profiling.Number of reads reduced.
Number of cache misses high.memmove() dominates.
Failed branch predictionscan lead to cache entry eviction!
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 93/205
Searches not visible in profiling.Number of reads reduced.
Number of cache misses high.memmove() dominates.
Failed branch predictionscan lead to cache entry eviction!
Maybe try a map!>?
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 94/205
typedef uint32_t (*timer_cb)(void*);struct timer_data { void* userp; timer_cb callback;};
struct is_after { bool operator()(uint32_t lh, uint32_t rh) const { return lh < rh; }};
using timer_map = std!:multimap<uint32_t, timer_data, is_after>;using timer = timer_map!:iterator;
static timer_map timeouts;
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 95/205
typedef uint32_t (*timer_cb)(void*);struct timer_data { void* userp; timer_cb callback;};
struct is_after { bool operator()(uint32_t lh, uint32_t rh) const { return lh < rh; }};
using timer_map = std!:multimap<uint32_t, timer_data, is_after>;using timer = timer_map!:iterator;
static timer_map timeouts;
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 96/205
typedef uint32_t (*timer_cb)(void*);struct timer_data { void* userp; timer_cb callback;};
struct is_after { bool operator()(uint32_t lh, uint32_t rh) const { return lh < rh; }};
using timer_map = std!:multimap<uint32_t, timer_data, is_after>;using timer = timer_map!:iterator;
static timer_map timeouts;
timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { return timeouts.insert(std!:make_pair(deadline, timer_data{userp, cb}));}
void cancel_timer(timer t) { timeouts.erase(t);}
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 97/205
typedef uint32_t (*timer_cb)(void*);struct timer_data { void* userp; timer_cb callback;};
struct is_after { bool operator()(uint32_t lh, uint32_t rh) const { return lh < rh; }};
using timer_map = std!:multimap<uint32_t, timer_data, is_after>;using timer = timer_map!:iterator;
static timer_map timeouts;
timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { return timeouts.insert(std!:make_pair(deadline, timer_data{userp, cb}));}
void cancel_timer(timer t) { timeouts.erase(t);}
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 98/205
typedef uint32_t (*timer_cb)(void*);struct timer_data { void* userp; timer_cb callback;};
struct is_after { bool operator()(uint32_t lh, uint32_t rh) const { return lh < rh; }};
using timer_map = std!:multimap<uint32_t, timer_data, is_after>;using timer = timer_map!:iterator;
static timer_map timeouts;
timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { return timeouts.insert(std!:make_pair(deadline, timer_data{userp, cb}));}
void cancel_timer(timer t) { timeouts.erase(t);}
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 99/205
typedef uint32_t (*timer_cb)(void*);struct timer_data { void* userp; timer_cb callback;};
struct is_after { bool operator()(uint32_t lh, uint32_t rh) const { return lh < rh; }};
using timer_map = std!:multimap<uint32_t, timer_data, is_after>;using timer = timer_map!:iterator;
static timer_map timeouts;
timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { return timeouts.insert(std!:make_pair(deadline, timer_data{userp, cb}));}
void cancel_timer(timer t) { timeouts.erase(t);}
bool shoot_first() { if (timeouts.empty()) return false; auto i = timeouts.begin(); i!>second.callback(i!>second.userp); timeouts.erase(i); return true;}
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 100/205
Live demo
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 101/205
Faster, but lots ofcache misses when
comparing keysand rebalancing
the tree.
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 102/205
Faster, but lots ofcache misses when
comparing keysand rebalancing
the tree.What did I say about
chasing pointers?
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 103/205
Faster, but lots ofcache misses when
comparing keysand rebalancing
the tree.What did I say about
chasing pointers?
1 10 100 1000 100001.00E-08
1.00E-07
1.00E-06
1.00E-05
1.00E-04
1.00E-03
1.00E-02
Execution time
linear
bsearch
map
Number of elements
seco
nd
s
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 104/205
Faster, but lots ofcache misses when
comparing keysand rebalancing
the tree.What did I say about
chasing pointers?
1 10 100 1000 100001.00E-08
1.00E-07
1.00E-06
1.00E-05
1.00E-04
1.00E-03
1.00E-02
Execution time
linear
bsearch
map
Number of elements
seco
nd
s
1 10 100 1000 100000
0.20.40.60.8
11.21.41.61.8
2
Performance relative to linear
Execution time
bsearch/linear
map/linear
Number of elements
Tim
e r
ela
tive
lin
ea
r
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 105/205
Faster, but lots ofcache misses when
comparing keysand rebalancing
the tree.What did I say about
chasing pointers?
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 106/205
Faster, but lots ofcache misses when
comparing keysand rebalancing
the tree.What did I say about
chasing pointers?
Can we get log(n)lookup without
chasing pointers?
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 107/205
Enter the HEAP
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 108/205
3
5 8
6 1010 14
9 151312 11
Enter the HEAP
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 109/205
3
5 8
6 1010 14
9 151312 11
Enter the HEAP● Perfectly balanced partially sorted tree
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 110/205
3
5 8
6 1010 14
9 151312 11
Enter the HEAP● Perfectly balanced partially sorted tree● Every node is sorted after or same as its parent
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 111/205
3
5 8
6 1010 14
9 151312 11
Enter the HEAP● Perfectly balanced partially sorted tree● Every node is sorted after or same as its parent● No relation between siblings
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 112/205
3
5 8
6 1010 14
9 151312 11
Enter the HEAP● Perfectly balanced partially sorted tree● Every node is sorted after or same as its parent● No relation between siblings● At most one node with only one child,
and that child is the last node
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 113/205
10 14
151312 11
8
3
5
6
9
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 114/205
10 14
151312 11
8
Insertion:
3
5
6
9
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 115/205
10 14
151312 11
8
Insertion:● Create space
3
5
6
9
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 116/205
10 14
151312 11
8
Insertion:● Create space● Trickle down greater nodes
3
5
6
9
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 117/205
10 14
151312 11
8
Insertion:● Create space● Trickle down greater nodes● Insert into space
3
5
6
9
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 118/205
10 14
151312 11
8
7
Insertion:● Create space● Trickle down greater nodes● Insert into space
3
5
6
9
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 119/205
1010 14
151312 11
8
7
Insertion:● Create space● Trickle down greater nodes● Insert into space
3
5
6
9
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 120/205
1010 14
151312 11
8
7
Insertion:● Create space● Trickle down greater nodes● Insert into space
3
5
6
9
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 121/205
1010 14
151312 11
8
7
Insertion:● Create space● Trickle down greater nodes● Insert into space
3
5
6
9
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 122/205
7
810 14
151312 11
3
5
6
9 10
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 123/205
7
810 14
151312 11
Pop top:
3
5
6
9 10
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 124/205
7
810 14
151312 11
Pop top:● Remove top
3
5
6
9 10
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 125/205
7
810 14
151312 11
Pop top:● Remove top● Trickle up lesser child
3
5
6
9 10
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 126/205
7
810 14
151312 11
Pop top:● Remove top● Trickle up lesser child● move-insert last into hole
3
5
6
9 10
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 127/205
7
810 14
151312 11
Pop top:● Remove top● Trickle up lesser child● move-insert last into hole
5
6
9 10
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 128/205
7
810 14
151312 11
Pop top:● Remove top● Trickle up lesser child● move-insert last into hole
5
6
9 10
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 129/205
7
810 14
151312 11
Pop top:● Remove top● Trickle up lesser child● move-insert last into hole
5
6
9 10
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 130/205
7
810 14
151312 11
Pop top:● Remove top● Trickle up lesser child● move-insert last into hole
5
6
9 10
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 131/205
7
810 14
151312 11
Pop top:● Remove top● Trickle up lesser child● move-insert last into hole
5
6
9 10
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 132/205
7
810 14
151312 11
Pop top:● Remove top● Trickle up lesser child● move-insert last into hole
5
6
9 10
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 133/205
5
1
6
2
7
3
9
4
10
5
8
6
14
7
10
8
12
9
13
10
11
11
15
12
15151515
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 134/205
5
1
6
2
7
3
9
4
10
5
8
6
14
7
10
8
12
9
13
10
11
11
15
12
15151515
Addressing:The index of a parent nodeis half (rounded down) of thatof a child.
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 135/205
5
1
6
2
7
3
9
4
10
5
8
6
14
7
10
8
12
9
13
10
11
11
15
12
15151515
Addressing:The index of a parent nodeis half (rounded down) of thatof a child.
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 136/205
5
1
6
2
7
3
9
4
10
5
8
6
14
7
10
8
12
9
13
10
11
11
15
12
15151515
Addressing:The index of a parent nodeis half (rounded down) of thatof a child.
Array indexes!No pointer chasing!
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 137/205
The heap is not searchable,so how handle cancellation?
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 138/205
The heap is not searchable,so how handle cancellation?
struct timer_action { uint32_t (*callback)(void*); void* userp;};
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 139/205
The heap is not searchable,so how handle cancellation?
actions
struct timer_action { uint32_t (*callback)(void*); void* userp;};
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 140/205
The heap is not searchable,so how handle cancellation?
actions
struct timer_action { uint32_t (*callback)(void*); void* userp;};
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 141/205
The heap is not searchable,so how handle cancellation?
actions
struct timer_action { uint32_t (*callback)(void*); void* userp;};
struct timeout { uint32_t deadline; uint32_t action_index;};
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 142/205
The heap is not searchable,so how handle cancellation?
actions
struct timer_action { uint32_t (*callback)(void*); void* userp;};
struct timeout { uint32_t deadline; uint32_t action_index;};
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 143/205
The heap is not searchable,so how handle cancellation?
actions
struct timer_action { uint32_t (*callback)(void*); void* userp;};
struct timeout { uint32_t deadline; uint32_t action_index;};
Only 8 bytesper element ofworking datain the heap.
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 144/205
The heap is not searchable,so how handle cancellation?
actions
struct timer_action { uint32_t (*callback)(void*); void* userp;};
struct timeout { uint32_t deadline; uint32_t action_index;};
Cancel bysetting callback
to nullptr
Only 8 bytesper element ofworking datain the heap.
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 145/205
struct timer_data { uint32_t deadline; uint32_t action_index;};
struct is_after { bool operator()(const timer_data& lh, const timer_data& rh) const { return lh.deadline < rh.deadline; }};
std!:priority_queue<timer_data, std!:vector<timer_data>, is_after> timeouts;
timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { auto action_index = actions.push(cb, userp); timeouts.push(timer_data{deadline, action_index}); return action_index;}
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 146/205
struct timer_data { uint32_t deadline; uint32_t action_index;};
struct is_after { bool operator()(const timer_data& lh, const timer_data& rh) const { return lh.deadline < rh.deadline; }};
std!:priority_queue<timer_data, std!:vector<timer_data>, is_after> timeouts;
timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { auto action_index = actions.push(cb, userp); timeouts.push(timer_data{deadline, action_index}); return action_index;}
Container adapter thatimplements a heap
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 147/205
struct timer_data { uint32_t deadline; uint32_t action_index;};
struct is_after { bool operator()(const timer_data& lh, const timer_data& rh) const { return lh.deadline < rh.deadline; }};
std!:priority_queue<timer_data, std!:vector<timer_data>, is_after> timeouts;
timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { auto action_index = actions.push(cb, userp); timeouts.push(timer_data{deadline, action_index}); return action_index;}
Container adapter thatimplements a heap
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 148/205
struct timer_data { uint32_t deadline; uint32_t action_index;};
struct is_after { bool operator()(const timer_data& lh, const timer_data& rh) const { return lh.deadline < rh.deadline; }};
std!:priority_queue<timer_data, std!:vector<timer_data>, is_after> timeouts;
timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { auto action_index = actions.push(cb, userp); timeouts.push(timer_data{deadline, action_index}); return action_index;}
Container adapter thatimplements a heap
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 149/205
bool shoot_first() { while (!timeouts.empty()) { auto& t = timeouts.top(); auto& action = actions[t.action_index]; if (action.callback) break; actions.remove(t.action_index); timeouts.pop(); } if (timeouts.empty()) return false; auto& t = timeouts.top(); auto& action = actions[t.action_index]; action.callback(action.userp); actions.remove(t.action_index); timeouts.pop(); return true;}
Pop-off any cancelled items
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 150/205
Live demo
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 151/205
A lot fewer everything!and nearly twice as fast too
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 152/205
A lot fewer everything!and nearly twice as fast too
1 10 100 1000 10000 1000000
0.01
0.01
0.02
0.02
0.03
Execution time
linear
bsearch
map
heap
Number of elements
Se
con
ds
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 153/205
A lot fewer everything!and nearly twice as fast too
1 10 100 1000 10000 1000000
0.01
0.01
0.02
0.02
0.03
Execution time
linear
bsearch
map
heap
Number of elements
Se
con
ds
1 10 100 1000 100000
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
Relative execution time
heap/linear
heap/map
Number of elements
Re
lavi
te ti
me
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 154/205
A lot fewer everything!and nearly twice as fast too
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 155/205
A lot fewer everything!and nearly twice as fast too
But there are many cache missesin the adjust-heap functions
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 156/205
A lot fewer everything!and nearly twice as fast too
But there are many cache missesin the adjust-heap functions
Can we do better?
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 157/205
How do the entries fit incache lines?
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 158/205
How do the entries fit incache lines?
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 159/205
How do the entries fit incache lines?
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 160/205
How do the entries fit incache lines?
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 161/205
How do the entries fit incache lines?
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 162/205
How do the entries fit incache lines?
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 163/205
How do the entries fit incache lines?
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 164/205
Every generation ison a new cache line
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 165/205
Can we do better?
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 166/205
Can we do better?
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 167/205
Can we do better?
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 168/205
Can we do better?
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 169/205
Can we do better?
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 170/205
Can we do better?
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 171/205
Can we do better?
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 172/205
Three generationsper cache line!
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 173/205
5
1
6
2
7
3
9
4
10
5
8
6
14
7
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 174/205
5
1
6
2
7
3
9
4
10
5
8
6
14
7
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 175/205
5
1
6
2
7
3
9
4
10
5
8
6
14
7
0
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 176/205
5
1
6
2
7
3
9
4
10
5
8
6
14
7
0
8 9 10 11 12 13 14 15
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 177/205
5
1
6
2
7
3
9
4
10
5
8
6
14
7
0
8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 178/205
5
1
6
2
7
3
9
4
10
5
8
6
14
7
0
8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 179/205
class timeout_store { static constexpr size_t block_size = 8; static constexpr size_t block_mask = block_size – 1U;
static size_t block_offset(size_t idx) { return idx & block_mask; } static size_t block_base(size_t idx) { return idx & ~block_mask; } static bool is_block_root(size_t idx) { return block_offset(idx) != 1; } static bool is_block_leaf(size_t idx) { return (idx & (block_size !> 1)) != 0U; } !!.};
1
2 3
4 5 6 7
0
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 180/205
class timeout_store { static constexpr size_t block_size = 8; static constexpr size_t block_mask = block_size – 1U;
static size_t block_offset(size_t idx) { return idx & block_mask; } static size_t block_base(size_t idx) { return idx & ~block_mask; } static bool is_block_root(size_t idx) { return block_offset(idx) != 1; } static bool is_block_leaf(size_t idx) { return (idx & (block_size !> 1)) != 0U; } !!.};
1
2 3
4 5 6 7
0
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 181/205
class timeout_store { static constexpr size_t block_size = 8; static constexpr size_t block_mask = block_size – 1U;
static size_t block_offset(size_t idx) { return idx & block_mask; } static size_t block_base(size_t idx) { return idx & ~block_mask; } static bool is_block_root(size_t idx) { return block_offset(idx) != 1; } static bool is_block_leaf(size_t idx) { return (idx & (block_size !> 1)) != 0U; } !!.};
1
2 3
4 5 6 7
0
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 182/205
class timeout_store { static constexpr size_t block_size = 8; static constexpr size_t block_mask = block_size – 1U;
static size_t block_offset(size_t idx) { return idx & block_mask; } static size_t block_base(size_t idx) { return idx & ~block_mask; } static bool is_block_root(size_t idx) { return block_offset(idx) != 1; } static bool is_block_leaf(size_t idx) { return (idx & (block_size !> 1)) != 0U; } !!.};
1
2 3
4 5 6 7
0
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 183/205
class timeout_store { static constexpr size_t block_size = 8; static constexpr size_t block_mask = block_size – 1U;
static size_t block_offset(size_t idx) { return idx & block_mask; } static size_t block_base(size_t idx) { return idx & ~block_mask; } static bool is_block_root(size_t idx) { return block_offset(idx) != 1; } static bool is_block_leaf(size_t idx) { return (idx & (block_size !> 1)) != 0U; } !!.};
1
2 3
4 5 6 7
0
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 184/205
class timeout_store { static constexpr size_t block_size = 8; static constexpr size_t block_mask = block_size – 1U;
static size_t block_offset(size_t idx); static size_t block_base(size_t idx); static bool is_block_root(size_t idx); static bool is_block_leaf(size_t idx);
static size_t left_child_of(size_t idx) { if (!is_block_leaf(idx)) return idx + block_offset(idx); auto base = block_base(idx) + 1; return base * block_size + child_no(idx) * block_size * 2 + 1; }
!!.};
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 185/205
class timeout_store { static constexpr size_t block_size = 8; static constexpr size_t block_mask = block_size – 1U;
static size_t block_offset(size_t idx); static size_t block_base(size_t idx); static bool is_block_root(size_t idx); static bool is_block_leaf(size_t idx);
static size_t left_child_of(size_t idx) { if (!is_block_leaf(idx)) return idx + block_offset(idx); auto base = block_base(idx) + 1; return base * block_size + child_no(idx) * block_size * 2 + 1; }
!!.};
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 186/205
class timeout_store { static constexpr size_t block_size = 8; static constexpr size_t block_mask = block_size – 1U;
static size_t block_offset(size_t idx); static size_t block_base(size_t idx); static bool is_block_root(size_t idx); static bool is_block_leaf(size_t idx);
static size_t left_child_of(size_t idx) { if (!is_block_leaf(idx)) return idx + block_offset(idx); auto base = block_base(idx) + 1; return base * block_size + child_no(idx) * block_size * 2 + 1; }
!!.};
static size_t parent_of(size_t idx) { auto const node_root = block_base(idx); if (!is_block_root(idx)) return node_root + block_offset(idx) / 2; auto parent_base = block_base(node_root / block_size - 1); auto child = ((idx - block_size) / block_size - parent_base) / 2; return parent_base + block_size / 2 + child; }
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 187/205
class timeout_store { static constexpr size_t block_size = 8; static constexpr size_t block_mask = block_size – 1U;
static size_t block_offset(size_t idx); static size_t block_base(size_t idx); static bool is_block_root(size_t idx); static bool is_block_leaf(size_t idx);
static size_t left_child_of(size_t idx) { if (!is_block_leaf(idx)) return idx + block_offset(idx); auto base = block_base(idx) + 1; return base * block_size + child_no(idx) * block_size * 2 + 1; }
!!.};
static size_t parent_of(size_t idx) { auto const node_root = block_base(idx); if (!is_block_root(idx)) return node_root + block_offset(idx) / 2; auto parent_base = block_base(node_root / block_size - 1); auto child = ((idx - block_size) / block_size - parent_base) / 2; return parent_base + block_size / 2 + child; }
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 188/205
class timeout_store { !!. using allocator = align_allocator<64>!:type<timer_data>; std!:vector<timer_data, allocator> bheap_store;};
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 189/205
class timeout_store { !!. using allocator = align_allocator<64>!:type<timer_data>; std!:vector<timer_data, allocator> bheap_store;};
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 190/205
class timeout_store { !!. using allocator = align_allocator<64>!:type<timer_data>; std!:vector<timer_data, allocator> bheap_store;};
template <size_t N>struct align_allocator { template <typename T> struct type { using value_type = T; static constexpr std!:align_val_t alignment{N}; T* allocate(size_t n) { return static_cast<T!>(operator new(n*sizeof(T), alignment)); } void deallocate(T* p, size_t) { operator delete(p, alignment); } };};
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 191/205
class timeout_store { !!. using allocator = align_allocator<64>!:type<timer_data>; std!:vector<timer_data, allocator> bheap_store;};
template <size_t N>struct align_allocator { template <typename T> struct type { using value_type = T; static constexpr std!:align_val_t alignment{N}; T* allocate(size_t n) { return static_cast<T!>(operator new(n*sizeof(T), alignment)); } void deallocate(T* p, size_t) { operator delete(p, alignment); } };};
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 192/205
class timeout_store { !!. using allocator = align_allocator<64>!:type<timer_data>; std!:vector<timer_data, allocator> bheap_store;};
template <size_t N>struct align_allocator { template <typename T> struct type { using value_type = T; static constexpr std!:align_val_t alignment{N}; T* allocate(size_t n) { return static_cast<T!>(operator new(n*sizeof(T), alignment)); } void deallocate(T* p, size_t) { operator delete(p, alignment); } };};
Aligned operator new anddelete came with C++ 17
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 193/205
Live demo
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 194/205
1 10 100 1000 10000 1000000
0
0
0
0
0
0.01
0.1
Execution time
linear
bsearch
map
heap
bheap
Number of elements
seco
nd
s
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 195/205
1 10 100 1000 10000 1000000
0
0
0
0
0
0.01
0.1
Execution time
linear
bsearch
map
heap
bheap
Number of elements
seco
nd
s
1 10 100 1000 10000 1000000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Execution time relative map
heap/map
bheap/map
Number of elements
fact
or
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 196/205
Rules of thumb
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 197/205
Rules of thumb
● Following a pointer is a cache miss, unless you have information to the contrary
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 198/205
Rules of thumb
● Following a pointer is a cache miss, unless you have information to the contrary
● Smaller working data set is better
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 199/205
Rules of thumb
● Following a pointer is a cache miss, unless you have information to the contrary
● Smaller working data set is better
● Use as much of a cache entry as you can
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 200/205
Rules of thumb
● Following a pointer is a cache miss, unless you have information to the contrary
● Smaller working data set is better
● Use as much of a cache entry as you can
● Sequential memory accesses can be very fast due to prefetching
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 201/205
Rules of thumb
● Following a pointer is a cache miss, unless you have information to the contrary
● Smaller working data set is better
● Use as much of a cache entry as you can
● Sequential memory accesses can be very fast due to prefetching
● Fewer evicted cache lines means more data in hot cache for the rest of the program
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 202/205
Rules of thumb
● Following a pointer is a cache miss, unless you have information to the contrary
● Smaller working data set is better
● Use as much of a cache entry as you can
● Sequential memory accesses can be very fast due to prefetching
● Fewer evicted cache lines means more data in hot cache for the rest of the program
● Mispredicted branches can evict cache entries
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 203/205
Rules of thumb
● Following a pointer is a cache miss, unless you have information to the contrary
● Smaller working data set is better
● Use as much of a cache entry as you can
● Sequential memory accesses can be very fast due to prefetching
● Fewer evicted cache lines means more data in hot cache for the rest of the program
● Mispredicted branches can evict cache entries
● Measure measure measure
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 204/205
Resources
Ulrich Drepper - “What every programmer should know about memory”http://www.akkadia.org/drepper/cpumemory.pdf
Milian Wolff - “Linux perf for Qt Developers”https://www.youtube.com/watch?v=L4NClVxqdMw
Travis Downs - “Cache counters rant”https://gist.github.com/travisdowns/90a588deaaa1b93559fe2b8510f2a739
Emery Berger - “Performance Matters” https://www.youtube.com/watch?v=r-TLSBdHe1A
What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 205/205
@bjorn_fahller
@rollbear
Björn Fahller
What Do You Mean by “Cache Friendly”?