ever growing cpu states: context switch with less memory ... · context switch with less memory and...

26
1 Ever Growing CPU States: Context Switch with Less Memory and Better Performance Fenghua Yu <[email protected]>

Upload: others

Post on 27-May-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ever Growing CPU States: Context Switch with Less Memory ... · Context Switch with Less Memory and Better Performance ... AVX/AVX2/AVX-512: Advanced Vector Extensions ... Intel 64

1

Ever Growing CPU States:

Context Switch with Less Memory and Better Performance

Fenghua Yu <[email protected]>

Page 2: Ever Growing CPU States: Context Switch with Less Memory ... · Context Switch with Less Memory and Better Performance ... AVX/AVX2/AVX-512: Advanced Vector Extensions ... Intel 64

2

Agenda

• Introduction

• Impact of xstates in Context Switch

• Context Switch Optimizations for xstates

• Kernel Implementation for xstates Context Switch

• Security Concern and Solution

• Status and Future Work

• Q & A

Page 3: Ever Growing CPU States: Context Switch with Less Memory ... · Context Switch with Less Memory and Better Performance ... AVX/AVX2/AVX-512: Advanced Vector Extensions ... Intel 64

3

Terms

FP: Floating Point

SSE: Streaming SIMD Extensions

AVX/AVX2/AVX-512: Advanced Vector Extensions

MPX: Memory Protection Extensions

Extended States (xstates): Currently include FP/SSE/AVX2/MPX/AVX-512

registers

Xsave area: kernel mem allocated for xstates context per process defined in

xsave_struct

To reduce confusion, the following terms are used in this presentation:

Page 4: Ever Growing CPU States: Context Switch with Less Memory ... · Context Switch with Less Memory and Better Performance ... AVX/AVX2/AVX-512: Advanced Vector Extensions ... Intel 64

4

CPU

Registers

Introduction – X86 Context Switch Flow

xstates

basic

Kernel memory

Register context for process A

xstates registers

per_event registers

Basic registers (segment, ip, cr, etc)

Register context for process B

Xstates registers

Per_event registers

Basic registers(segment, ip, cr, etc)

Perf_event

Save registers

Restore registers

Page 5: Ever Growing CPU States: Context Switch with Less Memory ... · Context Switch with Less Memory and Better Performance ... AVX/AVX2/AVX-512: Advanced Vector Extensions ... Intel 64

5

Agenda

• Introduction

• Impact of xstates in Context Switch

• Context Switch Optimizations for xstates

• Kernel Implementation for xstates Context Switch

• Security Concern and Solution

• Status and Future Work

• Q & A

Page 6: Ever Growing CPU States: Context Switch with Less Memory ... · Context Switch with Less Memory and Better Performance ... AVX/AVX2/AVX-512: Advanced Vector Extensions ... Intel 64

6

Why Care Xstates? Large Portion of CPU States Are

From Xstates

xstates(FP/SSE/AVX/MPX/AVX-512)

Perf_eventregisters

Basic registers

Total 3480 bytes/process currently maintained in kernel

xstates perf_event registers basic registers

72%

24%4%

Impact of improperly handling

large xstates:

1. Large memory footprint

2. Large cache footprint

3. Slow context switch

execution

4. Slow response to user and

bad user experience

5. Overall performance

degradation

Page 7: Ever Growing CPU States: Context Switch with Less Memory ... · Context Switch with Less Memory and Better Performance ... AVX/AVX2/AVX-512: Advanced Vector Extensions ... Intel 64

7

And Xstates Are Growing Over Years

0

500

1000

1500

2000

2500

3000

Xsta

tes S

ize (

byte

s)

Time

Legacy FP State (160 bytes) Legacy SSE State (352 bytes) AVX2(YMM_H) (256 bytes) MPX (128 bytes) AVX-512 (1600 bytes)

Page 8: Ever Growing CPU States: Context Switch with Less Memory ... · Context Switch with Less Memory and Better Performance ... AVX/AVX2/AVX-512: Advanced Vector Extensions ... Intel 64

8

Agenda

• Introduction

• Impact of xstates in Context Switch

• Context Switch Optimizations for xstates

• Kernel Implementation for xstates Context Switch

• Security Concern and Solution

• Status and Future Work

• Q & A

Page 9: Ever Growing CPU States: Context Switch with Less Memory ... · Context Switch with Less Memory and Better Performance ... AVX/AVX2/AVX-512: Advanced Vector Extensions ... Intel 64

9

Optimization 1: Init Optimization

The state is not used and is not saved/restored during context switch

Process

life time

FP

SSE

YMM_H

MPX

AVX-

512

The state is used and is saved/restored during context switch

Start First Use FP First Use AVX2 First Use SSE First Use AVX512 Done

Start saving/restoring a state only after it is first used

Page 10: Ever Growing CPU States: Context Switch with Less Memory ... · Context Switch with Less Memory and Better Performance ... AVX/AVX2/AVX-512: Advanced Vector Extensions ... Intel 64

10

Optimization 2: Modified Optimization

CPU

Kernel memory

Xsave area for process P

1. Switch to P: restores all xstates

3. Switch out P: only saves modified

FP and MPX registers back

FP SSE YMM_H MPX AVX512

10.50+b*c=10

Y= encrypt(x);

AVXcalculations

2. P runs: changes FP

and MPX registers

Example of how only modified FP and MPX registers are detected and saved

FP SSE YMM_H MPX AVX512

Page 11: Ever Growing CPU States: Context Switch with Less Memory ... · Context Switch with Less Memory and Better Performance ... AVX/AVX2/AVX-512: Advanced Vector Extensions ... Intel 64

11

Optimization 3: Compacted Format of Xsave Area

Xstates Byte

offset

Legacy FP State 0

Legacy SSE State 160

Xsave Header Data 512

YMM_H State (256 bytes) 576

MPX_BNDREGS(64 bytes) 832

MPX BNDCSR (64 bytes) 896

AVX-512 KMASK (64 bytes) 960

AVX-512 ZMM_H (512 bytes) 1024

AVX-512 ZMM (1024 bytes) 1536

Xstates Byte

offset

Legacy FP State 0

Legacy SSE State 160

Xsave Header Data 512

AVX-512 KMASK (64 bytes) 576

AVX-512 ZMM_H (512 bytes) 640

AVX-512 ZMM (1024 bytes) 1152

Scenario 1: FP/SSE/AVX/MPX/AVX3

are enabled in processor

Scenario 2: Only FP/SSE/AVX512 are

enabled in processor

Total size: 2560 bytes/process

Total size: 2176 bytes/process

Scenario 2 occupies 384 bytes or

15% less mem than scenario 1 for

xsave area per process

Page 12: Ever Growing CPU States: Context Switch with Less Memory ... · Context Switch with Less Memory and Better Performance ... AVX/AVX2/AVX-512: Advanced Vector Extensions ... Intel 64

12

Xstates Context Switch Instructions Overview

Instructions Format Optimization States Xsave

AreaStandard Compacted Init Modified User Supervisor

fxsave/

Fxrstor Legacy FP, SSE

xsave/

xrstor Legacy

FP, SSE

AVX2, AVX-512, MPX

xsaveopt/

xrstor

xsavec/

xrstor

xsaves/

xrstors All above + Supervisor

States

Page 13: Ever Growing CPU States: Context Switch with Less Memory ... · Context Switch with Less Memory and Better Performance ... AVX/AVX2/AVX-512: Advanced Vector Extensions ... Intel 64

13

Agenda

• Introduction

• Impact of xstates in Context Switch

• Context Switch Optimizations for xstates

• Kernel Implementation for xstates Context Switch

• Security Concern and Solution

• Status and Future Work

• Q & A

Page 14: Ever Growing CPU States: Context Switch with Less Memory ... · Context Switch with Less Memory and Better Performance ... AVX/AVX2/AVX-512: Advanced Vector Extensions ... Intel 64

14

Saving Current Xstates to Previous Process

fxsave

xsave

xsaveopt

xsaves

Save xstate

xsaves

CPU

xsaveopt

xsave

Kernel memory

Xsave area in prev process

Xsave area in next process

Y

Y

Y

N

N

N

Save xstate

registers

N

Page 15: Ever Growing CPU States: Context Switch with Less Memory ... · Context Switch with Less Memory and Better Performance ... AVX/AVX2/AVX-512: Advanced Vector Extensions ... Intel 64

15

Loading New Xstates from Next Process

fxrstor

xrstor

xrstors

restore xstate

xsaves

CPUxsave

Y

Y

N

N

Kernel memory

Xsave area in prev process

Xsave area in next processRestore xstate

registers

Page 16: Ever Growing CPU States: Context Switch with Less Memory ... · Context Switch with Less Memory and Better Performance ... AVX/AVX2/AVX-512: Advanced Vector Extensions ... Intel 64

16

Standard Format of Xsave Area in User Space for Backward

Compatibilty with Legacy Applications

Process P:

xrstorxsave

CPU

signal handler

signal context: Standard

format of xsave area

Kernel memory

Compacted formatted

xsave area for process P

xrstors

xsaves

Kernel

space

User

space

Page 17: Ever Growing CPU States: Context Switch with Less Memory ... · Context Switch with Less Memory and Better Performance ... AVX/AVX2/AVX-512: Advanced Vector Extensions ... Intel 64

17

Kernel memory

Xsave area for process P

Kernel API for Accessing Registers in Compacted

Format of Xsave Area

Legacy FP State 0

Legacy SSE State 160

Xsave Header Data 512

MPX_BNDREGS 832

MPX_BNDCSR 896

Scenario: Only FP/SSE/MPX are

enabled in processor

+

MPX caller

Base addr

MPX registersaddr

MPX registers offset

get_xsave_addr

All offsets in xsave area are calculated during kernel boot from cupid(eax=0x0d, ecx=n, n>1)

Page 18: Ever Growing CPU States: Context Switch with Less Memory ... · Context Switch with Less Memory and Better Performance ... AVX/AVX2/AVX-512: Advanced Vector Extensions ... Intel 64

18

Agenda

• Introduction

• Impact of xstates in Context Switch

• Context Switch Optimizations for xstates

• Kernel Implementation for xstates Context Switch

• Security Concern and Solution

• Status and Future Work

• Q & A

Page 19: Ever Growing CPU States: Context Switch with Less Memory ... · Context Switch with Less Memory and Better Performance ... AVX/AVX2/AVX-512: Advanced Vector Extensions ... Intel 64

19

Potential Security Concern for Supervisor States

CPU

Kernel

space

User

space

Ptrace tool

Kernel buffer

User buffer User buffer

Supervisor xstates

read by user

Kernel buffer

Supervisor xstates

written by userxsavesxrstors

Page 20: Ever Growing CPU States: Context Switch with Less Memory ... · Context Switch with Less Memory and Better Performance ... AVX/AVX2/AVX-512: Advanced Vector Extensions ... Intel 64

20

Solution for Security Concern for Supervisor States

CPU

Kernel

space

User

space

Ptrace tool

Kernel buffer

User buffer User buffer

No supervisor xstates

leaked to user

Kernel buffer

No supervisor xstates

written by userxrstors xsaves

filter out supervisor xstates filter out supervisor xstates

Page 21: Ever Growing CPU States: Context Switch with Less Memory ... · Context Switch with Less Memory and Better Performance ... AVX/AVX2/AVX-512: Advanced Vector Extensions ... Intel 64

21

Agenda

• Introduction

• Impact of xstates in Context Switch

• Context Switch Optimizations for xstates

• Kernel Implementation for xstates Context Switch

• Security Concern and Solution

• Status and Future Work

• Q & A

Page 22: Ever Growing CPU States: Context Switch with Less Memory ... · Context Switch with Less Memory and Better Performance ... AVX/AVX2/AVX-512: Advanced Vector Extensions ... Intel 64

22

Patches Status

Instructions Kernel

fxsave/

fxrstor

In 3.16 or earlier version

xsave/

xrstor

xsaveopt/

xrstor

xsavec/

xrstor

Not implemented

xsaves/

xrstors

In 3.17

Page 23: Ever Growing CPU States: Context Switch with Less Memory ... · Context Switch with Less Memory and Better Performance ... AVX/AVX2/AVX-512: Advanced Vector Extensions ... Intel 64

23

Future Work

• Init optimization for xrstor/xrstors in kernel.

• Init optimization for xsaveopt/xsavec/xsaves is implemented in

processor

• Enable supervisor xstates once hardware implementation is

available.

• Currently there is no supervisor xstate implemented yet.

• Performance improvement measurement

Page 24: Ever Growing CPU States: Context Switch with Less Memory ... · Context Switch with Less Memory and Better Performance ... AVX/AVX2/AVX-512: Advanced Vector Extensions ... Intel 64

24

Acknowledgements

Asit Mallick, H. Peter Anvin, Glenn Williamson, Bruce Schlobohm

(Intel SSG/OTC)

Page 25: Ever Growing CPU States: Context Switch with Less Memory ... · Context Switch with Less Memory and Better Performance ... AVX/AVX2/AVX-512: Advanced Vector Extensions ... Intel 64

25

References

[1] Intel 64 and IA-32 Architectures Software Developer’s Manual

(Volume 1, 2, 3)

[2] Intel® Architecture Instruction Set Extensions Programming

Reference

[3] Linux Kernel Source Tree.

Page 26: Ever Growing CPU States: Context Switch with Less Memory ... · Context Switch with Less Memory and Better Performance ... AVX/AVX2/AVX-512: Advanced Vector Extensions ... Intel 64

Q & A