wei bai with li chen, kai chen, dongsu han, chen tian, hao wang sing group @ hkust...

52
Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU, June 1st, 2015

Upload: douglas-parks

Post on 04-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

Wei Baiwith Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang

SING Group @ HKUST

Information-Agnostic Flow Scheduling for Commodity Data Centers

1SJTU, June 1st, 2015

Page 2: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

2

Data Centers

Page 3: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

3

This talk is about the transport inside the data center

Page 4: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

4

Data Center Networks

• High bandwidth• Low latency• Commodity switches• Single administrative domain

Page 5: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

5

Data Center Transport

• Cloud applications– Desire low latency for short messages

• Goal: Minimize flow completion time (FCT)– Many flow scheduling proposals…

Page 6: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

6

The State-of-the-art Solutions

• PDQ [SIGCOMM’12]

• pFabric [SIGCOMM’13]

• PASE [SIGCOMM’14]• …

All assume prior knowledge of flow size information to approximate ideal preemptive Shortest Job First (SJF) with customized network elements

Page 7: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

7

The State-of-the-art Solutions

• PDQ [SIGCOMM’12]

• pFabric [SIGCOMM’13]

• PASE [SIGCOMM’14]• …

All assume prior knowledge of flow size information to approximate ideal preemptive Shortest Job First (SJF) with customized network elements

Not feasible for some applications

Page 8: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

8

The State-of-the-art Solutions

• PDQ [SIGCOMM’12]

• pFabric [SIGCOMM’13]

• PASE [SIGCOMM’14]• …

All assume prior knowledge of flow size information to approximate ideal preemptive Shortest Job First (SJF) with customized network elements

Hard to deploy in practice

Page 9: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

9

Question

Without prior knowledge of flow size information, how to minimize FCT in commodity data centers?

Page 10: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

10

Design Goal 1

Without prior knowledge of flow size information, how to minimize FCT in commodity data centers?

Information-agnostic: not assume a priori knowledge of flow size information available from the applications

Page 11: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

11

Design Goal 2

Without prior knowledge of flow size information, how to minimize FCT in commodity data centers?

FCT minimization: minimize average and tail FCTs of short flows & not adversely affect FCTs of large flows

Page 12: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

12

Design Goal 3

Without prior knowledge of flow size information, how to minimize FCT in commodity data centers?

Readily-deployable: work with existing commodity switches & be compatible with legacy network stacks

Page 13: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

13

Question

Without prior knowledge of flow size information, how to minimize FCT in commodity data centers?

Our answer: PIAS

Page 14: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

14

PIAS’S DESIGN

Page 15: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

15

Design Rationale

• PIAS performs Multi-Level Feedback Queue (MLFQ) to emulate Shortest Job First

Priority 1

Priority 2

Priority K

……

High

Low

Page 16: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

16

Design Rationale

• PIAS performs Multi-Level Feedback Queue (MLFQ) to emulate Shortest Job First

Priority 1

Priority 2

Priority K

……

Page 17: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

17

Simple Example Illustrating PIAS

Congestion

Page 18: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

18

Simple Example Illustrating PIAS

Priority 1

Priority 2

Priority 4

Flow 1 with 10 packets and flow 2 with 2 packets arrive

Priority 3

Page 19: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

19

Simple Example Illustrating PIAS

Priority 1

Priority 2

Priority 4

Flow 1 and 2 transmit simultaneously

Priority 3

Page 20: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

20

Simple Example Illustrating PIAS

Priority 1

Priority 2

Priority 4

Flow 2 finishes while flow 1 is demoted to priority 2

Priority 3

Page 21: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

21

Simple Example Illustrating PIAS

Priority 1

Priority 2

Priority 4

Flow 3 with 2 packets arrives

Priority 3

Page 22: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

22

Simple Example Illustrating PIAS

Priority 1

Priority 2

Priority 4

Flow 3 and 1 transmit simultaneously

Priority 3

Page 23: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

23

Simple Example Illustrating PIAS

Priority 1

Priority 2

Priority 4

Flow 3 finishes while flow 1 is demoted to priority 3

Priority 3

Page 24: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

24

Simple Example Illustrating PIAS

Priority 1

Priority 2

Priority 4

Flow 4 with 2 packets arrives

Priority 3

Page 25: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

25

Simple Example Illustrating PIAS

Priority 1

Priority 2

Priority 4

Flow 4 and 1 transmit simultaneously

Priority 3

Page 26: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

26

Simple Example Illustrating PIAS

Priority 1

Priority 2

Priority 4

Priority 3

Flow 4 finishes while flow 1 is demoted to priority 4

Page 27: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

27

Simple Example Illustrating PIAS

Priority 1

Priority 2

Priority 4

Priority 3

Eventually, flow 1 finishes in priority 4

With MLFQ, PIAS can emulate Shortest Job First without prior knowledge of flow size information

Page 28: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

28

How to implement?

Priority 1

Priority 2

Priority K

……

• Strict priority queueing on switches• Packet tagging as a shim layer at end hosts

• priorities:

• demotion thresholds: • The threshold to demote priority

from to is

Page 29: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

29

How to implement?

Priority 1

Priority 2

Priority K

……

• Strict priority queueing on switches• Packet tagging as a shim layer at end hosts

1

Page 30: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

30

How to implement?

Priority 1

Priority 2

Priority K

……

• Strict priority queueing on switches• Packet tagging as a shim layer at end hosts

Page 31: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

31

How to implement?

Priority 1

Priority 2

Priority K

……

• Strict priority queueing on switches• Packet tagging as a shim layer at end hosts

2

Page 32: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

32

• Thresholds depend on:– Flow size distribution– Traffic load

• Solution:– Solve a FCT minimization problem to calculate

demotion thresholds • Problem:– Traffic is highly dynamic

Determine Thresholds

Traffic variations -> Mismatched thresholds

Page 33: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

33

Impact of Mismatches

High

Low

10MB

10MB

20KB

Page 34: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

34

• When the threshold is perfect (20KB)

Impact of Mismatches

Page 35: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

35

• When the threshold is too small (10KB)

Impact of Mismatches

Increased latency for short flows

Page 36: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

36

• When the threshold is too large (1MB)

Impact of Mismatches

Increased latency for short flows

Page 37: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

37

• When the threshold is too large (1MB)

Impact of Mismatches

Increased latency for short flows

Leverage ECN to keep low buffer occupation

Page 38: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

38

• When the threshold is too small (10KB)

Handle Mismatches

If we enable ECN

Page 39: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

39

• When the threshold is too small (10KB)

Handle Mismatches

ECN can keep low latency

Page 40: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

40

• When the threshold is too large (1MB)

Handle Mismatches

If we enable ECN

Page 41: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

41

• When the threshold is too large (1MB)

Handle Mismatches

ECN can keep low latency

Page 42: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

42

PIAS in 1 Slide

• PIAS packet tagging– Maintain flow states and mark packets with priority

• PIAS switches– Enable strict priority queueing and ECN

• PIAS rate control– Employ Data Center TCP to react to ECN

Page 43: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

43

Testbed Experiments

• PIAS prototype– http://sing.cse.ust.hk/projects/PIAS

• Testbed Setup– A Gigabit Pronto-3295 switch– 16 Dell servers

• Benchmarks– Web search (DCTCP paper)– Data mining (VL2 paper)– Memcached

Page 44: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

44

Small Flows (<100KB)

Web Search Data Mining

Compared to DCTCP, PIAS reduces average FCT of small flows by up to 47% and 45%

47%45%

Page 45: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

45

NS2 Simulation Setup

• Topology– 144-host leaf-spine fabric with 10G/40G links

• Workloads– Web search (DCTCP paper) – Data mining (VL2 paper)

• Schemes– Information-agnostic: PIAS, DCTCP and L2DCT – Information-aware: pFabric

Page 46: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

46

Overall Performance

Web Search Data Mining

PIAS has an obvious advantage over DCTCP and L2DCT in both workloads.

Page 47: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

47

Small Flows (<100KB)

Simulations confirm testbed experiment results

Web Search Data Mining

40% - 50% improvement

Page 48: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

48

Comparison with pFabric

PIAS only has 4.9% performance gap to pFabric for small flows in data mining workload

Web Search Data Mining

Page 49: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

49

Conclusion

• PIAS: practical and effective– Not assume flow information from applications

– Enforce Multi-Level Feedback Queue scheduling

– Use commodity switches & legacy network stacks

Information-agnostic

FCT minimization

Readily deployable

Page 50: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

50

Thanks!

Page 51: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

51

Starvation

• Measurement– 5000 flows, 5.7 million MTU-sized packets– 200 timeouts, 31 two consecutive timeouts

• Solutions– Per-port ECN pushes back high priority flows when

many low priority flow get starved– Treating a long-term starved flow as a new flow

Page 52: Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING Group @ HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

52

Persistent Connections

• Solution: periodically reset flow states based on more behaviors of traffic– When a flow idles for some time, we reset the

bytes sent of this flow to 0.– Define a flow as packets demarcated by incoming

packets with payload within a single connection