high-performance networks and architectures (hina) - flowtune · 2019. 12. 30. · datacenter...

50
Jonathan Perry, Hari Balakrishnan and Devavrat Shah Flowtune Flowlet Control for Datacenter Networks

Upload: others

Post on 01-Jan-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Jonathan Perry, Hari Balakrishnan and Devavrat Shah

Flowtune

Flowlet Control for

Datacenter Networks

Page 2: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Software in the Datacenter

Page 3: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

• Response Time: Productivity, Revenue, Reputation

• microservices develop network is centraldeployscale

Software in the Datacenter

Page 4: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Traditional approach is packet-centric

Switch Mechanisms

Server Mechanisms

Implicit Allocation

Several RTTto converge

Changes manycomponents

Page 5: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Implicit Allocation

Several RTTto converge

Changes manycomponents

Allocate network resources• Explicitly (maximize utility)

• Quickly, Consitently

• Flexibly (in software)

Page 6: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Flowtune’s approach

1. Flowlet control

send() flowlet

2. Logically centralized• Reduce RTT dependence

Page 7: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Example

AB

Allocator

R2R1

C

Hadoop on Server A has data for B:

Page 8: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Example

A Allocator “Hadoop on A has data for B”

AB

Allocator

R2R1

C

Hadoop on Server A has data for B:

Page 9: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Example

A Allocator “Hadoop on A has data for B”

Allocator Assign rates

AB

Allocator

R2R1

C

Hadoop on Server A has data for B:

Page 10: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Example

A Allocator “Hadoop on A has data for B”

Allocator Assign rates

Allocator A “Send at 40Gbps”

AB

Allocator

R2R1

C

Hadoop on Server A has data for B:

Page 11: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Example

Now say ad_server on Server C has data for B:

AB

Allocator

R2R1

C

Page 12: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Example

Now say ad_server on Server C has data for B:

C Allocator “ad_server on C has data for B”

AB

Allocator

R2R1

C

Page 13: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Example

Now say ad_server on Server C has data for B:

C Allocator “ad_server on C has data for B”

Allocator Assign rates

AB

Allocator

R2R1

C

Page 14: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Example

Now say ad_server on Server C has data for B:

C Allocator “ad_server on C has data for B”

Allocator Assign rates

Allocator A “Send at 5Gbps”

Allocator C “Send at 35Gbps”

AB

Allocator

R2R1

C

Page 15: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Hadoop flowlet rate 2x $0.05Ads flowlet rate 2x $0.20

Network Utility Maximization (NUM)

Ads flowlet:Hadoop flowlet:

Kelly et al., Journal of the Operational Research Society, 1998

Page 16: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

NUM Iterative Optimizer

- =

2. Each flow 𝑠 chooses rate 𝑥𝑠

1. Each link ℓ chooses price 𝑝ℓ using

×

3. Goto 1

SupplyDemand

flow rates on ℓ − link capacity

Page 17: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Adjusting prices

Page 18: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Adjusting prices

NewtonExact

Diagonal(NED)

Page 19: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Increasing responsiveness

Solution 1: Updateinputs

Run 100iterations

Outputrates

But: too slow!

Page 20: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Increasing responsiveness

Solution 1:

Solution 2:

Updateinputs

Run 100iterations

Outputrates

But: too slow!

Updateinputs

Run 1iteration

Outputrates

$0.01$0.05$0.09

Page 21: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Increasing responsiveness

Solution 1:

Solution 2:

Updateinputs

Run 100iterations

Outputrates

But: too slow!

But: queueing, packet drops!

Updateinputs

Run 1iteration

Outputrates

Page 22: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Increasing responsiveness

Solution 1:

Solution 2:

Solution 3:

Updateinputs

Run 100iterations

Outputrates

But: too slow!

But: queueing, packet drops!

Updateinputs

Run 1iteration

Outputrates

Updateinputs

Run 1iteration

Normalizerates

Outputrates

Page 23: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Flowtune normalizes rates

Page 24: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Flowtune normalizes rates

𝑟ℓ =allocation

capacity

Page 25: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Flowtune normalizes rates

ෝ𝑥𝑠 =𝑥𝑠

max 𝑟ℓ

𝑟ℓ =allocation

capacity

Page 26: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Flowtune normalizes rates

99.7% of optimal

throughput

ෝ𝑥𝑠 =𝑥𝑠

max 𝑟ℓ

𝑟ℓ =allocation

capacity

Page 27: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Architecture

Endpoints

Optimizer

Normalizer

rates

flowletstart/end

normalizedrates

Allocator

Page 28: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Multicore

Core 1: 𝑝1 𝑝2 𝑝3 𝑝4

Core 2: 𝑝5 𝑝6 𝑝7 𝑝81 2 3 4

8

76

5

Page 29: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Multicore

Core 1: 𝑝1 𝑝2 𝑝3 𝑝4

Core 2: 𝑝5 𝑝6 𝑝7 𝑝81 2 3 4

8

76

5

Page 30: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Multicore

Core 1: 𝑝1 𝑝2 𝑝3 𝑝4

Core 2: 𝑝5 𝑝6 𝑝7 𝑝81 2 3 4

8

76

5

Page 31: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Multicore

For each flow 𝑥 compute 𝑔(links 𝑥 traverses)

For each link ℓ compute 𝑓(flows on ℓ)

Core 1: 𝑝1 𝑝2 𝑝3 𝑝4

Core 2: 𝑝5 𝑝6 𝑝7 𝑝81 2 3 4

8

76

5

Page 32: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Source

Blo

ck1

core

0core

1core

2core

3

Blo

ck2

core

4core

5core

6core

7

Blo

ck 3

core

8core

9core

10core

11

Blo

ck 4

core

12core

13core

14core

15Block 1 Block 2 Block 3 Block 4

Destination

Block 1 Block 2 Block 3 Block 4

Multicore

Page 33: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Sou

rce

Blo

ck1

core

0core

1core

2core

3

Blo

ck2

core

4core

5core

6core

7

Blo

ck 3

core

8core

9core

10core

11

Blo

ck 4

core

12core

13core

14core

15Block 1 Block 2 Block 3 Block 4

Destination

Block 1 Block 2 Block 3 Block 4

Multicore

Page 34: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Sou

rce

Blo

ck1

core

0core

1core

2core

3

Blo

ck2

core

4core

5core

6core

7

Blo

ck 3

core

8core

9core

10core

11

Blo

ck 4

core

12core

13core

14core

15Block 1 Block 2 Block 3 Block 4

Destination

Block 1 Block 2 Block 3 Block 4

Multicore

Page 35: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Sou

rce

Blo

ck1

core

0core

1core

2core

3

Blo

ck2

core

4core

5core

6core

7

Blo

ck 3

core

8core

9core

10core

11

Blo

ck 4

core

12core

13core

14core

15Block 1 Block 2 Block 3 Block 4

Destination

Block 1 Block 2 Block 3 Block 4

Multicore

Page 36: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Sou

rce

Blo

ck1

core

0core

1core

2core

3

Blo

ck2

core

4core

5core

6core

7

Blo

ck 3

core

8core

9core

10core

11

Blo

ck 4

core

12core

13core

14core

15Block 1 Block 2 Block 3 Block 4

Destination

Block 1 Block 2 Block 3 Block 4

Multicore

Page 37: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Sou

rce

Blo

ck1

core

0core

1core

2core

3

Blo

ck2

core

4core

5core

6core

7

Blo

ck 3

core

8core

9core

10core

11

Blo

ck 4

core

12core

13core

14core

15Block 1 Block 2 Block 3 Block 4

Destination

Block 1 Block 2 Block 3 Block 4

Multicore

Page 38: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Sou

rce

Blo

ck1

core

0core

1core

2core

3

Blo

ck2

core

4core

5core

6core

7

Blo

ck 3

core

8core

9core

10core

11

Blo

ck 4

core

12core

13core

14core

15Block 1 Block 2 Block 3 Block 4

Destination

Block 1 Block 2 Block 3 Block 4

Multicore

Page 39: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Sou

rce

Blo

ck1

core

0core

1core

2core

3

Blo

ck2

core

4core

5core

6core

7

Blo

ck 3

core

8core

9core

10core

11

Blo

ck 4

core

12core

13core

14core

15Block 1 Block 2 Block 3 Block 4

Destination

Block 1 Block 2 Block 3 Block 4

Multicore

Page 40: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Sou

rce

Blo

ck1

core

0core

1core

2core

3

Blo

ck2

core

4core

5core

6core

7

Blo

ck 3

core

8core

9core

10core

11

Blo

ck 4

core

12core

13core

14core

15Block 1 Block 2 Block 3 Block 4

Destination

Block 1 Block 2 Block 3 Block 4

Multicore

Page 41: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Sou

rce

Blo

ck1

core

0core

1core

2core

3

Blo

ck2

core

4core

5core

6core

7

Blo

ck 3

core

8core

9core

10core

11

Blo

ck 4

core

12core

13core

14core

15Block 1 Block 2 Block 3 Block 4

Destination

Block 1 Block 2 Block 3 Block 4

Multicore

Page 42: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Sou

rce

Blo

ck1

core

0core

1core

2core

3

Blo

ck2

core

4core

5core

6core

7

Blo

ck 3

core

8core

9core

10core

11

Blo

ck 4

core

12core

13core

14core

15Block 1 Block 2 Block 3 Block 4

Destination

Block 1 Block 2 Block 3 Block 4

Multicore

Page 43: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Sou

rce

Blo

ck1

core

0core

1core

2core

3

Blo

ck2

core

4core

5core

6core

7

Blo

ck 3

core

8core

9core

10core

11

Blo

ck 4

core

12core

13core

14core

15Block 1 Block 2 Block 3 Block 4

Destination

Block 1 Block 2 Block 3 Block 4

Multicore

Page 44: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

In the paper…

core

0core

1core

2core

3

core

4core

5core

6core

7

core

8core

9core

10core

11

core

12core

13core

14core

15

core

0core

1core

2core

3core

4core

5core

6core

7

core

8core

9core

10core

11

core

12core

13core

14core

15

core

0core

1core

2core

3

core

4core

5core

6core

7

core

8core

9core

10core

11

core

12core

13core

14core

15

Flow view Link view

Page 45: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

4608 servers in < 31𝜇𝑠

Communication

>1

2of time

Page 46: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

EC2: Resource Allocation

senders

receiver

8 senders, every 50 milliseconds

Page 47: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Ns-2: Flowtune converges quickly to a fair allocation

Every 10 milliseconds:

senders

receiver

Page 48: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Overhead is lowFr

acti

on

of

net

wo

rk c

apac

ity

Inside the Social Network’s (Datacenter) Network, Roy et al., SIGCOMM’15

99% of links< 10% utilized

Page 49: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Open Questions

• Handling mice• Bypass the allocator? Fastpass?

• External traffic• Measure & react?

• Deadlines, Co-flow• Market?

• Multicore: 3-tier Clos, WAN

Page 50: High-performance Networks and Architectures (HiNA) - Flowtune · 2019. 12. 30. · Datacenter Networks. Software in the Datacenter •Response Time: Productivity, Revenue, Reputation

Flowtune

Give application developers control over network transport

Explicit PolicyApplication

DevelopersNetworkEngineers