dynamic resource allocation in apache spark

Post on 23-Jan-2018

1.679 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

DynamicResourceAlloca1oninApacheSpark

YutaImai@imai_factory

1.RDDGraphvaltext="HelloSpark,thisismyfirstSparkapplication."valtextArray=text.split("").map(_.replaceAll("",""))valresult=sc.parallelize(textArray).map(item=>(item,1)).reduceByKey((x,y)=>x+y).collect()

Array ArrayParallelCollec1onRDD

Par11on0

Par11on1

Par11on2

Par11on3

MapPar11onsRDD

Par11on0

Par11on1

Par11on2

Par11on3

ShuffledRDD

Par11on0

Par11on1

sc.parallelize() .map(…) .reduceByKey(…) .collect()

2.DAGScheduler

Array ArrayParallelCollec1onRDD

Par11on0

Par11on1

Par11on2

Par11on3

MapPar11onsRDD

Par11on0

Par11on1

Par11on2

Par11on3

ShuffledRDD

Par11on0

Par11on1

sc.parallelize() .map(…) .reduceByKey(…) .collect()

2.DAGScheduler

NarrowDependency ShuffleDependency

Array ArrayParallelCollec1onRDD

Par11on0

Par11on1

Par11on2

Par11on3

MapPar11onsRDD

Par11on0

Par11on1

Par11on2

Par11on3

ShuffledRDD

Par11on0

Par11on1

sc.parallelize() .map(…) .reduceByKey(…) .collect()

2.DAGScheduler

NarrowDependency ShuffleDependency

Stage0 Stage1

Task0

Task1

Task2

Task3

Task4

Task5

3.TaskScheduler

Par11on0

Par11on1

Par11on2

Par11on3

Par11on0

Par11on1

Par11on2

Par11on3

Task0

Task1

Task2

Task3

Executors

ShuffleFile

iterator.map(…).map(...)...

Executor

ThreadStorage

WorkerNode

iterator.map(…).map(...)...

Executor

Thread

WorkerNode

DYNAMICRESOURCEALLOCATION

DynamicResourceAlloca1on•  Addsextraexecutorstoanappwhichhaspendingtasks.– Offloadschallengeforexactresourceplanningforanapp.

•  Removesidleexecutorsfromanapp.– Helpsalongrunningapptofreeidleexecutors.

Overview

Tasks

Executors

Overview

Tasks

Executors

Insufficientcapacity

Overview

Tasks

Executors

Insufficientcapacity

Overview

Tasks

Executors

Insufficientcapacity

Overview

Tasks

Executors

Insufficientcapacity Op1malcapacity

Overview

Tasks

Executors

✔ ✔

Insufficientcapacity Op1malcapacity Idleexecutors

Tasks

Executors

✔ ✔

Overview

Insufficientcapacity Op1malcapacity Idleexecutors

Op1malcapacity

RequestPolicy•  Anappstartswithuserspecified#ofexecutors.

./bin/spark-submit\--class<main-class>--master<master-url>\--num-executors<#ofexecutors>

•  Aderspark.dynamicAlloca1on.schedulerBacklogTimeout(sec),Appstartsreques1ngnewexecutors,ifithaspendingtask(s).

•  Apprequestsnewexecutorseveryspark.dynamicAlloca1on.sustainedSchedulerBacklogTimeout(sec),withdoubling#ofrequestslike1,2,4,8,16…

RemovePolicy•  Anappremovesanexecutorwhenithasbeenidleformore

thanspark.dynamicAlloca1on.executorIdleTimeoutseconds.

ExternalShuffleService

iterator.map(…).map(...)...

Executor

ThreadStorage

WorkerNode

iterator.map(…).map(...)...

Executor

Thread

WorkerNode

ExternalShuffleService

iterator.map(…).map(...)...

Executor

ThreadStorage

WorkerNode

iterator.map(…).map(...)...

Executor

Thread

WorkerNode

ExternalShuffleService

iterator.map(…).map(...)...

Executor

ThreadStorage

WorkerNode

iterator.map(…).map(...)...

Executor

Thread

WorkerNode

ShuffleService

ShuffleService

top related