dynamic resource allocation in apache spark

21
Dynamic Resource Alloca1on in Apache Spark Yuta Imai @imai_factory

Upload: yuta-imai

Post on 23-Jan-2018

1.678 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Dynamic Resource Allocation in Apache Spark

DynamicResourceAlloca1oninApacheSpark

YutaImai@imai_factory

Page 2: Dynamic Resource Allocation in Apache Spark

1.RDDGraphvaltext="HelloSpark,thisismyfirstSparkapplication."valtextArray=text.split("").map(_.replaceAll("",""))valresult=sc.parallelize(textArray).map(item=>(item,1)).reduceByKey((x,y)=>x+y).collect()

Page 3: Dynamic Resource Allocation in Apache Spark

Array ArrayParallelCollec1onRDD

Par11on0

Par11on1

Par11on2

Par11on3

MapPar11onsRDD

Par11on0

Par11on1

Par11on2

Par11on3

ShuffledRDD

Par11on0

Par11on1

sc.parallelize() .map(…) .reduceByKey(…) .collect()

2.DAGScheduler

Page 4: Dynamic Resource Allocation in Apache Spark

Array ArrayParallelCollec1onRDD

Par11on0

Par11on1

Par11on2

Par11on3

MapPar11onsRDD

Par11on0

Par11on1

Par11on2

Par11on3

ShuffledRDD

Par11on0

Par11on1

sc.parallelize() .map(…) .reduceByKey(…) .collect()

2.DAGScheduler

NarrowDependency ShuffleDependency

Page 5: Dynamic Resource Allocation in Apache Spark

Array ArrayParallelCollec1onRDD

Par11on0

Par11on1

Par11on2

Par11on3

MapPar11onsRDD

Par11on0

Par11on1

Par11on2

Par11on3

ShuffledRDD

Par11on0

Par11on1

sc.parallelize() .map(…) .reduceByKey(…) .collect()

2.DAGScheduler

NarrowDependency ShuffleDependency

Stage0 Stage1

Task0

Task1

Task2

Task3

Task4

Task5

Page 6: Dynamic Resource Allocation in Apache Spark

3.TaskScheduler

Par11on0

Par11on1

Par11on2

Par11on3

Par11on0

Par11on1

Par11on2

Par11on3

Task0

Task1

Task2

Task3

Executors

Page 7: Dynamic Resource Allocation in Apache Spark

ShuffleFile

iterator.map(…).map(...)...

Executor

ThreadStorage

WorkerNode

iterator.map(…).map(...)...

Executor

Thread

WorkerNode

Page 8: Dynamic Resource Allocation in Apache Spark

DYNAMICRESOURCEALLOCATION

Page 9: Dynamic Resource Allocation in Apache Spark

DynamicResourceAlloca1on•  Addsextraexecutorstoanappwhichhaspendingtasks.– Offloadschallengeforexactresourceplanningforanapp.

•  Removesidleexecutorsfromanapp.– Helpsalongrunningapptofreeidleexecutors.

Page 10: Dynamic Resource Allocation in Apache Spark

Overview

Tasks

Executors

Page 11: Dynamic Resource Allocation in Apache Spark

Overview

Tasks

Executors

Insufficientcapacity

Page 12: Dynamic Resource Allocation in Apache Spark

Overview

Tasks

Executors

Insufficientcapacity

Page 13: Dynamic Resource Allocation in Apache Spark

Overview

Tasks

Executors

Insufficientcapacity

Page 14: Dynamic Resource Allocation in Apache Spark

Overview

Tasks

Executors

Insufficientcapacity Op1malcapacity

Page 15: Dynamic Resource Allocation in Apache Spark

Overview

Tasks

Executors

✔ ✔

Insufficientcapacity Op1malcapacity Idleexecutors

Page 16: Dynamic Resource Allocation in Apache Spark

Tasks

Executors

✔ ✔

Overview

Insufficientcapacity Op1malcapacity Idleexecutors

Op1malcapacity

Page 17: Dynamic Resource Allocation in Apache Spark

RequestPolicy•  Anappstartswithuserspecified#ofexecutors.

./bin/spark-submit\--class<main-class>--master<master-url>\--num-executors<#ofexecutors>

•  Aderspark.dynamicAlloca1on.schedulerBacklogTimeout(sec),Appstartsreques1ngnewexecutors,ifithaspendingtask(s).

•  Apprequestsnewexecutorseveryspark.dynamicAlloca1on.sustainedSchedulerBacklogTimeout(sec),withdoubling#ofrequestslike1,2,4,8,16…

Page 18: Dynamic Resource Allocation in Apache Spark

RemovePolicy•  Anappremovesanexecutorwhenithasbeenidleformore

thanspark.dynamicAlloca1on.executorIdleTimeoutseconds.

Page 19: Dynamic Resource Allocation in Apache Spark

ExternalShuffleService

iterator.map(…).map(...)...

Executor

ThreadStorage

WorkerNode

iterator.map(…).map(...)...

Executor

Thread

WorkerNode

Page 20: Dynamic Resource Allocation in Apache Spark

ExternalShuffleService

iterator.map(…).map(...)...

Executor

ThreadStorage

WorkerNode

iterator.map(…).map(...)...

Executor

Thread

WorkerNode

Page 21: Dynamic Resource Allocation in Apache Spark

ExternalShuffleService

iterator.map(…).map(...)...

Executor

ThreadStorage

WorkerNode

iterator.map(…).map(...)...

Executor

Thread

WorkerNode

ShuffleService

ShuffleService