d16 spark, cluster, awssmack 2 hot topic in bay area scala, spark apache mesos - distributed system...
Post on 23-May-2020
12 Views
Preview:
TRANSCRIPT
CS 696 Intro to Big Data: Tools and Methods Fall Semester, 2017
Doc 16 Spark, Cluster, AWS EMR Nov 7, 2017
Copyright ©, All rights reserved. 2017 SDSU & Roger Whitney, 5500 Campanile Drive, San Diego, CA 92182-7700 USA. OpenContent (http://www.opencontent.org/opl.shtml) license defines the copyright on this document.
SMACK
2
Hot topic in Bay area
Scala, Spark Apache Mesos - Distributed system kernel Apache Akka - highly concurrent, distributed, resilient message-driven applications on JVM Apache Cassandra - distributed database Apache Kafka -
Towards AWS
3
Need Spark program packaged in jar file
Issues Packaging in jar Running in local cluster of one machine Logging File references
Spark Program & Packaging in Jar
4
Put program in object
Packaging in jar file Package your code not Spark jars - Spark adds 200MB By hand using jar command Using sbt
Why Jar Size Matters
5
Jar
Master
Slave Slave Slave
Jar File & Spark Jars
6
When running Spark program Spark supplies all the Spark dependencies
If your jar file does not contain Spark jars then It can not run by itself
If your jar file does contain the Spark jars then It can run by itself Can run in Spark But you are passing unneeded 200 MB to each slave
Need to include all other needed resources in you jar file
Sample Program
7
import org.apache.spark.{SparkConf, SparkContext}
object SimpleApp { def main(args: Array[String]) { val conf = new SparkConf().setAppName("Simple Application") val sc = new SparkContext(conf) val rdd = sc.parallelize(List(1,2,3,4)) rdd.saveAsTextFile("SimpleAppOutput") sc.stop() } }
build.sbt
8
name := "Simple Project"
version := "1.0"
scalaVersion := "2.11.11"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.0"
File Structure
9
simpleApp simpleApp/build.sbt src/ src/main src/main/scala src/main/scala/SimpleApp.scala
Compiling the Example Using sbt
10
from the directory simpleApp directory
->sbt package[info] Updated file /Users/whitney/Courses/696/Fall17/SparkExamples/simpleApp/project/build.properties: set sbt.version to 1.0.2[info] Loading project definition from /Users/whitney/Courses/696/Fall17/SparkExamples/simpleApp/project[info] Updating {file:/Users/whitney/Courses/696/Fall17/SparkExamples/simpleApp/project/}simpleapp-build...[info] Done updating.[warn] Run 'evicted' to see detailed eviction warnings...[info] Compiling 1 Scala source to /Users/whitney/Courses/696/Fall17/SparkExamples/simpleApp/target/scala-2.11/classes ...[info] Done compiling.[info] Packaging /Users/whitney/Courses/696/Fall17/SparkExamples/simpleApp/target/scala-2.11/simple-project_2.11-1.0.jar ...[info] Done packaging.[success] Total time: 14 s, completed Nov 4, 2017 4:24:36 PM
Note size of Jar file
11
Running in Temp Spark Runtime
12
->spark-submit target/scala-2.11/simple-project_2.11-1.0.jar Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 17/11/04 16:30:13 INFO SparkContext: Running Spark version 2.2.0 .... 17/11/04 16:30:15 INFO SparkContext: Successfully stopped SparkContext 17/11/04 16:30:15 INFO ShutdownHookManager: Shutdown hook called 17/11/04 16:30:15 INFO ShutdownHookManager: Deleting directory /private/var/folders/br/q_fcsjqc8xj9qn0059bctj3h0000gr/T/spark-8930a3ab-b041-4ed4-8203-fc8369b9c374
I put the SPARK_HOME/bin & SPARK_HOME/sbin on my path Set SPARK_HOME
setenv SPARK_HOME /Java/spark-2.2.0-bin-hadoop2.7
run SPARK_HOME/bin/spark-submit from simpleApp
Starting a Spark Cluster of One
13
Command SPARK_HOME/sbin/start-master.sh
->start-master.sh starting org.apache.spark.deploy.master.Master, logging to /Java/spark-2.2.0-bin-hadoop2.7/logs/spark-whitney-org.apache.spark.deploy.master.Master-1-air-6.local.out
Master Web Page
14
localhost:8080 127.0.0.1:8080 0.0.0.0:8080
Starting slave on local machine
15
->start-slave.sh spark://air-6.local:7077 starting org.apache.spark.deploy.worker.Worker, logging to /Java/spark-2.2.0-bin-hadoop2.7/logs/spark-whitney-org.apache.spark.deploy.worker.Worker-1-air-6.local.out
Command SPARK_HOME/sbin/start-slave.sh
Master Web Page
16
Submitting Job to Spark on Cluster
17
->spark-submit --master spark://air-6.local:7077 target/scala-2.11/simple-project_2.11-1.0.jar
run SPARK_HOME/bin/spark-submit from simpleApp
Master Web Page
18
Application Page
19
Starting/Stopping Master/Slave
20
Commands in SPARK_HOME/sbin
->start-master.sh
->start-slave.sh spark://air-6.local:7077
->stop-master.sh
->stop-slave.sh
->start-all.sh
->stop-all.sh
spark-submit
21
./bin/spark-submit \ --class <main-class> \ --master <master-url> \ --deploy-mode <deploy-mode> \ --conf <key>=<value> \ ... # other options <application-jar> \ [application-arguments]
Spark Properties
22
name master logging memory etc
https://spark.apache.org/docs/latest/configuration.html
name - displayed in Spark Master Web page
master
23
Master URL Meaning
local Run Spark locally with one worker thread.
local[K] Run Spark locally with K worker threads
local[K,F] Run Spark locally with K worker threads and F maxFailures
local[*]Run Spark locally with as many worker threads as logical cores on your machine.
local[*,F]Run Spark locally with as many worker threads as logical cores on your machine and F maxFailures.
spark://HOST:PORT Connect to the given Spark standalone cluster master.
spark://HOST1:PORT1,HOST2:PORT2
Connect to the given Spark standalone cluster with standby masters with Zookeeper.
mesos://HOST:PORT Connect to the given Mesos cluster.
yarn Connect to a YARN cluster in client or cluster mode
Examples
24
->spark-submit target/scala-2.11/simple-project_2.11-1.0.jar
->spark-submit --master spark://air-6.local:7077 \ target/scala-2.11/simple-project_2.11-1.0.jar
->spark-submit --master "local[*]" target/scala-2.11/simple-project_2.11-1.0.jar
Start spark master-slave using default value
Start spark master-slave using all cores
Submit job to existing master
Setting Properties
25
In program
submit command
config file
In precedence order
Setting master in Code
26
import org.apache.spark.{SparkConf, SparkContext}
object SimpleApp { def main(args: Array[String]) { val conf = new SparkConf().setAppName("Simple Application").setMaster("local[*]") val sc = new SparkContext(conf) val rdd = sc.parallelize(List(1,2,3,4)) rdd.saveAsTextFile("SimpleAppOutput") sc.stop() } }
Don't set master in code It overrides value in command line and config file So will not be able change master settings without recompiling
Warning
27
import org.apache.spark.{SparkConf, SparkContext}
object SimpleApp { def main(args: Array[String]) { val conf = new SparkConf().setAppName("Simple Application") val sc = new SparkContext(conf) val rdd = sc.parallelize(List(1,2,3,4)) rdd.saveAsTextFile("SimpleAppOutput") sc.stop() } }
Spark will not override existing files If you run this a second time without removing files you get an exception
Using Intellij
28
Using Intellij
29
Edit build.sbt file to add libraryDependencies
name := "Your Project"
version := "0.1"
scalaVersion := "2.11.11"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.0"
SBT
30
http://www.scala-sbt.org
clean
update - dependencies
compile
package - generate jar file
test
run - Not useful with Spark
Commands
Issue - Debugging
31
Debugger not available for program running on cluster
Print statements Don't count on seeing them from slaves
Logging Spark uses log4j 1.2
1/2 of Default Output
32
->spark-submit --master spark://air-6.local:7077 simpleappintell_2.11-0.1.jarlog4j:WARN No appenders could be found for logger (root).log4j:WARN Please initialize the log4j system properly.log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.cat in the hatUsing Spark's default log4j profile: org/apache/spark/log4j-defaults.properties17/11/04 22:16:37 INFO SparkContext: Running Spark version 2.2.017/11/04 22:16:38 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable17/11/04 22:16:38 INFO SparkContext: Submitted application: Simple Application17/11/04 22:16:38 INFO SecurityManager: Changing view acls to: whitney17/11/04 22:16:38 INFO SecurityManager: Changing modify acls to: whitney17/11/04 22:16:38 INFO SecurityManager: Changing view acls groups to: 17/11/04 22:16:38 INFO SecurityManager: Changing modify acls groups to: 17/11/04 22:16:38 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(whitney); groups with view permissions: Set(); users with modify permissions: Set(whitney); groups with modify permissions: Set()17/11/04 22:16:38 INFO Utils: Successfully started service 'sparkDriver' on port 52153.17/11/04 22:16:38 INFO SparkEnv: Registering MapOutputTracker17/11/04 22:16:38 INFO SparkEnv: Registering BlockManagerMaster17/11/04 22:16:38 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information17/11/04 22:16:38 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up17/11/04 22:16:38 INFO DiskBlockManager: Created local directory at /private/var/folders/br/q_fcsjqc8xj9qn0059bctj3h0000gr/T/blockmgr-f07bc14c-79a1-4402-aa1f-8df995460e4717/11/04 22:16:38 INFO MemoryStore: MemoryStore started with capacity 366.3 MB17/11/04 22:16:38 INFO SparkEnv: Registering OutputCommitCoordinator17/11/04 22:16:38 INFO Utils: Successfully started service 'SparkUI' on port 4040.17/11/04 22:16:39 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.0.102:404017/11/04 22:16:39 INFO SparkContext: Added JAR file:/Users/whitney/Courses/696/Fall17/SparkExamples/simpleAppIntell/target/scala-2.11/simpleappintell_2.11-0.1.jar at spark://192.168.0.102:52153/jars/simpleappintell_2.11-0.1.jar with timestamp 150985899902017/11/04 22:16:39 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://air-6.local:7077...17/11/04 22:16:39 INFO TransportClientFactory: Successfully created connection to air-6.local/192.168.0.102:7077 after 23 ms (0 ms spent in bootstraps)17/11/04 22:16:39 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20171104221639-0004
Log4j
33
OFF (most specific, no logging) FATAL (most specific, little data) ERROR WARN INFO DEBUG TRACE (least specific, a lot of data) ALL (least specific, all data)
Log Levels Can specify level Per package Per class
Can determine log Format Location of output
Setting Level in Code
34
import org.apache.spark.{SparkConf, SparkContext} import org.apache.log4j.{Level, LogManager, Logger}
object SimpleApp { def main(args: Array[String]) {
Logger.getLogger("org").setLevel(Level.ERROR) val log = LogManager.getRootLogger log.info("Start") println("cat in the hat") val conf = new SparkConf().setAppName("Simple Application") val sc = new SparkContext(conf) val rdd = sc.parallelize(List(1,2,3,4)) rdd.saveAsTextFile("SimpleAppOutput2") log.info("End") sc.stop() } }
Output
35
->spark-submit --master spark://air-6.local:7077 simpleappintell_2.11-0.1.jar log4j:WARN No appenders could be found for logger (root). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. cat in the hat Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 17/11/05 12:04:37 INFO root: End
Again - Do you want to set log level in Code
36
Can set level in config file $SPARK_HOME/conf/log4j.properties.temple
By default Spark will look for $SPARK_HOME/conf/log4j.properties But does is not part of program
Quiet Log config
37
# Set everything to be logged to the console log4j.rootCategory=INFO, console log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
# Set the default spark-shell log level to WARN. When running the spark-shell, the # log level for this class is used to overwrite the root logger's log level, so that # the user can have different defaults for the shell and regular Spark apps. log4j.logger.org.apache.spark.repl.Main=WARN
# Settings to quiet third party logs that are too verbose log4j.logger.org=WARN log4j.logger.org.apache.parquet=ERROR log4j.logger.parquet=ERROR
Master Logging vs Slave Logging
38
import org.apache.spark.{SparkConf, SparkContext} import org.apache.log4j.{Level, LogManager, PropertyConfigurator, Logger}
object SimpleApp { def main(args: Array[String]) { val log = LogManager.getRootLogger log.info("Start") val conf = new SparkConf().setAppName("Simple Application") val sc = new SparkContext(conf) val rdd = sc.parallelize(1 to 10) val stringRdd = rdd.map { value => log.info(value) value.toString } log.info("End") sc.stop() } }
Master
Slave
Error on Running Log on serializable
Serializable Logger
39
import org.apache.spark.{SparkConf, SparkContext} import org.apache.log4j.{LogManager, Logger}
object DistributedLogger extends Serializable { @transient lazy val log = Logger.getLogger(getClass.getName) }
Main
40
object SimpleApp { def main(args: Array[String]) { val log = LogManager.getRootLogger log.info("Start") val conf = new SparkConf().setAppName("Simple Application") val sc = new SparkContext(conf) val rdd = sc.parallelize(1 to 10) val result = rdd.map { i => DistributedLogger.log.warn("i = " + i) i + 10 } result.saveAsTextFile("SimpleAppOutput") log.info("End") sc.stop() } }
Running
41
->spark-submit target/scala-2.11/simpleappintell_2.11-0.1.jar 17/11/06 16:59:40 INFO root: Start 17/11/06 16:59:41 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [Stage 0:> (0 + 0) / 8]17/11/06 16:59:44 WARN DistributedLogger$: i = 7 17/11/06 16:59:44 WARN DistributedLogger$: i = 8 17/11/06 16:59:44 WARN DistributedLogger$: i = 9 17/11/06 16:59:44 WARN DistributedLogger$: i = 6 17/11/06 16:59:44 WARN DistributedLogger$: i = 3 17/11/06 16:59:44 WARN DistributedLogger$: i = 4 17/11/06 16:59:44 WARN DistributedLogger$: i = 1 17/11/06 16:59:44 WARN DistributedLogger$: i = 5 17/11/06 16:59:44 WARN DistributedLogger$: i = 2 17/11/06 16:59:44 WARN DistributedLogger$: i = 10 17/11/06 16:59:44 INFO root: End
Logging DataFrames
42
To log client operations needs to use udf
Amazon Elastic Map-Reduce (EMR)
43
Hadoop, Hive, Spark, etc on Cluster
Predefined set of languages/tools available
Can create cluster of machines
https://aws.amazon.com Create new account Get 12 months free access
AWS Free Tier
44
12 months free
EC2 - compute instances 740 hours per month Billed in hour increments Billed per instance
S3 - storage 5 GB 20,000 Get requests
RDS - MySQL, PostgresSQL, SQL Sever 20 GB 750 hours
EC2 Container - Docker images 500 MB
I and students were charged last year
AWS Educate
45
https://aws.amazon.com/education/awseducate/
SDSU is an institutional member
Students get $100 credit
EC2 Pricing
46
Price Per Hour
On Demand Spot
m1.medium $0.0047
m1.large $0.0?
ml.xlarge $0.352
m3.xlarge $0.0551
m4.large $0.1 $0.0299
c1.medium $0.0132
c1.xlarge $0.057
Basic Outline
47
Develop & test Spark locally
Upload program jar file & data to S3
Configure & launch cluster AWS Management Console AWS CLI SDKs
Monitor cluster
Make sure you terminate cluster when done
Simple Storage System - S3
48
Files are stored in buckets
Bucket names are global
Supports s3 - files divided in to block s3n
Accessing files S3 console Third party REST Java, C#, etc
Amazon S3
49
S3 Creating a Bucket
50
S3 Costs
51
AWS Free Usage Tier
New AWS customers receive each month for one year 5 GB of Amazon S3 storage in the Standard Storage class, 20,000 Get Requests, 2,000 Put Requests, and 15 GB of data transfer out
Standard StorageStandard - Infrequent
Access StorageGlacier Storage
First 50 TB / month $0.023 per GB $0.0125 per GB $0.004 per GB
Next 450 TB / month $0.022 per GB $0.0125 per GB $0.004 per GB
Over 500 TB / month $0.021 per GB $0.0125 per GB $0.004 per GB
S3 Objects
52
Objects contain Object data Metadata
Size 1 byte to 5 gigabytes per object
Object data Just bytes No meaning associated with bytes
Metadata Name-value pairs to describe the object Some http headers used
Content-Type
S3 Buckets
53
Namespace for objects
No limitation on number of object per bucket
Only 100 buckets per account
Each bucket has a name Up to 255 bytes long Cannot be same as existing bucket name by any S3 user
Bucket Names
54
Bucket names must Contain lowercase letters, numbers, periods (.), underscores (_), and dashes (-) Start with a number or letter Be between 3 and 255 characters long Not be in an IP address style (e.g., "192.168.5.4")
To conform with DNS requirements, Amazon recommends Bucket names should not contain underscores (_) Bucket names should be between 3 and 63 characters long Bucket names should not end with a dash Bucket names cannot contain dashes next to periods (e.g.,
"my-.bucket.com" and "my.-bucket" are invalid
Key
55
Unique identifier for an object within a bucket
Object Url
http://buckerName.s3.amazonaws.com/Key
http://doc.s3.amazonaws.com/2006-03-01/AmazonS3.wsdl
Bucket = doc Key = 2006-03-01/AmazonS3.wsdl
Access Control Lists (ACL)
56
Each Bucket has an ACL Determines who has read/write access
Each Object can have an ACL Determines who has read/write access
ACL consists of a list of grants
Grant contains One grantee One permission
S3 Data Consistency Model
57
Updates to a single object at a key in a bucket are atomic
But a read after a write may return the old value Changes may take time to progate
No object locking If two writes to same object occur at the same time The one with later timestamp wins
CAP Theorem
58
CAP theorem says in a distributed system you can not have all three Consistency Availability tolerance to network Partitions
Consistency
59
A = 2 A = 2
Machine 1 Machine 2
A = 2 A = 3Not Consistent
Partition
60
A = 2 A = 2
Machine 1 Machine 2
A = 2 A = 2Partitioned
Machine 1 cannot talk to machine 2
But how does machine 1 tell the difference between no connection and a very slow connection or busy machine 2?
Latency
61
Latency Time between making a request and getting a response
Distributed systems always have latency
In practice detect a partition by latency
When no response in a given time frame assume we are partitioned
Available
62
A = 2 A = 2
Machine 1 Machine 2
Client
A = 2 A = 2ClientClient can not access value of A
What does not available mean? No connection Slow connection What is the difference?
Some say high available - meaning low latency
In practice available and latency are related
Consistency over Latency
63
A = 2 A = 2Set A to 3
A = 2 A = 2Set A to 3 Lock A
A = 2 A = 2Set A to 3 Set A to 3
A = 3 A = 3Set A to 3 Unlock A
Machine 1 Machine 2
Write requests queued until unlocked
Increased latency System still available
A = 3 A = 3
Latency over Consistency
64
A = 2 A = 2Set A to 3
Machine 1 Machine 2
Write requests accepted
Low latency System inconsistent A = 3 A = 2
Set A to 3
A = 3 A = 2
A = 3 A = 3
Latency over Consistency - Write Conflicts
65
A = 2 A = 2Set A to 3
Machine 1 Machine 2
A = 3 A = 1Set A to 3
Subtract 1 from A
A = ? A = ?Need policy to make system consistent
A = 3 A = 2Subtract 1 from A
Partition
66
A = 2 A = 2
Machine 1 Machine 2
A = ? A = ?Need policy to make system consistent
A = 2 A = 2
Set A to 3A = 3 A = 1
Subtract 1 from A
CAP Theorem
67
Not a theorem
Too simplistic What is availability What is a partition of the network
Misleading
Intent of CAP was to focus designers attention on the tradeoffs in distributed systems
How to handle partitions in the network Consistency Latency Availability
CAP & S3
68
S3 favors latency over consistency
Running Program on AWS EMR
69
Make sure program runs locally
Create jar file containing code Make sure that jar file contains manifest
Create s3 bucket(s) for jar file logs input output
Upload jar & data files to s3
Test Program - SimpleApp
70
import org.apache.spark.{SparkConf, SparkContext} import org.apache.log4j.LogManager
object SimpleApp { def main(args: Array[String]) { val log = LogManager.getRootLogger log.info("Start") if (args.length < 1) { log.error("Missing argument") return } val outputFile = args(0) val conf = new SparkConf().setAppName("Simple Application") val sc = new SparkContext(conf) val rdd = sc.parallelize(1 to 10) rdd.saveAsTextFile(outputFile) log.info("End") sc.stop() } }
Packaging SimpleApp using SBT
71
->sbt package [info] Loading settings from idea.sbt ... [info] Loading global plugins from /Users/whitney/.sbt/1.0/plugins [info] Loading settings from plugins.sbt ... [info] Loading project definition from /Users/whitney/Courses/696/Fall17/SparkExamples/simpleAppIntell/project [info] Loading settings from build.sbt ... [info] Set current project to SimpleAppIntell (in build file:/Users/whitney/Courses/696/Fall17/SparkExamples/simpleAppIntell/) [success] Total time: 2 s, completed Nov 6, 2017 4:05:00 PM
In project directory
Packaging SimpleApp using SBT
72
In project directory
->sbt [info] Loading settings from idea.sbt ... [info] Loading global plugins from /Users/whitney/.sbt/1.0/plugins [info] Loading settings from plugins.sbt ... [info] Loading project definition from /Users/whitney/Courses/696/Fall17/SparkExamples/simpleAppIntell/project [info] Loading settings from build.sbt ... [info] Set current project to SimpleAppIntell (in build file:/Users/whitney/Courses/696/Fall17/SparkExamples/simpleAppIntell/) [info] sbt server started at 127.0.0.1:4172 sbt:SimpleAppIntell> package [success] Total time: 2 s, completed Nov 6, 2017 4:06:33 PM sbt:SimpleAppIntell>
I use SBT shell as it is faster when needing to repeat operations
Result of SBT package
73
Note: I renamed the jar file simpleapp.jar
Contents of simple app.jar
74
Manifest-Version: 1.0 Implementation-Title: SimpleAppIntell Implementation-Version: 0.1 Specification-Vendor: default Specification-Title: SimpleAppIntell Implementation-Vendor-Id: default Specification-Version: 0.1 Implementation-Vendor: default Main-Class: SimpleApp
MANIFEST.MF Note
When running SimpleApp locally Don't need to use --class Spark finds main class from manifest
When running on AWS Need to use --class
Running Program on AWS EMR
75
Make sure program runs locally
Create jar file containing code Make sure that jar file contains manifest
Create s3 bucket(s) for jar file logs input output
Upload jar & data files to s3
My S3 Buckets
76
Spark on AWS - EMR Console
77
78
You can either use Spark option on Quick Options or use Advanced Options
Advanced Options
79
Spark Application Setup
80
You have to give --class ClassName in Spark-submit options
81
Using the custom jar option Useful when cloning steps
Output
82
Warning on AWS
83
It can take 5-10 minutes to start cluster
Logs do not show your logging statements
When you configure Steps incorrectly they fail Error messages are not very helpful
SSH to your Master Node
84
Create Amazon EC2 Key pair
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html#having-ec2-create-your-key-pair
Instructions
Open EC2 Dashboard - Select Key Pairs
SSH to your Master Node
85
In Create Cluster - Quick Options
SSH to your Master Node
86
Click for Instructions
Command-line Tools
87
Open-source command-line tool for launching Apache Spark clusters
https://github.com/nchammas/flintrock
Flintrock
aws cli
Amazon's command line tool
https://aws.amazon.com/cli/
top related