Download - Databricks Community Cloud
![Page 1: Databricks Community Cloud](https://reader035.vdocuments.site/reader035/viewer/2022081507/587852a11a28ab68198b696f/html5/thumbnails/1.jpg)
Databricks Community Cloud
By: Robert Sanders
![Page 2: Databricks Community Cloud](https://reader035.vdocuments.site/reader035/viewer/2022081507/587852a11a28ab68198b696f/html5/thumbnails/2.jpg)
2Page:
Databricks Community Cloud
• Free/Paid Standalone Spark Cluster•Online Notebook• Python• R• Scala• SQL
• Tutorials and Guides• Shareable Notebooks
![Page 3: Databricks Community Cloud](https://reader035.vdocuments.site/reader035/viewer/2022081507/587852a11a28ab68198b696f/html5/thumbnails/3.jpg)
3Page:
Why is it useful?
• Learning about Spark• Testing different versions of Spark• Rapid Prototyping•Data Analysis• Saved Code•Others…
![Page 4: Databricks Community Cloud](https://reader035.vdocuments.site/reader035/viewer/2022081507/587852a11a28ab68198b696f/html5/thumbnails/4.jpg)
4Page:
Forumshttps://forums.databricks.com/
![Page 5: Databricks Community Cloud](https://reader035.vdocuments.site/reader035/viewer/2022081507/587852a11a28ab68198b696f/html5/thumbnails/5.jpg)
5Page:
Login/Sign Uphttps://community.cloud.databricks.com/login.html
![Page 6: Databricks Community Cloud](https://reader035.vdocuments.site/reader035/viewer/2022081507/587852a11a28ab68198b696f/html5/thumbnails/6.jpg)
6Page:
Home Page
![Page 7: Databricks Community Cloud](https://reader035.vdocuments.site/reader035/viewer/2022081507/587852a11a28ab68198b696f/html5/thumbnails/7.jpg)
7Page:
Active Clusters
![Page 8: Databricks Community Cloud](https://reader035.vdocuments.site/reader035/viewer/2022081507/587852a11a28ab68198b696f/html5/thumbnails/8.jpg)
8Page:
Create a Cluster - Steps
1. From the Active Clusters page, click the “+ Create Cluster” button
2. Fill in the cluster name3. Select the version of Apache Spark4. Click “Create Cluster”5. Wait for the Cluster to start up and be in a
“Running” state
![Page 9: Databricks Community Cloud](https://reader035.vdocuments.site/reader035/viewer/2022081507/587852a11a28ab68198b696f/html5/thumbnails/9.jpg)
9Page:
Create a Cluster
![Page 10: Databricks Community Cloud](https://reader035.vdocuments.site/reader035/viewer/2022081507/587852a11a28ab68198b696f/html5/thumbnails/10.jpg)
10Page:
Active Clusters
![Page 11: Databricks Community Cloud](https://reader035.vdocuments.site/reader035/viewer/2022081507/587852a11a28ab68198b696f/html5/thumbnails/11.jpg)
11Page:
Active Clusters – Spark Cluster UI - Master
![Page 12: Databricks Community Cloud](https://reader035.vdocuments.site/reader035/viewer/2022081507/587852a11a28ab68198b696f/html5/thumbnails/12.jpg)
12Page:
Workspaces
![Page 13: Databricks Community Cloud](https://reader035.vdocuments.site/reader035/viewer/2022081507/587852a11a28ab68198b696f/html5/thumbnails/13.jpg)
13Page:
Create a Notebook - Steps
1. Right click within a Workspace and click Create -> Notebook
2. Fill in the Name3. Select the programming language4. Select the running cluster you’ve created that you
want to attach to the Notebook5. Click the “Create” button
![Page 14: Databricks Community Cloud](https://reader035.vdocuments.site/reader035/viewer/2022081507/587852a11a28ab68198b696f/html5/thumbnails/14.jpg)
14Page:
Create a Notebook
![Page 15: Databricks Community Cloud](https://reader035.vdocuments.site/reader035/viewer/2022081507/587852a11a28ab68198b696f/html5/thumbnails/15.jpg)
15Page:
Notebook
![Page 16: Databricks Community Cloud](https://reader035.vdocuments.site/reader035/viewer/2022081507/587852a11a28ab68198b696f/html5/thumbnails/16.jpg)
16Page:
Using the Notebook
![Page 17: Databricks Community Cloud](https://reader035.vdocuments.site/reader035/viewer/2022081507/587852a11a28ab68198b696f/html5/thumbnails/17.jpg)
17Page:
Using the Notebook – Code Snippets
> sc
> sc.parallelize(1 to 5).collect()
![Page 18: Databricks Community Cloud](https://reader035.vdocuments.site/reader035/viewer/2022081507/587852a11a28ab68198b696f/html5/thumbnails/18.jpg)
18Page:
Using the Notebook - Shortcuts
Short Cut ActionShift + Enter Run Selected Cell and Move to
next CellCtrl + Enter Run Selected CellOption + Enter Run Selected Cell and Insert Cell
BellowCtrl + Alt + P Create Cell Above Current CellCtrl + Alt + N Create Cell Bellow Selected
Cell
![Page 19: Databricks Community Cloud](https://reader035.vdocuments.site/reader035/viewer/2022081507/587852a11a28ab68198b696f/html5/thumbnails/19.jpg)
19Page:
Tables
![Page 20: Databricks Community Cloud](https://reader035.vdocuments.site/reader035/viewer/2022081507/587852a11a28ab68198b696f/html5/thumbnails/20.jpg)
20Page:
Create a Table - Steps
1. From the Tables section, click “+ Create Table”2. Select the Data Source (bellow steps assume you’re using
File as the Data Source)3. Upload a file from your local file system
1. Supported file types: CSV, JSON, Avro, Parquet4. Click Preview Table5. Fill in the Table Name6. Select the File Type and other Options depending on the File
Type7. Change Column Names and Types as desired8. Click “Create Table”
![Page 21: Databricks Community Cloud](https://reader035.vdocuments.site/reader035/viewer/2022081507/587852a11a28ab68198b696f/html5/thumbnails/21.jpg)
21Page:
Create a Table – Upload File
![Page 22: Databricks Community Cloud](https://reader035.vdocuments.site/reader035/viewer/2022081507/587852a11a28ab68198b696f/html5/thumbnails/22.jpg)
22Page:
Create a Table – Configure Table
![Page 23: Databricks Community Cloud](https://reader035.vdocuments.site/reader035/viewer/2022081507/587852a11a28ab68198b696f/html5/thumbnails/23.jpg)
23Page:
Create a Table – Review Table
![Page 24: Databricks Community Cloud](https://reader035.vdocuments.site/reader035/viewer/2022081507/587852a11a28ab68198b696f/html5/thumbnails/24.jpg)
24Page:
Notebook – Access Table
![Page 25: Databricks Community Cloud](https://reader035.vdocuments.site/reader035/viewer/2022081507/587852a11a28ab68198b696f/html5/thumbnails/25.jpg)
25Page:
Notebook – Access Table – Code Snippets
> sqlContext
> sqlContext.sql("show tables").collect()
> val got = sqlContext.sql("select * from got")> got.limit(10).collect()
![Page 26: Databricks Community Cloud](https://reader035.vdocuments.site/reader035/viewer/2022081507/587852a11a28ab68198b696f/html5/thumbnails/26.jpg)
26Page:
Notebook – Display
![Page 27: Databricks Community Cloud](https://reader035.vdocuments.site/reader035/viewer/2022081507/587852a11a28ab68198b696f/html5/thumbnails/27.jpg)
27Page:
Notebook – Data Cleaning for Charting
![Page 28: Databricks Community Cloud](https://reader035.vdocuments.site/reader035/viewer/2022081507/587852a11a28ab68198b696f/html5/thumbnails/28.jpg)
28Page:
Notebook – Plot Options
![Page 29: Databricks Community Cloud](https://reader035.vdocuments.site/reader035/viewer/2022081507/587852a11a28ab68198b696f/html5/thumbnails/29.jpg)
29Page:
Notebook – Charting
![Page 30: Databricks Community Cloud](https://reader035.vdocuments.site/reader035/viewer/2022081507/587852a11a28ab68198b696f/html5/thumbnails/30.jpg)
30Page:
Notebook – Display and Charting – Code Snippets
> filter(got)
> val got = sqlContext.sql("select * from got")> got.limit(10).collect()
> import org.apache.spark.sql.functions._ > val allegiancesCleanupUDF = udf[String, String] (_.toLowerCase().replace("house ", ""))> val isDeathUDF = udf{ deathYear: Integer => if(deathYear != null) 1 else 0}> val gotCleaned = got.filter("Allegiances != \"None\"").withColumn("Allegiances", allegiancesCleanupUDF($"Allegiances")).withColumn("isDeath", isDeathUDF($"Death Year"))> display(gotCleaned)
![Page 31: Databricks Community Cloud](https://reader035.vdocuments.site/reader035/viewer/2022081507/587852a11a28ab68198b696f/html5/thumbnails/31.jpg)
31Page:
Publish Notebook - Steps
1. While in a Notebook, click “Publish” on the top right
2. Click “Publish” on the pop up3. Copy the link and send it out
![Page 32: Databricks Community Cloud](https://reader035.vdocuments.site/reader035/viewer/2022081507/587852a11a28ab68198b696f/html5/thumbnails/32.jpg)
32Page:
Publish Notebook