using apache spark in the cloud—a devops perspective with telmo oliveira

Post on 21-Jan-2018

280 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Telmo Oliveira, Toon

Using Spark in the Cloud: A Devops perspective

4put your #assignedhashtag here by setting the footer in view-header/footer

5put your #assignedhashtag here by setting the footer in view-header/footer

6put your #assignedhashtag here by setting the footer in view-header/footer

8

9

Requirements

10

• Seamless transition

• Ensure data anonymity

• Move fast, optimise later

• Ensure multi-tenancy

• As little disturbance as possible to the DS team

11

12

13

14

• Cluster timeouts• Autoscaling• Spot instances• Well documented API

15

Infrastructure as code

• Repeatability• Fast deployment• Resilience• Documentation

16

17

Terraform• S3 Buckets• EC2 instances• Network topology• Log management• RDS instances• IAM roles/policies

18

Terraform• S3 Buckets• EC2 instances• Network topology• Log management• RDS instances• IAM roles/policies

19

Ansible

• User management

• Databases and ACLs

• Custom app deployment

20

Ansible

• User management

• Databases and ACLs

• Custom app deployment

21

ArchitectureOverview

22

Airflow

23

24

25

26

27

28

• External Hive metastore• Send logs to S3• Authorisation• i3.2xlarge nodes

Future plans

29

• Streaming

• Real time services

• Improve CI/CD

What’s all this for?

30

What’s all this for?

31

32

Thanks to the team

Aemro AmareBarend GarvelinkBert Jan KatsmanKliment MarkovskiMiquel MonrealStanislava Potupchik

Questions?

top related