gerard toonstra | migrating to the cloud: our …...what we learned lessons learned from this...

75
Gerard Toonstra | Migrating to the Cloud: Our Journey | 28-11-2019

Upload: others

Post on 25-Apr-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

Gerard Toonstra | Migrating to the Cloud: Our Journey | 28-11-2019

Page 2: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it
Page 3: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

Data Center Architecture

Data Source 1

External Systems

(...)

Data Source 2

Data Source 3

Page 4: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

Data Center Architecture

Data Source 1

External Systems

(...)

Staging Data Warehouse

Data Mart 1

Data Source 2

Data Source 3

Data Mart 2

Page 5: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

Data Center Architecture

Data Source 1

External Systems

(...)

Staging Data Warehouse

Data Mart 1 OLAP 1

Data Source 2

Data Source 3

Data Mart 2 OLAP 2

Page 6: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

Data Center Architecture

Data Source 1

External Systems

(...)

Azkaban

Staging Data Warehouse

Data Mart 1 OLAP 1

Data Source 2

Data Source 3

Data Mart 2 OLAP 2

Page 7: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

Data Center Architecture

Data Source 1

External Systems

(...)

Staging Data Warehouse

Data Mart 1 OLAP 1

Data Source 2

Data Source 3

Data Mart 2 OLAP 2

Azkaban

Page 8: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

Data Center Architecture

Data Source 1

External Systems

(...)

Staging Data Warehouse

Data Mart 1 OLAP 1

Data Source 2

Data Source 3

Data Mart 2 OLAP 2

Azkaban

Page 9: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

Azkaban

Data Center Architecture

Data Source 1

External Systems

(...)

Staging Data Warehouse

Data Mart 1 OLAP 1

Data Source 2

Data Source 3

Data Mart 2 OLAP 2

Page 10: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

Migration Steps

Move OLAP Server from Data Center to

Cloud

Move SQL Server from Data Center to

Cloud

Moving away from Azkaban and adopt

Airflow

Create data validation mechanisms

Page 11: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it
Page 12: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

● Templated arguments, like SQL

queries

● A lot of building blocks designed for

data engineering work

● Interface is not great to

browse historical runs

● Hard to rerun individual tasks

Azkaban vs Airflow

Azkaban● Task configuration separate

from code

● Dates as first class citizen; easily

trigger historical runs

Airflow

Page 13: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

What do we need to migrate?

Page 14: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

Big daily process

Page 15: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

Identifying and splitting up

Page 16: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

Introducing checkpoints

Organizing it in Airflow● Multiple pipelines instead of one

● Interdependent communication

● Multiple “checkpoints”

Page 17: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

Our new daily process(es!)

Time

Load to staging area325 tasks

Heavy calculations50 tasks

Loading data warehouse tables140 tasks

Process semantic layer20 tasks

Page 18: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

● Easier to read and reason

● Maintainable

● Separate logical units

● Less dependency management

● Interdependency checks can fail,

blocking the next step

● More code because of the

interdependency checks

Checkpoints approach

Disadvantages● Easier to test● Generally slower

Advantages

Page 19: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

Ignoring the black box

Observing behaviour

?Source Staging

Page 20: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

Old code

Most code looked like this

def main():

result = run_query('some_query.sql')

filename = create_csv(result)

upload_to_gcs(filename, GCS_FILENAME)

load_to_mssql(GCS_FILENAME, MSSQL_TABLE)

Page 21: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

New code

Now we have this

OracleToMsSqlOperator(

sql='some_query.sql',

source_conn_id=’oracle’,

target_conn_id=’mssql’)

Page 22: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

Advantages

Now we have this● Configuration as code

○ Easier to read

○ Very easy to test● Less code to maintain

○ Written and maintained by Airflow contributors

○ Custom code is rare instead of the default

● Quicker to create new pipelines

Page 23: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

Summary

Airflow● All configuration now in code

● Building blocks for faster pipeline development

● Lot less code

● Manageable daily process

Page 24: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it
Page 25: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

Migration Steps

Move OLAP Server from Data Center to

Cloud

Move SQL Server from Data Center to

Cloud

Moving away from Azkaban and adopt

Airflow

Create data validation mechanisms

Page 26: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

SQL Server

SQL Server from Data Center to Cloud

Data CenterPhysical Server Cloud

Page 27: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

SQL Server

Amazon Relational Database Service (RDS)● Simple to setup and configure

● Supports multiple databases providers

● Patching the database software, backing up databases and

some other DBA tasks are managed by AWS itself

Page 28: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

SQL Server

Step by Step - SQL Server Migration to Cloud

New SQL Server Instance on RDS

Deploy DW onto new Instance

Populate historical tables

Configure daily ETL in Airflow

Data Validation tools

Page 29: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

SQL Server

Step by Step - SQL Server Migration to Cloud

MyDB: Properties: AllocatedStorage: "100" DBInstanceClass: db.m1.small Engine: sqlserver-se EngineVersion: "14.00.3015.40.v1" Type: "AWS::RDS::DBInstance"

Page 30: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

SQL Server

New SQL Server Instance on RDS

Deploy DW onto new Instance

Populate historical tables

Configure daily ETL in Airflow

Data Validation tools

Step by Step - SQL Server Migration to Cloud

Team City Deployment

Page 31: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

SQL Server

New SQL Server Instance on RDS

Deploy DW onto new Instance

Populate historical tables

Configure daily ETL in Airflow

Data Validation tools

Step by Step - SQL Server Migration to Cloud

Page 32: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

SQL Server

New SQL Server Instance on RDS

Deploy DW onto new Instance

Populate historical tables

Configure daily ETL in Airflow

Data Validation tools

Step by Step - SQL Server Migration to Cloud

Data Source 1

Data Source 2

Data Source 3ETL

Page 33: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

SQL Server

New SQL Server Instance on RDS

Deploy DW onto new Instance

Populate historical tables

Configure daily ETL in Airflow

Data Validation tools

Step by Step - SQL Server Migration to Cloud

Page 34: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

SQL Server

New SQL Server Instance on RDS

Deploy DW onto new Instance

Populate historical tables

Configure daily ETL in Airflow

Data Validation tools

Apache Beam

Step by Step - SQL Server Migration to Cloud

NBi

Data Validation

Page 35: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

Summary

SQL Server on RDS● We can easily scale our instance

● No server maintenance

● All configurations in code (Cloudformation) facilitates maintenance

● Backup mechanism offered by AWS needs good understanding

+

Page 36: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it
Page 37: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

Migration Steps

Move OLAP Server from Data Center to

Cloud

Move SQL Server from Data Center to

Cloud

Moving away from Azkaban and adopt

Airflow

Create data validation mechanisms

Page 38: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

OLAP Server

What is an OLAP database?● OLAP stands for OnLine Analytical Processing

● An OLAP database is a multi-dimensional array of data,

commonly referred as “cube”

● This technology used to facilitate query processing on

data warehouse.

Page 39: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

OLAP Server

OLAP on top of Data warehouse

Data warehouse

Report 1

Report 2

Report N

Page 40: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

OLAP Server

(SSAS)

Page 41: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

OLAP Server

How to migrate our OLAP Server?

?

Page 42: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

OLAP Server

Main Challenges

Page 43: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

OLAP Server

Main Challenges

No support for our OLAP technology

Page 44: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

OLAP Server

Main Challenges

No support for our OLAP technology

● Owning and support our VM

(EC2)

● Configure VM using “code” (no UI on Windows Server Core)

Page 45: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

OLAP Server

Main Challenges

Weekly Recycling (wipe)

Page 46: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

OLAP Server

Main Challenges

Weekly Recycling (wipe)

● Keep same machine

configurations after recycling

● Keep data in OLAP Server after recycling

Page 47: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

OLAP Server

1st step - AMI (basebox)

431 2

Page 48: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

OLAP Server

2nd step - Cloudformation (AWS Architecture)

21 43

Page 49: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

OLAP Server

3rd step - Configurations and Backups

21 2 3 4

Page 50: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

OLAP Server

4th step - Integration with our ETL pipeline

1 322 4

Page 51: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

OLAP Server

Integrate OLAP Server with Airflow

Partition 2019W04

Partition 2019W03

Partition 2019W02

...

Page 52: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

OLAP Server

Integrate OLAP Server with Airflow

Process Partition

Create Partition Partition 2019W04

Partition 2019W03

Partition 2019W02

...

Page 53: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

OLAP Server

Integrate OLAP Server with Airflow

Process Partition

Create Partition Partition 2019W04

Partition 2019W03

Partition 2019W02

...

Page 54: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

OLAP Server

Integrate OLAP Server with Airflow and... USERS

Partition 2019W04

Partition 2019W03

Partition 2019W02

...

Page 55: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

Summary

OLAP Server on EC2● We can easily scale our instance

● Infrastructure as Code facilitates maintenance

● Easy to rebuild machine if gets corrupted

● A lot of effort on understanding AWS & CI tools

● Load balancers don’t like SSAS

+

Page 56: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it
Page 57: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

Migration Steps

Move OLAP Server from Data Center to

Cloud

Move SQL Server from Data Center to

Cloud

Moving away from Azkaban and adopt

Airflow

Create data validation mechanisms

Page 58: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

Automated validation

Page 59: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

Automated validation

Same result set is important

PK value

1 A

2 B

3 C

PK value

4 D

3 C

2 B

SELECTTOP 3 *FROM FooORDER BY PK

Page 60: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

Automated validation

Getting the hashes

PK hash

1 c4ca4238

2 c81e728d

3 eccbc87e

PK hash

1 c4ca4238

2 c81e728d

3 a87ff679

Page 61: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

Automated validation

Comparing hashes

Sourcehashes

Targethashes Apache Beam

Not in source

Not in target

Different

Page 62: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

Automated validation

Grouping the outputTable Type Count

A not_in_target 0

not_in_source 5

different 1000

B not_in_target 20

not_in_source 0

different 500

Page 63: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

Automated validation

Daily report

Table Difference Difference yesterday

A 5000 0

B 300 300

C 20 10,000

Page 64: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

Automated validation

What’s different?

Table Primary Key Type

A 1 not_in_target

A 2 not_in_source

A 3 different

Page 65: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

Automated validation

Automated validation steps1. Get result set from source and target

2. Calculate hashes

3. Compare hashes, track differences

4. Store counts of differences in tracking tables

5. Talk through differences every day

Page 66: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

Custom validation

Page 67: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

Custom validation

Page 68: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

Custom validation

NBi

● Unit testing for Business Intelligence, based on NUnit

● For tables where the logic changed, so needs custom

validation

● For validating the OLAP Server output

Page 69: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

Summary

Validation● Automated validation for most of our data

● Custom validation for tables that changed

● Custom validation for important parts of

the OLAP Server

Apache Beam NBi

Page 70: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it
Page 71: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

What we gained

Page 72: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

What we learned

Lessons learned from this migration (1 / 2)● Not everything you have on data center will be supported by AWS as it is

● Less sqlserver DB monitoring capabilities in comparison to data center.

No server admin on RDS.

● Doing two migrations in parallel (Azkaban → Airflow, data center → AWS)

causes a much longer migration process, but the target environment shadows

your still active production environment.

Page 73: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

What we learned

Lessons learned from this migration (2 / 2) ● You should arrange training on AWS/DevOps upfront

● Think about infrastructure as code, both for Airflow pipelines as well as weekly

OLAP recycling: all is in code now, less in documentation and manual routines

● AWS flexibility allows you to scale your infrastructure with ease

Page 74: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

Timeline

Page 75: Gerard Toonstra | Migrating to the Cloud: Our …...What we learned Lessons learned from this migration (1 / 2) Not everything you have on data center will be supported by AWS as it

Gerard Toonstra| Data Discovery with Amundsen | 28-11-2019 | https://www.careersatcoolblue.com

Join at slido.com with #bigdata2019