moving the elephant in the room: data migration at scale

Post on 14-Jan-2017

306 Views

Category:

Software

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Data Migration at ScaleMOVING THE ELEPHANT IN THE ROOM

2

· BDPA Los Angeles Chapter· 4 year HSCC participant

· Columbia University, CC ‘14· Conductor, Inc.· linkedin.com/in/calltyrone

WHO AM I?

3

· Web Presence Management· SAAS· Big data

· Collect 6TB of raw web data a week· Scalable Collection & ETL pipelines· Final Product: reports

· 6 years running· Tons of data!

CONDUCTOR, INC.

4

· Growth· More users· More data

· Systems have to keep up!

WHY WE CARE ABOUT SCALABILITY

5

HORIZONTAL SCALING

6

VERTICAL SCALING

7

· Yesterday’s solution is tomorrow’s problem· Under-prioritized· It’s hard!

· Can require massive changes· No cure-all

SCALABILITY IN THE REAL WORLD

8

· Save money· Improve performance· Clear the way for progress

WHY REPLACE AN UNSCALABLE SYSTEM?

9

· If it ain’t broke…· Significant Resource Investment

· Time· Money

· Software Downtime· Data Quality Concerns

WHY NOT?

10

1. Identify an unscalable system2. Discover and vet a suitable successor3. Replace the legacy system with the new system

· while minimizing risk and cost

Simple, no???

YOUR TASK, AT A GLANCE

TALKING ABOUT THE ELEPHANTIdentifying an Unscalable System

12

· MySql· Normalized data model

· Helpful for initial modeling of our problem space· Hosted by a single, very powerful machine

OverviewCASE STUDY: LEGACY REPORTING DATABASE

Talking about the Elephant: Diagnosing an Unscalable System

13

· Powerful hardware isn’t cheap.· Vertical Scaling· Obsolete Schema· Difficult to backup· Queries aren’t getting any faster.

UnsustainableCASE STUDY: LEGACY REPORTING DATABASE

Talking about the Elephant: Diagnosing an Unscalable System

14

· If your solution…· Scales vertically· Prevents progress· Can’t perform at scale· Is difficult/slow/expensive to upgrade

…It’s time for a change!

SEE FOR YOURSELF

Talking about the Elephant: Diagnosing an Unscalable System

FINDING A BIGGER ROOMVetting Scalable Alternatives

16

· Price-efficient· Easy to maintain· Scales Horizontally

WHAT TO LOOK FOR

Finding a Bigger Room: Vetting Scalable Alternatives

17

· Write once, read many· De-normalized reports· High storage capacity· High Availability

Our Use CaseCASE STUDY: AWS S3 DATASTORE

Tyrone
I

18

· Write once, read many· Decent write performance, great read performance

· De-normalized reports· Flat files

· High storage capacity· No defined space limit

· High Availability· Configurable file replication

Technical OverviewCASE STUDY: AWS S3 DATASTORE

Finding a Bigger Room: Vetting Scalable Alternatives

19

· Cheap· Cloud-based· Architecture facilitates testing· Easy to back up

BenefitsCASE STUDY: AWS S3 DATASTORE

Finding a Bigger Room: Vetting Scalable Alternatives

20

· “Eventual Consistency”· Switching to non-relational storage is nontrivial

· Application code must change· Migration path gets complicated

CaveatsCASE STUDY: AWS S3 DATASTORE

Finding a Bigger Room: Vetting Scalable Alternatives

MOVING THE ELEPHANTMigrating Legacy Data to the New System

22

· Time Frame· Scheduling Constraints

· Operational Cost· Resource Constraints

· Standards for data parity

INITIAL CONSIDERATIONS

Moving the Elephant: Migrating Legacy Data to the New System

23

· Two-month finish line· Developed COGS models· Built data validation software

CASE STUDY: OUR UPFRONT PLANNING

Moving the Elephant: Migrating Legacy Data to the New System

24

· Can be scaled up or down· Speed up to save time· Slow down to save resources

· Can be run in a testing capacity· Configurable data sources/sinks· Configurable hardware resource use

IDEAL MIGRATION SOFTWARE CHARACTERISTICS

Moving the Elephant: Migrating Legacy Data to the New System

25

· Oozie and Hive· Controllable time/resource tradeoff· Testable in a qa environment

OUR MIGRATION SOFTWARE

26

· Easy to track progress· Enables concurrency· Dilutes failure risks· E.g. Conductor “Time Periods”

AN INCREMENTAL MIGRATION: PARTITIONING DATA

Moving the Elephant: Migrating Legacy Data to the New System

27

· Limit client exposure to subtler bugs· Incorporate customer feedback· Demonstrate progress early· E.g. Conductor Searchlight 3.0 Beta Program

AN INCREMENTAL RELEASE

28

YOU CAN DO IT!

29

QUESTIONS?Thanks for Listening!

30

(We’re Hiring!)

top related