tungsten university: load a vertica data warehouse with mysql data

35
©Continuent 2013 Tungsten University: Load a Vertica Data Warehouse with MySQL Data Robert Hodges CEO, Continuent

Upload: continuent

Post on 25-Jun-2015

423 views

Category:

Technology


0 download

DESCRIPTION

Continuent Tungsten offers real-time replication from MySQL to a variety of DBMS types including Vertica. In this Tungsten University webcast we will show you the details of setting up MySQL-to-Vertica replication, including the following topics: • Introduction to Continuent Tungsten features for data warehouse loading • Installation for MySQL to Vertica replication • Best practices for applications: charsets, schema design, time zones, etc. • Techniques for filtering and transforming data • Performance tuning and trouble shooting • Adapting batch loading for new use cases The webinar includes technical information and a live demo to help you get your own data warehouse loading application up and running quickly.

TRANSCRIPT

Page 1: Tungsten University: Load A Vertica Data Warehouse With MySQL Data

©Continuent 2013

Tungsten University: Load a Vertica Data

Warehouse with MySQL DataRobert Hodges

CEO, Continuent

Page 2: Tungsten University: Load A Vertica Data Warehouse With MySQL Data

©Continuent 2013

Introducing Continuent

2

• The leading provider of clustering and replication for open source DBMS

• Our Product: Continuent Tungsten

• Clustering - Commercial-grade HA, performance scaling and data management for MySQL

• Replication - Flexible, high-performance data movement

Page 3: Tungsten University: Load A Vertica Data Warehouse With MySQL Data

©Continuent 2013

OLTP and Data Warehouse Fundamentals

3

Page 4: Tungsten University: Load A Vertica Data Warehouse With MySQL Data

©Continuent 2013

The Contenders

4

Popular open source RDBMS for transaction

processing

Popular closed source RDBMS

for analytics

Page 5: Tungsten University: Load A Vertica Data Warehouse With MySQL Data

©Continuent 2013

Storage Layout in MySQL

5

id cust_id prod_id ...

1 335301 532 ...

2 2378 6235 ...

3 ... ... ...

Sales Table

id sku type

532 C00135 consumer

533 S09957 specialty

... ...

Product Tableprod_id id

532 1

6235 2

... ...

Prod_ID Index

Row format makes table scans very

slow

Indexes slow OLTP

Low/no data compression

Limited indextypes

Limited join

types

Page 6: Tungsten University: Load A Vertica Data Warehouse With MySQL Data

©Continuent 2013

Storage Layout in Vertica

6

Sales Table

cust_id

335301

2378

...

prod_id

532

6235

...

Fast scans on columns

Updates to single rows are

hideously slow

quantity

1

3

...

id

1

2

3

Every column is an index

Good compression

id

532

533

...

sku

C00135

S09957

...

type

consumer

specialty

...

Product Table

Fast joins with parallel

query

Page 7: Tungsten University: Load A Vertica Data Warehouse With MySQL Data

©Continuent 2013

Traditional ETL Problems

7

MySQL

SalesTable

SalesTable

LoadTransferExtract

Date columns = intrusive

Batch-oriented = not timely

Scan for changes = performance hit

Page 8: Tungsten University: Load A Vertica Data Warehouse With MySQL Data

©Continuent 2013

Questions for Real-Time Loading

• Do I need to transform data and if so how?

• Do I need to clean up bad information?

• Do I need to process UPDATE/DELETE too?

• Do I need to load from multiple sources?

• How timely do loads need to be?

• What if something fails?

8

Page 9: Tungsten University: Load A Vertica Data Warehouse With MySQL Data

©Continuent 2013

Tungsten Replicator Basics

9

Page 10: Tungsten University: Load A Vertica Data Warehouse With MySQL Data

©Continuent 2013

Real-Time Data Replication

10

MySQL

SalesTable

SalesTable

Fast propagation = timely

No SQL changes = transparent

Automatic change capture = low impact

DBMSLogs

Data Replication

Page 11: Tungsten University: Load A Vertica Data Warehouse With MySQL Data

©Continuent 2013

Tungsten Master/Slave in Action

11

Master

(Transactions + Metadata)

Slave

THL

DBMSLogs

Replicator

(Transactions + Metadata)

THLReplicator

Download transactions via network

Apply using JDBC

Page 12: Tungsten University: Load A Vertica Data Warehouse With MySQL Data

©Continuent 2013

Pipelines with Parallel Apply

12

Extract Filter Apply

StageExtract Filter Apply

StageStage

Pipeline

RemoteMaster

TransactionHistory Log

ParallelQueue

SlaveDBMS

Extract Filter ApplyExtract Filter ApplyExtract Filter Apply

(Assign Shard ID)

Page 13: Tungsten University: Load A Vertica Data Warehouse With MySQL Data

©Continuent 2013

Real-Time Batch Loading

13

MySQL Tungsten Master Replicator

Service my2vr

MySQLExtractorSpecial Filters* pkey - Fill in pkey info* colnames - Fill in names* replicate - Ignore tables

binlog_format=row

Tungsten Slave Replicator

Service my2vr

MySQLBinlog

CSVFilesCSVFilesCSVFilesCSVFilesCSVFiles

Large transaction batches to leverage load parallelization

Single transactions from OLTP operations

Page 14: Tungsten University: Load A Vertica Data Warehouse With MySQL Data

©Continuent 2013

Batch Loading--The Gory Details

14

Replicator

Service my2vrTransactions from master

CSVFilesCSVFilesCSVFiles

StagingTablesStagingTablesStagingTables

Base Tables

Base Tables

Base Tables

Merge Script

(or)COPY

directly to base tables

COPY to stage tables SELECT to

base tables

Page 15: Tungsten University: Load A Vertica Data Warehouse With MySQL Data

©Continuent 2013

Setting Up MySQL to Vertica Replication

15

Page 16: Tungsten University: Load A Vertica Data Warehouse With MySQL Data

©Continuent 2013

DEMO

16

MySQL to Vertica replication with some bells and a whistle

MySQL

db01db02db03

db01renamed02

Xsysbenchsysbenchsysbench

Page 17: Tungsten University: Load A Vertica Data Warehouse With MySQL Data

©Continuent 2013

Get the Code

wget --no-check-certificate https://s3.amazonaws.com/files.continuent.com/builds/nightly/tungsten-2.0-snapshots/tungsten-replicator-2.1.0-285.tar.gz

tar -xf tungsten-replicator-2.1.0-285.tar.gz

cd tungsten-replicator-2.1.0-285

17

Page 18: Tungsten University: Load A Vertica Data Warehouse With MySQL Data

©Continuent 2013

Installing MySQL Master

18

tools/tungsten-installer --master-slave -a \ --service-name=mysql2vertica \ --master-host=mysql1 \ --cluster-hosts=mysql1 \ --datasource-user=tungsten \ --datasource-password=secret \ --home-directory=/opt/continuent \ --buffer-size=100 \ --java-file-encoding=UTF8 \ --java-user-timezone=GMT \ --mysql-use-bytes-for-string=false \ --svc-extractor-filters=replicate,colnames,pkey \ --property=replicator.filter.pkey.addPkeyToInserts=true \ --property=replicator.filter.pkey.addColumnsToDeletes=true \ --property=replicator.filter.replicate.do=db01.*,db02.* \ --start-and-report

Page 19: Tungsten University: Load A Vertica Data Warehouse With MySQL Data

©Continuent 2013

Installing Vertica Slave

19

$ tools/tungsten-installer --master-slave -a \ --service-name=mysql2vertica \ --home-directory=/opt/continuent \ --cluster-hosts=vertica1 \ --master-host=mysql1 \ --datasource-type=vertica \ --datasource-user=dbadmin \ --datasource-password=secret \ --datasource-port=5433 \ --batch-enabled=true --batch-load-template=vertica6 \ --vertica-dbname=bigdata \ --java-user-timezone=GMT \ --java-file-encoding=UTF8 \ --svc-applier-filters=dbtransform \ --property=replicator.filter.dbtransform.from_regex1=db02 \ --property=replicator.filter.dbtransform.to_regex1=renamed02 \ --property=replicator.stage.q-to-dbms.blockCommitRowCount=25000 \ --start-and-report

Page 20: Tungsten University: Load A Vertica Data Warehouse With MySQL Data

©Continuent 2013

Generate Schema Using ddlscan

20

•Data types?•Column lengths?•Naming conventions?•Staging tables?

MySQL Tables

ddlscan

Page 21: Tungsten University: Load A Vertica Data Warehouse With MySQL Data

©Continuent 2013

Tungsten ddlscan Utility

cd /opt/continuent/tungsten/tungsten-replicator/bin

# Base table generation../ddlscan -template ddl-mysql-vertica.vm \ -db db01 -user tungsten -pass secret >> ddl.sql

# Staging table generation./ddlscan -template ddl-mysql-vertica-staging.vm \ -db db01 -user tungsten -pass secret >> ddl.sql

# Load into Verticavsql -Udbadmin -wsecret < ddl.sql

21

Page 22: Tungsten University: Load A Vertica Data Warehouse With MySQL Data

©Continuent 2013

Checking Status

# Checking status on mastertrepctl -host logos1 heartbeattrepctl -host logos1 status

# Checking status on slavetrepctl -host vertica1 status

# Checking detailed performance of apply task. trepctl -host vertica1 status -name tasks

22

Page 23: Tungsten University: Load A Vertica Data Warehouse With MySQL Data

©Continuent 2013

Application Tips and Tricks

23

Page 24: Tungsten University: Load A Vertica Data Warehouse With MySQL Data

©Continuent 2013

Application Design Practices

24

• Primary keys on all tables

• (Tungsten requires single column keys)

• Clean schema design *really* helps

• UTF-8 character set--or at least be consistent

• Use GMT timezone--or be very consistent about dates

• Use row replication on MySQL master

Page 25: Tungsten University: Load A Vertica Data Warehouse With MySQL Data

©Continuent 2013

Transforming Data -- Replicator Filters

25

• Tables to ignore/include?

• Schema/table/column renaming?

• Map names to upper/lower case?

• Drop data?

tungsten-installer --master-slave -a \ --service-name=mysql2vertica \ ... --svc-extractor-filters=pkey,colnames,replicate \ --property=replicator.filter.replicate.do=db01.*,db02.*\ ...

Page 26: Tungsten University: Load A Vertica Data Warehouse With MySQL Data

©Continuent 2013

List of Commonly Used Filters

26

• CDC -- Transform log to record of changes

• colnames -- Add column names

• dbtransform -- Change db name only

• enumtostring -- Make MySQL enums a string

• pkey -- Add primary key metadata

• rename -- Rename db/table/column

• replicate -- Replicate/don’t replicate tables

• zerodate2null -- Make MySQL ‘0’ dates null

Page 27: Tungsten University: Load A Vertica Data Warehouse With MySQL Data

©Continuent 2013

Transforming Data -- Staging Server(s)

27

OLTP Servers

StagingServer with Triggers/SQL

Vertica Cluster

Page 28: Tungsten University: Load A Vertica Data Warehouse With MySQL Data

©Continuent 2013

Transforming Data -- Merge Script Hacks

28

# Hacked load script for Vertica--deletes always precede inserts, so# inserts can load directly.

# Extract deleted data keys and put in temp CSV file for deletes. !egrep '^"D",' %%CSV_FILE%% |cut -d, -f4 > %%CSV_FILE%%.deleteCOPY %%STAGE_TABLE_FQN%% FROM '%%CSV_FILE%%.delete' DIRECT NULL 'null' DELIMITER ',' ENCLOSED BY '"'

# Delete rows using an IN clause. You could also set a column value to # mark deleted rows. DELETE FROM %%BASE_TABLE%% WHERE %%BASE_PKEY%% IN (SELECT %%STAGE_PKEY%% FROM %%STAGE_TABLE_FQN%%)

# Load inserts directly into base table from a separate CSV file. !egrep '^"I",' %%CSV_FILE%% |cut -d, -f4- > %%CSV_FILE%%.insert

COPY %%BASE_TABLE%% FROM '%%CSV_FILE%%.insert' DIRECT NULL 'null' DELIMITER ',' ENCLOSED BY '"'

Page 29: Tungsten University: Load A Vertica Data Warehouse With MySQL Data

©Continuent 2013

Provisioning -- Using CSV

29

mysql> SELECT * from sales INTO OUTFILE ‘sales.csv’;...(Fix up data if necessary)...vsql> COPY sales FROM 'sales.csv' DIRECT NULL 'null' DELIMITER ',' ENCLOSED BY '"';

Page 30: Tungsten University: Load A Vertica Data Warehouse With MySQL Data

©Continuent 2013

Provisioning Using a Sandbox Server

30

OLTP Server

Temporary Sandbox Server

Vertica Cluster

1. Restore logical backup

2. Replicate restored transactions

3. Replicate normally after restore loads

Page 31: Tungsten University: Load A Vertica Data Warehouse With MySQL Data

©Continuent 2013

Parallel Provisioning from Sandbox

31

OLTP Server

Temporary Sandbox Server

Vertica Cluster

1. Restore logical backup

2. Replicate restored data in parallel

3. Replicate normally after restore loads

Page 32: Tungsten University: Load A Vertica Data Warehouse With MySQL Data

©Continuent 2013

Complex Topologies: Fan-In

32

VerticaCluster

logos1

Master

logos2

Master

logos2

SlaveServices

logos1

Page 33: Tungsten University: Load A Vertica Data Warehouse With MySQL Data

©Continuent 2013

Wrapping Up

33

Page 34: Tungsten University: Load A Vertica Data Warehouse With MySQL Data

©Continuent 2013

Tungsten University Sessions

34

• Load a Vertica Data Warehouse with MySQL Data (May 30 10am PDT and June 4, 4pm CEST)

Send feedback to: [email protected]

Page 35: Tungsten University: Load A Vertica Data Warehouse With MySQL Data

©Continuent 2012.

Continuent Web Page:http://www.continuent.com

Tungsten Replicator 2.0:http://code.google.com/p/tungsten-replicator

Our Blogs:http://scale-out-blog.blogspot.comhttp://!yingclusters.blogspot.comhttp://datacharmer.org/bloghttp://www.continuent.com/news/blogs

560 S. Winchester Blvd., Suite 500 San Jose, CA 95128 Tel +1 (866) 998-3642 Fax +1 (408) 668-1009e-mail: [email protected]