Download - Migrating from PostgreSQL to MySQL at Cocolog, Japan's Largest Blog Community

Migrating from PostgreSQL to MySQL at Cocolog

Naoto Yokoyama, NIFTY Corporation

Garth Webb, Six Apart

Lisa Phillips, Six Apart

Credits:Kenji Hirohama, Sumisho Computer Systems Corp.

Agenda

1. What is Cocolog 2. History of Cocolog 3. DBP: Database Partitioning 4. Migration From PostgreSQL to MySQL

1. What is Cocolog

What is Cocolog

NIFTY Corporation Established in 1986 A Fujitsu Group Company NIFTY-Serve (licensed and interconnected with CompuServe) One of the largest ISPs in Japan

Cocolog First blog community at a Japanese ISP Based on TypePad technology by SixApart Several hundred million PV/month

History Dec/02/2003: Cocolog for ISP users launch Nov/24/2005: Cocolog Free for free launch April/05/2007: Cocolog for Mobile Phone launch

2008/04700 Thousand Users

Cocolog (Screenshot of home page)

Cocolog (Screenshot of home page)

TypePadCocolog

Cocolog template sets

Cocolog Growth (User) 　■ Cocolog 　■ Cocolog Free

phase1

phase2

phase3

phase4

Cocolog Growth (Entry) 　■ Cocolog 　■ Cocolog Free

phase1

phase2

phase3

phase4

Technology at Cocolog

Core System Linux 2.4/2.6 Apache 1.3/2.0/2.2 　＆ mod_perl Perl 5.8+CPAN PostgreSQL 8.1 MySQL 5.0 memcached/TheSchwartz/cfengine

Eco System LAMP,LAPP,Ruby+ActiveRecord, Capistrano Etc...

Monitoring Management Tool

Proprietary in-house development with PostgreSQL, PHP, and Perl

Monitoring points (order of priority) response time of each post number of spam comments/trackbacks number of comments/trackbacks source IP address of spam number of entries number of comments via mobile devices page views via mobile devices time of batch completion amount of API usage bandwidth usage

DB Disk I/O Memory and CPU usage time of VACUUM analyze

APP number of active processes CPU usage Memory usage

Hard

DB

Service

APL

2. History of Cocolog

Phase1 2003/12 ～ (Entry: 0.04Million)

Register

PostgreSQL

NAS

WEB

Static contents Published

Before DBP10servers

TypePad

PodcastPortal

Profile Etc..

Phase2 2004/12 ～ (Entry: 7Million)

Rich templatePublish Book

Tel Operator Support

NAS

WEB


PostgreSQL

Register

TypePad2004/12 ～

2005/5 ～

Before DBP50servers

Phase2 - Problems

The system is tightly coupled. Database server is receiving from multiple poi

nts. It is difficult to change the system design and

database schema.


NAS

WEB


Web-API

memcached

PodcastPortal

Profile Etc..

PostgreSQL



RegisterTypePad

Before DBP200servers


Web-API

NASW

EB


memcached

Atom

MobileWEB



Register

Typepad

PostgreSQL

Before DBP300servers

Now 2008/4 ～

Web-API

NASW

EB


memcached

Atom

MobileWEB

Typepad



Register

Multi MySQL

After DBP150servers

3. TypePad Database Partitioning

Steps for Transitioning

• Server Preparation Hardware and software setup

• Global Write Write user information to the global DB

• Global Read Read/write user information on the global DB

• Move Sequence Table sequences served by global DB

• User Data Move Move user data to user partitions

• New User Partition All new users saved directly to user partition 1

• New User Strategy Decide on a strategy for the new user partition

• Non User Data Move Move all non-user owned data

Storage

TypePad Overview (PreDBP)

Database(Postgres)

Static Content (HTML, Images, etc)

ApplicationServer

WebServer

TypeCastServer

ATOMServer

MEMCACHED

Data Caching servers to reduce DB load

Dedicated Server for TypeCast (via ATOM)

https(443)http(80)

http(80) : atom apimemcached(11211)

postgres(5432)

MailServer

Internet

nfs(2049)

ADMIN(CRON)Server

smtp(25) / pop(110)Blog Readers

Blog Owners

Mobile Blog Readers

smtp(25) / pop(110)

Cron Server for periodic asynchronous tasks

TypePadTypePad

TypePad

Non-User Role

Why Partition?

TypePad

User Role(User0)

All inquires (access) go to one DB(Postgres)

After DBPCurrent setup

Inquiries (access) are divided among several DB(MySQL)

TypePadTypePad

TypePadTypePad

GlobalRole

Non-UserRole

User Role(User1)

User Role(User2)

User Role(User3)

Non-User Role

Server Preparation

TypePad

User Role(User0)

DB(PostgreSQL)

User Role(User1)

User Role(User2)

User Role(User3)

GlobalRole

Non-UserRole

New expanded setup

DB(MySQL) for partitioned data

Current Setup

Job Server+ TypePad + Schwartz

SchwartzDB

User information is partitioned

Maintains user mapping and primary key generation Stores job

details

Server for executing Jobs

※Grey areas are not used in current steps

Asynchronous Job Server

Information that does not need to be partitioned (such as session information)

Global WriteCreating the user map

Non-User Role

TypePad

User Role(User0)

DB(PostgreSQL)

User Role(User1)

User Role(User2)

User Role(User3)

GlobalRole

Non-UserRole


SchwartzDB

①

②

Explanation

①： For new registrations only, uniquely identifying user data is written to the global DB ②： This same data continues to be written to the existing DB



Maintains user mapping and primary key generation


Global ReadUse the user map to find the user partition

Non-User Role

TypePad

User Role(User0)

DB(PostgreSQL)

User Role(User1)

User Role(User2)

User Role(User3)

GlobalRole

Non-UserRole


SchwartzDB

Explanation ①： Migrate existing user data to the global DB ②： At start of the request, the application queries global DB for the location of user data ③： The application then talks to this DB for all queries about this user. At this stage the global DB points to the user0 partition in all cases.



①Migrate existing

user data


②

③


Move SequenceMigrating primary key generation

Non-User Role

TypePad

User Role(User0)

DB(PostgreSQL)

User Role(User1)

User Role(User2)

User Role(User3)

GlobalRole

Non-UserRole


SchwartzDB

Explanation ①： Postgres sequences (for generating unique primary keys) are migrated to tables on the global DB that act as “pseudo-sequences”. ② Application requests new primary keys from global DB rather than the user partition.



①


Migrate sequence management


②

User Data MoveMoving user data to the new user-role partitions

Non-User Role

TypePad

User Role(User0)

DB(PostgreSQL)

User Role(User1)

User Role(User2)

User Role(User3)

GlobalRole

Non-UserRole


SchwartzDB

Explanation ①： Existing users that should be migrated by Job Server are submitted as new Schwartz jobs. User data is then migrated asynchronously ②： If a comment arrives while the user is being migrated, it is saved in the Schwartz DB to be published later. ③： After being migrated all user data will exist on the user-role DB partitions ④： Once all user data is migrated, only non-user data is on Postgres


Stores job details




①

②


③

Migrating each user data


④

New User PartitionNew registrations are created on one user role partition

Non-User Role

TypePad

User Role(User0)

DB(PostgreSQL)

User Role(User1)

User Role(User2)

User Role(User3)

GlobalRole

Non-UserRole


SchwartzDB

Explanation ①： When new users register, user data is written to a user role partition. ②： Non-user data continues to be served off Postgres




①

②



New User StrategyPick a scheme for distributing new users

Non-User Role

TypePad

User Role(User0)

DB(PostgreSQL)

User Role(User1)

User Role(User2)

User Role(User3)

GlobalRole

Non-UserRole


SchwartzDB

Explanation ①： When new users register, user data is written to one of the user role partitions, depending on a set distribution method (round robin, random, etc) ②： Non-user data continues to be served off Postgres




①

②



Non User Data MoveMigrate data that cannot be partitioned by user

Non-User Role

TypePad

User Role(User0)

DB(PostgreSQL)

User Role(User1)

User Role(User2)

User Role(User3)

GlobalRole

Non-UserRole


SchwartzDB

Explanation ①： Migrate non-user role data left on PostgreSQL to the MySQL side.




①


Migrate non-User data



Data migration done

Non-User Role

TypePad

User Role(User0)

DB(Postgres)

User Role(User1)

User Role(User2)

User Role(User3)

GlobalRole

Non-UserRole


SchwartzDB

Explanation

①： All data access is now done through MySQL ②： Continue to use The Schwartz for asynchronous jobs


Stores job details




①


①

② Asynchronous Job Server


Storage

The New TypePad configuration

Database(MySQL)

Static Content (HTML,

Images, etc)

ApplicationServer

WebServer

TypeCastServer

ATOMServer

MEMCACHED

Data Caching servers to reduce DB load

Dedicated Server for TypeCast (via ATOM)

https(443)http(80)

http(80) : atom api

memcached(11211)

MySQL(3306)

MailServer

Internet

nfs(2049)

ADMIN(CRON)Server

smtp(25) / pop(110)

Blog Readers

Blog Owners (management

interface)

Mobile Blog Readers

smtp(25) / pop(110)

Cron Server for periodic asynchronous tasks

JobServer

TheSchwartz server for running ad-hoc jobs

asynchronously

4. Migration from PostgreSQL to MySQL

DB Node Spec History

Time OS(RedHat) CPU Xeon MEM DiskArray

2003/12

2007/11

7.4(2.4.9) 1.8GHz/512k×1 1GB No

ES2.1(2.4.9) 3.2GHz/1M×2 4GB No

ES2.1(2.4.9) 3.2GHz/1M×2 4GB Yes

AS2.1(2.4.9) 3.2GHz/1M×4 12GB Yes

AS4 (2.6.9) 3.2GHz/1M×4 12GB Yes

AS4 (2.6.9) MP3.3GHz/1M×4

〔 2Core×4 〕16GB Yes

History of scale up PostgreSQL server, Before DBP

DB DiskArray Spec [FUJITSU ETERNUS8000]

Best I/O transaction performance in the world 146GB (15 krpm) * 32disk with RAID - 10 MultiPath FibreChannel 4Gbps QuickOPC (One Point Copy)

OPC copy functions let you create a duplicate copy of any data from the original

at any chosen time.

http://www.computers.us.fujitsu.com/www/products_storage.shtml?products/storage/fujitsu/e8000/e8000

History of scale up PostgreSQL server, Before DBP

Scale out MySQL servers, After DBP

A role configuration Each role is configured as HA cluster

HA Software: NEC ClusterPro

Shared Storage


PostgreSQL

FibreChannel SAN

DiskArray

…

heart beat

MySQLRole3

MySQLRole2

MySQLRole1

TypePadApplication


Backup Replication w/ Hot backup


PostgreSQL

FibreChannel SAN

DiskArray

…

heart beat

MySQLRole3

MySQLRole2

MySQLRole1

MySQLBackupRole

TypePadApplication

mysqld mysqld mysqld

rep rep rep

opc

mysqldmysqldmysqld

Troubles with PostreSQL 7.4 – 8.1

Data size over 100 GB 40% is index

Severe Data Fragmentation VACUUM

“VACUUM analyze” cause the performance problem Takes too long to VACUUM large amounts of data dump/restore is the only solution for de-fragmentation

Auto VACUUM We don’t use Auto VACUUM since we are worried about

latent response time

Troubles with PostgreSQL 7.4 – 8.1

Character set PostgreSQL allow the out of boundary UTF-8

Japanese extended character sets and multi

bytes character sets which normally should

come back with an error - instead of

accepting them.

“Cleaning” data

Removing characters set that are out of the boundries UTF-8 character sets.

Steps PostgreSQL.dumpALL Split for Piconv UTF8 -> UCS2 -> UTF8 & Merge PostgreSQL.restore

dump Split UTF8->UCS2->UTF8 Mergerestore

TypePadTypePad

Migration from PostgreSQL to MySQL using TypePad script

Steps PostgreSQL -> PerlObject & tmp publish

-> MySQL -> PerlObject & last publish diff tmp ＆ last Object （ data check ） diff tmp ＆ last publish （ file check ）

PostgreSQL

Document

Object

tmp

Document

Object

lastFile check

data check

Troubles with MySQL

convert_tz function doesn't support the input value outside the

scope of Unix Time

sort order different sort order without “order by” clause

Cocolog Future Plans

Dynamic Job queue

Consulting by

Sumisho Computer Systems Corp. System Integrator first and best partner of MySQL in Japan

since 2003 provide MySQL consulting, support, training

service HA Maintenance

online backup Japanese character support

Questions

Download - Migrating from PostgreSQL to MySQL at Cocolog, Japan's Largest Blog Community

Top Related