drupal campchicago2010.rachel.datamigration

40
Drupal Migration Migrating 100,000 pages of content From Legacy CMS to Drupal Rachel Jaro Solutions Architect at PrometSource www.prometsource.com

Upload: andrew-kucharski

Post on 20-Jan-2015

385 views

Category:

Documents


0 download

DESCRIPTION

Promet Source - Rachel Joaro - Drupal Camp Chicago presentation on Data Migration. Migrating 100,000 pages of contentFrom Legacy CMS to Drupal

TRANSCRIPT

Page 1: Drupal campchicago2010.rachel.datamigration

Drupal MigrationMigrating 100,000 pages of content

From Legacy CMS to Drupal

Rachel JaroSolutions Architect at PrometSource

www.prometsource.com

Page 2: Drupal campchicago2010.rachel.datamigration

OverviewWe’ll talk about:Successful migration recipeCommon questions you should be asking

before you startTop 3 tools to do migration in DrupalIssues

Tools to use in URL RewritingFile management Comparison in D6

TestingDeploying Solution

Page 3: Drupal campchicago2010.rachel.datamigration

Data Migration

“Data migration solutions extract data from a source system, correct errors, reformat, restructure and load the data into a replacement target system”.

It sounds simple, but poorly managed data migration is the most common cause of failure in implementing a replacement system.

-- Gershon Pick, March 2001

Page 4: Drupal campchicago2010.rachel.datamigration

Successful Migration Recipe

Page 5: Drupal campchicago2010.rachel.datamigration

Planning

Source: http://www.flickr.com/photos/bjornmeansbear/4380595283/

Page 6: Drupal campchicago2010.rachel.datamigration

Plan: What to AskNode types (Content separation, fields)

Do you want to separate contents into pages, articles, biography, news, etc.

What fields are needed for each node?Who can access it?Do you really need that content type? Or can

we just use taxonomies instead for similar contents.

Page 7: Drupal campchicago2010.rachel.datamigration

Plan: What to AskTaxonomy (Categorization, tags)

Do you need to categorize nodes? Would you need different access?What kind of taxonomy groups or vocabularies

you would need?Permission (per nodes) and User Roles

Who are going to use the site? What are particularly their access rights?

Page 8: Drupal campchicago2010.rachel.datamigration

Plan: What to AskNew URL mapping

Do you need to make SEO friendly URLs?Files, files permissions and file directory

Do you need advance file management or document management tool?

Do you need simpler solutions? How simple is that. Do you need access rights for each folder?Do you need browser type interface to access

them?What kind of files do you need to store? Images,

pdfs?

Page 9: Drupal campchicago2010.rachel.datamigration

Build

Page 10: Drupal campchicago2010.rachel.datamigration

RequirementsUse CSV files to import dataDivide migration into group or sectionsMap and replace old URL to SEO friendly

URLBefore: 05-200.htm

Page 11: Drupal campchicago2010.rachel.datamigration

Data in CSV ExampleDecember 13, 2005 3:39:54 PM||||||||||December 13, 2005||||||||||Report

Spotlights Need for Reform in Jackpot Jurisdictions||||||||||/press/releases/2005/december/||||||||||05-200||||||||||{UUID}|||||||||| Economics^^^^^^^^^^Economy ||||||||||

<p> Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. </p>

<p> Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. </p>

$$$$$$$$$$

Separator: ||||||||||End of Row: $$$$$$$$$$

Page 12: Drupal campchicago2010.rachel.datamigration

Content Type Division

Example: CNN.comDivide migration sequences into US, World, Politics, Justice, etc

Page 13: Drupal campchicago2010.rachel.datamigration

Solutions/ToolsTW and Migrate modules Combonode_import()Drush + custom script

Page 14: Drupal campchicago2010.rachel.datamigration

TW & Migrate Module Combohttp://drupal.org/project/tw

Supports Migrate module to run views of source data

http://drupal.org/project/migratea flexible framework for migrating content

Page 15: Drupal campchicago2010.rachel.datamigration

Migrate ModuleFeatures:users browse their legacy data using viewssupport for creating Drupal nodes, users, and

comments is includedhooks permit migration of other types of

content.provides a dashboard for running mini

migrationsDrush support

Page 16: Drupal campchicago2010.rachel.datamigration

Why I did not choose migrateImporting to mysql was not an option. CSV

were used insteadCannot map old URL to new URL

Page 17: Drupal campchicago2010.rachel.datamigration

node_import()http://drupal.org/project/node_importFeatures:Easy to learn, Point and clickUses CSV to upload contentsCan easily delete previous imported dataCan download errors when import failed for

easy reference to fix issues

Page 18: Drupal campchicago2010.rachel.datamigration

node_import() ProblemsI can’t define map old URL to new URLNo drush supportIt doesn’t save my old settings for a csv.

Page 19: Drupal campchicago2010.rachel.datamigration

Drush + Custom script

Flexibility - I can do whatever I want with the data

Page 20: Drupal campchicago2010.rachel.datamigration

Create your own migration script

[demo]

Page 21: Drupal campchicago2010.rachel.datamigration

IssuesFile ManagementURL Rewriting

Page 22: Drupal campchicago2010.rachel.datamigration

File ManagementClient requirementsIntuitiveHas wysiwyg supportAccess control – upload, edit, delete, revise

files by different rolesRevision control – optional but good to haveLimited time!

Page 23: Drupal campchicago2010.rachel.datamigration

File Management Modules

*DbFm was not included due to problems encountered during tests in D6

Page 24: Drupal campchicago2010.rachel.datamigration

URL Rewriting

Source: http://www.flickr.com/photos/randomfactor/483264915/

Page 25: Drupal campchicago2010.rachel.datamigration

URLs Rewriting SolutionNot recommended.htaccess

Too many URL to handle. Too much server load

Recommendedpathauto + path_redirect modules

automated alias settings 301 redirect set

global redirect

Additional reference:http://acquia.com/blog/migrating-drupal-way-part-ii-saving-those-old-urls

Page 26: Drupal campchicago2010.rachel.datamigration

URL Checkerhttp://drupal.org/project/linkchecker

Page 27: Drupal campchicago2010.rachel.datamigration

Access control Alternative/default/files/PressReleases/default/files/Documents/default/files/International

/default/files/International/America/default/files/International/England/default/files/International/Asia

Page 28: Drupal campchicago2010.rachel.datamigration

Test, Test and did I say Test?

Source: http://www.flickr.com/photos/paperpariah/2424107350/

Page 29: Drupal campchicago2010.rachel.datamigration

Common problemsBroken linksMisconfigured pageEmpty pagesInvalid dateFile not found or orphan pagesPage format

Test when CACHE is on

Page 30: Drupal campchicago2010.rachel.datamigration

Deployment

Page 31: Drupal campchicago2010.rachel.datamigration

Deployment2 Ways to Deploy your data to live environment1. All at once2. Divide and conquer

Page 32: Drupal campchicago2010.rachel.datamigration

Deployment: Divide and Conquer

Example: CNN Division

Page 33: Drupal campchicago2010.rachel.datamigration

Deployment Mockup

* shadow box is your migrated data’s production box* old CMS is still active at this time

Page 34: Drupal campchicago2010.rachel.datamigration

Deployment• Coordination between the old CMS and

Drupal• URL Testing

Page 35: Drupal campchicago2010.rachel.datamigration

Deployment Mockup

* shadow box is your migrated data’s production box* replacing old CMS with Drupal

Page 36: Drupal campchicago2010.rachel.datamigration

DeploymentProsLess risk, less stress Editors can do continues data entry daily

ConsURL rewriting can be a trickyUpdating the production box with new

content can be an arduous task

Page 37: Drupal campchicago2010.rachel.datamigration

Deployment: Updating ProductionAutomationSVNDrush scripts to migrate contents from

tester’s box to shadow boxDeploy – http://drupal.org/project/deploy

ManualDocument configuration changesDocument database changes

Page 38: Drupal campchicago2010.rachel.datamigration

RecapSDLC + AgileCommon questions you should be asking

before you startTop 3 tools to do migration in Drupal

TW & Migrate, node_import(), drushIssues

File management Comparison in D6Tools to use in URL Rewriting

TestingDeployment Solution

Page 39: Drupal campchicago2010.rachel.datamigration

Questions?

Page 40: Drupal campchicago2010.rachel.datamigration

Resourceshttp://groups.drupal.org/content-migration-im

port-and-exporthttp://drupal.org/handbook/migrating