seize the data. 2015 - hewlett packard enterpriseh41382.€¦ · big data conference 2015 boston...

22
© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 1 SEIZE THE DATA. 2015 SEIZE THE DATA. 2015

Upload: lytuyen

Post on 09-Sep-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.1 SEIZE THE DATA. 2015

SEIZE THE DATA. 2015

BIG DATA CONFERENCE 2015

Boston August 10-13

Vertica Backup and Restore

Ramesh Narayanan, Vertica Professional Services

Aug 10, 2015

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.3 SEIZE THE DATA. 2015

Module Overview

Backup and Restore

Copy Vertica Database

Online Recovery

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.4 SEIZE THE DATA. 2015

Backup and Restore

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.5 SEIZE THE DATA. 2015

Backup - Overview

Backup is the process of copying the actual data files to a specified location

• Vertica data and backup files are written once− Once a file is written Vertica will not update it

• Number of files increase with each backup

• Tuple Mover keeps the number of files under control− The TM ‘mergeout’ process consolidates smaller ROS containers into larger ones

• To backup, copy Vertica files to stable storage− Can be direct attached storage, NFS mounts or SAN

− Those files can then be moved to tape backup or integrated with other tools

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.6 SEIZE THE DATA. 2015

Backup – When?

Backup is the process of copying the actual data files to a specified location

• Part of Regular Disaster Recovery Strategy− Nightly, weekly, depending on business continuity requirements and resources

• After loading or altering a large volume of data

• Before Maintenance Tasks− Upgrading to another version of Vertica

− Dropping a Partition

− Before and after adding, removing or replacing nodes

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.7 SEIZE THE DATA. 2015

Backup and Restore – Options

There are several ways to take a Vertica Backup

• Backup and Restore by Database− Most common backup process

− Backs up the entire database which includes all the schemas and objects within them

• Backup and Restore by Schema− Multi-tenant database with different backup frequency

− Multi-application cluster with different backup requirements /policies

• Backup and Restore by Table− Can be used to backup some critical tables

− Restore certain tables for QA / Testing

− Backup frequency depends on the criticality / tolerance of data loss / recovery

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.8 SEIZE THE DATA. 2015

Vertica Backup Restore – VBR

vbr.py is a Python script located under /opt/vertica/bin

• Use vbr.py with various options to take backup and restore data

• Create a configuration file− vbr.py --setupconfig

− Goes into interactive mode, gathers all parameters and creates the configuration file

• VBR parameters− Database name, schema name, snapshot name, object names

− Restore points, backup location, node names, temporary directories etc.

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.9 SEIZE THE DATA. 2015

vbr.py –setupconfig options

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.10 SEIZE THE DATA. 2015

vbrtest.ini

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.11 SEIZE THE DATA. 2015

Vertica Backup Restore – VBR

A few parameters explained

• Snapshot Name – stores all the files under that named directory

• Restore Points – number of incremental backups stored in addition to full backup

• Node− Names of nodes in the cluster

− Data is backed up from each node of the cluster

• Backup Directory− Location where the backup files are stored

− If it is NFS mount, a separate directory for each node gets created under the backup directory

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.12 SEIZE THE DATA. 2015

VBR preparation

Steps and some prerequisites

• Backup location to be configured on all the nodes

• Verify database is running

• Ensure backup hosts are running if data is backed up to those hosts− Backup can be done to the same cluster nodes

− Backup can also be done to a dedicated host which has the SAN storage

• Backup Directory Permissions / Contents− Ensure that the user who starts the backup process has write permissions

− Backup directory contains sub-directories for each node (if NFS location)

− Under the Backup directory VBR creates the sub-directory for each snapshot

− The full backup and each incremental backups are stored in separate directories

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.13 SEIZE THE DATA. 2015

Performing a Backup

How to run the vbr.py script

• vbr.py --task backup --config-file <myconfigfile>− Same command is used for full and incremental backups

• First run does a full backup− All data files are copied to the sub-directory with the snapshot name

• Subsequent runs are incremental− Copies files which have changed since last backup

− Files are only added or deleted, never modified

− Each incremental backup goes into a separate sub-directory with a timestamp

− Each incremental backup also adds those files to the full backup

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.14 SEIZE THE DATA. 2015

VBR Process Infographics

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.15 SEIZE THE DATA. 2015

Performing a Restore

The same vbr.py script is used for restore

• vbr.py --task restore --config-file <myconfigfile>− The configuration file is the same that is used for the Backup

• Restore can be specific − Entire database, specific schema or table depending on the configuration file used

− Vertica copies the files from backup location to the data directory location

• Some key features − Vertica does not have the concept of transaction logging

− There is no roll forward or roll back of transactions

− Objects can be restored to the timestamp of the last snapshot

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.16 SEIZE THE DATA. 2015

Copy Vertica Database

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.17 SEIZE THE DATA. 2015

Copy Vertica Database

This option of VBR copies the entire Database (cluster) to a target cluster

• When do we need copycluster?− Maintain a warm-standby cluster for Disaster Recovery

− Provide an alternative cluster to a different set of users / applications

• Prerequisites− Source and Target cluster must have same number of nodes

− Database, node names and dbadmin user have to be the same on both sides

− Password-less ssh has to be established between all the nodes on both sides

− Target database has to be shut down before starting the process

• vbr.py --task copycluster --config-file <cfgfile>− The task runs as one continuous transaction

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.18 SEIZE THE DATA. 2015

Online Recovery

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.19 SEIZE THE DATA. 2015

Node Recovery

Vertica is highly available MPP architecture, but nodes may go down…

• Node can recover from failure− A node can rebuild its data set from other nodes in the cluster if the cluster is K-safe

− In a full recovery the node rebuilds from scratch

• Incremental Recovery− Node rebuilds from the current persisted state

− To speed up a full recovery, use a prior backup for the given node and perform incremental recovery

• RAID 10 is best practice− RAID arrays (5,6,10) can be rebuilt without impact to other cluster nodes

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.20 SEIZE THE DATA. 2015

Monitor Recovery

• Monitor disk space− df –h

− SELECT * FROM v_monitor.disk_storage;

• Monitor Recovery− tail vertica.log

− SELECT * FROM v_monitor.recovery_status;

SEIZE THE DATA. 2015QUESTIONS?Please attend our Q&A with HP Big Data experts today

Marina Ballroom, Lobby level

10:15 am • 10:30 am

12:00 pm • 1:00 pm

2:30 pm • 3:00 pm

4:30 pm • 5:00 pm

SEIZE THE DATA. 2015