oracle backup's on pure storage

30
Efficient Oracle Backups with Pure Storage FlashArray Danny Higgins, Sales Engineer March 2015

Upload: sivarammutyam

Post on 08-Nov-2015

56 views

Category:

Documents


15 download

DESCRIPTION

Oracle Backup's on Pure Storage

TRANSCRIPT

  • Efficient Oracle Backups with Pure Storage FlashArray

    Danny Higgins, Sales Engineer March 2015

  • Pure Storage 2015 | 2

    Table of Contents

    Executive Summary ................................................................................. 3

    Audience ................................................................................................... 3

    Oracle Backup Strategies ....................................................................... 3

    Traditional Backup Methods ................................................................... 4 RMAN Full Compressed Backups ................................................................................... 4 RMAN Incrementally Updating Backups ....................................................................... 5 RMAN Incrementally Updating Backups on the FlashArray ..................................... 7 FlashRecover Snapshot Database Backups ................................................................. 8

    Conclusion ............................................................................................... 10

    Appendix .................................................................................................. 11 Appendix 1 - Incrementally Updating Backups on the FlashArray ............................ 11 Appendix 2 - Configuring The FlashProtect Backup Strategy ................................. 15

  • Pure Storage 2015 | 3

    Executive Summary

    Protecting your organizations biggest asset (its data) is the most important responsibility of any IT department and in most cases its the Database Administrators who are responsible for ensuring this data is readily available and well protected. There are various backup and recovery tools and different backup strategies that DBAs can use to ensure databases are protected against hardware failures, natural disasters, human errors and other unexpected causes of data corruption. The common theme between these different methods, is that the larger the database, the more difficult and expensive it becomes to ensure its properly backed up. This paper demonstrates how features built into the Pure Storage FlashArray can be used to drive efficiency by significantly reducing the amount of time and cost associated with backing up and protecting Oracle Databases. Although this document focuses on Oracle many of the same principles can easily be adopted by other database flavors.

    Audience

    The target audiences for this document are Oracle DBAs, storage administrators and IT infrastructure professionals concerned with the performance and economics of providing data protection for large Oracle Database implementations. Oracle Backup Strategies

    Providing host level database backups for very large databases can represent a significant cost in terms of storage capacity and also in the negative effect they can have on database performance. As databases grow ever larger, the backup windows, capacity requirements and performance implications increase. Selecting the right backup strategy and technology is essential to reduce these prohibitive costs and ensure a scalable, robust and performant solution to protect your organizations data. The table below shows how the Pure Storage FlashArray can be used to meet these needs by comparing key metrics when running different backup strategies on the Flash Array compared to legacy disk based arrays.

    Backup Type Storage DB Size DB Space

    Used Backup Space

    Total Space Used

    Backup Duration

    Host CPU Consumption

    RMAN Full Compressed

    Traditional Disk 1000GB 1000GB 500GB 1500GB 8hrs HIGH

    RMAN Incrementally

    Updating

    Traditional Disk 1000GB 1000GB 2000GB 3000GB 30mins LOW

    RMAN Incrementally

    Updating Pure

    FlashArray 1000GB 333GB 250GB 588GB 5mins MINIMAL

    FlashRecover Snapshot

    Pure FlashArray

    1000GB 333GB 150GB 488GB 0mins ZERO

    Table 1 - Comparison of backup space, duration and CPU requirements for various backups strategies

  • Pure Storage 2015 | 4

    Traditional Backup Methods

    Backup and Recovery are key responsibilities for the Database Administrators. Losing your companies data is simply not an option. The loss of key customer or transactional data can have serious revenue implications and adversely affect a companys reputation with its customers.

    Maintaining readily available backups of large data volumes is expensive and choosing the most appropriate backup strategy has traditionally been a balancing act between cost, efficiency and availability.

    The most widely used backup tool for Oracle databases is Oracles own Recovery Manager (RMAN) that has shipped with the database software since Oracle version 9i. This has been widely adopted by DBAs as it automates many of the manual tasks that were previously required using a simple command set and also maintains an RMAN Catalog (RCAT) containing a history of what types of backup have been taken and where they are located. This RCAT metadata simplifies the recovery process, as when RMAN is used DBAs no longer have to work out which combination of datafiles, control files and archived redo logs are required to restore and recover a database. Instead a DBA can issue simple commands to instruct RMAN to recover a database (or subset of a database) to a specific point in time and the RMAN engine will work out what recovery files are required and complete the restore and recovery process on the DBAs behalf.

    There are other methods of backing up Oracle databases that aim to leverage 3rd party storage features such as snapshots and split mirrors. Whilst these have been effective at reducing the amount of time it takes to backup a large database or in offloading the backup IO workload away from the database server they are generally complicated, require co-ordination between DBAs, Storage Admin and Server Admins as well as requiring additional disk capacity and therefore cost.

    To understand how the Pure Storage FlashArray can significantly reduce the cost and complexity of your backup strategy well take a look at the most common strategies being used today for Oracle Backups.

    In these scenarios well assume best practices are being followed and that the database is configured with a Fast Recovery Area (FRA) and that Automatic Storage Management (ASM) is being used to provide disk groups for +DATA and +FRA. The same principles apply even when ASM is not in use and database files are stored on OS based filesystems. RMAN Full Compressed Backups RMAN> BACKUP AS COMPRESSED BACKUPSET DATABASE PLUS ARCHIVELOG; This is a common backup strategy used for small to medium sized databases. Its been considered effective in its storage capacity requirements as the database is read from the +DATA area and the backup pieces are compressed on-the-fly before they are written down to the +FRA area. The compression typically achieves 5x reduction so if the database is 500GB then the backup would compress down to approximately 100GB.

    For this backup strategy we need space to keep a minimum of 2 backups in the FRA as we need to complete todays backup before we can safely remove yesterdays backup. So in the example here we

  • Pure Storage 2015 | 5

    need at least 200GB for the 2 * 100GB backups. We also need to reserve room for the archived redo logs (lets say 50GB in this example, although it varies depending on database changes) so we have total space requirement of 250GB for our +FRA area and an +FRA:+DATA ratio of 0.5:1

    Figure 1 - RMAN compressed backup. In this example filesystems such as /u01/oracle/data and /u01/oracle/fra could be

    substituted if ASM was not in use

    This backup strategy can work well for small databases but it does not scale and as the database grows this strategy will begin to cause problems that have a negative impact on database performance. The RMAN backup job runs on the database host and the compression consumes a large amount of CPU resource, which reduces the amount of CPU available for the database to service application and end user operations. The compression also slows down the backup process meaning that a non-compressed 500GB backup that would normally run in 1 hour takes 4 times longer and runs for 4 hours before it completes. So in this example we have a 4hr period where the DB host is running a heavy CPU workload to compress the backup and is impacting general database performance. Its possible to speed the backup job up by running it in multiple parallel streams, but this further exacerbates the CPU problem, as now we have multiple RMAN jobs all running heavy CPU intensive compression routines.

    Eventually this backup strategy becomes unsustainable, as the database grows to multi terabyte volumes of data, the RMAN compressed backup duration can increase to 20 hours or more and we get to the point where we can no longer schedule it on a daily basis. Even worse, the backup is running nearly all the time throughout the business day, hitting the CPUSs hard and impacting database performance.

    RMAN Incrementally Updating Backups RMAN> RUN { RECOVER COPY OF DATABASE WITH TAG 'my_database' UNTIL TIME 'SYSDATE -7'; BACKUP INCREMENTAL LEVEL 1 FOR RECOVER OF COPY WITH TAG 'my_database' DATABASE; } This backup strategy was introduced in Oracle 10g to allow faster daily backups that consume less CPU resource. It is the default backup strategy for backups scheduled by OEM. It works by maintaining

  • Pure Storage 2015 | 6

    a copy of the source database (datafile copy) in the +FRA area. Making this copy is a one-time operation and all subsequent backups are incremental backups (only the data blocks that have been modified are backed up

    Figure 2 - RMAN incrementally updating backup. In this example the same /u01/oracle/data and /u01/oracle/fra filesystems could be substituted if ASM was not in use

    each day). 7 days worth of rolling incremental backups are kept in the +FRA and the previous weeks incremental is merged into the datafile copy at the end of the backup job. This means there is a copy of the database as it was at t 7 days available in the +FRA. This can be used to recover the source database to any point in time within the last 7 days by restoring the datafiles and applying the relevant incremental backups and rolling forward through the redo logs.

    Note: compression is not supported in this strategy, as this would prevent incrementals being merged into the datafile copy.

    A block change-tracking (BCT) file is used to keep track of any changed block in the source database. This makes the incremental backups efficient and quick because the entire database does not need to be scanned to determine which blocks need to be saved in the incremental backup.

    As were only backing up a small subset of the overall data (just the changed blocks) coupled with the fact there is no compression routine then the backup durations are an order of magnitude less than for Compressed Full Backups. This shorter backup duration and lack of compression mean that there is minimal performance impact while this backup is in progress.

    So its clear that the Incrementally Updating Backup Strategy is far more efficient than the Compressed Full Backup strategy but theres a penalty to pay for this in terms of the storage capacity required in the +FRA area to run this which makes this strategy a lot more expensive than compressed backups.

    For a 1TB database we need 1TB to store the datafile copy, we also need room to store the 7 days worth of incremental backups and archived redo logs. A realistic +FRA size would be 2TB giving a +FRA:+DATA ratio of 2:1. So while we have scalable backups thats dont take too long to run and dont degrade database performance with large CPU overheads we need 4x the storage capacity to run this strategy. Storage is not cheap at these multi-terabyte volumes so even if we use tier 2 disks for our +FRA disk group this becomes an expensive option.

    Fortunately, running the RMAN Incrementally Updating Backups on the Pure Storage FlashArray is not expensive at all, in fact its much cheaper than running the Full Compressed backup even on Tier 2 disk. Many IT professionals may have the following mindset Storing database backups on Flash? Isnt it normal practice to store backups on cheap SATA disk, this sounds crazy, flash is too expensive for backups. Well lets take a look at how this works cost effectively on the FlashArray.

  • Pure Storage 2015 | 7

    RMAN Incrementally Updating Backups on the FlashArray

    The good news is that this is the exact same backup strategy outlined in the previous section. We do not need to make any changes to the backup job and we get all the benefits of minimal CPU use and short backup windows. In fact the backup windows are even shorter as the large amount of IO that takes place to read from the source database and write the backup happens many times faster on the FlashArray. The same Incrementally Updating backup that took 30 mins on traditional disk storage completes in 5 minutes on the Flash Array.

    The only change we need to make is to store the +DATA and +FRA ASM disk groups on the same Pure Storage FlashArray, this can be migrated non-disruptively while the database is in use with the ASM rebalancing feature.

    In the diagram below the blue boxes show what the database host and DBAs see, this is exactly the same as before but the orange boxes show what is stored on the FlashArray.

    Figure 3 - RMAN incrementally updating backup on the Pure Storage Array. In this example filesystems such as /u01/oracle/data and /u01/oracle/fra could be substituted if ASM was not in use

    The first thing to notice is that thanks to the Flash Reduce pattern removal, de-duplication and compression the source database is much smaller. Databases usually achieve 2-4x data reduction on the FlashArray and 3x reduction is shown here. So we have already made big capacity savings on the +DATA disk group so lets look at what happens to the +FRA backup area. The datafile copy is essentially an exact copy of the source database, this means its almost all duplicate blocks and these get removed by the Flash Reduce de-duplication and replaced with metadata pointers so we are left with very little to store in the +FRA. The datafile copy is visible to the DBA as a full size 1TB copy of the database that can be used for restores but the representation on the FlashArray is mostly metadata pointers that refer back to the original blocks of the source database. To see an example of this in action refer to appendix 1.

    So when we run this backup strategy on the FlashArray with a 1TB database the datafile copy uses approx. 100GB of raw space, when we add the reduced incrementals and redo logs we need approx. 250GB of raw space on the FlashArray so we achieve an +FRA:+DATA ratio of 0.25:1 which is even smaller than the RMAN compressed backups whilst providing all the efficiency savings of the

  • Pure Storage 2015 | 8

    Incrementally Updating Backup Strategy.

    DBAs are comfortable using RMAN, this strategy requires no change and delivers great results. But lets look at how we can improve things further by leveraging the FlashArray Snapshot features.

    FlashRecover Snapshot Database Backups

    pureuser@pure> purevol snap -suffix ORADB1 asm_data asm_control_redo asm_fra The FlashRecover Snapshot feature allows instantaneous snapshots of any volume that can be can run adhoc or regularly scheduled via the Purity console. These snapshots require zero CPU resource on the database hosts and initially require very minimal space on the array. As data changes the snapshot will consume space to record the delta.

    Snapshots can be used straight out of the box on any existing database to provide a crash consistent moment in time backup. This is useful as a recovery point for data upgrades, major changes or as a pre-batch marker but is not suitable as replacement for a full blown backup and recovery strategy. The reason is that its a volume based snapshot of a single point in time, this is not journaled so its not possible to roll forward to a user specified point in time in the same way as we can with an RMAN restore followed by a recovery of the redo logs. In most cases this makes the FlashRecover Snapshot feature a compliment to an RMAN backup strategy and not a replacement.

    RMAN incrementally Updating Backups are already extremely efficient on the FlashArray and recommended for most use cases especially as Oracle DBAs are well versed with RMAN. However if you need to achieve the absolute pinnacle of backup efficiency with 1-2 second backup durations and zero CPU consumption then FlashRecover Snapshots can be implemented as a replacement for RMAN with some careful planning of the database layout, simple scripting and well documented recovery plans.

    Before considering this option please be aware of the following caveats:

    The snapshot based backup strategy is done at the volume level, this means restores will affect the entire volume and for this reason this strategy is not suitable for consolidation (or Schema as a Service) type databases where multiple applications run out of a single database. For these types of consolidation databases a more granular recovery of single application schemas is required and this can only be provided by RMAN tablespace point in time recovery (TSPITR). For consolidation databases the RMAN Incrementally Updating Backup Strategy on the FlashArray is recommended.

    Some of the advanced features of RMAN such as block level recovery and block consistency checking will not be available if RMAN datafile backups are dropped in favor of snapshots although the FlashArray does have its own robust integrity and error checking algorithms.

    For databases that run a single application then the snapshot based strategy can provide the most efficient solution offering instantaneous backups with minimal capacity requirements. Care must be taken when implementing this strategy, as there are some conditions for its use.

    1) Dedicated ASM disk groups (or filesystems) per database. Many sites use a shared +DATA and +FRA diskgroup for multiple databases, the traditional reasoning behind this is that more spindles provides greater performance and that 1 shared pool of storage reduces wastage. Both of these points are irrelevant for the Flash Arrays as performance is not based on spindles and there is no wastage due to the in-built thin provisioning of storage. To allow databases to

  • Pure Storage 2015 | 9

    be restored independently then the entire ASM disk group (or filesystem) needs to be recovered from the snapshot image so each database must have its own dedicated disk group.

    2) An additional ASM disk group is required to store the database control files and online redo logs. This is a requirement for point in time recovery, as these files should not be restored from an earlier snapshot image. +REDO_CONTROL and +FRA should not be touched in restore operations, Only the +DATA diskgroup should be restored from an earlier snapshot

    3) If a Fast Recovery Area (FRA) is used for archived redo logs then some co-ordination is required to ensure the archive log deletion policy matches the snapshot retention schedule. E.g. if we wish to keep 1 weeks worth of snapshots and allow a point in time recovery then we also need to ensure 1 weeks worth of redo logs are retained. RMAN is not aware of FlashRecover Snapshots and vice versa so the DBA will need to ensure the 2 parts of the strategy are in sync with each other.

    Careful planning of the database layout is required to use the FlashRecover Snapshot Strategy and allow point in time recovery. The key to making this work is to segregate the datafiles from the metadata held in the control files and the recovery data in the online and archived redo logs

    Figure 4 - Database layout for FlashRecover Snapshot based backups. Example filesystems are /u01/oracle/data,

    /u01/oracle/control_redo and /u01/oracle/fra if ASM was not in use The ASM diskgroups shown above (or equivalent dedicated filesystems) are required for each database

    that will adopt this strategy. The underlying volume(s) for the +DATA diskgroup are configured with a daily FlashRecover Snapshot schedule where snapshots are retained for 7 days. Only the datafiles should be placed in this +DATA diskgroup, all other files such as online redo logs, control files and spfiles must be segregated into dedicated disk group +CONTROL_REDO while the +FRA is retained for archive logs (and flashback logs if required).

    In the event of any condition requiring a restore, the contents of +DATA can be restored instantaneously using the FlashRecover Snapshots. At this point the control file will recognize the datafiles are not up to date and a manual recovery can take place using the archived redo logs and online redo logs.

    Appendix 2 provides an example of the detailed steps required to implement FlashRecover Snapshot Database Backups and how to complete a recovery from this strategy.

  • Pure Storage 2015 | 10

    Conclusion

    This paper compares and contrasts various backup options available to Oracle database administrator. It demonstrates that by leveraging features built into the Pure Storage FlashArray, the customers can realize following benefits

    1. Reduced capacity requirement for database backup operations

    2. Reduced time for database backup operatons

    3. Improved CPU utilization by not leveraging it for mundane task like bakup compression.

  • Pure Storage 2015 | 11

    Appendix

    Appendix 1 Incrementally Updating Backups on the FlashArray This appendix shows what is visible to the end-user (DBA) and what is visible at the array level when the efficient Incrementally Updating backup strategy is used on the Pure Storage FlashArray.

    A RAC database has been built that contains 456GB of data in the +DBDATA ASM diskgroup

    The overall Array capacity looks like this. There are other volumes on the array as well as the database shown above but we can see the overall data reduction is 3.3:1

    456GB Database Space

  • Pure Storage 2015 | 12

    Now we drill down to see just the volume for the Oracle DATA and can that the total space required to store 456GB of data is 160GB.

    If we look at the volume for the FRA then we can see that nothing is in use yet (just 215MB of archive redo logs) as we have no backup at this stage

    Now we ll take the incrementally updatating backup. As there is no prior backup at this stage this initial backup will be a full uncompressed datafile copy of the source datafiles (456GB)

    Theres a few key things to notice here. Fistly the input size is 445GB (this is the size of the source database to be backed up, the output size is also 445GB (meaning that as far as RMAN is concerned 445GB of data was written down into the backup set). Lastly the time taken to write this was 11 minutes so the output rate was 672MB/s.

  • Pure Storage 2015 | 13

    This can be verified by looking at the ASM FRA disk group which is now showing 446 space occupied so as far as RMAN is concerned everthing looks normal and we have a full size copy of the database. Lets look at how this has affected whats stored on the Pure Storage array.

    The first thing to notice is that the overall data reduction ratio has shot up from 3.3:1 to 5.1:1 and overall used space has increased from 323GB up to only 354GB . So for the extra 446GB backup were storing on the array, weve only used an additional 31GB. The reason is that this backup is almost the same as the original datafiles (not exactyl the same as some of the the original blocks are being modified by the live applications) so it deduplicates down very well. We can look at a capacity view of the DATA and FRA volumes to see exactyl how this is represented.

    FRA 446GB Backup

  • Pure Storage 2015 | 14

    For the DATA volume the first thing we see is that the data reduction ratio has gone up from 2.6:1 to 4.8:1 yet the used space has gone down from 198GB to only 91GB. This may be confusing if you are not familiar with the way de-duplicated data is reported by the Array. The duplicate data (original datafiles and their backup copies) are now represented as shared data so they are no longer associated to a specific volume in the capacity view. If this deduplicated data was shown for both the DATA and FRA volumes (which both share it) it would be double counted so instead of doing this we report it as shared space. We can see from the overall capacity view that we now have a 159GB bucket of shared space shown in dark blue on the overall capacity reporting. If we refer back to the overall capacity view we see that shared space has grown from 53GB to 159GB.

    Another advantage of running this strategy on flash is that we can very quickly recover a lost datafile without any restore by switching to the datafile copy. As this copy is also stored on high speed flash there is no performance concerns as would be expected with backups stored on a lower performance tier of storage

    e.g.

    If we lost datafile 4 and this was a bigfile tablespace of several TB then it would take hours to restore from disk and even on flash we could be looking at restore times of 10s of minutes.

    Using this stratgey on flash we can instantly switch to the copy using the following RMAN commands and have our database up and running immediately.

    When we understand how this backup strategy interacts with the FlashArray the performance and capacity savings become obvious and its easy to see how large-scale backup cost reductions can be made as well as a massive reduction in your RTO.

    ALTER DATABASE DATAFILE 4 OFFLINE; SWITCH DATAFILE 4 TO COPY; RECOVER DATAFILE 4; ALTER DATABASE DATAFILE 4 ONLINE;

  • Pure Storage 2015 | 15

    Appendix 2 Configuring The FlashProtect Backup Strategy

    Here you will find example steps for configuring new ASM disk groups and a database to leverage the FlashProtect backup strategy along with a restore and recovery example.

    Note: If the the Oracle host is not a physical server and is a VMware guest then the LUNS should be presented as Raw Device Mappings (RDMs) or vVols when available.

    Create LUNS on the Array for each of the 3 ASM disk groups

    1) Control files and online redo logs 2) Datafiles 3) FRA for archived redo logs

    Using the the purevol list command we can see the serial numbers of these LUNS which well need later to map to the correct ASM devices.

    After the volumes have been attached to the database hosts we need to identify them by their SCSI IDs and use them to create ASM disk groups.

    pureuser@pure> purevol create size 1T Oracle_RDM_CONTROL_REDO pureuser@pure> purevol create size 1T Oracle_RDM_DATA pureuser@pure> purevol create size 1T Oracle_RDM_FRA

    pureuser@RedDot> purevol list Name Size Source Created Serial Oracle_RDM_CONTROL_REDO 1000G - 2014-12-04 16:59:27 SGT 764E5101EECBAC3C0001147D Oracle_RDM_DATA 1000G - 2014-12-04 16:59:27 SGT 764E5101EECBAC3C0001147E Oracle_RDM_FRA 1000G - 2014-12-04 16:59:27 SGT 764E5101EECBAC3C0001147F

  • Pure Storage 2015 | 16

    We can match the serial on the Pure Console with SCSI ID of the devices on the host.

    From this we determine that sdc1 is for the control_redo volume, sdd1 is for data and the sde1 is for FRA.

    Based on this we create labels for the devices using ASMlib

    And next we create ASM Disk groups using these devices

    And then we wil use these ASM diskgroups to create the database and place the datafile, redo logs, control file etc in the relevant disk group.

    Note: if using Grid Infrastructre for RAC or Oracle Restart then the +DATA diskgroup must not be used for any Grid Infrastructure related files such as the OCR. Please use the +CONTROL_REDO groups for OCR and Voting Disk files.

    When creating a database that will make use of the FlashRecover Snapshot backup strategy we must ensure that only the datafiles go into the +DATA disk group. So for the DBCA storage location we will choose to place all files in a common location in the the +CONTROL_REDO group and then we will manually move the datafiles to +DATA during the DBCA wizard.

    for d in `ls /dev/sd*1` > do > echo ${d} > /lib/udev/scsi_id --whitelisted --device=${d} > done /dev/sda1 /dev/sdb1 /dev/sdc1 3624a9370764e5101eecbac3c0001147d /dev/sdd1 3624a9370764e5101eecbac3c0001147e /dev/sde1 3624a9370764e5101eecbac3c0001147f

    oracleasm createdisk CONTROL_REDO01 /dev/sdc1 oracleasm createdisk DATA01 /dev/sdd1 oracleasm createdisk FRA01 /dev/sde1

    CREATE DISKGROUP CONTROL_REDO EXTERNAL REDUNDANCY DISK '/dev/oracleasm/disks/CONTROL_REDO01' SIZE 1023994M ATTRIBUTE 'compatible.asm'='12.1.0.0.0','au_size'='1M'; CREATE DISKGROUP DATA EXTERNAL REDUNDANCY DISK '/dev/oracleasm/disks/DATA01' SIZE 1023994M ATTRIBUTE 'compatible.asm'='12.1.0.0.0','au_size'='1M'; CREATE DISKGROUP FRA EXTERNAL REDUNDANCY DISK '/dev/oracleasm/disks/FRA01' SIZE 1023994M ATTRIBUTE 'compatible.asm'='12.1.0.0.0','au_size'='1M'; Note: If regular filesystems are used instead of ASM then

  • Pure Storage 2015 | 17

    For the recovery options we will specify a Fast Recovery Area in the +FRA diskgroup and ensure archiving is enabled.

  • Pure Storage 2015 | 18

    On the storage screen we will now select the datafiles and manually change the file directory of the datafiles to use the +DATA group instead of +CONTROL_REDO. All other files mentioned under control file and redo log groups shoud stay in the +CONTROL_REDO disk group.

    After the database has been created we set parameters to ensure that future datafiles and redo members are created in the appropriate ASM disk group.

    Now for some final checks to make sure all database files are in the correct disk groups, this is very important for this snapshot based backup strategy as we cant afford to overwite any redo or control file when we recover the datafile volume with a previous snapshot during a restore operation.

    oracle@TOSG2ORA> alter system set db_create_file_dest='+DATA' scope=both; System altered. oracle@TOSG2ORA> alter system set db_create_online_log_dest_1='+CONTROL_REDO' scope=both; System altered. oracle@TOSG2ORA> alter system set control_file_record_keep_time=30 scope=both; System altered.

  • Pure Storage 2015 | 19

    Once we have confirmed that only datafiles (and tempfiles) exist in the +DATA diskgroup then we are ready to setup the snapshot based backup strategy.

    In the Array console go to the protection tab and create a new source group by clicking on the + icon in the left hand pane.

    Name the protection group based on the database name and ASM diskgroup that we will snapshot (in this case TOSG2ORA-DATA

    oracle@TOSG2ORA> select name from v$datafile; NAME --------------------------------------------------------------------------------- +DATA/tosg2ora/system01.dbf +DATA/tosg2ora/sysaux01.dbf +DATA/tosg2ora/undotbs01.dbf +DATA/tosg2ora/users01.dbf +DATA/tosg2ora/example01.dbf oracle@TOSG2ORA> select name from v$tempfile; NAME --------------------------------------------------------------------------------- +DATA/tosg2ora/temp01.dbf oracle@TOSG2ORA> select name from v$controlfile; NAME ---------------------------------------------------------------------------------+CONTROL_REDO/tosg2ora/control01.ctl +CONTROL_REDO/tosg2ora/control02.ctl oracle@TOSG2ORA> select member from v$logfile; MEMBER --------------------------------------------------------------------------------- +CONTROL_REDO/tosg2ora/redo03.log +CONTROL_REDO/tosg2ora/redo02.log +CONTROL_REDO/tosg2ora/redo01.log oracle@TOSG2ORA> show parameter spfile; NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ spfile string +CONTROL_REDO/tosg2ora/spfiletosg2ora.ora

  • Pure Storage 2015 | 20

    Now select the TOSG2OAR-DATA source group and click the members link so that we can add the volume(s) to the group. To do this use the wheel icon and select add volumes

    Now click on the relevent volumes (Oracle_RDM_DATA in this case) to select it and then click on confirm.

    Note: ASM supports maximum LUN sizes of 2TB so there could be multiple volumes for the ASM +DATA disk group depending on its size.

    Now that we have members for this source group we can click on the Schedule link to setup the snapshot schedule.

    Use the edit button at the bottom of the screen if you wish to change the default hourly schedule.

    The scheuler is very flexible and in this example we will take a snapshot every hour that will be reatained for 1 day and after this we will keep one of each days hourly snapshots for a further 7 days.

    This means it will be possible to recover the database to any point in the last seven days by recovering the target dates snapshot and then rolling forward upto a maximum of 24hrs of redo. We can recover to any point in the last 24 hours by only having to roll forward through a maximum of 1 hrs redo so this would provide a very quick RTO.

    The schedule should be customised to suit your own individual RPO and RTO requirements and by understanding that the snapshots happen instantly and consume zero space (only the changes consume space) we can provide a very granular snapshot frequency with much longer recovery windows than would normally be possible with traditional disk based RMAN backups strategies.

  • Pure Storage 2015 | 21

    Now that the snapshot schedule has been put in place for the +DATA volume we need to configure RMAN.

    There will be no RMAN database backups in this strategy (as we use the Pure FlashRecover Snapshots instead) but we need to ensure the archived redo logs are retained for at least 7 days to match the snapshot retention schedule.

    The last FORMAT command ensures a backup of the control file will be maintened in $ORACLE_HOME/dbs for any structural db changes (e.g. datafile additions). These control file backups may be required under certain conditions for FlashRecover Snapshot restores.

    The datafiles are now potected with snapshots and we can roll forward if necessary with the archived redo logs that weve configured to maintain a 7 day history so now we will load some data in to the database and simulate the loss of a datafile and use the snapshots to restore the datafiles and then the archived redo logs to roll forward to the point of failure.

    To load data in this example a freeware benchmarking tool called swingbench was used to quickly generate a sales history schema and several GB of data in the +DATA disk group of the database.

    RMAN> CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 7 DAYS; RMAN> CONFIGURE ARCHIVELOG DELETION POLICY TO NONE; RMAN> CONFIGURE CONTROLFILE AUTOBACKUP ON; RMAN> CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '/u01/app/oracle/product/db/11gR2/dbs/cf_bkup_TOSG2ORA.%F';

  • Pure Storage 2015 | 22

  • Pure Storage 2015 | 23

    After the swingbench load completes we have 1.6GB of data in the SALES schema

    Looking at the archivelog history we can see the loading of this schema took place between to 17:27 and 17:29 and generated 29 * 50MB redo logs.

    oracle@TOSG2ORA> select sequence#, first_time from v$log_history; SEQUENCE# FIRST_TIME ---------- --------------------

    1 18-DEC-14 16:46:01 2 18-DEC-14 17:27:48 3 18-DEC-14 17:27:51 4 18-DEC-14 17:27:54 5 18-DEC-14 17:27:57 6 18-DEC-14 17:28:00 7 18-DEC-14 17:28:03 8 18-DEC-14 17:28:06 9 18-DEC-14 17:28:09

    22 18-DEC-14 17:28:49 23 18-DEC-14 17:28:52 24 18-DEC-14 17:28:55 25 18-DEC-14 17:28:58 26 18-DEC-14 17:29:01 27 18-DEC-14 17:29:04 28 18-DEC-14 17:29:07 29 18-DEC-14 17:29:11 30 18-DEC-14 17:29:14 31 18-DEC-14 21:00:38

    oracle@TOSG2ORA> select sum(bytes/1024/1024) MB from dba_segments where owner='SALES'; MB ---------- 1668.4375 oracle@TOSG2ORA> select TABLESPACE_NAME, BYTES/1024/1024 MB, FILE_NAME from DBA_DATA_FILES; TABLESPACE_NAME MB FILE_NAME ------------------------------ ---------- ----------------------------------------------USERS 5 +DATA/tosg2ora/users01.dbf UNDOTBS1 625 +DATA/tosg2ora/undotbs01.dbf SYSAUX 580 +DATA/tosg2ora/sysaux01.dbf SYSTEM 760 +DATA/tosg2ora/system01.dbf EXAMPLE 346.25 +DATA/tosg2ora/example01.dbf SH 17472 +DATA/tosg2ora/datafile/sh.262.866478419 TEST 100 +DATA/tosg2ora/datafile/test.263.866649197 TEST2 100 +DATA/tosg2ora/datafile/test2.264.866652947 SALES 1800 +DATA/tosg2ora/datafile/sales.265.866654387

  • Pure Storage 2015 | 24

    We will now simulate the loss of the SALES tablespace holding the data we just loaded by deleting (or moving) the datafile +DATA/tosg2ora/datafile/sales.265.866654387 and then attempting to restart the database

    So now the database will not start as the datafile for the sales history is missing.

    To restore this we will unmount the the ASM +DATA disk group and use the snapshot from before the data load to restore it and then use the archived redo logs in the +FRA diskgroup to roll-forward and replay the data load upto the point of failure.

    We know the data load started at 17:27 so we will use the snapshot imediately before this which we can see from the list of snapshots above is TOSG2ORA-DATA.51.Oracle_RDM_DATA.

    Note: We could easily recover to the latest snapshot (which includes all the loaded data) but the purpose of this demonstration is to show how we can use a combination of snapshots and archived redo logs to recover to any point in time and provide a fully fledged backup and recovery solution.

    RMAN> copy datafile '+DATA/tosg2ora/datafile/sales.265.866654387' to '+FRA'; srvctl stop database -d TOSG2ORA -o immediate; SQL> alter diskgroup DATA DROP FILE '+DATA/tosg2ora/datafile/sales.265.866654387'; Diskgroup altered. srvctl start database -d TOSG2ORA PRCR-1079 : Failed to start resource ora.tosg2ora.db CRS-5017: The resource action "ora.tosg2ora.db start" encountered the following error: ORA-01157: cannot identify/lock data file 9 - see DBWR trace file ORA-01110: data file 9: '+DATA/tosg2ora/datafile/sales.265.866654387' . For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/ora3/crs/trace/ohasd_oraagent_oracle.trc". CRS-2674: Start of 'ora.tosg2ora.db' on 'ora3' failed

  • Pure Storage 2015 | 25

    First we must stop the +DATA diskgroup

    now we login to the array as pureser and overwrite the original Oracle_RDM_Data volume with the snapshot from before the data load.

    (This must be done from the command line, Its not possible to ovewrite a live volume from the GUI, this is by design to prevent accidents)

    Now we get a different error as the control file (stored on +CONTROL_REDO) is ahead of the datafiles in +DATA which have been recoverd to an earlier point in time from the snapshot.

    This is telling us that the datafiles need to be recovered by rolling forward through the archived redo logs.

    RMAN> startup mount Oracle instance started database mounted Total System Global Area 1286066176 bytes Fixed Size 2252904 bytes Variable Size 402657176 bytes Database Buffers 872415232 bytes Redo Buffers 8740864 bytes RMAN> recover database; Starting recover at 19 Dec 2014 12:33:00

    pureuser@RedDot> purevol copy --overwrite TOSG2ORA-DATA.51.Oracle_RDM_DATA Oracle_RDM_DATA Name Size Source Created Serial Oracle_RDM_DATA 1000G Oracle_RDM_DATA 2014-12-18 17:21:15 SGT 764E5101EECBAC3C0001147E

    oracle@ora3:/home/oracle [+ASM] > srvctl start diskgroup -g DATA Note: for non ASM we would re-mount the filesystem e.g. mount /u01/oracle/data oracle@ora3:/home/oracle [TOSG2ORA] > srvctl start database -d TOSG2ORA PRCR-1079 : Failed to start resource ora.tosg2ora.db CRS-5017: The resource action "ora.tosg2ora.db start" encountered the following error: ORA-01113: file 1 needs media recovery ORA-01110: data file 1: '+DATA/tosg2ora/system01.dbf' . For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/ora3/crs/trace/ohasd_oraagent_oracle.trc". CRS-2674: Start of 'ora.tosg2ora.db' on 'ora3' failed

    srvctl stop diskgroup -g DATA f Note: for non ASM we would unmount the filesystem as root e.g. umount /u01/oracle/data Note: for non ASM we would unmount the filesystem e.g. umount /u01/oracle/data

  • Pure Storage 2015 | 26

    allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=136 device type=DISK starting media recovery archived log for thread 1 with sequence 1 is already on disk as file +FRA/tosg2ora/archivelog/2014_12_18/thread_1_s eq_1.564.866654869 archived log for thread 1 with sequence 2 is already on disk as file +FRA/tosg2ora/archivelog/2014_12_18/thread_1_s eq_2.565.866654871 archived log for thread 1 with sequence 3 is already on disk as file +FRA/tosg2ora/archivelog/2014_12_18/thread_1_s eq_3.566.866654875 archived log for thread 1 with sequence 4 is already on disk as file +FRA/tosg2ora/archivelog/2014_12_18/thread_1_s eq_4.567.866654877 archived log for thread 1 with sequence 5 is already on disk as file +FRA/tosg2ora/archivelog/2014_12_18/thread_1_s eq_5.568.866654881 archived log for thread 1 with sequence 6 is already on disk as file +FRA/tosg2ora/archivelog/2014_12_18/thread_1_s eq_6.569.866654883 archived log for thread 1 with sequence 7 is already on disk as file +FRA/tosg2ora/archivelog/2014_12_18/thread_1_ eq_7.570.866654887 archived log for thread 1 with sequence 8 is already on disk as file +FRA/tosg2ora/archivelog/2014_12_18/thread_1_ eq_8.571.866654889 archived log for thread 1 with sequence 9 is already on disk as file +FRA/tosg2ora/archivelog/2014_12_18/thread_1_ eq_9.572.866654893 archived log for thread 1 with sequence 10 is already on disk as file +FRA/tosg2ora/archivelog/2014_12_18/thread_1 seq_10.573.866654895 archived log for thread 1 with sequence 11 is already on disk as file +FRA/tosg2ora/archivelog/2014_12_18/thread_1 seq_11.574.866654899 archived log for thread 1 with sequence 12 is already on disk as file +FRA/tosg2ora/archivelog/2014_12_18/thread_1 seq_12.575.866654903 archived log for thread 1 with sequence 13 is already on disk as file +FRA/tosg2ora/archivelog/2014_12_18/thread_1 seq_13.576.866654905 archived log for thread 1 with sequence 14 is already on disk as file +FRA/tosg2ora/archivelog/2014_12_18/thread_1 seq_14.577.866654909 archived log for thread 1 with sequence 15 is already on disk as file +FRA/tosg2ora/archivelog/2014_12_18/thread_1 seq_15.578.866654911 archived log for thread 1 with sequence 16 is already on disk as file +FRA/tosg2ora/archivelog/2014_12_18/thread_1 seq_16.579.866654915 archived log for thread 1 with sequence 17 is already on disk as file +FRA/tosg2ora/archivelog/2014_12_18/thread_1 seq_17.580.866654917 archived log for thread 1 with sequence 18 is already on disk as file +FRA/tosg2ora/archivelog/2014_12_18/thread_1 seq_18.581.866654921 archived log for thread 1 with sequence 19 is already on disk as file +FRA/tosg2ora/archivelog/2014_12_18/thread_1 seq_19.582.866654923 archived log for thread 1 with sequence 20 is already on disk as file +FRA/tosg2ora/archivelog/2014_12_18/thread_1 seq_20.583.866654927 archived log for thread 1 with sequence 21 is already on disk as file +FRA/tosg2ora/archivelog/2014_12_18/thread_1 seq_21.584.866654929 archived log for thread 1 with sequence 22 is already on disk as file +FRA/tosg2ora/archivelog/2014_12_18/thread_1 seq_22.585.866654933

  • Pure Storage 2015 | 27

    archived log for thread 1 with sequence 23 is already on disk as file +FRA/tosg2ora/archivelog/2014_12_18/thread_1 seq_23.586.866654935 archived log for thread 1 with sequence 24 is already on disk as file +FRA/tosg2ora/archivelog/2014_12_18/thread_1 seq_24.587.866654939 archived log for thread 1 with sequence 25 is already on disk as file +FRA/tosg2ora/archivelog/2014_12_18/thread_1 seq_25.588.866654941 archived log for thread 1 with sequence 26 is already on disk as file +FRA/tosg2ora/archivelog/2014_12_18/thread_1 seq_26.589.866654945 archived log for thread 1 with sequence 27 is already on disk as file +FRA/tosg2ora/archivelog/2014_12_18/thread_1 seq_27.590.866654947 archived log for thread 1 with sequence 28 is already on disk as file +FRA/tosg2ora/archivelog/2014_12_18/thread_1 seq_28.591.866654951 archived log for thread 1 with sequence 29 is already on disk as file +FRA/tosg2ora/archivelog/2014_12_18/thread_1 seq_29.592.866654955 archived log for thread 1 with sequence 30 is already on disk as file +FRA/tosg2ora/archivelog/2014_12_18/thread_1 seq_30.593.866667639 archived log for thread 1 with sequence 31 is already on disk as file +FRA/tosg2ora/archivelog/2014_12_18/thread_1 seq_31.594.866674001 archived log file name=+FRA/tosg2ora/archivelog/2014_12_18/thread_1_seq_1.564.866654869 thread=1 sequence=1 archived log file name=+FRA/tosg2ora/archivelog/2014_12_18/thread_1_seq_2.565.866654871 thread=1 sequence=2 archived log file name=+FRA/tosg2ora/archivelog/2014_12_18/thread_1_seq_3.566.866654875 thread=1 sequence=3 archived log file name=+FRA/tosg2ora/archivelog/2014_12_18/thread_1_seq_4.567.866654877 thread=1 sequence=4 archived log file name=+FRA/tosg2ora/archivelog/2014_12_18/thread_1_seq_5.568.866654881 thread=1 sequence=5 archived log file name=+FRA/tosg2ora/archivelog/2014_12_18/thread_1_seq_6.569.866654883 thread=1 sequence=6 archived log file name=+FRA/tosg2ora/archivelog/2014_12_18/thread_1_seq_7.570.866654887 thread=1 sequence=7 archived log file name=+FRA/tosg2ora/archivelog/2014_12_18/thread_1_seq_8.571.866654889 thread=1 sequence=8 archived log file name=+FRA/tosg2ora/archivelog/2014_12_18/thread_1_seq_9.572.866654893 thread=1 sequence=9 archived log file name=+FRA/tosg2ora/archivelog/2014_12_18/thread_1_seq_10.573.866654895 thread=1 sequence=10 archived log file name=+FRA/tosg2ora/archivelog/2014_12_18/thread_1_seq_11.574.866654899 thread=1 sequence=11 archived log file name=+FRA/tosg2ora/archivelog/2014_12_18/thread_1_seq_12.575.866654903 thread=1 sequence=12 archived log file name=+FRA/tosg2ora/archivelog/2014_12_18/thread_1_seq_13.576.866654905 thread=1 sequence=13 archived log file name=+FRA/tosg2ora/archivelog/2014_12_18/thread_1_seq_14.577.866654909 thread=1 sequence=14 archived log file name=+FRA/tosg2ora/archivelog/2014_12_18/thread_1_seq_15.578.866654911 thread=1 sequence=15

  • Pure Storage 2015 | 28

    archived log file name=+FRA/tosg2ora/archivelog/2014_12_18/thread_1_seq_16.579.866654915 thread=1 sequence=16 archived log file name=+FRA/tosg2ora/archivelog/2014_12_18/thread_1_seq_17.580.866654917 thread=1 sequence=17 archived log file name=+FRA/tosg2ora/archivelog/2014_12_18/thread_1_seq_18.581.866654921 thread=1 sequence=18 archived log file name=+FRA/tosg2ora/archivelog/2014_12_18/thread_1_seq_19.582.866654923 thread=1 sequence=19 archived log file name=+FRA/tosg2ora/archivelog/2014_12_18/thread_1_seq_20.583.866654927 thread=1 sequence=20 archived log file name=+FRA/tosg2ora/archivelog/2014_12_18/thread_1_seq_21.584.866654929 thread=1 sequence=21 archived log file name=+FRA/tosg2ora/archivelog/2014_12_18/thread_1_seq_22.585.866654933 thread=1 sequence=22 archived log file name=+FRA/tosg2ora/archivelog/2014_12_18/thread_1_seq_23.586.866654935 thread=1 sequence=23 archived log file name=+FRA/tosg2ora/archivelog/2014_12_18/thread_1_seq_24.587.866654939 thread=1 sequence=24 archived log file name=+FRA/tosg2ora/archivelog/2014_12_18/thread_1_seq_25.588.866654941 thread=1 sequence=25 archived log file name=+FRA/tosg2ora/archivelog/2014_12_18/thread_1_seq_26.589.866654945 thread=1 sequence=26 archived log file name=+FRA/tosg2ora/archivelog/2014_12_18/thread_1_seq_27.590.866654947 thread=1 sequence=27 archived log file name=+FRA/tosg2ora/archivelog/2014_12_18/thread_1_seq_28.591.866654951 thread=1 sequence=28 archived log file name=+FRA/tosg2ora/archivelog/2014_12_18/thread_1_seq_29.592.866654955 thread=1 sequence=29 media recovery complete, elapsed time: 00:00:15 Finished recover at 19 Dec 2014 12:33:16 RMAN> alter database open; database opened

    So here we have restored to our earlier snapshot in less than 1 second and then recovered the database to replay the SALES schema recreation and the 1.6GB data load in 15 seconds.

    16 seconds of actual processing plus a minute or two to type in the commands and we are up and running again. Attempting this type of restore using a traditional RMAN restore operation on traditional disk would take many times longer and there would be at least 30-mins to 1 hr of downtime.

    So although there may be additional integration work, if your organistaion needs the fastest backkup and recovery times possible then the FlashRecover Snapshot feature offers a viable and low cost solution.

    WARNING: If new datafiles have been added since the snapshot then recovery will fail as the current controlfile has a record of the new datafile but it will be missing from the Disk Group. In this case we

  • Pure Storage 2015 | 29

    must either use a later snapshot that already has the new datafile or use a backup controlfile from the ones saved in $ORACLE_HOME/dbs and then follow the steps in Recovering Through an Added Datafile: Scenario

  • Pure Storage 2015 | 30

    Pure Storage, Inc. Twitter: @purestorage

    650 Castro Street, Suite #260

    Mountain View, CA 94041

    T: 650-290-6088 F: 650-625-9667

    Sales: [email protected]

    Support: [email protected] Media: [email protected]

    General: [email protected]