compressing very large data sets in oracle

Download Compressing Very Large Data  Sets in Oracle

Post on 18-Mar-2016

41 views

Category:

Documents

7 download

Embed Size (px)

DESCRIPTION

Compressing Very Large Data Sets in Oracle. Luca Canali, CERN UKOUG Conference Birmingham, December 1 st , 2009. Outline. Physics databases at CERN and compression Use cases for compression and archive for Physics DBs @ CERN Current experience with10g Advanced compression tests with 11g - PowerPoint PPT Presentation

TRANSCRIPT

  • Compressing Very Large Data Sets in OracleLuca Canali, CERNUKOUG ConferenceBirmingham, December 1st, 2009

    Compressing Very Large Data Sets in Oracle, Luca Canali

  • OutlinePhysics databases at CERN and compressionUse cases for compression and archive for Physics DBs @ CERN Current experience with10gAdvanced compression tests with 11gOracle and compression internalsA closer look on how compression is implemented in OracleBASIC, OLTP, Columnar compressionAppendix: how to play with hybrid columnar without exadata*Compressing Very Large Data Sets in Oracle, Luca Canali

    Compressing Very Large Data Sets in Oracle, Luca Canali

  • CERN and LHCCompressing Very Large Data Sets in Oracle, Luca Canali*

    Compressing Very Large Data Sets in Oracle, Luca Canali

  • LHC data LHC data correspond to about 20 million CDs each year!Concorde(15 Km)Balloon(30 Km)CD stack with1 year LHC data!(~ 20 Km)Mt. Blanc(4.8 Km)RDBMS play a key role for the analysis of LHC data*Compressing Very Large Data Sets in Oracle, Luca Canali

    Compressing Very Large Data Sets in Oracle, Luca Canali

  • Why Compression? The high level view:Databases are growing fast, beyond TB scale CPUs are becoming fasterThere is a opportunity to reduce storage cost by using compression techniquesGaining in performance while doing that tooOracle provides compression in the RDBMS

    *Compressing Very Large Data Sets in Oracle, Luca Canali

    Compressing Very Large Data Sets in Oracle, Luca Canali

  • Compressing Very Large Data Sets in Oracle, Luca CanaliMaking it Work in Real WorldEvaluate gains case by caseNot all applications can profitNot all data models can allow for itCompression can give significant gains for some applicationsIn some other cases applications can be modified to take advantage of compressionComment:Our experience of deploying partitioning goes on the same track Implementation involves developers and DBAs

    *

    Compressing Very Large Data Sets in Oracle, Luca Canali

  • Compressing Very Large Data Sets in Oracle, Luca CanaliEvaluating Compression BenefitsCompressing segments in OracleSave disk spaceCan save cost in HWBeware that capacity in often not as important as number of disks, which determine max IOPSCompressed segments need less blocks soLess physical IO required for full scanLess logical IO / space occupied in buffer cacheBeware compressed segments will make you consume more CPU

    *

    Compressing Very Large Data Sets in Oracle, Luca Canali

  • Compressing Very Large Data Sets in Oracle, Luca CanaliCompression and ExpectationsA 10TB DB can be shrunk to 1TB of storage with a 10x compression?Not really unless one can get rid of indexesData ware house like with only FULL SCAN operationsData very rarely read (data on demand, almost taken offline)Licensing costsAdvanced compression option required for anything but basic compressionExadata storage required for hybrid columnar

    *

    Compressing Very Large Data Sets in Oracle, Luca Canali

  • Compressing Very Large Data Sets in Oracle, Luca CanaliUse Case 1: Data Life CycleTransactional application with historical data Data has an active part (high DML activity)Older data is made read-only (or read-mostly)As data ages, becomes less and less used

    *

    Compressing Very Large Data Sets in Oracle, Luca Canali

  • Compressing Very Large Data Sets in Oracle, Luca CanaliExample of Archiving and Partitioning for Physics ApplicationsSeparation of active and the read-mostly partData is inserted with a timestamp (log-like) Most queries will specify a timestamp value or a time intervalRange/interval partitioning, by time comes naturalExample: Atlas PANDA, DASHBOARD Other option: application takes care of using multiple tables Example: PVSSMore flexible but need app to maintain metadataAlternative: archiving as post-processingThis steps allow to add modifications Ex: changing index structure (Atlas PANDA)

    *

    Compressing Very Large Data Sets in Oracle, Luca Canali

  • Compressing Very Large Data Sets in Oracle, Luca CanaliArchive DB and CompressionNon-active data can be compressedTo save storage spaceIn some cases speed up queries for full scansCompression can be applied as post-processingOn read-mostly data partitions (Ex: Atlas PVSS, Atlas MDT DCS, LCGs SAME)With alter table move or online redefinitionActive dataNon compressed or compressed for OLTP (11g)

    *

    Compressing Very Large Data Sets in Oracle, Luca Canali

  • Compressing Very Large Data Sets in Oracle, Luca CanaliData Archive and IndexesIndexes do not compress wellDrop indexes in archive when possibleRisk archive compression factor dominated by index segmentsImportant details when using partitioningLocal partitioned indexes preferred for ease of maintenance and performanceNote limitation: columns in unique indexes need be superset of partitioning keyMay require some index change for the archiveDisable PKs in the archive table (Ex: Atlas PANDA) Or change PK to add partitioning key

    *

    Compressing Very Large Data Sets in Oracle, Luca Canali

  • Compressing Very Large Data Sets in Oracle, Luca CanaliUse Case 2: Archive DBMove data from production to a separate archive DBCost reduction: archive DB is sized for capacity instead of IOPSMaintenance: reduces impact of production DB growthOperations: archive DB is less critical for HA than production

    *

    Compressing Very Large Data Sets in Oracle, Luca Canali

  • Compressing Very Large Data Sets in Oracle, Luca CanaliArchive DB in PracticeDetach old partitions form prod and load them on the archive DBCan use partition exchange to tableAlso transportable tablespace is a tool that can helpArchive DB post-move jobs can implement compression for archive (exadata 11gR2) Post-move jobs may be implemented to drop indexesDifficult point:One needs to move a consistent set of dataApplications need to be developed to support this moveAccess to data of archive need to be validated with application owners/developersNew releases of software need to be able to read archived data

    *

    Compressing Very Large Data Sets in Oracle, Luca Canali

  • Example of Data Movement

    *Compressing Very Large Data Sets in Oracle, Luca Canali

    Compressing Very Large Data Sets in Oracle, Luca Canali

  • Compressing Very Large Data Sets in Oracle, Luca CanaliUse Case 3: Read-Only Large Fact Table

    Data warehouse likeData is loaded once and queried many timesTable access has many full scansCompression Save spaceReduces physical IOCan be beneficial for performance

    *

    Compressing Very Large Data Sets in Oracle, Luca Canali

  • Compressing Very Large Data Sets in Oracle, Luca CanaliIs There Value in the Available Compression Technology?Measured compression factors for tables:About 3x for BASIC and OLTPIn prod at CERN, example: PVSS, LCG SAME, Atlas TAGs, Atlas MDT DCS 10-20x for hybrid columnar (archive)more details in the followingCompression can be of help for the use cases described aboveWorth investigating more the technologyCompression for archive is very promising

    *

    Compressing Very Large Data Sets in Oracle, Luca Canali

  • Compressing Very Large Data Sets in Oracle, Luca CanaliCompression for Archive Some Tests and Results

    Tests of hybrid columnar compressionExadata V1, rack Oracle 11gR2Courtesy of Oracle Remote access to a test machinein Reading

    *

    Compressing Very Large Data Sets in Oracle, Luca Canali

  • Advanced Compression TestsRepresentative subsets of data from production exported to Exadata V1 Machine:Applications: PVSS (slow control system for the detector and accelerator)GRID monitoring applicationsFile transfer applications (PANDA) Log application for ATLASExadata machine accessed remotely to Reading, UK for a 2-week testTests focused on :OLTP and Hybrid columnar compression factorsQuery speedup

    *Compressing Very Large Data Sets in Oracle, Luca Canali

    Compressing Very Large Data Sets in Oracle, Luca Canali

  • Hybrid Columnar Compression on Oracle 11gR2 and ExadataData from Svetozar Kapusta Openlab (CERN).*Compressing Very Large Data Sets in Oracle, Luca CanaliCompression factor

    Compressing Very Large Data Sets in Oracle, Luca Canali

  • Full Scan Speedup a Basic Test*Compressing Very Large Data Sets in Oracle, Luca Canali

    Compressing Very Large Data Sets in Oracle, Luca Canali

  • IO Reduction for Full Scan Operations on Compressed TablesData from Svetozar Kapusta Openlab (CERN).Speed up of full scan*Compressing Very Large Data Sets in Oracle, Luca Canali

    Compressing Very Large Data Sets in Oracle, Luca Canali

  • Time to Create Compressed TablesData from Svetozar Kapusta Openlab (CERN).*Compressing Very Large Data Sets in Oracle, Luca Canali

    Compressing Very Large Data Sets in Oracle, Luca Canali

  • A Closer Look to Compression..

    ..the Technology Part

    Compressing Very Large Data Sets in Oracle, Luca Canali

  • Compressing Very Large Data Sets in Oracle, Luca CanaliOracle Segment Compression -What is AvailableHeap table compression:Basic (from 9i)For OLTP (from 11gR1)11gR2 hybrid columnar (11gR2 exadata)Other compression technologiesIndex compressionKey factoringApplies also to IOTsSecure files (LOB) compression11g compression and deduplicationCompressed external tables (11gR2)Details not covered here

    *

    Compressing Very Large Data Sets in Oracle, Luca Canali

  • Compressing Very Large Data Sets in Oracle, Luca CanaliCompression - Syntax11gR2 syntax for compressed tablesExample:

    create table MYCOMP_TABLE compress BASIC compress for OLTPcompress for QUERY [LOW|HIGH] compress for ARCHIVE [LOW|HIGH] as select *