real-world performance training · •fastreader from wisdomforce (now informatica) •datapump...

55

Upload: others

Post on 22-Mar-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Real-World Performance TrainingLoading Data

Real-World Performance Team

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Agenda

The DW/BI Death Spiral

Parallel Execution

Loading Data

Exadata and Database In-Memory

Dimensional Queries

1

2

3

5

4

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Retail Demo

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

The Schema

Oracle Retail Data Warehouse

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Retail Demonstration

Table Size of Source Data in GByte Number or Rows in Millions

Transactions 51.8 463.7

Payments 54.2 463.7

Line Items 940.8 6,980.6

Total 1046.9 7,908.0

Table Sizes

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Retail Demonstration

Table Size of Table (GB) Compression Ratio

Transactions 29.1 1.78 : 1

Payments 29.2 1.86 : 1

Line Items 257.1 3.66 : 1

Total 315.5 3.32 : 1

Table Sizes – Default Compression

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Retail Demonstration

Table Size of Table (GB) Compression Ratio

Transactions 4.8 10.82 : 1

Payments 4.9 10.99 : 1

Line Items 55.0 17.11 : 1

Total 64.7 16.18 : 1

Table Sizes – Hybrid Columnar Compression

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Retail Demo

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Setup

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Setup

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Initial Data Load

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Initial Data Load

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Initial Data Load

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Flash—Scan and Count Rows

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Flash—Scan and Count Rows

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Flash—Scan and Count Rows

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Gather Statistics

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Gather Statistics

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Load Daily Data

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Load Daily Data

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Validate Daily Data

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Validate Daily Data

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Transform Daily Data

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Transform Daily Data

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Transform Daily Data

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Exchange In Daily Data

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Exchange In Daily Data

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Exchange In Daily Data

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Gather Incremental Stats

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Gather Incremental Stats

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Gather Incremental Stats

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Loading a Data Warehouse

• Two broad approaches

– ETL: Extract Transform Load

– ELT: Extract Load Transform

Oracle Confidential – Internal/Restricted/Highly Restricted 33

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Loading a Data Warehouse

• Extract the data from the source system. In many cases, this is the Data Warehouse itself

• Perform Transformation and Validation, usually on some middle tier server

• Load the data into the Data Warehouse.– Often the data is written to the Data Warehouse using DML operations; inserts, updates and

deletes. In turn, this may require indexes in order to perform

• A whole business has been developed around “data integration” products and services, such as– Informatica

– Ab Initio

Oracle Confidential – Internal/Restricted/Highly Restricted 34

ETL – Extract Transform Load

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Loading a Data Warehouse

• Extract the data from the source system

• Load the data as-is into “staging” tables on the Data Warehouse system

• Validation and Transformation performed via SQL and set based processing techniques

• Final data is added to the target fact or dimension table

– Partition Exchange is an effective technique for this step

Oracle Confidential – Internal/Restricted/Highly Restricted 35

ELT – Extract Load Transform

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Loading a Data Warehouse

• Extracting data from a source system is often the most challenging– What tools are available depends on the data source

– For Oracle, there is no “data unload” product• Home grown tools

• Fastreader from WisdomForce (now Informatica)

• Datapump Export, Transportable Tablespaces

– Compression Benefits • Reduced time to copy data over the network

• Increased load performance

– Where will the data be staged• DBFS, NFS, ZFS

• USB Drive !

Oracle Confidential – Internal/Restricted/Highly Restricted 36

Extract

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Loading a Data Warehouse

• Data Loading is a CPU/Memory constrained operation.

– Data loads scale well over multiple CPUs, cores and hosts (assuming no other form of contention)

–Memory usage for meta data associated with highly partitioned objects can become significant at high DOP

• Use external tables with a parallel SQL statement (e.g. CTAS or IAS) to minimize on-disk and in-memory meta data. Do NOT use multiple threads of SQL*Loader– Using external tables is also much simpler than having to manage multiple threads of

SQL*Loader

Oracle Confidential – Internal/Restricted/Highly Restricted 37

Loading

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Loading a Data Warehouse

• Direct Path Load– Enabled using the APPEND hint– Default for CTAS and for Parallel Inserts

• Why Direct Path Load?– Allows a single parallel insert operation to efficiently load data from multiple parallel server processes

• Significant performance improvement for parallel DML/DDL operations

– Required for basic/default and HCC compression – No redo or undo– Bypasses buffer cache

• Possible Issues– Only one direct path load into a table/partition at a time– No logging for Data Guard

Loading

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Anatomy of an External Table

Loading a Data Warehouse

create table FAST_LOAD

(

column definition list ...

)

organization external

( type oracle_loader

default directory SPEEDY_FILESYSTEM

preprocessor exec_file_dir:’zcat.sh’

characterset ‘ZHS16GBK’

badfile ERROR_DUMP:’FAST_LOAD.bad’

logfile ERROR_DUMP:’FAST_LOAD.log’

(

file column mapping list ...

)

location

(file_1.gz, file_2.gz, file_3.gz, file_4.gz )

reject limit 1000

parallel 4

/

External Table Definition

Reference the Mount Point Uncompress the data

using a secure wrapper

The Character set must match the Character set of the Files

Note Compressed Files

The number of files should match or be a multiple of the DoP.

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Loading a Data Warehouse

• Elimination of duplicates– Outer Join back to the table

– Window function

– Aggregate with HAVING clause

• Foreign Key References– Outer Joins between tables

• The choice of techniques will be dependent on the following– Good/Bad validation of the data

– The desire to identify and locate bad rows e.g. find ROWIDS

– The desire to programmatically eliminate bad rows

Validation and Transformation

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Duplicate Rows

Data Validation SQL

Simply Check the Data Obtain one of the ROWIDs of duplicates to investigate

Query the rows you wish to keep eliminating duplicates based on the load time

select

pk,count(*)

from DIRTY_DATA

group by pk

having count(*)>1;

select

pk,

count(*),

max(rowid)

from DIRTY_DATA

group by pk

having count(*)>1;

select column_list

from

(

select

a.*,row_number() over

(

partition by pk

order by load_time desc

) rowno

from DIRTY_DATA a

)

where rowno=1

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Orphaned Row Check

Data Validation SQL

Look For Orphans Look for Parents with no Children

select C.rowid

from PARENT P

right outer join

CHILD C

on P.pk = C.fk

where P.pk is null;

select P.rowid

from PARENT P

left outer join

CHILD C

on P.pk = C.fk

where C.fk is null;

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Loading a Data Warehouse

• Data Transformation

– Change data by performing transformations into a new table

– Consistent and Predictable Performance

– Supports Direct Path Loads and Compression

• Data Modification

– Change data in place

– Update, Delete, Insert

–Overhead and performance impact of changing existing blocks

– Does not work well with compression

Data Transformation vs Data Modification

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Loading a Data Warehouse

• Use either – INSERT /*+APPEND */ INTO … SELECT

– CREATE TABLE … AS SELECT

• Using an INSERT– Constraints such as NOT NULL can be correctly applied and enforced

– Data type, column lengths and precision can be defined and preserved

• Using a CTAS– DDL (not DML)

– Some optimizations available, that are currently disabled for DML. This may change over time.

Data Transformation SQL

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Delete

Rewriting DML as Transformation

alter session enable parallel

dml;

delete from tx_log

where

symbol = ‘JAVA’;

commit;

alter session enable parallel dml;

insert /*+ append */ into tx_log_new

select * from tx_log

where

symbol != ‘JAVA’;

alter table tx_log

rename to tx_log_old;

alter table tx_log_new

rename to tx_log;

The predicate is the compliment of the DELETE, it selects the rows to keep

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Update

Rewriting DML as Transformation

alter session enable parallel dml;

update sales_ledger

set tax_rate = 9.9

where tax_rate = 9.3

and sales_date > ‘01-Jan-09’;

commit;

alter session enable parallel dml;

insert /*+ append */ into sales_ledger_new

select

<column list>,

case

sales_date>‘01-Jan-09’

and

tax_rate=9.3

then

9.9

else

tax_rate

end,

<column list>

from sales_ledger;

The UPDATE predicates are moved to the SELECT list in a CASE statement to transform the rows

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Loading a Data Warehouse

• An example workflow may be:– Load data into first staging table• Basic data integrity, nulls, data types etc.

– Check the data, writing “good” data to a second staging table• Uniqueness, foreign keys, business rules etc.

• Apply constraints with “RELY DISABLE NOVALIDATE”

– Transform the data into a third staging table• Tax corrections, time zone corrections, consolidate codes etc.

• Gather statistics on final staging table, including synopsis

– Use Partition Exchange to “swap in” the staging table to the Fact table• Gather Global statistics, which will be rolled up using partition synopses

Example Workflow

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Example Workflow

Oracle Confidential – Internal/Restricted/Highly Restricted 48

Load Data into Staging Table

5-305-295-285-275-265-25 5-31

Daily PartitionedTable

. . .

Stage_3

Partition Synopses

Stage_2Stage_1Load data from externaltable into stage_1

Stage_1_err Stage_2_err

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Data Load

Oracle Confidential – Internal/Restricted/Highly Restricted 49

Validation

5-305-295-285-275-265-25 5-31

Daily PartitionedTable

. . .

Stage_3Stage_2

Partition Synopses

Stage_1Valid data transformedinto stage_2

Invalid data transformedInto stage_1_err

Stage_1_err Stage_2_err

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Data Load

Oracle Confidential – Internal/Restricted/Highly Restricted 50

Transformation

5-305-295-285-275-265-25 5-31

Daily PartitionedTable

. . .

Stage_3

Partition Synopses

Stage_2Stage_1Data transformationInto stage_3VAT codes, time zonesetc

Invalid data transformedInto stage_2_err Stage_1_err Stage_2_err

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Data Load

Oracle Confidential – Internal/Restricted/Highly Restricted 51

Gather Statistics

5-305-295-285-275-265-25 5-31

Daily PartitionedTable

. . .

Stage_3

Partition Synopses

Stage_2Stage_1Gather statistics

Stage_1_err Stage_2_err

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Data Load

Oracle Confidential – Internal/Restricted/Highly Restricted 52

Partition Exchange

5-305-295-285-275-265-25 5-31

Daily PartitionedTable

. . .

Stage_3

Partition Synopses

Stage_2Stage_1Exchange stage_3 with partition

Stage_1_err Stage_2_err

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Data Transformation SQL

Driver Transformation Modification

Compression No Impact Compression may be Lost and severely impact performance

Fragmentation None Fragmentation, row chaining, and holes will almost certainly take place

Logging and UNDO Possible to eliminate Will take place and may impact performance and administration requirements

Indexes Indexes need to be rebuilt if used Indexes will be maintained in place. This may be a performance overhead and Bit Map indexes may become fragments and require rebuilding

Meta Data Grants etc will require redefinition No impact

Space Overhead of maintaining multiple copies of the data Overhead of UNDO and Logging

Coding New code required writing and new techniques need teaching Old Code runs with performance challenges

3rd Party Issues May not be supported by Tool Vendors

Transformation vs. Modification

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Loading a Data Warehouse

• Data validation and modification

– Best executed in the database via SQL

– This presents big challenges to users who have committed to classic ETL tools such as Informatica

– Changes of data are best made via transformation and redefinition than via classic OLTP DML statements ( delete, update, merge )• Allows exploitation of hardware and parallelism

• Minimizes fragmentation and maximizes compression

• Minimizes logging and minimizes recovery

– Set based techniques use efficient CPU and IO techniques

Summary