1 copyright © 2012, oracle and/or its affiliates. all...

32
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 1

Upload: lekhue

Post on 03-Jul-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 1

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 2

Big Data Connectors: High Performance Integration for Hadoop and Oracle Database Melli Annamalai Sue Mavris Rob Abbott

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 3

Program Agenda

  Big Data Connectors: Brief Overview

  Connecting Hadoop with Oracle Database –  Oracle Direct Connector for HDFS –  Oracle Loader for Hadoop –  Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 4

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 5

Acquire Organize & Discover Analyze

Visualize & Decide

Oracle’s Big Data Platform

Stream

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 6

Hadoop Oracle Database

Oracle’s Big Data Platform

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 7

Oracle Big Data Connectors

 Oracle Direct Connector for HDFS

 Oracle Loader for Hadoop

 Oracle R Connector for Hadoop

 Oracle Data Integrator Application Adapters for Hadoop

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 8

Oracle Loader for Hadoop and Oracle Direct Connector for HDFS

 Access data resident on Hadoop from Oracle Database

 Load data from Hadoop into Oracle Database

 Analyze all data together: –  Data processed on Hadoop along with data in Oracle Database

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 9

Oracle R Connector for Hadoop R Analytics leveraging Hadoop and HDFS

Linearly Scale a Robust Set of R Algorithms

Leverage MapReduce for R Calculations

Compute Intensive Parallelism for Simulations HDFS

Hadoop

Oracle R Client

MAP MAP MAP MAP

REDUCE REDUCE

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 10

Oracle Data Integrator Application Adapters for Hadoop

Transforms Via MapReduce(HIVE)

Loads

Activates

Benefits   Consistent tooling across BI/DW, SOA, Integration and Big Data

  Reduce complexities of processing Hadoop through graphical tooling

  Improves productivity when processing Big Data (Structured + Unstructured) Oracle Database

Improving Productivity and Efficiency for Big Data

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 11

Big Data Connectors

ORACLE LOADER FOR HADOOP ORACLE DIRECT CONNECTOR FOR HDFS

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 12

Loading and Accessing Data from Hadoop

SHUFFLE /SORT

SHUFFLE /SORT

MAP

MAP

MAP

MAP SHUFFLE

/SORT

REDUCE

REDUCE

INPUT 2

INPUT 1

MAP

MAP

MAP

MAP

MAP

REDUCE

REDUCE

MAP

MAP

MAP

MAP

MAP

REDUCE

REDUCE

REDUCE

Oracle Database

LOG FILES

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 13

BUSINESS PROBLEM Need insight into customer web activity (clickstream data)

CONNECT HADOOP WITH ORACLE DATABASE

Aggregate raw data and load into database for analysis

Example Use Case

BUSINESS PROBLEM Need to connect web activity with transactional activity

CONNECT HADOOP WITH ORACLE DATABASE

Perform analysis on in-place data by running Oracle SQL queries

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 14

Usage Scenarios

 Bulk load large volumes of data –  Example: Historical data, daily uploads of data gathered during the day

 Loads at regular frequency –  Example: 24/7 monitoring of log feeds

 Loads at irregular frequency –  Example: Monitoring of sensor feeds

 Access data files in place on HDFS

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 15

Oracle Direct Connector for HDFS Accessing HDFS Data from Oracle Database

External Table

SQL Query

HDFS Client

HDFS Oracle Database Features

Access and analyze data in place on HDFS

Query and join data on HDFS with database resident data

Load into the database using SQL if required

Automatic load balancing to maximize performance

Access or load into the database in parallel using external table mechanism

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 16

Oracle Direct Connector for HDFS

 Access data on HDFS via external tables –  No DML operations, and no indexes can be created on external tables

 Data files can be text files or Oracle Data Pump files (created by Oracle Loader for Hadoop)

 Parallelism is controlled by the external table definition

 Data files are grouped to distribute load evenly across PQ slaves

External Tables

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 17

  Create external table   Run the Oracle Direct

Connector for HDFS utility to publish HDFS content to the external table

  Access and load into the database using SQL

3 Simple Steps

Oracle Direct Connector for HDFS

>hadoop jar \ $ODCH_HOME/jlib/orahdfs.jar \ oracle.hadoop.hdfs.extab.ExternalTable\ -conf MyConf.xml \ -publish

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 18

Performance Comparison

0

1

2

3

4

5

6

Fuse-DFS Oracle Direct Connector for HDFS

Load rate (TB/hour)

Fuse DFS

0 20 40 60 80

100 120 140 160 180

Fuse-DFS Oracle Direct Connector for HDFS

CPU

sec

onds

use

d pe

r GB

CPU Usage

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 19

Key Benefits

 Uniquely enables access to HDFS data files from Oracle Database  Performance

–  12 TB/hour from Oracle Big Data Appliance to Oracle Exadata –  5x – 20x faster than comparable third party products

 Easy to use for Oracle DBAs and Hadoop developers  Developed and supported by Oracle

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 20

Oracle Loader for Hadoop

SHUFFLE /SORT

SHUFFLE /SORT

REDUCE

REDUCE

REDUCE

MAP

MAP

MAP

MAP

MAP

MAP

REDUCE

REDUCE

ORACLE LOADER FOR HADOOP

Features

Offloads data pre-processing from the database server to Hadoop

Works with a range of input data formats

Handles skew in input data to maximize performance Online and offline modes (offline: create Oracle Data Pump files on HDFS)

Connect to the database from reducer nodes, load into database partitions in parallel (JDBC or direct path)

Read target table metadata from the database

Partition, sort, and convert into Oracle data types on Hadoop

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 21

Input Formats

 Delimited text InputFormat  Hive tables InputFormat  Avro record InputFormat  User written InputFormat  (Planned) Regular expression InputFormat  (Planned) Oracle NoSQL Database InputFormat

Oracle Loader for Hadoop

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 22

Automatically Handle Input Data Skew

 Distribute load evenly across reduce tasks –  All reducers do approximately the same amount of work –  Avoids slowdown because of unbalanced reducer loads –  Maximizes performance

 Data is sampled to determine optimal partitioning of map output keys

Load Balancing across Reducers

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 23

 Create target table

 Submit Oracle Loader for Hadoop job to the cluster

2 Simple Steps

Oracle Loader for Hadoop

>hadoop jar \ $OLH_HOME/jlib/oraloader.jar \ oracle.hadoop.loader.OraLoader \ -conf MyConf.xml

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 24

Performance Comparison Third party products

0

0.5

1

1.5

2

2.5

Comparable third party product

Oracle Loader for Hadoop

Load rate (TB/hour)

0

100

200

300

400

500

600

700

Comparable third party product

Oracle Loader for Hadoop

CPU

sec

onds

use

d pe

r GB

CPU Usage

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 25

Key Benefits

 Load directly from HDFS, Hive tables, … into Oracle Database without intermediate staging files

 Performance –  10x faster than comparable third party products

 Offload database server processing on to Hadoop –  Minimizes impact on performance SLAs of production applications

 Easy to use for Oracle DBAs and Hadoop developers  Developed and supported by Oracle

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 26

Oracle Loader for Hadoop and Oracle Direct Connector for HDFS

SHUFFLE /SORT

SHUFFLE /SORT

REDUCE

REDUCE

REDUCE

MAP

MAP

MAP

MAP

MAP

MAP

REDUCE

REDUCE

ORACLE LOADER FOR HADOOP

External Table

SQL Query

HDFS Client

Oracle Database

ORACLE DIRECT CONNECTOR FOR HDFS

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 27

•  12 TB / HOUR (66 BILLION ROWS)

•  5 – 20 TIMES FASTER THAN THIRD PARTY PRODUCTS

•  REDUCED DATABASE CPU USAGE IN COMPARISON

Performance Summary

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 28

Summary

 High performance connectors for load and access of data from a Hadoop cluster

 Fast and efficient connectors support a range of use cases

 Simple to set up, easy to use for developers  Developed and supported by Oracle

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 29

Q & A

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 30

Graphic Section Divider

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 31

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 32