big data analytics - indico

39

Upload: others

Post on 26-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big Data Analytics - Indico
Page 2: Big Data Analytics - Indico

<Insert Picture Here>

Oracle’s Big Data solutions Jean-Philippe Breysse

Oracle Suisse

Page 3: Big Data Analytics - Indico

The following is intended to outline our general product

direction. It is intended for information purposes only, and

may not be incorporated into any contract. It is not a

commitment to deliver any material, code, or functionality,

and should not be relied upon in making purchasing

decisions.

The development, release, and timing of any features or

functionality described for Oracle’s products remain at the

sole discretion of Oracle.

Page 4: Big Data Analytics - Indico

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13 4

USE CASE 3: LOGS ANALYSIS OF SERVERS

Short Description : – Daily logs analysis

Issues: – Find correlations on what drives to failures

– Log files stored as flat files

Page 5: Big Data Analytics - Indico

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13 5

Oracle Technology mapped to Analytics Landscape

Acquire Analyze Organize Data Decide

Str

uctu

red

Se

mi-str

uctu

red

U

nstr

uctu

red

Master &

Reference

Transactions

Machine

Generated

Text, Image,

Video, Audio

Oracle 12g

Files

Oracle

NoSQL

Oracle

Hadoop

HDFS

Oracle Data

Integrator

Oracle

Hadoop

MapReduce

Oracle 12g

Oracle

Essbas

e

Oracle R

Enterprise &

Oracle Data

Mining

Oracle BI

Enterprise

Oracle Real Time

Decisions

Oracle Endeca

Information

Discovery

Oracle

Times Ten

Endeca

MDEX

Oracle

Golden Gate

Page 6: Big Data Analytics - Indico

Agenda

• Big Data

• Solution Spectrum

• Inside the Big Data Appliance

• Big Data Applications Software

• Big Data Analytics

• Conclusions

Page 7: Big Data Analytics - Indico

<Insert Picture Here>

Big Data

Why Everyone Should Care

Page 8: Big Data Analytics - Indico

Tapping into Diverse Data Sets

Transactions

Information Architectures

Today:

Decisions based on database data

Big Data:

Decisions based on all your data

Video and Images

Machine-Generated Data Social Data

Documents

Page 9: Big Data Analytics - Indico

9

A bit of history ...

: Developed initially by Doug Cutting (Nutch - Opensource websearch engine) and Yahoo -> inspired by Google’s papers on MapReduce and GFS (2003-2004) resulted in Apache Hadoop (2006)

Amazon Dynamo (2007): distributed systems technologies

Cassandra: was developed at Facebook (2008) to power their Inbox Search feature (columnar oriented distributed DB) based initially on Dynamo and Bigtable (built by Google)

Voldemort: is a distributed data store that is designed as a key-value store used by LinkedIn for high-scalability storage (NoSql key value)

Cloudera: . It contributes to Hadoop and related Apache projects and provides a commercial distribution of Hadoop

Page 10: Big Data Analytics - Indico

10

So What is Big Data Anyway? It’s a matter of perspective. Big Data is both:

• LARGE AND VARIABLE DATASETS that are difficult for traditional

database tools to easily manage – including datasets that once

seemed not important or too problematic to deal with. Big Data

datasets include:

− Extremely large files of unstructured or semi-structured data

− Large and highly distributed datasets that are otherwise difficult

to manage as a single unit of information

• NEW SET OF TECHNOLOGIES that can economically capture, store,

manage, and extract value from Big Data datasets – thus facilitating

better, more informed business decisions

Structured Data vs. Unstructured Data

Relational databases work best with structured data – data which has underlying structure (schema) and size that easily fits the specific confines of database columns and rows. Unstructured data is highly variable, lacks fixed structure, and is often too large to easily handle by RDBMS systems.

Source: IDC Digital Universe Study, Extracting Value from Chaos, June 2011 (sponsored by EMC)

Page 11: Big Data Analytics - Indico

<Insert Picture Here>

Drive Value from Big Data

Building a Big Data Platform

Page 12: Big Data Analytics - Indico

Divided Solution Spectrum

Acquire Analyze Organize

MapReduce Solutions

DBMS (DW)

DBMS (OLTP)

Advanced Analytics

Distributed File Systems

Transaction (Key-Value)

Stores

ETL

NoSQL Flexible

Specialized Developer

Centric

SQL Trusted Secure

Administered

Schema-less

Unstructured

Data Variety

Schema

Page 13: Big Data Analytics - Indico

Hadoop to Oracle – Bridging the Gap

Acquire Analyze Organize

Hadoop MapReduce

HDFS

Cassandra

RDBMS (OLTP)

RDBMS (DW)

Advanced Analytics

ETL

Oracle Loader for Hadoop

Schema-less

Unstructured

Data Variety

Schema

Page 14: Big Data Analytics - Indico

Oracle Integrated Software Solution

Acquire Analyze Organize

Oracle (DW)

Oracle (OLTP)

Schema-less

Unstructured

Data Variety

Schema

Hadoop HDFS

Oracle NoSQL DB

Oracle Analytics:

Data Mining

R Spatial Graph

mapreduce

OBI EE

Oracle Data Integrator

Oracle Loader for Hadoop

Page 15: Big Data Analytics - Indico

<Insert Picture Here>

Inside the Big Data Appliance

Overview

Page 16: Big Data Analytics - Indico

16 Copyright © 2011, Oracle and/or its affiliates. All rights

reserved.

Insert Information Protection Policy Classification from Slide 8

Oracle Engineered Solutions

Acquire Analyze Organize

Oracle

Database (DW)

Oracle Database

(OLTP)

In-DB Analytics

“R” Mining

Text Graph Spatial

Oracle BI EE

Oracle NoSQL DB

HDFS Hadoop

Oracle Data Integrator

Oracle Loader for Hadoop

Data Variety

Information Density

Unstructured

Schema

Big Data Appliance • Hadoop

• NoSQL Database

• Oracle Loader for hadoop

• Oracle Data Integrator

Oracle Exadata • OLTP & DW

• Data Mining & Oracle R

• Semantics

• Spatial

Exalytics • Speed of

Thought

Analytics

Page 17: Big Data Analytics - Indico

Big Data Appliance Usage Model

InfiniBand

Oracle Big Data Appliance

Oracle Exadata

Acquire Organize Analyze & Visualize Stream

Oracle Exalytics

InfiniBand

Page 18: Big Data Analytics - Indico

•18 Sun X4270 M2 Servers

– 48 GB memory per node = 864 GB memory

– 12 Intel cores per node = 216 cores

– 24 TB storage per node = 432 TB storage

•40 Gb p/sec InfiniBand

•10 Gb p/sec Ethernet

Oracle Big Data Appliance Hardware

Page 19: Big Data Analytics - Indico

Scale Out to Infinity

Scale out by connecting racks to each other using Infiniband

• Expand up to eight racks without additional switches

• Scale beyond eight racks by adding an additional switch

Page 20: Big Data Analytics - Indico

• Oracle Enterprise Linux 5.6

• Oracle Hotspot Java VM

• Cloudera’s Distribution

including Apache Hadoop

• Cloudera Manager

• Open Source Distribution of R

• Oracle NoSQL Database

Community Edition

Oracle Big Data Appliance Software

Page 21: Big Data Analytics - Indico

<Insert Picture Here>

Big Data Application Software

Acquire New Information

Page 22: Big Data Analytics - Indico

Key-Value Store Workloads

• Large dynamic schema based data repositories

• Data capture

• Web applications (click-through capture)

• Online retail

• Sensor/statistics/network capture (factory automation for example)

• Backup services for mobile devices

• Data services

• Scalable authentication

• Real-time communication (MMS, SMS, routing)

• Personalization

• Social Networks

Page 23: Big Data Analytics - Indico

Oracle NoSQL DB A distributed, scalable key-value database

• Simple Data Model

• Key-value pair with major+sub-key paradigm

• Read/insert/update/delete operations

• Scalability

• Dynamic data partitioning and distribution

• Optimized data access via intelligent driver

• High availability

• One or more replicas

• Disaster recovery through location of replicas

• Resilient to partition master failures

• No single point of failure

• Transparent load balancing

• Reads from master or replicas

• Driver is network topology & latency aware

• Elastic (Planned for Release 2)

• Online addition/removal of Storage Nodes

• Automatic data redistribution

Storage Nodes Data Center A

Storage Nodes Data Center B

NoSQLDB Driver

Application

NoSQLDB Driver

Application

Page 24: Big Data Analytics - Indico

<Insert Picture Here>

Big Data Application Software

Organizing Data for Analysis

Page 25: Big Data Analytics - Indico

30 Copyright © 2011, Oracle and/or its affiliates. All rights

reserved.

Oracle Loader for Hadoop Features

• Load data into a partitioned or non-partitioned table – Single level, composite or interval partitioned table

– Support for scalar datatypes of Oracle Database

– Load into Oracle Database 11g Release 2

• Runs as a Hadoop job and supports standard options

• Pre-partitions and sorts data on Hadoop

• Online and offline load modes

Page 26: Big Data Analytics - Indico

31 Copyright © 2011, Oracle and/or its affiliates. All rights

reserved.

Oracle Loader for Hadoop

SHUFFLE /SORT

SHUFFLE /SORT

MAP

MAP

MAP

MAP SHUFFLE

/SORT

REDUCE

REDUCE

SHUFFLE /SORT

SHUFFLE /SORT

REDUCE

REDUCE

REDUCE

INPUT 2

INPUT 1

MAP

MAP

MAP

MAP

MAP

REDUCE

REDUCE

REDUCE

MAP

MAP

MAP

MAP

MAP

MAP

REDUCE

REDUCE

MAP

MAP

MAP

MAP

MAP

REDUCE

REDUCE

REDUCE

ORACLE LOADER FOR HADOOP

Page 27: Big Data Analytics - Indico

32 Copyright © 2011, Oracle and/or its affiliates. All rights

reserved.

Oracle Loader for Hadoop: Online Option

SHUFFLE /SORT

SHUFFLE /SORT

REDUCE

REDUCE

REDUCE

MAP

MAP

MAP

MAP

MAP

MAP

REDUCE

REDUCE

ORACLE LOADER FOR HADOOP Connect to the database from reducer nodes, load into database partitions in parallel

Read target table metadata from the database

Perform partitioning, sorting, and data conversion

Page 28: Big Data Analytics - Indico

33 Copyright © 2011, Oracle and/or its affiliates. All rights

reserved.

Oracle Loader for Hadoop: Offline Option

SHUFFLE /SORT

SHUFFLE /SORT

REDUCE

REDUCE

REDUCE

MAP

MAP

MAP

MAP

MAP

MAP

REDUCE

REDUCE

ORACLE LOADER FOR HADOOP Read target table metadata

from the database Perform partitioning,

sorting, and data conversion

Write from reducer nodes to Oracle Data Pump files

Import into the database in parallel using external table mechanism

Page 29: Big Data Analytics - Indico

35 Copyright © 2011, Oracle and/or its affiliates. All rights

reserved.

Selection Output Option for Use Case

Oracle Loader for Hadoop

Output Option Use Case Characteristics

Online load with JDBC The simplest use case for non

partitioned tables

Online load with Direct Path Fast online load for partitioned

tables

Offline load with datapump files Fastest load method for external

tables

On Oracle Big Data Appliance

Direct HDFS

Leave data on HDFS

Parallel access from database

Import into database when

needed

Page 30: Big Data Analytics - Indico

36 Copyright © 2011, Oracle and/or its affiliates. All rights

reserved.

Automate Usage of Oracle Loader for Hadoop

• ODI has knowledge modules to

– Generate data transformation code to run on Hive/Hadoop

– Invoke Oracle Loader for Hadoop

• Use the drag-and-drop interface in ODI to

– Include invocation of Oracle Loader for Hadoop in any ODI

packaged flow

Oracle Data Integrator (ODI)

Page 31: Big Data Analytics - Indico

37 Copyright © 2011, Oracle and/or its affiliates. All rights

reserved.

Page 32: Big Data Analytics - Indico

<Insert Picture Here>

Big Data Analytics

Real Time Analytics Platform

Page 33: Big Data Analytics - Indico

R Statistical Programming Language

Open source language and environment Used for statistical computing and graphics Strength in easily producing publication-quality plots Highly extensible with open source community R packages

Page 34: Big Data Analytics - Indico

<Insert Picture Here>

Drive Value from Big Data

Conclusions

Page 35: Big Data Analytics - Indico

Big Data Appliance Big Data for the Enterprise

• Optimized and Complete

• Everything you need to store and integrate

your lower information density data

• Integrated with Oracle Exadata

• Analyze all your data

• Easy to Deploy

• Risk Free, Quick Installation and Setup

• Single Vendor Support

• Full Oracle support for the entire system and

software set

Page 36: Big Data Analytics - Indico

DECIDE

Oracle Analytic

Applications

Oracle Integrated Solution Stack for Big Data

ACQUIRE

Oracle NoSQL

Database

HDFS

Enterprise Applications

ORGANIZE

Hadoop (MapReduce)

Oracle Loader

for Hadoop

Oracle Data Integrator

ANALYZE

In-D

ata

ba

se

An

aly

tic

s

Data

Warehouse

Page 37: Big Data Analytics - Indico

Oracle: Big Data for the Enterprise

• The most comprehensive solution

• Includes everything needed to acquire, organize and

analyze all your data

• Optimized for Extreme Analytics

• Deepest analytics portfolio with access to all data

• Engineered to Work Together

• Eliminate deployment risk and support risk

• Enterprise Ready

• Deliver extreme performance and scalability

Page 38: Big Data Analytics - Indico

Questions

Page 39: Big Data Analytics - Indico