big data analytics - indico
TRANSCRIPT
<Insert Picture Here>
Oracle’s Big Data solutions Jean-Philippe Breysse
Oracle Suisse
The following is intended to outline our general product
direction. It is intended for information purposes only, and
may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality,
and should not be relied upon in making purchasing
decisions.
The development, release, and timing of any features or
functionality described for Oracle’s products remain at the
sole discretion of Oracle.
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13 4
USE CASE 3: LOGS ANALYSIS OF SERVERS
Short Description : – Daily logs analysis
Issues: – Find correlations on what drives to failures
– Log files stored as flat files
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13 5
Oracle Technology mapped to Analytics Landscape
Acquire Analyze Organize Data Decide
Str
uctu
red
Se
mi-str
uctu
red
U
nstr
uctu
red
Master &
Reference
Transactions
Machine
Generated
Text, Image,
Video, Audio
Oracle 12g
Files
Oracle
NoSQL
Oracle
Hadoop
HDFS
Oracle Data
Integrator
Oracle
Hadoop
MapReduce
Oracle 12g
Oracle
Essbas
e
Oracle R
Enterprise &
Oracle Data
Mining
Oracle BI
Enterprise
Oracle Real Time
Decisions
Oracle Endeca
Information
Discovery
Oracle
Times Ten
Endeca
MDEX
Oracle
Golden Gate
Agenda
• Big Data
• Solution Spectrum
• Inside the Big Data Appliance
• Big Data Applications Software
• Big Data Analytics
• Conclusions
<Insert Picture Here>
Big Data
Why Everyone Should Care
Tapping into Diverse Data Sets
Transactions
Information Architectures
Today:
Decisions based on database data
Big Data:
Decisions based on all your data
Video and Images
Machine-Generated Data Social Data
Documents
9
A bit of history ...
: Developed initially by Doug Cutting (Nutch - Opensource websearch engine) and Yahoo -> inspired by Google’s papers on MapReduce and GFS (2003-2004) resulted in Apache Hadoop (2006)
Amazon Dynamo (2007): distributed systems technologies
Cassandra: was developed at Facebook (2008) to power their Inbox Search feature (columnar oriented distributed DB) based initially on Dynamo and Bigtable (built by Google)
Voldemort: is a distributed data store that is designed as a key-value store used by LinkedIn for high-scalability storage (NoSql key value)
Cloudera: . It contributes to Hadoop and related Apache projects and provides a commercial distribution of Hadoop
10
So What is Big Data Anyway? It’s a matter of perspective. Big Data is both:
• LARGE AND VARIABLE DATASETS that are difficult for traditional
database tools to easily manage – including datasets that once
seemed not important or too problematic to deal with. Big Data
datasets include:
− Extremely large files of unstructured or semi-structured data
− Large and highly distributed datasets that are otherwise difficult
to manage as a single unit of information
• NEW SET OF TECHNOLOGIES that can economically capture, store,
manage, and extract value from Big Data datasets – thus facilitating
better, more informed business decisions
Structured Data vs. Unstructured Data
Relational databases work best with structured data – data which has underlying structure (schema) and size that easily fits the specific confines of database columns and rows. Unstructured data is highly variable, lacks fixed structure, and is often too large to easily handle by RDBMS systems.
Source: IDC Digital Universe Study, Extracting Value from Chaos, June 2011 (sponsored by EMC)
<Insert Picture Here>
Drive Value from Big Data
Building a Big Data Platform
Divided Solution Spectrum
Acquire Analyze Organize
MapReduce Solutions
DBMS (DW)
DBMS (OLTP)
Advanced Analytics
Distributed File Systems
Transaction (Key-Value)
Stores
ETL
NoSQL Flexible
Specialized Developer
Centric
SQL Trusted Secure
Administered
Schema-less
Unstructured
Data Variety
Schema
Hadoop to Oracle – Bridging the Gap
Acquire Analyze Organize
Hadoop MapReduce
HDFS
Cassandra
RDBMS (OLTP)
RDBMS (DW)
Advanced Analytics
ETL
Oracle Loader for Hadoop
Schema-less
Unstructured
Data Variety
Schema
Oracle Integrated Software Solution
Acquire Analyze Organize
Oracle (DW)
Oracle (OLTP)
Schema-less
Unstructured
Data Variety
Schema
Hadoop HDFS
Oracle NoSQL DB
Oracle Analytics:
Data Mining
R Spatial Graph
mapreduce
OBI EE
Oracle Data Integrator
Oracle Loader for Hadoop
<Insert Picture Here>
Inside the Big Data Appliance
Overview
16 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Insert Information Protection Policy Classification from Slide 8
Oracle Engineered Solutions
Acquire Analyze Organize
Oracle
Database (DW)
Oracle Database
(OLTP)
In-DB Analytics
“R” Mining
Text Graph Spatial
Oracle BI EE
Oracle NoSQL DB
HDFS Hadoop
Oracle Data Integrator
Oracle Loader for Hadoop
Data Variety
Information Density
Unstructured
Schema
Big Data Appliance • Hadoop
• NoSQL Database
• Oracle Loader for hadoop
• Oracle Data Integrator
Oracle Exadata • OLTP & DW
• Data Mining & Oracle R
• Semantics
• Spatial
Exalytics • Speed of
Thought
Analytics
Big Data Appliance Usage Model
InfiniBand
Oracle Big Data Appliance
Oracle Exadata
Acquire Organize Analyze & Visualize Stream
Oracle Exalytics
InfiniBand
•18 Sun X4270 M2 Servers
– 48 GB memory per node = 864 GB memory
– 12 Intel cores per node = 216 cores
– 24 TB storage per node = 432 TB storage
•40 Gb p/sec InfiniBand
•10 Gb p/sec Ethernet
Oracle Big Data Appliance Hardware
Scale Out to Infinity
Scale out by connecting racks to each other using Infiniband
• Expand up to eight racks without additional switches
• Scale beyond eight racks by adding an additional switch
• Oracle Enterprise Linux 5.6
• Oracle Hotspot Java VM
• Cloudera’s Distribution
including Apache Hadoop
• Cloudera Manager
• Open Source Distribution of R
• Oracle NoSQL Database
Community Edition
Oracle Big Data Appliance Software
<Insert Picture Here>
Big Data Application Software
Acquire New Information
Key-Value Store Workloads
• Large dynamic schema based data repositories
• Data capture
• Web applications (click-through capture)
• Online retail
• Sensor/statistics/network capture (factory automation for example)
• Backup services for mobile devices
• Data services
• Scalable authentication
• Real-time communication (MMS, SMS, routing)
• Personalization
• Social Networks
Oracle NoSQL DB A distributed, scalable key-value database
• Simple Data Model
• Key-value pair with major+sub-key paradigm
• Read/insert/update/delete operations
• Scalability
• Dynamic data partitioning and distribution
• Optimized data access via intelligent driver
• High availability
• One or more replicas
• Disaster recovery through location of replicas
• Resilient to partition master failures
• No single point of failure
• Transparent load balancing
• Reads from master or replicas
• Driver is network topology & latency aware
• Elastic (Planned for Release 2)
• Online addition/removal of Storage Nodes
• Automatic data redistribution
Storage Nodes Data Center A
Storage Nodes Data Center B
NoSQLDB Driver
Application
NoSQLDB Driver
Application
<Insert Picture Here>
Big Data Application Software
Organizing Data for Analysis
30 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Oracle Loader for Hadoop Features
• Load data into a partitioned or non-partitioned table – Single level, composite or interval partitioned table
– Support for scalar datatypes of Oracle Database
– Load into Oracle Database 11g Release 2
• Runs as a Hadoop job and supports standard options
• Pre-partitions and sorts data on Hadoop
• Online and offline load modes
31 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Oracle Loader for Hadoop
SHUFFLE /SORT
SHUFFLE /SORT
MAP
MAP
MAP
MAP SHUFFLE
/SORT
REDUCE
REDUCE
SHUFFLE /SORT
SHUFFLE /SORT
REDUCE
REDUCE
REDUCE
INPUT 2
INPUT 1
MAP
MAP
MAP
MAP
MAP
REDUCE
REDUCE
REDUCE
MAP
MAP
MAP
MAP
MAP
MAP
REDUCE
REDUCE
MAP
MAP
MAP
MAP
MAP
REDUCE
REDUCE
REDUCE
ORACLE LOADER FOR HADOOP
32 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Oracle Loader for Hadoop: Online Option
SHUFFLE /SORT
SHUFFLE /SORT
REDUCE
REDUCE
REDUCE
MAP
MAP
MAP
MAP
MAP
MAP
REDUCE
REDUCE
ORACLE LOADER FOR HADOOP Connect to the database from reducer nodes, load into database partitions in parallel
Read target table metadata from the database
Perform partitioning, sorting, and data conversion
33 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Oracle Loader for Hadoop: Offline Option
SHUFFLE /SORT
SHUFFLE /SORT
REDUCE
REDUCE
REDUCE
MAP
MAP
MAP
MAP
MAP
MAP
REDUCE
REDUCE
ORACLE LOADER FOR HADOOP Read target table metadata
from the database Perform partitioning,
sorting, and data conversion
Write from reducer nodes to Oracle Data Pump files
Import into the database in parallel using external table mechanism
35 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Selection Output Option for Use Case
Oracle Loader for Hadoop
Output Option Use Case Characteristics
Online load with JDBC The simplest use case for non
partitioned tables
Online load with Direct Path Fast online load for partitioned
tables
Offline load with datapump files Fastest load method for external
tables
On Oracle Big Data Appliance
Direct HDFS
Leave data on HDFS
Parallel access from database
Import into database when
needed
36 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Automate Usage of Oracle Loader for Hadoop
• ODI has knowledge modules to
– Generate data transformation code to run on Hive/Hadoop
– Invoke Oracle Loader for Hadoop
• Use the drag-and-drop interface in ODI to
– Include invocation of Oracle Loader for Hadoop in any ODI
packaged flow
Oracle Data Integrator (ODI)
37 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
<Insert Picture Here>
Big Data Analytics
Real Time Analytics Platform
R Statistical Programming Language
Open source language and environment Used for statistical computing and graphics Strength in easily producing publication-quality plots Highly extensible with open source community R packages
<Insert Picture Here>
Drive Value from Big Data
Conclusions
Big Data Appliance Big Data for the Enterprise
• Optimized and Complete
• Everything you need to store and integrate
your lower information density data
• Integrated with Oracle Exadata
• Analyze all your data
• Easy to Deploy
• Risk Free, Quick Installation and Setup
• Single Vendor Support
• Full Oracle support for the entire system and
software set
DECIDE
Oracle Analytic
Applications
Oracle Integrated Solution Stack for Big Data
ACQUIRE
Oracle NoSQL
Database
HDFS
Enterprise Applications
ORGANIZE
Hadoop (MapReduce)
Oracle Loader
for Hadoop
Oracle Data Integrator
ANALYZE
In-D
ata
ba
se
An
aly
tic
s
Data
Warehouse
Oracle: Big Data for the Enterprise
• The most comprehensive solution
• Includes everything needed to acquire, organize and
analyze all your data
• Optimized for Extreme Analytics
• Deepest analytics portfolio with access to all data
• Engineered to Work Together
• Eliminate deployment risk and support risk
• Enterprise Ready
• Deliver extreme performance and scalability
Questions