hadoop, oracle and the industrial revolution of data
DESCRIPTION
Presentation given at Oracle Open World 2012TRANSCRIPT
© 2012 Quest Software Inc. All rights reserved.
Hadoop, Oracle and the industrial revolution of data
Guy HarrisonVP R&D, Database Management
Hadoop, Oracle and the industrial revolution of data
Guy HarrisonExecutive Director, R&D Business Intelligence Software
Pg. 3© 2012 Quest Software Inc. All rights reserved.
Introductions
www.guyharrison.net [email protected]
http://twitter.com/guyharrison
Pg. 4© 2012 Quest Software Inc. All rights reserved.
Quest
Pg. 5© 2012 Quest Software Inc. All rights reserved.
Pg. 6© 2012 Quest Software Inc. All rights reserved.
Pg. 7© 2012 Quest Software Inc. All rights reserved.
Pg. 9© 2012 Quest Software Inc. All rights reserved.
Pg. 10© 2012 Quest Software Inc. All rights reserved.
Pg. 11© 2012 Quest Software Inc. All rights reserved.
Blue
Yellow
Red
0 10 20 30 40 50 60 70 80
Star trek shirt fatality analysis
Pct
Pg. 12© 2012 Quest Software Inc. All rights reserved.
Pg. 13© 2012 Quest Software Inc. All rights reserved.
Pg. 14© 2012 Quest Software Inc. All rights reserved.
What is Big Data?
Pg. 15© 2012 Quest Software Inc. All rights reserved.
The 3-4 V’s
VolumeTerabytesPetabytesExabytesZetabytes
VarietyStructuredUnstructuredHuman GeneratedMachine Generated
VelocityUser populations xTransaction rates xMachine data
Value Competitive or Community advantage
Pg. 16© 2012 Quest Software Inc. All rights reserved.
Volume Data volumes have always been increasing
2006 Perspective
Pg. 17© 2012 Quest Software Inc. All rights reserved.
But the vastness is becoming mind boggling
Human Brain
Living Human Genomes
Digital information 2008
Total Digital capacity
Digital information created 2011
1.00E+09 1.00E+11 1.00E+13 1.00E+15 1.00E+17 1.00E+19 1.00E+21 1.00E+23
2.81E+15
1.10E+17
5.48E+18
4.87E+18
1.18E+21
2.13E+21
Gigabyte Terabyte Petabyte Exabyte zettabyte
Pg. 18© 2012 Quest Software Inc. All rights reserved.
Velocity
Pg. 19© 2012 Quest Software Inc. All rights reserved.
Fail whales
Pg. 20© 2012 Quest Software Inc. All rights reserved.
The Industrial Revolution of Data
Variety
Pg. 21© 2012 Quest Software Inc. All rights reserved.
Pg. 22© 2012 Quest Software Inc. All rights reserved.
Pg. 23© 2012 Quest Software Inc. All rights reserved.
Big Data is driven by the smallest devices
Pg. 24© 2012 Quest Software Inc. All rights reserved.
Samsung Galaxy S IIII specifications
Quad-core 1.4 GHz CPU
1GB RAM
64GB Storage
1080p display
GSM/Bluetooth/WiFi Network
8MP Camera
GPS & Compass
Pg. 25© 2012 Quest Software Inc. All rights reserved.
Pg. 26© 2012 Quest Software Inc. All rights reserved.
Pg. 27© 2012 Quest Software Inc. All rights reserved.
Pg. 28© 2012 Quest Software Inc. All rights reserved.
Pg. 29© 2012 Quest Software Inc. All rights reserved.
Pg. 30© 2012 Quest Software Inc. All rights reserved.
Pg. 31© 2012 Quest Software Inc. All rights reserved.
Pg. 32© 2012 Quest Software Inc. All rights reserved.
Pg. 33© 2012 Quest Software Inc. All rights reserved.
Pg. 34© 2012 Quest Software Inc. All rights reserved.
35
Name: Willy Bowman
Nationality: German
DON’T MENTION THE WAR
Pg. 36© 2012 Quest Software Inc. All rights reserved.
Data Input
Pg. 37© 2012 Quest Software Inc. All rights reserved.
From now on, I’ll call you ‘An Ambulance’. OK?
“Siri call me an ambulance”
I found 14 bridges nearby:
“I want to jump off a bridge”
Siri
Pg. 39© 2012 Quest Software Inc. All rights reserved.
Pg. 40© 2012 Quest Software Inc. All rights reserved.
Pg. 41© 2012 Quest Software Inc. All rights reserved.
Brain Control
Pg. 42© 2012 Quest Software Inc. All rights reserved.
Pg. 43© 2012 Quest Software Inc. All rights reserved.
Pg. 44© 2012 Quest Software Inc. All rights reserved.
Pg. 45© 2012 Quest Software Inc. All rights reserved.
Pg. 46© 2012 Quest Software Inc. All rights reserved.
Pg. 47© 2012 Quest Software Inc. All rights reserved.
All of this requires and Generates Big Datasets
But what are they good for?
Pg. 48© 2012 Quest Software Inc. All rights reserved.
Value?
Achieve competitive advantage
From Big Data using
Collective Intelligence,
Machine Learning
and Predictive Analytics
Machine LearningPrograms that evolve with “experience”
Collective IntelligencePrograms that use inputs from “crowds’ to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsHow do we derive value from the data?
Pg. 50© 2012 Quest Software Inc. All rights reserved.
Pg. 51© 2012 Quest Software Inc. All rights reserved.
Pg. 52© 2012 Quest Software Inc. All rights reserved.
Pg. 53© 2012 Quest Software Inc. All rights reserved.
Pg. 54© 2012 Quest Software Inc. All rights reserved.
Pg. 55© 2012 Quest Software Inc. All rights reserved.
Pg. 56© 2012 Quest Software Inc. All rights reserved.
Pg. 57© 2012 Quest Software Inc. All rights reserved.
Pg. 58© 2012 Quest Software Inc. All rights reserved.
Pg. 59© 2012 Quest Software Inc. All rights reserved.
Pg. 60© 2012 Quest Software Inc. All rights reserved.
Pg. 61© 2012 Quest Software Inc. All rights reserved.
Applications
Collective Intelligence
Search Optimization
Recommendation Systems
Security•Vulnerability•Penetration Detection
Fraud Detection
Predictive Analytics•Churn •Defaults
Medical•Risk analysis•Diagnosis•Prognosis
Game optimization
Advertising•Targeting•Tailoring
Pg. 62© 2012 Quest Software Inc. All rights reserved.
Collective Intelligence beats Artificial Intelligence
?
Pg. 63© 2012 Quest Software Inc. All rights reserved.
Pg. 64© 2012 Quest Software Inc. All rights reserved.
Pg. 65© 2012 Quest Software Inc. All rights reserved.
Pg. 66© 2012 Quest Software Inc. All rights reserved.
Pg. 67© 2012 Quest Software Inc. All rights reserved.
Pg. 68© 2012 Quest Software Inc. All rights reserved.
For the past 40 years, AI has been consistently disappointing
Pg. 69© 2012 Quest Software Inc. All rights reserved.
Pg. 70© 2012 Quest Software Inc. All rights reserved.
Pg. 71© 2012 Quest Software Inc. All rights reserved.
Pg. 72© 2012 Quest Software Inc. All rights reserved.
Pg. 73© 2012 Quest Software Inc. All rights reserved.
Pg. 74© 2012 Quest Software Inc. All rights reserved.
Pg. 75© 2012 Quest Software Inc. All rights reserved.
Pg. 76© 2012 Quest Software Inc. All rights reserved.
Pg. 77© 2012 Quest Software Inc. All rights reserved.
Pg. 78© 2012 Quest Software Inc. All rights reserved.
Google: pioneers of big data
Pg. 79© 2012 Quest Software Inc. All rights reserved.
Pg. 80© 2012 Quest Software Inc. All rights reserved.
Pg. 81© 2012 Quest Software Inc. All rights reserved.
Pg. 82© 2012 Quest Software Inc. All rights reserved.
Pg. 83© 2012 Quest Software Inc. All rights reserved.
Google File System (GFS)
Map Reduce BigTableChubby
Google Applications
Google Software Architecture
Pg. 84© 2012 Quest Software Inc. All rights reserved.
START REDUCEMAPMAP
MAPMAP
MAPMAP
MAPMAP
MAPMAP
MAPMAP
MAP
MAPMAP
MAPMAP
MAPMAP
MAPMAP
MAPMAP
MAPMAP
MAPMAP
MAPMAP
MAPMAP
MAPMAP
MAPMAP
Map Reduce
Pg. 85© 2012 Quest Software Inc. All rights reserved.
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCECLIENT
Multi-stage Map-Reduce
Pg. 86© 2012 Quest Software Inc. All rights reserved.
Hadoop: Open Source Map-Reduce Stack
Pg. 87© 2012 Quest Software Inc. All rights reserved.
Hadoop at Yahoo!
Yahoo! Hadoop cluster:− 4000 nodes− 16PB disk− 64 TB of RAM− 32,000 Cores
Pg. 88© 2012 Quest Software Inc. All rights reserved.
Pg. 89© 2012 Quest Software Inc. All rights reserved.
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA, PIG, HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
Hadoop Architecture(1.0)
Pg. 90© 2012 Quest Software Inc. All rights reserved.
Schema on Read vs Schema on Write
Pg. 91© 2012 Quest Software Inc. All rights reserved.
Data
Analyse
Aggregate
Normalize
Cleanse
Code
Extract Load TransformData Warehouse
Utilize
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Pg. 92© 2012 Quest Software Inc. All rights reserved.
Hadoop Ecosystem
Hadoop File System (HDFS)
Hadoop Map ReduceHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Pg. 93© 2012 Quest Software Inc. All rights reserved.
HBase
Pg. 94© 2012 Quest Software Inc. All rights reserved.
HBase is a real-time database built on Hadoop
HBase
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffer
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
Name Site Counter
Dick Ebay 507,018
Dick Google 690,414
Jane Google 716,426
Dick Facebook 723,649
Jane Facebook 643,261
Jane ILoveLarry.com 856,767
Dick MadBillFans.com 675,230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarry.com
5 MadBillFans.com
NameId SiteId Counter
1 1 507,018
1 3 690,414
2 3 716,426
1 3 723,649
2 3 643,261
2 4 856,767
1 5 675,230
Id Name Ebay Google Facebook (other columns) MadBillFans.com
1 Dick 507,018 690,414 723,649 . . . . . . . . . . . . . . 675,230
Id Name Google Facebook (other columns) ILoveLarry.com
2 Jane 716,426 643,261 . . . . . . . . . . . . . . 856,767
Hbase Data Model
Pg. 96© 2012 Quest Software Inc. All rights reserved.
Hive
Pg. 97© 2012 Quest Software Inc. All rights reserved.
Pg. 98© 2012 Quest Software Inc. All rights reserved.
SQL
JAVA
Resu
lts
Pg. 99© 2012 Quest Software Inc. All rights reserved.
Pig
Pg. 100© 2012 Quest Software Inc. All rights reserved.
Pig Latin
SQL or Hive QL
Pg. 101© 2012 Quest Software Inc. All rights reserved.
Meanwhile, back at the Death Star….
Pg. 103© 2012 Quest Software Inc. All rights reserved.
Pg. 104© 2012 Quest Software Inc. All rights reserved.
Oracle Exadata
Database servers64 cores, 576 GB
RAM
Storage Servers112 cores, 100 TB SAS or336 TB SATA plus5 TB SSD
Pg. 105© 2012 Quest Software Inc. All rights reserved.
Exadata
Hadoop
$0 $1,000 $2,000 $3,000 $4,000 $5,000 $6,000
$4,911
$750
Exadata vs Hadoop $$/TB (Hardware only)
Economies
Pg. 106© 2012 Quest Software Inc. All rights reserved.
Pg. 107© 2012 Quest Software Inc. All rights reserved.
Pg. 108© 2012 Quest Software Inc. All rights reserved.
18 Sun X4270 M2 servers− 48GB RAM per node (864GB total)− 2x6 Core CPU per node (216 total)− 12x2TB HDD per node (216 spindles,
864 TB)− 40Gb/s Infiniband between nodes− 10Gb/s Ethernet to datacentre
Competitive Pricing
www.oracle.com/us/bigdata/index.html
Oracle Big Data Appliance
Pg. 109© 2012 Quest Software Inc. All rights reserved.
Big Data Appliance Software
Cloudera Enterprise
Oracle Enterprise R
Oracle NoSQL
Oracle Big Data Connectors
Pg. 110© 2012 Quest Software Inc. All rights reserved.
Oracle’s Storage Hierarchy
ORACLEEXADATA
ORACLEEXALOGIC
ORACLEBIG DATA
APPLIANCE
ORACLE NOSQL
ORACLE LOADER FOR HADOOP
APACHEHADOOP ORACLE
RDBMS
ORACLE WEBLOGIC
ORACLE EXALYTICS
ORACLE ESSBASE
ORACLE TIMES TEN
Latency
Storage Costs
Pg. 111© 2012 Quest Software Inc. All rights reserved.
111
Pg. 112© 2012 Quest Software Inc. All rights reserved.
Pg. 113© 2012 Quest Software Inc. All rights reserved.
Hadoop and RDBMS integration
Pg. 114© 2012 Quest Software Inc. All rights reserved.
Scenario #1: Reference data in RDBMS
CUSTOMERS
WEBlOGS
PRODUCTS
HDFS
RDBMS
Pg. 115© 2012 Quest Software Inc. All rights reserved.
Scenario #2: Hadoop for off-line analytics
CUSTOMERS
PRODUCTS
RDBMS
SALESHISTORY
HDFS
Pg. 116© 2012 Quest Software Inc. All rights reserved.
Scenario #3: MapReduce output to RDBMS
WEBLOGSSUMMARY
RDBMS
DB QUERYTOOL
WEBLOGS
HDFS
Pg. 117© 2012 Quest Software Inc. All rights reserved.
Scenario #4: Hadoop as RDBMS “active archive”
SALES 2011
HDFS
RDBMS
QUERYTOOL
SALES 2010
SALES 2009
SALES 2008
SALES 2009
SALES 2008
Pg. 118© 2012 Quest Software Inc. All rights reserved.
The Big Data Stack
The Big Data Stack
HDFS
MAP-REDUCE HBASE
PIG
CASCADING
MAHOUT
JAVA APIHIVE
R (ET AL)
JAVA API
DATA SCIENTIST
The Big Data Stack
HDFS
MAP-REDUCE HBASE
PIG
CASCADING
MAHOUT
JAVA API HIVE
R (ET AL)
JAVA API
DATA SCIENTISTBIG DATA ANALAYTIC PLATFORM
Big Data Analytics Platform
BIG DATA ANALYTICS
INDEXING AND SEARCH
VISUALIZATION
RECOMMENDERS
CLUSTERING
CLASSIFICATION
EXPERT SYSTEMS (LIKE WATSON)
OPTIMIZATION
ADVERTISING
BASKET ANALYSIS
SENTIMENT ANALYSIS
Pg. 123© 2012 Quest Software Inc. All rights reserved.
In Summary
Pg. 124© 2012 Quest Software Inc. All rights reserved.
Hadoop is….
Pg. 125© 2012 Quest Software Inc. All rights reserved.
Exadata
Hadoop
$0 $1,000 $2,000 $3,000 $4,000 $5,000 $6,000
$4,911
$750
Exadata vs Hadoop $$/TB (Hardware only)
Economical
Pg. 126© 2012 Quest Software Inc. All rights reserved.
Scalable
• 4000 nodes at Yahoo!• >100 PB at Facebook• 10,000 node design
goal for Hadoop 2.0
Pg. 127© 2012 Quest Software Inc. All rights reserved.
A platform for AI, CI & analytics
Pg. 128© 2012 Quest Software Inc. All rights reserved.
ETL “Free”
Data
Analyse
Aggregate
Normalize
Cleanse
Code
Extract Load TransformData Warehouse
Utilize
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Pg. 129© 2012 Quest Software Inc. All rights reserved.
The most concrete technology enabling the Big Data revolution
Pg. 130© 2012 Quest Software Inc. All rights reserved.
Hadoop is not….
Pg. 131© 2012 Quest Software Inc. All rights reserved.
But future Enterprise Data Architectures will likely incorporate Hadoop side by side with RDBMS
A replacement for RDBMS
Pg. 132© 2012 Quest Software Inc. All rights reserved.
Though OLTP systems can be built with Hadoop-compatible NoSQL systems such as HBase and Cassandra
Suitable for OLTP
Pg. 133© 2012 Quest Software Inc. All rights reserved.
Hadoop alone only solves the storage challenge of Big Data
A complete solution
Pg. 134© 2012 Quest Software Inc. All rights reserved.
Shameless plugs
Pg. 136© 2012 Quest Software Inc. All rights reserved.
Toad for Cloud Databases
Work with Hive, Hbase, Oracle, SQL Server, Cassandra, MySQL, MongoDB, BI servers and other NoSQL and SQL datastores
Pg. 137© 2012 Quest Software Inc. All rights reserved.
Toad for Cloud Databases• Federated SQL queries across Hive, Hbase, NoSQL, RDBMS
Toad for Cloud Databases
Pg. 138© 2012 Quest Software Inc. All rights reserved.
0 5 10 15 20 25 30 350
1,000
2,000
3,000
4,000
5,000
6,000
7,000
50M row, 50GB Oracle table to 16-node Hadoop clusterSQOOP
SQOOP with Quest Connector
Number of mappers
Ela
pse
d T
ime
(ms)
Quest Connector for Oracle and Hadoop
Hi-speed, bi-directional data transfer between Hadoop, Hive and Oracle
Pg. 139© 2012 Quest Software Inc. All rights reserved.
Business Intelligence solutions with first class support for Hadoop, Oracle and many other platforms
Toad BI Suite
Pg. 140© 2012 Quest Software Inc. All rights reserved.
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy
Audit / Change Data
HBase RealTime replication
SharePlex® for Hadoop
Pg. 141© 2012 Quest Software Inc. All rights reserved.
• Hive Query IDE
• Oracle <-> Hadoop data management
• Basic Hadoop administration
• ETA beta H1 2013
Toad for Hadoop
Pg. 143© 2012 Quest Software Inc. All rights reserved.
Pg. 144© 2012 Quest Software Inc. All rights reserved.
Summary:
The future belongs to those of us prepared to wear funny hats and glasses
The connected and mobile internet requires and produces “big data” that is qualitatively different from the data we’ve had before− Requiring different types of datastores
Enterprise can leverage big data for competitive advantage− Requiring different types of analytical engines