hadoop, oracle and the big data revolution collaborate 2013
DESCRIPTION
Presentation given at Collaborate 2013TRANSCRIPT
Hadoop, Oracle and the Industrial Revolution of Data
Guy Harrison, Dell Software Group
Hadoop, Oracle and the Industrial Revolution of Data
Guy Harrison
Executive Director, R&DInformation management group
3 Software Group
Introductions
www.guyharrison.net
http://twitter.com/guyharrison
4 Software Group
Dell, Quest and Toad
5 Software Group
6 Software Group
7 Software Group
8 Software Group
9 Software Group
10 Software Group
11 Software Group
Blue
Yellow
Red
0 10 20 30 40 50 60 70 80
Star trek shirt fatality analysis
Pct
12 Software Group
13 Software Group
14 Software Group
Quest Software is now part of Dell
15 Software Group
“Big” Data?
16 Software Group
Three or Four “V”s
VolumeTerabytesPetabytesExabytesZetabytes
VarietyStructuredUnstructuredHuman GeneratedMachine Generated
VelocityUser populations xTransaction rates xMachine data
Value Competitive or Collective advantage
17 Software Group
Data volumes have always been increasing….
2006 Perspective
18 Software Group
Though the absolute volumes are boggling…
Human Brain
Living Human Genomes
Digital information 2008
Total Digital capacity
Digital information created 2011
1E+09 1E+16 1E+23
2.81E+15
1.10E+17
5.48E+18
4.87E+18
1.18E+21
2.13E+21
Gigabyte Tera-byte
Petabyte Exabyte zettabyte
19 Software Group
Velocity
20 Software Group
21 Software Group
Fail whales
22 Software Group
Variety OR – the industrial Revolution of data
23 Software Group
24 Software Group
25 Software Group
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Data: now and then
Generated internally
Key to operational efficiency
1993
Generated externally
Key to competitiveness
Source of product innovation
Changing our world
2013
31 Software Group
“Big” data driven by the smallest devices
32 Software Group
Smartphone hardware
• Quad-core 1.4 GHz CPU
• 1GB RAM
• 64GB Storage
• 1080p display
• GSM/Bluetooth/WiFi Network
• 8MP Camera
• GPS & Compass
33 Software Group
Smartphone software
34 Software Group
35 Software Group
36 Software Group
37 Software Group
Name: Willy Bowman
Nationality: German
DON’T MENTION THE WAR
39 Software Group
Data Input
40 Software Group
41 Software Group
Siri
From now on, I’ll call you ‘An Ambulance’. OK?
“Siri call me an ambulance”
I found 14 bridges nearby:
“I want to jump off a bridge”
42 Software Group
Sixth-Sense
43 Software Group
44 Software Group
45 Software Group
Brain Control
46 Software Group
47 Software Group
48 Software Group
49 Software Group
50 Software Group
51 Software Group
The intrumented human
• Bluetooth Personal Area Network
• 3G/WiFi Wide Area Network
• GPS• Storage
• Pulse, temp monitor
• Silent alarms• Pedometer, sleep
monitoring
• Compass • Camera• Mike/earphones• Heads up display
52 Software Group
All this requires and generates huge data sets
But what else are they good for?
53 Software Group
The data “exhaust” itself generates new opportunites
Companies want to generate competitive advantage through “Big Data analytics”
54 Software Group
Machine LearningPrograms that evolve with “experience”
Collective IntelligencePrograms that use inputs from “crowds’ to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data Analytics
55 Software Group
56 Software Group
57 Software Group
58 Software Group
59 Software Group
60 Software Group
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Collective Intelligence
Search Optimization
Recommendation Systems
Security• Vulnerability• Penetration
Detection
Fraud Detection
Predictive Analytics• Churn • Defaults
Medical• Risk analysis• Diagnosis• Prognosis
Game optimization
Advertising• Targeting• Tailoring
67 Software Group
Collective Intelligence beats Artificial Intelligence ?
68 Software Group
69 Software Group
70 Software Group
71 Software Group
72 Software Group
73 Software Group
For the last 40 years AI has been consistently disappointing
74 Software Group
75 Software Group
76 Software Group
In 2011 AI made a comeback
77 Software Group
78 Software Group
79 Software Group
80 Software Group
81 Software Group
82 Software Group
83 Software Group
84 Software Group
Google: Pioneers of Big Data
85 Software Group
86 Software Group
87 Software Group
88 Software Group
89 Software Group
Google File System (GFS)
Map Reduce BigTableChubby
Google Applications
Google Software Architecture
90 Software Group
START REDUCEMAPMAP
MAPMAP
MAPMAP
MAPMAP
MAPMAP
MAPMAP
MAP
MAPMAP
MAPMAP
MAPMAP
MAPMAP
MAPMAP
MAPMAP
MAPMAP
MAPMAP
MAPMAP
MAPMAP
MAPMAP
Map Reduce
91 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCECLIENT
Multi-stage Map-Reduce
92 Software Group
Schema on Read vs Schema on Write
93 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
Code
ExtractLoad Transform Data
Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
94 Software Group
Hadoop: Open Source Map-Reduce Stack
95 Software Group
Hadoop at Yahoo
Yahoo! Hadoop cluster:4000 nodes16PB disk64 TB of RAM32,000 Cores
96 Software Group
97 Software Group
Hadoop 1.0 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA, PIG, HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
98 Software Group
Hadoop File System (HDFS)
Hadoop Map ReduceHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
99 Software Group
HBaseA Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
100 Software Group
Name Site Counter
Dick Ebay 507,018
Dick Google 690,414
Jane Google 716,426
Dick Facebook 723,649
Jane Facebook 643,261
Jane ILoveLarry.com 856,767
Dick MadBillFans.com 675,230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarry.com
5 MadBillFans.com
NameId SiteId Counter
1 1 507,018
1 3 690,414
2 3 716,426
1 3 723,649
2 3 643,261
2 4 856,767
1 5 675,230
Id Name Ebay Google Facebook (other columns) MadBillFans.com
1 Dick 507,018 690,414 723,649 . . . . . . . . . . . . . . 675,230
Id Name Google Facebook (other columns) ILoveLarry.com
2 Jane 716,426 643,261 . . . . . . . . . . . . . . 856,767
Hbase Data Model
101 Software Group
Hive
102 Software Group
103 Software Group
SQL
JAV
A
RES
ULT
S
104 Software Group
Other SQL-like Hadoop Interfaces
• Cloudera Impala
• MapR Drill
• Aster
• Greenplumb (Pivotal HD)
• Paraccel
• Hadapt
• Oracle SQL Connector for Hadoop (External Table interface to HDFS)
105 Software Group
Pig
106 Software Group
Pig Latin
SQL or Hive QL
107 Software Group
Meanwhile, back at the Deathstar…
108 Software Group
109 Software Group
110 Software Group
Oracle Exadata
Database servers
64 cores, 576 GB RAM
Storage Servers112 cores, 100 TB SAS or336 TB SATA plus5 TB SSD
111 Software Group
Economies
Exadata
Hadoop
$0 $1,000 $2,000 $3,000 $4,000 $5,000 $6,000
$4,911
$750
Exadata vs Hadoop $$/TB (Hardware only)
114 Software Group
Oracle Big Data Appliance
18 Sun X4270 M2 servers− 48GB RAM per node (864GB total)− 2x6 Core CPU per node (216 total)− 12x2TB HDD per node (216
spindles, 864 TB)− 40Gb/s Infiniband between nodes− 10Gb/s Ethernet to datacentre
Competitive Pricing
www.oracle.com/us/bigdata/index.html
115 Software Group
Big Data Appliance Software
• Cloudera Enterprise
• Oracle Enterprise R
• Oracle NoSQL
• Oracle Big Data Connectors
116 Software Group
ORACLEEXADATA
ORACLEEXALOGIC
ORACLEBIG DATA
APPLIANCE
ORACLE NOSQL
ORACLE LOADER FOR HADOOPAPACHE
HADOOP ORACLE RDBMS
ORACLE WEBLOGIC
ORACLE EXALYTICS
ORACLE ESSBASE
ORACLE TIMES TEN
Latency
Storage Costs
117 Software Group
The following week at the Borg collective….
Pg. 118© 2012 Quest Software Inc. All rights reserved. 118
119 Software Group
120 Software Group
Integrating Hadoop and RDBMS
121 Software Group
Scenario #1: Reference data in RDBMS
CUSTOMERS
WEBlOGS
PRODUCTS
HDFS
RDBMS
122 Software Group
Scenario #2: Hadoop for off-line analytics
CUSTOMERS
PRODUCTS
RDBMS
SALESHISTORY
HDFS
123 Software Group
Scenario #3: MapReduce output to RDBMS
WEBLOGSSUMMARY
RDBMS
DB QUERYTOOL
WEBLOGS
HDFS
124 Software Group
Scenario #4: Hadoop as RDBMS “active archive”
SALES 2011
HDFS
RDBMS
QUERYTOOL
SALES 2010
SALES 2009
SALES 2008
SALES 2009
SALES 2008
125 Software Group
The Big Data Stack
126 Software Group
HDFS
MAP-REDUCE HBASE
PIG
CASCADING
MAHOUT
JAVA APIHIVE
R (ET AL)JAVA API
DATA SCIENTIST
127 Software Group
128 Software Group
HDFS
MAP-REDUCE HBASE
PIG
CASCADING
MAHOUT
JAVA APIHIVE
R (ET AL)JAVA API
DATA SCIENTISTBIG DATA ANALYTICS SOFTWARE
129 Software Group
BIG DATA ANALYTICS
INDEXING AND
SEARCH VISUALIZATION
RECOMMENDERS
CLUSTERING
CLASSIFICATION
EXPERT SYSTEMS (LIKE WATSON)
OPTIMIZATIONMACHINE LEARNING
PREDICTIVE ANALYTICS
COLLECTIVE INTELLIGENCE
BASKET ANALYSIS
SENTIMENT ANALYSIS
130 Software Group
In Summary….
131 Software Group
Hadoop is….
132 Software Group
Economical
Exadata
Hadoop
$0 $1,000 $2,000 $3,000 $4,000 $5,000 $6,000
$4,911
$750
Exadata vs Hadoop $$/TB (Hardware only)
133 Software Group
Proven at Scale
134 Software Group
A platform for Advanced analytics
135 Software Group
ETL Free
Data
Analyse
Aggregate
Normalize
Cleanse
Code
Extract Load Transform Data Warehouse
Utilize
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
136 Software Group
The most concrete technology enabling the Big Data revolution
137 Software Group
Hadoop is not….
138 Software Group
A replacement for RDBMS
But future Enterprise Data Architectures will likely incorporate Hadoop side by side with RDBMS
139 Software Group
Suitable for OLTP
Though OLTP systems can be built with Hadoop-compatible NoSQL systems such as HBase and Cassandra
140 Software Group
A complete solution
Hadoop alone only solves the storage challenge of Big Data
141 Software Group
Shameless plugs
142 Software Group
Toad for Cloud Databases
143 Software Group
Toad BI Suite
Business Intelligence solutions with first class support for Hadoop, Oracle and many other platforms
144 Software Group
Kitenga Analytics Suite
145 Software Group
SharePlex® for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit / Change
Data
HBase RealTime replication
146 Software Group
Toad for Hadoop
Hive Query IDE
Oracle <-> Hadoop data management
Basic Hadoop administration
Beta June
147 Software Group
THANK YOU
[email protected]@guyharrisonguyharrison.net