emulex and the evaluator group present why i/o is strategic for big data
DESCRIPTION
This webcast is the fourth in a series on why I/O is strategic for the data center. John Webster, senior partner at the Evaluator Group, will discuss why I/O is critically important to meet the bandwidth demands of big data deployments. As the data center infrastructure scales upward, so will the need for I/O to scale dynamically to meet these needs.TRANSCRIPT
1
Presented by: Emulex and Evaluator Group
Why I/O Is Strategic for Big Data
2
Webcast Housekeeping
1. All attendees will be on mute during the presentation
2. Please submit your questions via the text/chat feature
3. We will do all Q&A at the end of the presentation
3
Katherine LaneDirector of Corporate Communications
Why I/O Is Strategic
4
Why I/O Is Strategic?
Building a Virtual Panel of Experts!
5
Topics for the Virtual Panel
ServerVirtualization
CloudComputing
NetworkConvergence
BigData
© 2012 Evaluator Group, Inc.
Moving the Elephant Through the Pipes
John WebsterSenior Partner
Evaluator Group
© 2012 Evaluator Group, Inc.
Overview “Big data” can mean two different things
— Storage for large amounts of data
— Analytics against very large amounts of data
— I/O is critical for both
Big Data Apps — Personalized Healthcare
— Online-style shopping for bricks-and-mortar retailers
— Fraud detection
Marketing Needs it Now— Correlate customer data with social media data feeds
— Understand the buyer as an individual
04/10/2023 7
© 2012 Evaluator Group, Inc.
04/10/2023
Logs, Tweets
Location
HDFS
NoSQL DB
Customer Profiles
High Scale Data Reductions BI and
Analytics
Expert System
NoSQL DB
Low Latency
1) Identify User
2a)Lookup User Profile
2b) Lookup Location
Predictions on Buying Behavior
4) Real-time: Determine Best Offer For This
Customer3) Input Into
Data Analytics Model for Individualized
Marketing
8
© 2012 Evaluator Group, Inc.
04/10/2023
NODE 1
NODE 2
NODE 3
NODE n
DAS DAS DAS DAS
1 2 3 4 5 6 7 8
B8
GM
R3 Link
Active
Link
Active
Link
Active
ConsolePwr
Active
Link
Active
DAS
Network Layer
Compute Layer
Storage Layer
Distributed, Shared-Nothing Architectures for Big Data
Analytics
9
CONTROL
© 2012 Evaluator Group, Inc.
CAP theorem
It is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: Consistency (all nodes see the same data at the same
time) Availability (a guarantee that every request receives a
response about whether it was successful or failed) Partition tolerance (the system continues to operate
despite arbitrary message loss or failure of part of the system)
A distributed system can satisfy any two of these guarantees at the same time, but not all three
© 2012 Evaluator Group, Inc.
04/10/2023
The Impact of Network and I/O Performance
The impacts of internal analytics system network performance—both positive and negative—are experienced at the level of analytics application users.
The rate at which data flows between storage and processors within a Hadoop cluster has a direct effect on cluster performance and scalability.
Getting data into and out of distributed computing clusters impacts how quickly query results are delivered to users.
11
© 2012 Evaluator Group, Inc.
Internal Network Throughput 1GbE
© 2012 Evaluator Group, Inc.
Internal Network Throughput 10GbE
© 2012 Evaluator Group, Inc.
Load/Unload Throughput
© 2012 Evaluator Group, Inc.
04/10/2023
Why Enterprise IT is Now Involved
Distributed computing for analytics (Hadoop, for example) is moving from science experiment to mission-critical
Emerging Enterprise Hadoop use cases include:— Hadoop for very large data sets that can’t be analyzed economically
by the data warehouse— Hadoop on the front-end of the data warehouse — Hadoop as data convergence engine – combine new unstructured
data sources with structured data warehouse data— Hadoop as the back-end to the data warehouse
Also emerging in the need to bring Hadoop under the data governance umbrella— Use case for NAS/SAN attached to Hadoop clusters?— At what cost?
15
© 2012 Evaluator Group, Inc.
04/10/2023
Is Hadoop Ready for Prime Time? Hadoop was not born and raised in the highly
risk averse, enterprise data center
Hadoop puts forward a different and inefficient operational model from the standpoint of enterprise IT
Hadoop introduces enterprise security and data governance issues
16
© 2012 Evaluator Group, Inc.
NODE 1
NODE 2
NODE 3
NODE n
1 2 3 4 5 6 7 8
B8
GM
R3 Link
Active
Link
Active
Link
Active
ConsolePwr
Active
Link
Active
CONTROL
Shared Storage as Secondary Storage
Network Layer
Compute Layer
Storage Layer
SAN/NAS
© 2012 Evaluator Group, Inc.
NODE 1
NODE 2
NODE 3
NODE n
1 2 3 4 5 6 7 8
B8
GM
R3 Link
Active
Link
Active
Link
Active
ConsolePwr
Active
Link
Active
CONTROL
Shared Storage as Primary Storage
Network Layer
Compute Layer
Storage Layer
SAN and Scale-out NAS
© 2012 Evaluator Group, Inc.
Evaluating Hadoop as a Storage Device
Single Points of Failure Eliminated? SSD and automated tiering? Dedupe? Snapshots? Insert your hot-button storage feature here:
__________
© 2012 Evaluator Group, Inc.
04/10/2023
Enterprise IT and Big Data
Analytics
There will be Big Data—Storage and Apps
The traditional data warehouse will continue to evolve
Distributed computing clusters (XxSQL, Hadoop) will achieve prominence in enterprise data centers
Shared storage, while controversial within some circles, can be applied
Communications bandwidth is as important a resource as compute and storage
20
21© 2011 Emulex Corporation