webinar: increasing business agility with real-time processing with apache hadoop and spark
Post on 16-Jul-2015
104 Views
Preview:
TRANSCRIPT
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved. (cloud) Computing for the Enterprise
Increasing Business Agility with Real-time Processing using Apache
Hadoop and Spark
Powered by
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
Agenda
• Big Data and Real-time
Processing
– Use cases
– Why Hadoop and Spark?
– What’s required?
• Successfully Designing an
Elastic Compute
Infrastructure
• Solutions Demo
– Hadoop and Spark, powered by
Nebula and Scalr
Huy NguyenSr. Director, Product
Marketing
Thomas OrozcoProduct Manager
Presenters
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
Evolution of Big Data and its Impact
• Businesses are pressed to operate in real-time
for competitive edge
• Mere minutes can make the difference between
a brilliantly handled crisis and a full-blown
social media disaster
• User, machine, or sensor generated data must
be processed in real-time
• Weekly reports, scheduled jobs, and batch
reporting alone are no longer solutions
• Data after-the-fact is losing competitive
advantages
• Data is more relevant to the business if it’s
“fresh data”
• Ability to act right now as things are happening
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
Batch Processing and Real-time Processing: It’s all about ‘now’
Batch Processing
Acting on
“Data at Rest”
Real-time Processing
Acting on
“Data in Motion”
Static Infrastructure Requires an Elastic Infrastructure
ComputeCompute Compute
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
Uses for Real-time, Stream Processing
IT Management:
Log processing, analysis, and log driven alerting, infrastructure fault
protection, intelligence and surveillance, fraud detection, etc…
Brand Management and Customer Engagement:
Sentiment analysis, data mining on social media streams and user-
generated content, algorithmic trading, geospatial location , etc…
Conversion Optimization:
Clickstream analysis and real-time targeted offer generation
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
Why use Hadoop + Spark for Real-Time Processing?
Plenty of alternatives exist:
• Mesos (+ Spark), Storm, Message Queue (+ custom processing tier)
Hadoop + Spark stack offers unique benefits:
• Familiar and high-level API (HDFS distributed storage abstraction, YARN scheduling…
and rescheduling).
• Integrates naturally with traditional batch jobs (e.g. process log streams in real-time to
flag high-priority events, and run traditional map-reduce jobs on them later on).
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
What’s Required: The Move from Batch Processing to Real-time Processing
Hadoop YARN & Apache Spark: Builds processing workflows that parse, categorize, and
score information in real-time
Hadoop evolved from being “MapReduce
+ HDFS” to “YARN + HDFS”
YARN is used to distribute tasks across a
set of computing nodes — regardless of
whether these tasks are batch, interactive,
or real-time data access
Apache Spark, a cluster-computing platform
that supports real-time, streaming workloads,
backed by the robust HDFS storage engine
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
Big Data
Storage
Compute
Decouple the compute tier from
storage tier for real-time processing
• Dynamically scaling the storage tier would
result in major inefficiencies or data loss
Processing Tier
Processing tier (application and
infrastructure) must be able to “auto
scale” compute resources as the
volume, velocity, and variety of big data
increases
What’s Required: Decoupling the Compute/Storage Tier & Auto-scaling
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
Suggested Architecture for Real-time Big Data Processing
A Hadoop Compute Tier (YARN)• One resource manager
• One history server
• Multiple node managers
B Hadoop Storage Tier (HDFS)• One name node
• Multiple data nodes
BA
C Client Nodes• Dispatch real-time data
processing jobs
C
D Intelligent Cloud Mgmt
Platform from Scalr • Orchestration and auto-
scaling of applicationsD
E Turnkey Private Cloud
Infrastructure from
Nebula • Elastic, on-demand cloud
computing infrastructureE
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
INTRODUCTION TO NEBULA
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
Nebula Turnkey Private Cloud
Fastest path to OpenStack
Nebula productizes OpenStack in a highly cost-efficient, fast
time-to-value, secure and scalable enterprise-class product
Cost-efficient: Software delivered using appliance with off-the-
shelf industry standard servers and storage – freedom of choice
Fast time-to-value: Curated OpenStack (rack integration or multi-
rack integration), enabling customers/partners to spend their
resources building applications, not building infrastructure
Open, Secure & Scalable: Identical clouds to deliver consistent
and predictable performance with open connectors for turnkey
eco-system
Enterprise-class: Highly available with connectors to existing
enterprise workflows & architecture (identity, storage, networking)
for zero disruption to IT
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
Nebula Turnkey Private Cloud
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
DevOPs / DevTestWorkloads
Genome SequencingWorkloads
Big Data / Real-timeWorkloads
Media RenderingWorkloads
Self-Service ITProcess Improvements API / Integration
Cosmos Software
StorageCompute Network
Management & Orchestration
Identity/Security
Active Directory
Identity
Storage
Networking
VLANs
Ente
rpris
e
Inte
rgra
tion
The Only Enterprise-ready,
Turnkey Solution for OpenStack Private Clouds
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
Traditional InfrastructureFixed Compute, Storage, Network
Private Cloud
Shared Resource Pool
•As real-time data feeds increase,
YARN tier can be provisioned to
scale-out across multiple servers
•As data feeds decrease,
resources can be de-provisioned
and returned to the shared pool
•Nebula enables resource pooling
of compute, storage, network
services for scale-out readinessYARN Tier w/
Spark
YARN Tier w/
Spark
YARN Tier w/
Spark
YARN Tier w/
Spark
YARN Tier w/
SparkYARN Tier w/
Spark
Auto-scaling with Nebula and Scalr
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
INTRODUCTION TO SCALR
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
Scalr is used to:
Orchestrate
Resources
Provisioning
Templating
Auto-scaling
…
Define and Enforce
Policies
Lease Management
Network Policies
RBAC
…
Centrally
Manage Clouds
Multi-Cloud
Cost Analytics
SSO, CMDB, ITSM
integrations
…
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
Scalr is trusted by:
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
www.nebula.com or www.scalr.com
Nebula’s turnkey private cloud and Scalr’s intelligent Cloud
Management Platform meet these demands by delivering
an orchestrated infrastructure that can auto scale compute
and storage resources on-demand to process data feeds in
real-time
Summary
Emergent big data technology such as Hadoop YARN and
Apache Spark can build processing workflows that parse,
categorize, and score information in real-time
Data processing tiers (from application
to infrastructure) must be able to auto-
scale to accommodate the 3 Vs of Big
Data
For more information:
Businesses need to operate in
real-time to maintain competitive
edge
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
Benefits to Real-Time Processing
React to changing business conditions in real time
• Adapt and react quickly to data, market conditions and events happening in the
outside world
Faster time-to-market
• Development and deployment
Delivering the best user experience
• Personalized experience
top related