elastic compute for batch platform › sites › events › ... · apache mesos architecture...
TRANSCRIPT
Elastic Compute for Batch Platform
Date : 13th Apr 2015
Name : Muralidhar Sortur
Content
• Background on Batch • Infra need for Batch • Available technology • Mesos Eco system • BAAS • Architecture and Design • Demo • What Next? • Questions
Elastic Compute for Batch Platform 2
Background
• Common use case of batch processing – Statement Generation, bank postings, risk evaluation, credit score calculation, inventory management, portfolio
optimization, Data backup, Data crunching, Information extraction etc.
• JSR-352 : Batch Application for Java platform
• Logging, check pointing & parallelization
• Start, Stop , Restart, Kill
• Require huge dedicated backend infrastructure
• ROI
Elastic Compute for Batch Platform 3
“Batch” As Is
• Infrastructure : Dedicated VMs, Under utilized Resources, Time to Production is more
• Operation : Dedicated ops to address production job failures
• Development : No Standard way to accomplish task
Elastic Compute for Batch Platform 4
Elastic Compute for Batch Platform 5 5
Batch Infra
Commercial Solutions
Where does “Batch Infra” stand?
PaaS
Infra need for batch
Elastic Compute for Batch Platform 6
Resources
Data
Exec Environment : RHEL V?, Ubuntu, CentOS, Windows?
Resources : CPU, Memory, Network
Data: Hadoop, swift store, SQL, NoSQL
Infra need for batch
Elastic Compute for Batch Platform 7
PaaS
SaaS Exec Environment : freedom to use different flavor and version of container
Resources: Fail proof, bill as per usage
Data: Abstraction, consistency, availability, performance
Static Partitioning & Resource Utilization
Elastic Compute for Batch Platform 8
Infra
Mesos Eco System
Elastic Compute for Batch Platform 9
Infra
Apache Mesos Architecture
Elastic Compute for Batch Platform 10
Chronos executor
Chronos scheduler
Chronos executo
r
Infra
Resource Offer
Elastic Compute for Batch Platform 11
Infra
Elastic Compute for Batch Platform 12
What is BAAS? 1. Batch as a Hosted service 2. Self service 3. Fault tolerant 4. Better monitoring & notification 5. Better ways to interact with outside world (DBs, services, Hadoop, streams,
messaging) 6. Flexibility in writing batch job. 7. Open source !!
12
Infra
Our Approach
• Abstracting user experience from specifics of technology. • Reduce opex and capex. • Proactive monitoring and alerting • Self service capability • Fault tolerant.
Elastic Compute for Batch Platform 13
Infra
BAAS Detailed View
Elastic Compute for Batch Platform 14
BAAS REST
Service
Chronos Scheduler
Application Life Cycle
Management
Mesos Master
(Resource manager)
Mesos slave
Mesos slave
Mesos slave
Docker
Zoo keeper
Infra
BAAS Deployment View
Elastic Compute for Batch Platform 15
Infra
Interacting modules in Docker
Elastic Compute for Batch Platform 16
libraries
Manifest
cleanup
Execute
Deployer
Docker
LOGSTORE
MANIFEST STORE
MAVEN REPO
APP LIFE CYCLE MGMT MYSQL DB
GITHUB AUTHN
PASS
Infra
Create Batch Flow
Elastic Compute for Batch Platform 17
Infra
Schedule and Run Batch Job
Elastic Compute for Batch Platform 18
Infra
Self Service
• Creating & Registering Batch App. • Provisioning • Build & Deployment • Scheduling • Monitoring batch jobs during execution • Email alert for failure, long running jobs. • Killing a running batch job. • Restarting a failed job. • Report generation for batch job executions. • Chain the execution of batch Jobs
Presentation Title Goes Here 19
Operation
Development
• Do I need to code all steps in batch ? • Is there a standard way of writing a step / task ? • How can I pass parameters from one step to another ? • Why should every one implement same task in different way? • Where can I contribute a well defined task / step to be used in batch job ?
Storage • Is there a abstraction for batch storage needs ? • Is it possible to isolate storage and choice of storage from code? • Should I write a separate batch app for taking backup?
Elastic Compute for Batch Platform 20
Development
Batch Reusable Infra component
Elastic Compute for Batch Platform 21
BRIC Tasklet
0 to N Inputs
output
Repeat Status
Inputs can be 1. hardcoded parameters 2. Previous Tasklet 3. Command line parameters
Output can be passed to other Tasklet
Development
Presentation Title Goes Here 22
Storage Manager
Batch Job Jcloud API
Compute
Event manager
BAAS
Meta Data Repo
Storage as a Service Development
“Batch” with this
• Infrastructure : NO Dedicated VMs, improved Resource utilization, Time to Production in less than a day
• Operation : Complete self service, Developer is responsible to take action if failure
• Development : Standard way to accomplish task
Presentation Title Goes Here 23
Presentation Title Goes Here 24
Resources
• http://stackoverflow.com/questions/18285212/how-to-scale-docker-containers-in-production
• http://eurosys2013.tudos.org/wp-content/uploads/2013/paper/Schwarzkopf.pdf
• http://www.wired.com/wiredenterprise/2013/03/google-borg-twitter-mesos/
• http://www.slideshare.net/tomasbart/introduction-to-apache-mesos
• http://www.slideshare.net/pacoid/strata-sc-2014-apache-mesos-as-an-sdk-for-building-distributed-frameworks
• https://www.docker.com/tryit/
Elastic Compute for Batch Platform 25