how to run from a zombie: cloudstack distributed process management

37
HOW TO RUN FROM A ZOMBIE: CLOUDSTACK DISTRIBUTED PROCESS MANAGEMENT John Burwell ([email protected] | [email protected] @john_burwell ) Tuesday, June 25, 13

Upload: john-burwell

Post on 02-Jul-2015

202 views

Category:

Technology


1 download

DESCRIPTION

Exploration of CloudStack's distributed process management requirements and the challenges they present in the context of CAP theorem. These challenges will be addressed through a distributed process model that emphasizes efficiency, fault tolerance, and operational transparency.

TRANSCRIPT

Page 1: How to Run from a Zombie: CloudStack Distributed Process Management

HOW TO RUN FROM A ZOMBIE: CLOUDSTACK DISTRIBUTED PROCESS

MANAGEMENT John Burwell

([email protected] | [email protected]@john_burwell)

Tuesday, June 25, 13

Page 2: How to Run from a Zombie: CloudStack Distributed Process Management

I Am Not A Zombie

• Apache CloudStack PMC Member

• Consulting Engineer @ Basho Technologies

• Ran operations and designed automated provisioning for hybrid analytic/virtualization clouds

• Led architectural design and server-side development of a SaaS physical security platform

Tuesday, June 25, 13

Page 3: How to Run from a Zombie: CloudStack Distributed Process Management

Current Process Management

• No consistent system-wide model

• Fail slowly, fail quietly

• Resource overcommitment issues

• Lack of instrumentation

Tuesday, June 25, 13

Page 4: How to Run from a Zombie: CloudStack Distributed Process Management

What is a cloud?

Tuesday, June 25, 13

Page 5: How to Run from a Zombie: CloudStack Distributed Process Management

Tuesday, June 25, 13

Page 6: How to Run from a Zombie: CloudStack Distributed Process Management

Hopefully not ...

Tuesday, June 25, 13

Page 7: How to Run from a Zombie: CloudStack Distributed Process Management

Tuesday, June 25, 13

Page 8: How to Run from a Zombie: CloudStack Distributed Process Management

Tuesday, June 25, 13

Page 9: How to Run from a Zombie: CloudStack Distributed Process Management

Tuesday, June 25, 13

Page 10: How to Run from a Zombie: CloudStack Distributed Process Management

Hosts

VirtualRouters

VirtualMachines

PrimaryStorage

NetworksSecondaryStorage

Load����������� ������������������  Balancers

Zone

Cluster Pod

Tuesday, June 25, 13

Page 11: How to Run from a Zombie: CloudStack Distributed Process Management

ResourceProcess State

A����������� ������������������  “thing”����������� ������������������  with����������� ������������������  a����������� ������������������  bounded����������� ������������������  capacity

PartitionOrchestration

Tuesday, June 25, 13

Page 12: How to Run from a Zombie: CloudStack Distributed Process Management

At it’s core, CloudStack ...

Integrates infrastructure components

Manages resources

Tuesday, June 25, 13

Page 13: How to Run from a Zombie: CloudStack Distributed Process Management

Tuesday, June 25, 13

Page 14: How to Run from a Zombie: CloudStack Distributed Process Management

Consistency

AvailabilityPartition����������� ������������������  Tolerance

PICK 2

Tuesday, June 25, 13

Page 15: How to Run from a Zombie: CloudStack Distributed Process Management

CloudStack provides zones, clusters, and pods to partition resources.

Tuesday, June 25, 13

Page 16: How to Run from a Zombie: CloudStack Distributed Process Management

Orchestration operations are eventually consistent

Tuesday, June 25, 13

Page 17: How to Run from a Zombie: CloudStack Distributed Process Management

Tuesday, June 25, 13

Page 18: How to Run from a Zombie: CloudStack Distributed Process Management

... but resource operations must be consistent & serialized.

Tuesday, June 25, 13

Page 19: How to Run from a Zombie: CloudStack Distributed Process Management

Tuesday, June 25, 13

Page 20: How to Run from a Zombie: CloudStack Distributed Process Management

A system can not be simultaneouslyconsistent and available.

Tuesday, June 25, 13

Page 21: How to Run from a Zombie: CloudStack Distributed Process Management

Orchestration����������� ������������������  ProcessesAP

CP Resource����������� ������������������  Management����������� ������������������  Processes

Tuesday, June 25, 13

Page 22: How to Run from a Zombie: CloudStack Distributed Process Management

CP Resource?

• Ordered/Serialized operations

• Prevent overcommitment

• Execution location independent

• Lock free

Tuesday, June 25, 13

Page 23: How to Run from a Zombie: CloudStack Distributed Process Management

Orchestration Coordination

1. Build a list of commands to be executed against a resource

2. Enqueue the list of commands to the resource management layer for execution

3. A process applies the commands to the resource

4. Aggregate the results from the reply

Tuesday, June 25, 13

Page 24: How to Run from a Zombie: CloudStack Distributed Process Management

ResourceProcess State

Queue

1

1

Unit����������� ������������������  of����������� ������������������  Work

1

1

ExclusiveConsumer

Tuesday, June 25, 13

Page 25: How to Run from a Zombie: CloudStack Distributed Process Management

Unit Of Work (UoW)

• Definition: A ordered list of commands executed against a one and only one resource.

• Created in the Orchestration layer

• Executed by processes in the resource management layer

• Failure of a command halts UoW execution

Tuesday, June 25, 13

Page 26: How to Run from a Zombie: CloudStack Distributed Process Management

Instrumentation

• Collect and report statistics on a per resource basis

• Inspect and remove pending UoWs for a resource

• Kill a running process

• View a history of UoWs completed by a resource

Tuesday, June 25, 13

Page 27: How to Run from a Zombie: CloudStack Distributed Process Management

• Process execution fails

• Resources become unavailable

• Slow consumers

When Gravity Fails

Tuesday, June 25, 13

Page 28: How to Run from a Zombie: CloudStack Distributed Process Management

Fail Fast; Fail Loudly

• If the resource can be returned to a consistent state, reply with the process failure

• If the resource can not be returned to a consistent state, change the transition the resource to a failure state, drain the queue of pending UoWs, and reply with the process failure for each UoW

• The orchestration layer will determine the appropriate recovery strategy (e.g. retry request on another resource)

Tuesday, June 25, 13

Page 29: How to Run from a Zombie: CloudStack Distributed Process Management

Preventing A Logjam

• Bounded Queues

• Request and Message Timeouts

• A failure to enqueue a request or a request timeout trigger a the resource’s circuit breaker

Tuesday, June 25, 13

Page 30: How to Run from a Zombie: CloudStack Distributed Process Management

How could we implement this model?

Tuesday, June 25, 13

Page 31: How to Run from a Zombie: CloudStack Distributed Process Management

Lightweight Threads

A thread that is not scheduled by theoperating system -- avoiding context

switch overhead.

Tuesday, June 25, 13

Page 32: How to Run from a Zombie: CloudStack Distributed Process Management

Actor Model

• An actor represents state and behavior

• Communicate by message passing

• Each actor is allocated a lightweight thread and mailbox

• Location independent

Tuesday, June 25, 13

Page 33: How to Run from a Zombie: CloudStack Distributed Process Management

Mailbox

ResourceActor

FSM

Orchestration

Unit����������� ������������������  of����������� ������������������  Work

Tuesday, June 25, 13

Page 34: How to Run from a Zombie: CloudStack Distributed Process Management

Java Actor Frameworks

• Akka (http://akka.io)

• Quasar (https://github.com/puniverse/quasar)

Tuesday, June 25, 13

Page 35: How to Run from a Zombie: CloudStack Distributed Process Management

Summary

• Orchestration and Resource Management must be properly divided to satisfy CAP

• To provide resource serialization guarantees, assign a queue and a process to each resource

• Fast fast, fail loudly

• An Actor Model based on lightweight threads may provide the scalability required to dedicate a queue and process per resource

Tuesday, June 25, 13

Page 36: How to Run from a Zombie: CloudStack Distributed Process Management

Thoughts? Questions?

Tuesday, June 25, 13

Page 37: How to Run from a Zombie: CloudStack Distributed Process Management

Thank you!

Slides available @ http://speakerdeck.com/jburwell

Tuesday, June 25, 13