a tale of disaster recovery (cfengine everyday, practices and tools)

31
 A tale of disaster recovery Cfengine everyday, practices and tools Nicolas Charles <[email protected]> Jonathan Clarke <[email protected]> FOSDEM 2011 @Brussels, Belgium

Upload: normation

Post on 17-May-2015

3.881 views

Category:

Technology


1 download

DESCRIPTION

After a brief presentation of configuration management (CM) basics, we start with an ill-fated tale from the recent past about disaster recovery (also known as a case study, if you must): how our CM saved us, how it didn't, and what could have been done better. This could lead to a discussion about best practices. We use Cfengine 3, and will introduce the software, overview the main differences with other open source CM tools before explaining why we like this choice. But Cfengine is not all: what enables us to manage our configuration completely are the practices and tools we've built around it.

TRANSCRIPT

Page 1: A tale of Disaster Recovery (Cfengine everyday, practices and tools)

  

A tale of disaster recoveryCfengine everyday, practices and tools

Nicolas Charles <[email protected]>Jonathan Clarke <[email protected]>

FOSDEM 2011 @Brussels, Belgium

Page 2: A tale of Disaster Recovery (Cfengine everyday, practices and tools)

  

About the speakers

Nicolas Charles

Cfengine contributor

Cfengine ”Community Champion” (C3)

Scala Developer

Jonathan Clarke

OpenLDAP commiter

Sysadmin

But we get on pretty well!(mostly...)

Page 3: A tale of Disaster Recovery (Cfengine everyday, practices and tools)

  

1) Configuration Management 101

2) Our choice of tool

3) A tale of disaster recovery

4) Introducing Cfengine 3

5) Why we love Cfengine 3

Agenda

Page 4: A tale of Disaster Recovery (Cfengine everyday, practices and tools)

  

A bit aboutConfiguration Management...

Page 5: A tale of Disaster Recovery (Cfengine everyday, practices and tools)

  

Configuration management What is it ?

Configuration Management is a field of management that focuses on establishing and maintaining consistency of a system (..) throughout its life

Software configuration management is the task of tracking and controlling changes in the software

Sources:http://en.wikipedia.org/wiki/Configuration_managementhttp://en.wikipedia.org/wiki/Software_configuration_management

Page 6: A tale of Disaster Recovery (Cfengine everyday, practices and tools)

  

Configuration management Why is it useful ?

Control changes Reproduce over time and nodes Audit and keep history data Repair automaticaly

Page 7: A tale of Disaster Recovery (Cfengine everyday, practices and tools)

  

What we chose, and why

Configuration ManagementTools

Page 8: A tale of Disaster Recovery (Cfengine everyday, practices and tools)

  

Our choice Back in mid 2009 Needed a configuration management tool Criteria:

Open source Multi-platform agent (including Windows) Resilient Non-disruptive

Page 9: A tale of Disaster Recovery (Cfengine everyday, practices and tools)

  

Our choice: candidates

Cfengine 3 Puppet Chef

Page 10: A tale of Disaster Recovery (Cfengine everyday, practices and tools)

  

Our choice: candidates

Cfengine 3

More on thischoice later...

Page 11: A tale of Disaster Recovery (Cfengine everyday, practices and tools)

  

An ill-fated talefrom the recent past

Disaster Recovery

(CASE STUDY)

Page 12: A tale of Disaster Recovery (Cfengine everyday, practices and tools)

  

Before the disaster... Our company's IT infrastructure

Small company: small requirements Web site, email Git repository, Redmine...

Small company: small budget All on one hosted server

Page 13: A tale of Disaster Recovery (Cfengine everyday, practices and tools)

  

Asking for trouble? Just one hosted server! Critical services!

No, a ”safe” configuration: Redundant hardware, 3 disk RAID-5 array All services automatically installed and setup

using Configuration Management Backups: daily (several off-site locations) Several VMs to separate services

Page 14: A tale of Disaster Recovery (Cfengine everyday, practices and tools)

  

A critical failure 2 hard drives fail simultaneously

→ RAID-5 array is down

→ Almost all services fail immediately

→ ”The end of the world as we know it”

→ Need to rebuild everything NOW

Page 15: A tale of Disaster Recovery (Cfengine everyday, practices and tools)

  

Recovering Step 1: Panic! Step 2: Get a new server Step 3: Reinstall base OS + virtualization Step 4: Restore VM configuration... whoops Step 4: Re-create the VMs manually Step 5: Reinstall each OS in each VM...

Page 16: A tale of Disaster Recovery (Cfengine everyday, practices and tools)

  

Recovering Step 6: Installation Configuration Management Step 7: Sit back and watch all the services

coming back online as if by magic! Step 8: Huh, where's my data? Step 9: Manually restore backups Step 10: Make a list of missing data...

Page 17: A tale of Disaster Recovery (Cfengine everyday, practices and tools)

  

Lessons learned

1) Hard disks fail reliably

2) Restoring virtualization setups:● Backing up the config files would have helped● Need CM tools to describe the desired state!

(Cfengine Nova does this)

3) Configuration Management should tie in to our backup system

4) Backups were lacking some files: always test!

Page 18: A tale of Disaster Recovery (Cfengine everyday, practices and tools)

  

Wishlist and discussion Integrating Configuration Management tools

and backup systems is a crucial step for CM to be efficient for disaster recovery

What do others do?

Provisioning VMs and their resources (disks, network) should be automated too

Cloud providers are one solution What about ”plain” virtualization?

Page 19: A tale of Disaster Recovery (Cfengine everyday, practices and tools)

  

A bit about Cfengine 3...

Sources: across the Internet

Page 20: A tale of Disaster Recovery (Cfengine everyday, practices and tools)

  

Cfengine: History

Source:http://verticalsysadmin.com/blog/uncategorized/relative-origins-of-cfengine-chef-and-puppet

Page 21: A tale of Disaster Recovery (Cfengine everyday, practices and tools)

  

Cfengine 3: Intro Configuration management software Written in C Two versions :

Community (GPL v3) Nova (closed source) : Community + extra

features

Backed by Cfengine AS – Norway based company founded in 2009

Page 22: A tale of Disaster Recovery (Cfengine everyday, practices and tools)

  

Cfengine 3: Features

According to Kuleven comparative study of configuration management systems:

Very mature Cross platform (*BSD, AIX, HP-UX, Linux, Mac

OS X, Solaris, Windows) Strongly distributed Based on state description and convergence Very high scalabily ( > 10000 nodes ) Very small footprint

Source: http://distrinet.cs.kuleuven.be/software/sysconfigtools/overview

Page 23: A tale of Disaster Recovery (Cfengine everyday, practices and tools)

  

Cfengine 3: Components Cf-agent

Runs on all managed hosts Applies configuration – this is the heart Can connect to cf-serverd to get policies / files

Cf-serverd Distributes policies and files Must be run on policy server(s) Usually run on all hosts to enable remote runs

Cf-monitord Collects statistics on all nodes

Page 24: A tale of Disaster Recovery (Cfengine everyday, practices and tools)

  

Cfengine 3: Promises Configuration rules are called promises

”Promise” to be in the desired state Cfengine agent handles the steps to get there:

convergence

Promise theory is based on research done in the University of Oslo

Page 25: A tale of Disaster Recovery (Cfengine everyday, practices and tools)

  

Cfengine 3: Usage examples Large companies (Facebook, AMD, …) Critical systems: Joint Australia Tsunami

Warning Centre Personal computers Mobile devices: Nokia N900 Underwater devices: army submarines Small and medium companies...

Page 26: A tale of Disaster Recovery (Cfengine everyday, practices and tools)

  

Why we love Cfengine 3...

Sources: our experience and opinions

Page 27: A tale of Disaster Recovery (Cfengine everyday, practices and tools)

  

Memory usage Daemon consumption on managed hosts

Page 28: A tale of Disaster Recovery (Cfengine everyday, practices and tools)

  

Multi-platform Define a configuration for all operating systems

Windows, Linux Make it ”transparent” (forget about the

complexity) Existing standard library handling the

differences between each OS and distribution

Page 29: A tale of Disaster Recovery (Cfengine everyday, practices and tools)

  

File editing Only change what you need to

You like your distribution's defaults? You have various different systems already

setup and just need to change something?

Search for lines and replace/delete/add them Only change one field in a file

/etc/passwd for example...

Page 30: A tale of Disaster Recovery (Cfengine everyday, practices and tools)

  

Complex tasks Powerful class system to trigger promises

Based on nodes itself Based on time Based on whatever you might imagine

Complex workflow can be created

Page 31: A tale of Disaster Recovery (Cfengine everyday, practices and tools)

  

FOSDEM 2011 Configuration Management room

Thank you !

And those brave enough to wake up early