configuration management 101 - a tale of disaster recovery using cfengine 3

Post on 17-May-2015

2.117 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

  

A tale of disaster recoveryCFEngine everyday, practices and tools

Nicolas Charles <nch@normation.com>Jonathan Clarke <jcl@normation.com>

RMLL 2011 @Strasbourg, France

  

About the speakers

Nicolas Charles

CFEngine contributor

CFEngine ”Community Champion” (C3)

Jonathan Clarke

CFEngine contributor

Contributor to various LDAP FLOSS projects

But we get on pretty well!(mostly...)

Scala Developer Sysadmin

  

1) Configuration Management 101

2) A tale of disaster recovery

3) Our choice of tool

4) About CFEngine 3

Agenda

  

A bit aboutConfiguration Management...

  

Configuration management What is it?

Configuration Management is a field of management that focuses on establishing and maintaining consistency of a system (..) throughout its life

Software configuration management is the task of tracking and controlling changes in the software

Sources:http://en.wikipedia.org/wiki/Configuration_managementhttp://en.wikipedia.org/wiki/Software_configuration_management

  

A server crashed.

Install a new one, peoplecan't work without it!

OK, it'll be done inabout two days...

There's a new critical security patchwe must deploy on all our servers!

Get it out quickly!

Right, I'll put the wholeteam on it.

Why configuration management?

  

Why configuration management?

Automation

IndustrializationReproducibility

  

Why configuration management?

How do we setupservice X?

Ask Jim, he'sthe expert on that.

But he left the company...

Huh, this server has been loggingerrors for a few weeks.

Oh? I think Michael changedsomething on it recently...He'll tell you what it was.

Damn, he's on vacation!

  

Why configuration management?

Building-upknowledge

HistoryDocumentation

  

Why configuration management?

An intruder just stole our datausing a vulnerability in amodule we don't need...

I thought the project specificationensured that we disabled that?

Er, it did, but we enabled it tosolve a problem and forgot todisable it afterwards... sorry...

  

Why configuration management?

Vigilance

AlertsAutomatic repairs

  

Why configuration management?

I don't understand how thisserver is setup. It doesn't matchour best-practices.

Oh, that's a legacy server...

Well, it's a collection of littlethings, here and there...

Give me details on ourcurrent security policy.

Ah... Well, OK.Tell me: is it fully appliedon all our critical servers?

Er...

  

Why configuration management?

Rationalization

ControlNormalization

  

Configuration management

Rationalization

ControlNormalization

Vigilance

AlertsAutomatic repairs

Building-upknowledge

HistoryDocumentation

Automation

IndustrializationReproducibility

  

An ill-fated talefrom the recent past

Disaster Recovery

(CASE STUDY)

  

Before the disaster... Our company's IT infrastructure

Small company: small requirements Web site, email Git repository, Redmine...

Small company: small budget All on one hosted server

  

Asking for trouble? Just one hosted server! Critical services!

No, a ”safe” configuration: Redundant hardware, 3 disk RAID-5 array All services automatically installed and setup

using Configuration Management Backups: daily (several off-site locations) Several VMs to separate services

  

A critical failure 2 hard drives fail simultaneously

→ RAID-5 array is down

→ Almost all services fail immediately

→ ”The end of the world as we know it”

→ Need to rebuild everything NOW

  

Recovering Step 1: Panic! Step 2: Get a new server Step 3: Reinstall base OS + virtualization Step 4: Restore VM configuration Step 4: Re-create the VMs manually Step 5: Reinstall each OS in each VM...

whoops

  

Recovering Step 6: Installation Configuration Management Step 7: Sit back and watch all the services

coming back online as if by magic! Step 8: Huh, where's my data? Step 9: Manually restore backups Step 10: Make a list of missing data...

  

Lessons learned

1) Hard disks fail reliably

2) Restoring virtualization setups:● Backing up the config files would have helped● Need CM tools to describe the desired state!

(Cfengine Nova does this)

3) Configuration Management should tie in to our backup system

4) Backups were lacking some files: always test!

  

Wishlist and discussion Integrating Configuration Management tools

and backup systems is a crucial step for CM to be efficient for disaster recovery

What do others do?

Provisioning VMs and their resources (disks, network) should be automated too

Cloud providers are one solution What about ”plain” virtualization?

  

What we chose, and why

Configuration ManagementTools

  

Our choice Back in mid 2009 Needed a configuration management tool Criteria:

Open source Multi-platform agent (including Windows) Resilient Non-disruptive

  

Our choice: candidates

CFEngine 3 Puppet Chef

  

Our choice: candidates

CFEngine 3

More on thischoice later...

  

A bit about CFEngine 3...

Sources: across the Internet

  

CFEngine: History

Sou

rce:

http

://ve

rtic

alsy

sadm

in.c

om/b

log/

unca

tego

rized

/rel

ativ

e-or

igin

s-o

f-cf

engi

ne-c

hef-

and-

pupp

et

  

CFEngine 3: Intro Configuration management software Written in C Two versions :

Community (GPL v3) Nova (closed source)

Community + extra features Some features released in Community

Backed by CFEngine AS – Norway based company founded in 2009

  

CFEngine 3: Components Cf-agent

Runs on all managed hosts Applies configuration – this is the heart Can connect to cf-serverd to get policies / files

Cf-serverd Distributes policies and files Must be run on policy server(s) Usually run on all hosts to enable remote runs

Cf-monitord Collects statistics on all nodes

  

Memory usage Daemon consumption on managed hosts

  

CFEngine 3: Usage examples Large companies Critical systems: Joint Australia Tsunami

Warning Centre Personal computers Mobile devices: Nokia N900 Underwater devices: army submarines Small and medium companies... Community

  

Feature: Multi-platform Define a configuration for all operating systems

Windows, Linux Make it ”transparent” (forget about the

complexity) Existing standard library handling the

differences between each OS and distribution

  

CFEngine 3: Promises Configuration rules are called promises

”Promise” to be in the desired state Cfengine agent handles the steps to get there:

convergence

Promise theory is based on research done in the University of Oslo

  

Feature: File editing Only change what you need to

You like your distribution's defaults? You have various different systems already

setup and just need to change something?

Search for lines and replace/delete/add them Only change one field in a file

/etc/passwd for example

  

Feature: Complex tasks Powerful class system to trigger promises

Based on nodes itself Based on time Based on whatever you might imagine

Complex workflow can be created

  

Configuration example Install the LAMP stack

bundle agent caller {  vars:

"pkg_list" slist => { "httpd", "php5", "mysql" };

  packages:    "${pkg_list}"      package_method => generic,      package_method => "addupdate";}

  

RMLL 2011

Thank you !

  

CFEngine 3: Features

According to Kuleven comparative study of configuration management systems:

Very mature Cross platform (*BSD, AIX, HP-UX, Linux, Mac

OS X, Solaris, Windows) Strongly distributed Based on state description and convergence Very high scalabily ( > 10000 nodes ) Very small footprint

Source: http://distrinet.cs.kuleuven.be/software/sysconfigtools/overview

top related