autobots @ rea

31
Autobots @ realestate.com.au Automate ALL THE THINGS (or at least the things that really matter)

Upload: ggiesemann

Post on 19-Jul-2015

211 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Autobots @ REA

Autobots @ realestate.com.au

AutomateALL THE THINGS(or at least the things that really matter)

Page 2: Autobots @ REA

Technical Lead @ realestate.com.au

Who am I?

@ggiesemann @geekle

Some newcomers mistake me for a sysadmin :(

Geoffrey Giesemann

Page 3: Autobots @ REA

(and what the hell am I talking about?)

What's the problem?

Page 4: Autobots @ REA

WTF is going on?!?

I hope my wife still recognises me...

(deployments are *hard*)

Page 5: Autobots @ REA

(just the usual suspects)

Why so hard?

Page 6: Autobots @ REA
Page 7: Autobots @ REA
Page 8: Autobots @ REA
Page 9: Autobots @ REA

(and who do we blame?)

How do we fix this?

Page 10: Autobots @ REA

deploybot

Page 11: Autobots @ REA
Page 12: Autobots @ REA

with_down_in_load_balancer(server) do with_down_in_nagios(server) do puppet(server) raise "borked!" unless server.working? endend

Page 13: Autobots @ REA
Page 14: Autobots @ REA
Page 15: Autobots @ REA

schemabot

Page 16: Autobots @ REA
Page 17: Autobots @ REA
Page 18: Autobots @ REA

What's worked well?

Page 19: Autobots @ REA

Service Discovery!

Page 20: Autobots @ REA

$ grep -r 'aa01' nagios/ | wc -l 17$ grep -r 'aa01' puppet/ | wc -l 4$ grep -r 'aa01' deploybot/ | wc -l 0

Page 21: Autobots @ REA
Page 22: Autobots @ REA

> show lb vserver agentadmin-prod

agentadmin-prod (125.56.204.120:80) - HTTP Type: ADDRESS State: UP

... blah blah ...

Bound Service Groups:

1) Group Name: agentadmin

1) agentadmin (192.168.25.1: 80) - HTTP State: UP Weight: 1

Persistence Cookie Value : my_random_str=9999

2) agentadmin (192.168.25.2: 80) - HTTP State: UP Weight: 1

Persistence Cookie Value : my_random_str=9999

... etc etc ...

> show servicegroup agentadmin

agentadmin - HTTP

State: ENABLED Monitor Threshold : 0

Max Conn: 0 Max Req: 0 Max Bandwidth: 0 kbits

Monitor Name: http-diagnositic-warmup State: ENABLED Weight: 1

1) 192.168.25.1:80 State: UP Server Name: 192.168.25.1

Probes: 131205 Failed [Total: 2525 Current: 0]

Last response: Success - HTTP response code 200 received.

2) 192.168.25.2:80 State: UP Server Name: 192.168.25.2

Probes: 131322 Failed [Total: 2428 Current: 0]

Last response: Success - HTTP response code 200 received.

Page 23: Autobots @ REA

Standardised Monitoring!

Page 24: Autobots @ REA
Page 25: Autobots @ REA

$ curl http://aa01/diagnostic/status/nagiosOK - the application is functioning correctly

$ curl http://ar01/diagnostic/status/nagiosOK - the application is functioning correctly

Page 26: Autobots @ REA

● custom app health checks○ https://github.com/tribune/is_it_working○ https://github.com/blythedunham/health_monitor

● monitor requests in munin○ https://github.com/pka/rack-monitor

Page 27: Autobots @ REA

Better Error Handling!

Page 28: Autobots @ REA
Page 29: Autobots @ REA

● Don't bother trying to handle errors - just make sure you can recover from them quickly!

● As long as you have some app servers alive things will work out!

Page 30: Autobots @ REA

FIN

Page 31: Autobots @ REA

● http://markerguru.deviantart.com/● http://www.quickmeme.com/● http://www.doingitwrong.com/● Everyone who was part of @rea_autobots

Thanks!