god - process and task monitoring done right

54
od process and task monitoring done right Jesse Newland jnewland.com [email protected] g

Upload: jnewland

Post on 20-Jun-2015

9.599 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: God - Process and Task Monitoring Done Right

odprocess and task monitoring

done right

Jesse Newlandjnewland.com

[email protected]

g

Page 2: God - Process and Task Monitoring Done Right
Page 3: God - Process and Task Monitoring Done Right

FAILWHALE NEEDSNO INTRODUCTION

Page 4: God - Process and Task Monitoring Done Right

Like it or not, the web is 24/7/365

Page 5: God - Process and Task Monitoring Done Right

But who wants to be online 24/7/365?

Page 6: God - Process and Task Monitoring Done Right

Sometimes, you’ve just gotta take a walk

Page 7: God - Process and Task Monitoring Done Right

ZOMG WHAT NOW?

Page 8: God - Process and Task Monitoring Done Right

Process monitoring

Page 9: God - Process and Task Monitoring Done Right

sudo gem install god

Page 10: God - Process and Task Monitoring Done Right

TomPreston-Warner

written by:

Page 11: God - Process and Task Monitoring Done Right

git clone git://github.com/jnewland/god_examples.git Follow along at home

Page 12: God - Process and Task Monitoring Done Right

The Basics

Page 13: God - Process and Task Monitoring Done Right

$ ruby scripts/crashy.rb Wed Jul 09 13:53:13 -0400 2008Wed Jul 09 13:53:14 -0400 2008Wed Jul 09 13:53:15 -0400 2008/Users/jnewland/src/god_examples/lib/god_test.rb:28:in `crash': Crash! (RuntimeError) from /Users/jnewland/src/god_examples/lib/god_test.rb:20:in `run' from /Users/jnewland/src/god_examples/lib/god_test.rb:19:in `loop' from /Users/jnewland/src/god_examples/lib/god_test.rb:19:in `run' from /Users/jnewland/src/god_examples/lib/god_test.rb:15:in `initialize' from scripts/crashy.rb:4:in `new' from scripts/crashy.rb:4

Page 14: God - Process and Task Monitoring Done Right

#simple.god#The simplest possible watchGod.watch do |w| w.name = 'crashy' w.interval = 1.seconds w.start = 'ruby scripts/crashy.rb'

w.start_if do |start| start.condition(:process_running) do |c| c.running = false end endend

Page 15: God - Process and Task Monitoring Done Right

$ god -h

...

Options: -c, --config-file CONFIG Configuration file -p, --port PORT Communications port (default 17165) -b, --auto-bind Auto-bind to an unused port number -P, --pid FILE Where to write the PID file -l, --log FILE Where to write the log file -D, --no-daemonize Don't daemonize -v, --version Print the version number and exit

Page 16: God - Process and Task Monitoring Done Right

$ god -c simple.god -D[... 20:19:33 #10897] INFO: Using pid file directory: /Users/jnewland/.god/pids[... 20:19:34 #10897] INFO: Started on drbunix:///tmp/god.17165.sock[... 20:19:34 #10897] INFO: crashy move 'unmonitored' to 'up'[... 20:19:34 #10897] INFO: crashy moved 'unmonitored' to 'up'[... 20:19:34 #10897] INFO: crashy [trigger] process is not running (ProcessRunning)[... 20:19:34 #10897] INFO: crashy move 'up' to 'start'[... 20:19:34 #10897] INFO: crashy start: ruby scripts/crashy.rb[... 20:19:34 #10897] INFO: crashy moved 'up' to 'up'[... 20:19:34 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:35 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:36 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:37 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:38 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:39 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:40 #10897] INFO: crashy [trigger] process is not running (ProcessRunning)[... 20:19:40 #10897] INFO: crashy move 'up' to 'start'[... 20:19:40 #10897] INFO: crashy start: ruby scripts/crashy.rb[... 20:19:40 #10897] INFO: crashy moved 'up' to 'up'[... 20:19:40 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:41 #10897] INFO: crashy [ok] process is running (ProcessRunning)

Page 17: God - Process and Task Monitoring Done Right

$ god -c simple.god -D[... 20:19:33 #10897] INFO: Using pid file directory: /Users/jnewland/.god/pids[... 20:19:34 #10897] INFO: Started on drbunix:///tmp/god.17165.sock[... 20:19:34 #10897] INFO: crashy move 'unmonitored' to 'up'[... 20:19:34 #10897] INFO: crashy moved 'unmonitored' to 'up'[... 20:19:34 #10897] INFO: crashy [trigger] process is not running (ProcessRunning)[... 20:19:34 #10897] INFO: crashy move 'up' to 'start'[... 20:19:34 #10897] INFO: crashy start: ruby scripts/crashy.rb[... 20:19:34 #10897] INFO: crashy moved 'up' to 'up'[... 20:19:34 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:35 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:36 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:37 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:38 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:39 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:40 #10897] INFO: crashy [trigger] process is not running (ProcessRunning)[... 20:19:40 #10897] INFO: crashy move 'up' to 'start'[... 20:19:40 #10897] INFO: crashy start: ruby scripts/crashy.rb[... 20:19:40 #10897] INFO: crashy moved 'up' to 'up'[... 20:19:40 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:41 #10897] INFO: crashy [ok] process is running (ProcessRunning)

Page 18: God - Process and Task Monitoring Done Right

$ god -c simple.god -D[... 20:19:33 #10897] INFO: Using pid file directory: /Users/jnewland/.god/pids[... 20:19:34 #10897] INFO: Started on drbunix:///tmp/god.17165.sock[... 20:19:34 #10897] INFO: crashy move 'unmonitored' to 'up'[... 20:19:34 #10897] INFO: crashy moved 'unmonitored' to 'up'[... 20:19:34 #10897] INFO: crashy [trigger] process is not running (ProcessRunning)[... 20:19:34 #10897] INFO: crashy move 'up' to 'start'[... 20:19:34 #10897] INFO: crashy start: ruby scripts/crashy.rb[... 20:19:34 #10897] INFO: crashy moved 'up' to 'up'[... 20:19:34 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:35 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:36 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:37 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:38 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:39 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:40 #10897] INFO: crashy [trigger] process is not running (ProcessRunning)[... 20:19:40 #10897] INFO: crashy move 'up' to 'start'[... 20:19:40 #10897] INFO: crashy start: ruby scripts/crashy.rb[... 20:19:40 #10897] INFO: crashy moved 'up' to 'up'[... 20:19:40 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:41 #10897] INFO: crashy [ok] process is running (ProcessRunning)

Page 19: God - Process and Task Monitoring Done Right

$ god -c simple.god -D[... 20:19:33 #10897] INFO: Using pid file directory: /Users/jnewland/.god/pids[... 20:19:34 #10897] INFO: Started on drbunix:///tmp/god.17165.sock[... 20:19:34 #10897] INFO: crashy move 'unmonitored' to 'up'[... 20:19:34 #10897] INFO: crashy moved 'unmonitored' to 'up'[... 20:19:34 #10897] INFO: crashy [trigger] process is not running (ProcessRunning)[... 20:19:34 #10897] INFO: crashy move 'up' to 'start'[... 20:19:34 #10897] INFO: crashy start: ruby scripts/crashy.rb[... 20:19:34 #10897] INFO: crashy moved 'up' to 'up'[... 20:19:34 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:35 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:36 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:37 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:38 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:39 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:40 #10897] INFO: crashy [trigger] process is not running (ProcessRunning)[... 20:19:40 #10897] INFO: crashy move 'up' to 'start'[... 20:19:40 #10897] INFO: crashy start: ruby scripts/crashy.rb[... 20:19:40 #10897] INFO: crashy moved 'up' to 'up'[... 20:19:40 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:41 #10897] INFO: crashy [ok] process is running (ProcessRunning)

Page 20: God - Process and Task Monitoring Done Right

$ god -c simple.god -D[... 20:19:33 #10897] INFO: Using pid file directory: /Users/jnewland/.god/pids[... 20:19:34 #10897] INFO: Started on drbunix:///tmp/god.17165.sock[... 20:19:34 #10897] INFO: crashy move 'unmonitored' to 'up'[... 20:19:34 #10897] INFO: crashy moved 'unmonitored' to 'up'[... 20:19:34 #10897] INFO: crashy [trigger] process is not running (ProcessRunning)[... 20:19:34 #10897] INFO: crashy move 'up' to 'start'[... 20:19:34 #10897] INFO: crashy start: ruby scripts/crashy.rb[... 20:19:34 #10897] INFO: crashy moved 'up' to 'up'[... 20:19:34 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:35 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:36 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:37 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:38 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:39 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:40 #10897] INFO: crashy [trigger] process is not running (ProcessRunning)[... 20:19:40 #10897] INFO: crashy move 'up' to 'start'[... 20:19:40 #10897] INFO: crashy start: ruby scripts/crashy.rb[... 20:19:40 #10897] INFO: crashy moved 'up' to 'up'[... 20:19:40 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:41 #10897] INFO: crashy [ok] process is running (ProcessRunning)

Page 21: God - Process and Task Monitoring Done Right

$ god -c simple.god$

Page 22: God - Process and Task Monitoring Done Right

$ god -c simple.god$ ps ax | grep ruby12512 ?? Ss 0:00.03 ruby /Users/jnewland/src/god_examples/scripts/crashy.rb12484 s001 S 0:00.36 /usr/bin/ruby /usr/bin/god -c simple.god

Page 23: God - Process and Task Monitoring Done Right

$ god -c simple.god$ ps ax | grep ruby12512 ?? Ss 0:00.03 ruby /Users/jnewland/src/god_examples/scripts/crashy.rb12484 s001 S 0:00.36 /usr/bin/ruby /usr/bin/god -c simple.god$ god -h...Commands: start <task or group name> start task or group restart <task or group name> restart task or group stop <task or group name> stop task or group monitor <task or group name> monitor task or group unmonitor <task or group name> unmonitor task or group remove <task or group name> remove task or group from god load <file> load a config into a running god log <task name> show realtime log for given task status show status of each task quit stop god terminate stop god and all tasks check run self diagnostic

Page 24: God - Process and Task Monitoring Done Right

$ god statuscrashy: up$ god restart crashySending 'restart' command

The following watches were affected: crashy$ god stop crashySending 'stop' command

The following watches were affected: crashy$ god statuscrashy: unmonitored$ god start crashySending 'start' command

The following watches were affected: crashy$ god statuscrashy: up

Page 25: God - Process and Task Monitoring Done Right

ControllingLeaky Processes

Page 26: God - Process and Task Monitoring Done Right

#leaky.godGod.watch do |w| w.name = "leaky" w.interval = 5.seconds w.start = 'ruby scripts/leaky.rb'

w.start_if do |start| start.condition(:process_running) do |c| c.running = false end end

w.restart_if do |restart| restart.condition(:memory_usage) do |c| c.above = 2.megabytes end endend

Page 27: God - Process and Task Monitoring Done Right

CPU Usage

Page 28: God - Process and Task Monitoring Done Right

w.restart_if do |restart| restart.condition(:cpu_usage) do |c| c.above = 50.percent c.times = [3, 5] end end

Page 29: God - Process and Task Monitoring Done Right

HTTP Status Codes

Page 30: God - Process and Task Monitoring Done Right

w.restart_if do |restart| restart.condition(:http_response_code) do |c| c.host = 'localhost' c.port = '80' c.path = '/heartbeat' c.code_is_not = %w(200 304) end end

Page 31: God - Process and Task Monitoring Done Right

Notifications

Page 32: God - Process and Task Monitoring Done Right

#email_contacts.godGod::Contacts::Email.message_settings = { :from => '[email protected]'}

God::Contacts::Email.server_settings = { :address => "smtp.jnewland.com", :port => 25, :domain => "jnewland.com", :authentication => :plain, :user_name => "god", :password => ""}

God.contact(:email) do |c| c.name = 'jesse' c.email = '[email protected]'end

Page 33: God - Process and Task Monitoring Done Right

#http://github.com/mojombo/god/tree/master/lib/god/contacts/jabber.rbrequire 'jabber'

God::Contacts::Jabber.settings = { :jabber_id => '[email protected]', :password => ' ' }

God.contact(:jabber) do |c| c.name = 'jesse' c.jabber_id = '[email protected]'end

Page 34: God - Process and Task Monitoring Done Right

w.restart_if do |restart| restart.condition(:cpu_usage) do |c| c.above = 50.percent c.times = [3, 5] c.notify = "jesse" end end

Page 35: God - Process and Task Monitoring Done Right

MonitoringMongrels

Page 36: God - Process and Task Monitoring Done Right

Putting it all together

• Process Running

• Memory Usage

• CPU Usage

• HTTP Response Code

• Notifications

• Capistrano?

• Web Interface?

Page 37: God - Process and Task Monitoring Done Right

#rails/config/god/app.godRAILS_ROOT = ENV['RAILS_ROOT'] ||= "/var/www/apps/test/current"RUBY = `which ruby`.chompMONGREL_RAILS = `which mongrel_rails`.chompRAILS_ENV = ENV['RAILS_ENV'] ||= 'production'MONGRELS = 2MONGREL_START_PORT= 3000USER = GROUP = 'deploy'

0.upto(MONGRELS-1) do |n| port = MONGREL_START_PORT+n God.watch do |w| w.group = 'mongrels' w.name = "mongrel_#{port}" w.uid = USER w.gid = GROUP w.interval = 30.seconds w.start = "#{RUBY} #{MONGREL_RAILS} start --environment #{RAILS_ENV} --chdir #{RAILS_ROOT} --port #{port}" w.start_grace = 90.seconds w.restart_grace = 90.seconds w.log = File.join(RAILS_ROOT, "log/mongrel_#{port}.log")

#process running

#memory usage

#cpu usage

#http response code enddo

Page 38: God - Process and Task Monitoring Done Right

class PulseController < ApplicationController session :off def pulse if (ActiveRecord::Base.connection.execute("select 1").num_rows rescue 0) == 1 render :text => "OK #{Time.now.utc.to_s(:db)}" else render :text => 'ERROR', :status => :internal_server_error end endend

Pulse Controller

Page 39: God - Process and Task Monitoring Done Right

Capistrano

Page 40: God - Process and Task Monitoring Done Right

#rails/config/deploy.rbrole :app, "test.jnewland.com"

require 'san_juan'san_juan.role :app, %w(mongrels)

#overwrite the default start, stop, and restart tasks to use godnamespace :deploy do

desc "Use god to restart the app" task :restart do god.all.reload god.app.mongrels.restart end

desc "Use god to start the app" task :start do god.all.start end

desc "Use god to stop the app" task :stop do god.all.terminate end

end

Page 41: God - Process and Task Monitoring Done Right

$ cap -T

...

cap god:all:quit # Quit god, but not the processes it's monitoringcap god:all:reload # Reloading God Configcap god:all:start # Start godcap god:all:start_interactive # Start god interactivelycap god:all:status # Describe the status of the running tasks on ...cap god:all:terminate # Terminate god and all monitored processescap god:app:mongrels:log # Log mongrelscap god:app:mongrels:remove # Remove mongrelscap god:app:mongrels:restart # Restart mongrelscap god:app:mongrels:start # Start mongrelscap god:app:mongrels:stop # Stop mongrelscap god:app:mongrels:unmonitor # Unmonitor mongrelscap god:app:quit # Quit god, but not the processes it's monitoringcap god:app:reload # Reload the god config filecap god:app:start # Start godcap god:app:start_interactive # Start god interactivelycap god:app:status # Describe the status of the running taskscap god:app:terminate # Terminate god and all monitored processes

...

Page 43: God - Process and Task Monitoring Done Right

ZOMG WHAT NOW?

Page 44: God - Process and Task Monitoring Done Right

#rails/config/god/app.god

...

require 'god_web'GodWeb.watch(:port => 3003)

...

Page 45: God - Process and Task Monitoring Done Right
Page 46: God - Process and Task Monitoring Done Right
Page 48: God - Process and Task Monitoring Done Right

AdvancedFeatures

Page 49: God - Process and Task Monitoring Done Right

#jabber_bot.god w.restart_if do |restart| restart.condition(:lambda) do |c| c.interval = 15.seconds c.lambda = lambda do require 'xmpp4r-simple' im = Jabber::Simple.new( '[email protected]', PASSWORDS['[email protected]'] ) im.deliver('[email protected]', 'ping') sleep(5) return true unless im.received_messages? chat = im.received_messages.find { |msg| msg.type == :chat} return true unless chat.body =~ /pong/ end end end

Lambda Conditions

Page 50: God - Process and Task Monitoring Done Right

#custom_behavior.godmodule God module Behaviors class Speak < Behavior

def before_start `say "Starting now"` 'announced start' end

def before_stop `say "Stopping now"` 'announced stop' end

end endend

God.watch do |w| ... w.behavior(:speak) ...end

Behaviors

Page 51: God - Process and Task Monitoring Done Right

#mongrel_cluster.godrequire 'lib/god_mongrel_cluster'

Dir.glob('/etc/mongrel_cluster/*.conf').each do |mongrel_cluster| cluster = GodMongrelCluster.new(mongrel_cluster) cluster.watchend

mongrel_cluster

Page 52: God - Process and Task Monitoring Done Right

Questions?

Page 53: God - Process and Task Monitoring Done Right

http://www.flickr.com/photos/stuckincustoms/522313332/http://www.flickr.com/photos/91499534@N00/2335651912/http://www.flickr.com/photos/code_martial/1411893703/http://www.flickr.com/photos/extranoise/163847669/http://www.flickr.com/photos/vanz/2480741207/http://www.flickr.com/photos/smartjunco/281071006/http://www.flickr.com/photos/davesag/8312984/http://www.flickr.com/photos/gaetanlee/298178764/http://www.flickr.com/photos/vrogy/511644410/http://www.flickr.com/photos/jeffsmallwood/299208539/http://www.flickr.com/photos/cjdaniel/2240123159/http://www.flickr.com/photos/bobbygreg/139080175/http://www.flickr.com/photos/lordelo/12958772/

Hooray Flickr! (And Creative Commons)