migrating from backgroundrb to resque
DESCRIPTION
We upgraded from Backgroundrb to Resque. The pagers have stopped buzzing, and we are very pleased with the migration. Resque was a little tricky to get the last 5% complete. This presentation shares some of the implementation details (code and config files) to help others make their Resque setup rock solid.TRANSCRIPT
2010-10-12
to: Resquefrom: Backgroundrb@kbrock
Summary
! Background queues let us defer logic outside the browser request and response.
! Background.rb was crashing for us often. Moved to resque and it hasn't crashed since.
! Background.rb is easier to run out of the box.
! Adding just a little code makes Resque just as easy without sacrificing all the added flexibility.
Why we upgraded?
! bdrb pages Boss 4 times my first weekend
! memory leaks caused crashes
! monit can't restart workers in backgroundrb
! move to active project (ala heroku, github, redis)
What do each bring to the table
bdrb resque
adhoc (out of request) ! !
delay (run/remind) ! resque-schedule
schedule (cron) ! resque-schedule
mail (invisible/out of req) code resque_mailer
status reporting code resque-meta, web
backgroundrb does most of what we need out of the boxresque has plugins to make up the difference
Bdrb Components
railsenqueue
mainqueue
queuemanager
scheduler
workers
mailer
work
bdrb ymlMonitoredwe started data
simple w/ 1 queue (add started_at for delayed jobs)scheduler is a special worker - managed by 1 process (is a runner/worker)
Resque Components
mainqueuemain
queue
delayedqueue
railsenqueue
mainqueue
rake
scheduler
workers
mailer
work
workers
schedule
Monitoredwe started data
resqueweb
2
1
5
3
4
6
many moving partssimplified in all workers are the samescheduler simply adds entries in the queue (instead of MetaWorker/running jobs)web ui is a nice touch
1. Ad-hoc Enqueuing
bdrb resque
args hash ruby, checked
enqueue AR objects !
mail(invisible) ! !
AR objects - creeped up in the action_mailer deliver callsLooks like bdrb wins here, but not enqueuing AR objects is best practice
Ad-hoc/Delayed (bdrb)
class JobWorker < BackgrounDRb::MetaWorker set_worker_name :job_worker def purge_job_logs() JobLog.purge_expired! persistent_job.finish! end def self.perform_later(*args) MiddleMan.worker(:job_worker).enq_purge_job_logs( :job_key => new_job_key, :arg => args) end def self.perform_at(*args) time=args.shift MiddleMan.worker(:job_worker).enq_purge_job_logs( :job_key => new_job_key, :arg => *args,:scheduled_at => time) end def self.new_job_key() "purge_job_logs_#{ActiveSupport::SecureRandom.hex(8)}" endend
don't need to do a command pattern (our code didn't)scheduled_at = beauty of SQLparent classenqueue knows queue name (code not loaded)
Ad-hoc/Delayed (resque)
class PurgeJobLogs @queue = :job_worker def self.process() JobLog.purge_expired! end
def self.perform_later(*args) Resque.enqueue(self, *args) end def self.perform_at(*args) time=args.shift Resque.enqueue_at(time, self, *args) endend
Enqueue needs worker class to know the name of the queue(even if called directly into Resque)interface only (perform_{at,later}) -> abstracted out to parent?
2. Scheduled Enqueuing
bdrb resque
sched any method !x2 command
scheduler ! !+
adhoc jobs !
Need to define schedule in 2 places. yml and ruby.We ran into case where this caused a problemweb ui for easy adhoc kicking off of resque commands. (very useful in staging)
Scheduled (bdrb)
:backgroundrb: :ip: 127.0.0.1 :port: 11006 :environment: development
:schedules: :scheduled_worker: :purge_job_logs: :trigger_args: 0 */5 * * * *
Evidence of framework - scheduled_worker defined here, need meta worker (so it can be run)
Scheduled (bdrb)
class ScheduledWorker < BackgrounDRb::MetaWorker extend BdrbUtils::CronExtensions set_worker_name :scheduled_worker
threaded_cron_job(:purge_job_logs) { JobLog.purge_expired! }end
scheduler = MetaWorker. Defined 2 times - so it calls your code, so can call "any static method"
Scheduled (resque)
---clear_logs: cron: "*/10 * * * *" class: PurgeJobLogs queue: job_worker description: Remove old logs
queue_name (so scheduler does not need to load worker into memory to enqueue)cron is standard format (remove 'seconds') - commandsscheduler in separate process. (can run when workers are stopped / changed) - minimal envscheduler injects into queue (vs runs jobs) - so can adhoc inject via webno ruby code for this
3. Processes/Worker management
bdrb resque
knows queues ! us, command, web
pids ! us+
mem leak resistant !
workers/queue 1 <1 - ∞
pause workers !
Discover previous queues (not all) via 'resque list' / webbdrb: creates 1 worker/queue (creates pid file + 1 pid for backgroundrb) - monit can't restartwe manage processes: 1+ workers/queue - 1+ queues / workerpause/restart workers
worker list (resque)
primary: queues: background,mailsecondary: queues: mail,background
can have multiple workers running the same queuescan have multiple queues in 1 workerworker pool can be * generalized, * response focused, * schedule focused, *changed at runtimeinverted priority list - prevents starvation
4. Running Workers
namespace :resque do desc 'start all background resque daemons' task :start_daemons do mrake_start "resque_scheduler resque:scheduler" workers_config.each do |worker, config| mrake_start "resque_#{worker} resque:work QUEUE=#{config['queues']}" end end desc 'stop all background resque daemons' task :stop_daemons do sh "./script/monit_rake stop resque_scheduler" workers_config.each do |worker, config| sh "./script/monit_rake stop resque_#{worker} -s QUIT" end end def self.workers_config YAML.load(File.open(ENV['WORKER_YML'] || 'config/resque_workers.yml')) end def self.mrake_start(task) sh "nohup ./script/monit_rake start #{task} RAILS_ENV=#{ENV['RAILS_ENV']} >> log/daemons.log &" endend
Deploying (cap)
namespace :resque do desc "Stop the resque daemon" task :stop, :roles => :resque do run "cd #{current_path} && RAILS_ENV=#{rails_env} WORKER_YML=#{resque_workers_yml} rake resque:stop_daemons; true" end
desc "Start the resque daemon" task :start, :roles => :resque do run "cd #{current_path} && RAILS_ENV=#{rails_env} WORKER_YML=#{resque_workers_yml} rake resque:start_daemons" endend
5. Monitoring Workers (monit.erb)
check process resque_scheduler with pidfile <%= @rails_root %>/tmp/pids/resque_scheduler.pid group resque alert [email protected] start program = "/bin/sh -c 'cd <%= @rails_root %>; RAILS_ENV=production ./script/monit_rake start resque_scheduler resque:scheduler'" stop program = "/bin/sh -c 'cd <%= @rails_root %>; RAILS_ENV=production ./script/monit_rake stop resque_scheduler'"
<% YAML.load(File.open(Rails.root+'/config/production/resque/resque_workers.yml')).each_pair do |worker, config| %>check process resque_<%=worker%> with pidfile <%= @rails_root %>/tmp/pids/resque_<%=worker%>.pid group resque alert [email protected] start program = "/bin/sh -c 'cd <%= @rails_root %>; RAILS_ENV=production ./script/monit_rake start resque_<%=worker%> resque:work QUEUE=<%=config['queues']%>'" stop program = "/bin/sh -c 'cd <%= @rails_root %>; RAILS_ENV=production ./script/monit_rake stop resque_<%=worker%>'"
<% end %>
use template to generate monit file
Monitoring Rake Processes
#!/bin/sh# wrapper to daemonize rake tasks: see also http://mmonit.com/wiki/Monit/FAQ#pidfile
usage() { echo "usage: ${0} [start|stop] name target [arguments]" echo "\tname is used to create or read the log and pid file names" echo "\tfor start: target and arguments are passed to rake" echo "\tfor stop: target and arguments are passed to kill (e.g.: -n 3)" exit 1}[ $# -lt 2 ] && usage
cmd=$1name=$2shift ; shift
pid_file=./tmp/pids/${name}.pidlog_file=./log/${name}.log
# ...
Monitoring Processes
case $cmd in start) if [ ${#} -eq 0 ] ; then echo -e "\nERROR: missing target\n" usage fi pid=`cat ${pid_file} 2> /dev/null` if [ -n "${pid}" ] ; then ps ${pid} if [ $? -eq 0 ] ; then echo "ensure process ${name} (pid: ${pid_file}) is not running" exit 1 fi fi echo $$ > ${pid_file} exec 2>&1 rake $* 1>> ${log_file} ;; stop) pid=`cat ${pid_file} 2> /dev/null` [ -n "${pid}" ] && kill $* ${pid} rm -f ${pid_file} ;; *) usage ;;esac
Monitoring Web
6. Running Web
namespace :resque do task :setup => :environment
desc 'kick off resque-web' task :web => :environment do $stdout.sync=true $stderr.sync=true puts `env RAILS_ENV=#{RAILS_ENV} resque-web #{RAILS_ROOT}/config/initializers/resque.rb` endend
initializer
#this runs in sinatra and rails - so don't use Rails.envrails_env = ENV['RAILS_ENV'] || 'development'rails_root=ENV['RAILS_ROOT'] || File.join(File.dirname(__FILE__),'../..')
redis_config = YAML.load_file(rails_root + '/config/redis.yml')Resque.redis = redis_config[rails_env]
require 'resque_scheduler'require 'resque/plugins/meta'require 'resque_mailer'
Resque.schedule = YAML.load_file(rails_root+'/config/resque_schedule.yml')Resque::Mailer.excluded_environments = [:test, :cucumber]
5. Monitoring Work
bdrb resque
ad-hoc queries SQL redis query
did it run? custom resque-meta
did it fail? hoptoad !
rerun !
have id ! resque-meta
que health sample controller !
Did the job run?resque assumes all worked - only tells you failures. not good enough for us
Pausing Workers
signal what happens when to use
quit wait for child & exit gracefully shutdown
term / int immediately kill child & exit shutdown now
usr1 immediately kill child stale child
usr2 don't start any new jobs
cont start to process new jobs
Testing Worker
bdrb resque
testing queue mid-easy resque_unit
testing command !
all workers same !
interface only !
Resque::Mailer.excluded_environments = [:test, :cucumber]
Extending with Hooks
resque hooks
around_enqueue "
after_enqueue !
before_perform !
around_perform ! / "
after_perform !
all plugins want to extend enqueue - not compatibleneed to be able to alter arguments (e.g.: add id for meta plugins)
Conclusion
! Boss got no pages in first month of implementation
! no memory leaks, great uptime (don't need monit...)
! Fast
! generalized workers increases throughput (nightly vs 1 hour)
! minimal custom code
! still some intimidation
! Eating flavor of the month
References
! coders: @kbrock and @wpeterson
! great company: PatientsLikeMe (encouraged sharing this)
! resque_mailer
! resque-scheduler
! resque-meta
! monit, hoptoad, rpm_contrib