patella railsconf 2012
DESCRIPTION
This talk will feature: memcache, resque, a bit of metaprogramming, a look at caching in the wild and code that fixes some usual problems, and a fairly epic SQL query with some nice Postgres features you should know about.TRANSCRIPT
PATELLAMEMOIZATION INTO MEMCACHED DONE IN RESQUE
JEFF DWYER
PATIENTSLIKEME
@JDWYAH
TODAY
Engineers will never be successful if we are the brake on innovation.
Technique for innovating safely
Learn a bit about meta-programming
TODAY
Setup the problem
Sketch the solution
Nitty Gritty Details
TODAY
Setup the problem
Sketch the solution
Nitty Gritty Details
1) NIH PRESENTATION IN 4 WEEKS!
Integrate clinicaltrials.gov into our site
Search by trial type
Search by trial phase
Search by trial conditions mapped from Mesh to Meddra
Search by trial facility locations…
• Location search…
WE HAVE A CHOICE
WHAT IS RIGHT
PostGIS spatial database extensions for PostgreSQL
MongoDB built in support for two dimensional geospatial indexes
AND WHAT IS EASYsqrt(pow(69.1 * (clinical_trial_locations.lat - 40.948073),2) + pow(53.0 * (clinical_trial_locations.lng - -90.36871),2)) AS distance
CHOOSE THE EASY!
CHOOSE THE EASY!
Who knows if location is even important?
Who knows if this project is even important?
MongoDB requires dev setup, automated staging setup, production setup, monitoring.
BUT, OH GOD THE HUMANITY
<query plan pic>
BUT, OH GOD THE HUMANITY
<query plan pic>
2) PATIENT LIKE ME SEARCH
2) PATIENT LIKE ME SEARCH
PATIENT SEARCH RANKING
Very basic search
Plus very complex ordering
Not as many great solutions in this space
N^2 similarity matrix @ 100k patients about 4 TB
And did I mention it’s N^2?
Postgres is an amazingly viable solution.
ELEGANT CODE…
LOVELY SQL…
BUT IT’S JUST THIS SIDE OF ‘REAL-TIME’
One second queries just don’t fly.
And oh, yeah 16 people hitting it at the same time would clobber the servers.
3) A FORWARD LOOKING TIME MACHINE
Maybe those were aberrations?
Crazy right?
A FORWARD LOOKING TIME MACHINE
AND HERE’S MY CEO PROMISING IT AT TED
A FORWARD LOOKING TIME MACHINE
STEPPING BACK
Conflict
• Relational data is most easily queried relationally. • Relational queries don’t necessarily scale and stay in the
millisecond range
• Denormalized queries & special solutions scale• But take longer to implement
• (note) This isn’t just SQL, I’m talking about anything slow
We want to experiment/fail fast
• But we don’t want…
DON’T WANT TO LET THIS:
TURN INTO:
TURN INTO: :-C
TURN INTO: :-C
TODAY
Setup the problem
Sketch the solution
Nitty Gritty Details
WHAT WE WANT
Trivially easy way for developers to declare that some methods are not to be run without adult supervision.
Consistent framework so that ops doesn’t need to be afraid of new, sometimes expensive experiments.
SOLUTION SPACE
Doing it right all the time
• Too slow and expensive• Slows innovation
SOLUTION SPACE
Memoization
• Brilliant• Functional Programming Nirvana• No cache-key shenanagins
• But also no expiry…
• There’s just one thing…• It only works in a single request
SOLUTION SPACE
Memcached
• Great• Simple to setup.
• Could be simpler. Handmade cache keys feels wrong.
• But it doesn’t solve our :-C problem.• The first request still slams the server.
• So you do some cache warming thing…• But this is a PITA again.
WHAT COULD MAKE THIS SIMPLER?
Remove one constraint.
A basic Rails.cache.fetch guarantees you a result
• But no performance guarantee
Flip that deal around.
• Guarantee performance• Don’t guarantee a result• It’s ok not to know the answer!
BUT IT NEEDS A NAME!
TECHNOLOGIES!
Memoization into Memcached with everything calculated in Resque.
TODAY
Setup the problem
Sketch the solution
Nitty Gritty Details
PATELLA DEVELOPER INTERFACE
SEND LATER
Super easy way to just do something later while in the same context.
Most workers are real boring.
Single worker for suffices for many background jobs.
Makes testing/development easier by bypassing Resque in configuration.
AR extension. Coordinates logging / monitoring.
SENDLATER
User.send_later :expensive, arg1, arag2
SENDLATER RESQUE WORKER
MEMOIZATION
SendLater gets things calculated in Resque, but that’s step 1.
We still need:
Memoization.
Stored in memcached.
THIS IS NOT A GOOD SLIDE
PATELLA RESULT
THE METHOD
WITH PATELLA
THE REPLACEMENT
THE ONE THAT DOES THE WORK
MAYBE IT’S BETTER NOW?
DOG PILE
THE REPLACEMENT
DOG PILE
LONG ARGUMENTS
LONG ARGUMENTS
LONG ARGUMENTS
SOFT EXPIRATION
Memcached is great, but it doesn’t tell you when something expires.
Our strategy was to add a ‘soft_expiry’
This gets stored along with the result.
Then recalculate if soft_expiry < now()
ABJALWAYS BE JSON
Beware putting not JSON in memcached
You really don’t want to know
META IS MAGIC
REAL LIFE
PRETTY BORING
Except that it works.
Round 1: Major Pain Points
Round 2: Magic Scaling Sprinkles
Super alpha gem here:
https://github.com/kbrock/patella
Alternative https://github.com/csquared/rack-worker
Very REST-ish, request based.
JOE@JOERODRIGUEZ
AMY@AMYNEWELL
KEENAN@KBROCK
WINFIELD@WPETERSON