building c lipboard.com architecture, practices, and lessons
DESCRIPTION
Gary Flake, Founder [email protected]. Building c lipboard.com Architecture, Practices, and Lessons. Outline. Introduction Architecture & Practices Lessons Q&A. Demo. Backstory. Founded by me (Gary Flake) Took ~1.4M angel investment in April, 2011 - PowerPoint PPT PresentationTRANSCRIPT
Outline
• Introduction• Architecture & Practices• Lessons• Q&A
Introduction
Demo
Introduction
Backstory
• Founded by me (Gary Flake)• Took ~1.4M angel investment in April, 2011• 6+ full time employees (almost all dev):
• Mark Dawson, Greg Pascale, Ken Perkins, Tommy Montgomery, Steve Courtney
• Investors include AH, Index, FRC, SVA, FCO, Betaworks, Crunchfund, … individual angels
• 12+ month runway left• Looking to hire one more engineer
Scenarios
Near Term Long term• Individual saving• Micro-blogging• Curating• Collaboration• Shared curating• Link aggregation
• Application and service backup
• Personal data visualization
• Web search• Advertising• Clip platform
Introduction
Overlapping Clip Spaces
Home
SharedPublic
Introduction
Overlapping Clip Spaces
Home
SharedPublic
Introduction
Your private stuff
Your stuff, selectively shared
Other’s stuff, selectively shared
with you
Your public stuff
Other people’spublic stuff
Other’s public stuff, explicitly shared with youYour public stuff, explicitly shared
Why Clipboard?
• Fidelity and functionality preserved• Heterogeneous objects• Simple overlapping spaces• Shareable in several ways:
• 1→1 @mention, email, permalinks• 1→N @mentions, Facebook• 1→∞ publish, twitter, embed
• Tagging and search
Introduction
Outline
• Introduction• Architecture & Practices• Lessons• Q&A
Architecture & Practices
Architectural Goals
• Development Efficiency - Development speed and cost are critical for startups.
• Scalability – We want to support millions of users without rewriting our whole backend.
• Simplicity – Little, clear code. Few moving parts. Painless operations.
Combination helps towards other goals.
Architecture & Practices
Architecture
riak-01
riak-02
riak-03riak-04
riak-05
web-01Node.js + Nginx
web-02Node.js + Nginx
web-03Node.js + Nginx
cache-01
cache-02
cache-03
redis-01
redis-02
thumb-01 thumb-02 job-01
admin-01
Architecture & Practices
Other Infrastructure Parts
• Rackspace API for spinning up/down VMs• AWS for thumbnails storage and CDN• A few 3rd party components:
• Mixpanel and Google for analytics• Sendgrid for email• Paper Trail for log aggregation• Scout for monitoring
Client – Single Page App
• All clip views use same html page• Dependencies on jQuery and a few
plugins• No fancy frameworks (sort of MVVMC)• Express, EJS, & Less on backend help• Almost no server-side composition• Backend code is essentially an API
Architecture & Practices
Nginx• Lost faith in Apache long ago• Nginx is wicked fast• Handles static content (obviously)• Can act as a micro-cache for static and dynamic
content (FTW!)
Architecture & Practices
App Logic – Node.js
You’ve heard the arguments, but for us…
• We like JavaScript• 1 dev can develop features end-to-end• JavaScript + JSON ≈ Buttah!
• Easy to make stateless easy to scale out• Well-suited for Riak
Architecture & Practices
Redis
Lightning fast in-memory key-ADT store:
• Atomic operations for mutations, so no locks, nor write contentions
• Excellent complement to Riak• Uses: top lists, session tokens,
notifications, batch queue, invite tokens, promises, mutex
Architecture & Practices
Memcached
• Simple cache invalidation for K/V reads.• We make no attempt to do proper cache
invalidation on search cache.• Instead, we embrace eventual
consistency as a way of life.• Translation: object have type specific TTLs
that range from seconds to a few minutes.
Architecture & Practices
Operations
• Hosted on VMs at Rackspace• Staging and test clusters identical to
production. Dev on Vagrant.• Puppet for managing configurations• Build and deployment done with home
grown tools:• Devdo: handles stuff on dev box side• Manage: handles stuff on cloud side
Architecture & Practices
Riak
An awesome noSQL data store:
• Super easy to scale up AND down• Fault tolerant – no SPoF• Flexible schema• Full-text search out of the box• Can be fixed and improved in Erlang (the
Basho folks awesomely take our commits)
Architecture & Practices
Riak – Basics
• Data in Riak is grouped buckets(effectively namespaces)
• Basic operations are:• Get, save, delete, search, map, reduce
• Eventual consistency managed through N, R, and W bucket parameters.
• Everything we put in Riak is JSON• We talk to Riak through the excellent riak-
js node library by Francisco TreacyArchitecture & Practices
Data Model – Clips
annotation
title
author
ctime
tags
domain
mentions
Architecture & Practices
Data Model - ClipsClips are the gateway to all of our data
key: abc
<html>
…
</html>
Key: abc
“F1rst”
“Nice clip yo!”
“Saw this on Reddit…”Clip
Blob
Comment Cache
Comments on Clip ‘abc’
Architecture & Practices
Other Buckets
• Users• Blobs• Comments• Templates• Counts• Search Caches• Transactions
Architecture & Practices
Riak Search
• Gets many things out of Riak by something other than the primary key.
• You specify a schema (the types for the field within a JSON object).
• Works great but with one big gotcha:– Index is uses term-based partitioning
instead of document-based partitioning– Implication: joins + sort + pagination sucks– We know how to work around this
Architecture & Practices
Riak Search – Querying
• Query syntax based on Lucene• Basic Query
text:funny • Compound Query
login:greg OR (login:gary AND tags:riak)• Range Query
ctime:[98685879630026 TO 98686484430026]
Architecture & Practices
Clipboard App FlowClient node.js Riak
Go to clipboard.com/homeSearch clips bucket query = login:greg
Top 20 resultsTop 20 results
start rendering
(For each clip)API Request for blob
GET from blobs bucket
Return blob to client
render blob
Architecture & Practices
Outline
• Introduction• Architecture & Practices• Lessons• Q&A
Lessons
Web development doesn’t suckWe are all indebted to Google / Chrome for making web development better and more rewarding.
• “Edit build test” is the new REPL• Good debugging within the client• Fast runtime makes new apps possible
Bet on modules, not frameworks• jQuery plugins are great working
examples of modules that you can take a la carte.
• Frameworks are trickier because they permeate your entire code base.
• You can pick the wrong module and recover, but recovery from choosing the wrong framework is much harder.
• My advice: just use good code hygiene.
Open source and SaaS are critical• Open source is like lego for developers• Paid SaaS is great too – I’ll happily pay for
services when:• They are better than what we could build,• Is not part of our core offering,• Frees up a dev to do something that only we
can do in house.
Browsers and jQuery have bugsWe spent a lot of time tracking down bugs in surprising places:• Chrome Google Apps break bookmarklets• Safari layout can be corrupted by reading
computed CSS• jQuery mishandles position:relative on
body• IE8 and IE9 – don’t even get me started
Node.js is ready for prime time
• This wasn’t the case a year ago.• Callback style takes time to get used to.• Common coding patterns are still ugly.• The result is pretty phenomenal: a
backend that is effectively non-blocking.• It’s really great to work with the same
JSON / JS objects on all 3 tiers.
Redis & Riak are yin & yang
Redis• Abstract data types• In RAM, single node• Fast and atomic
operations• Can handle easily
write contentions
Riak• Documents• On disk, many nodes• Slow and eventually
consistent• Have to think about
write contention
Think in terms of write contention• noSQL patterns will have you writing a lot
of independent objects.• Simple contention can be managed with a
mutex, keeping code simple.• Complex contention can be batched into a
work queue.
Cache, cache, cache
• There is more than one way to cache.• Don’t get too clever (embrace noSQL and
don’t worry about cache invalidation).• Cache in multiple places and on multiple
time scales.
Balance agility with process
• Dev’s do testing and deploying• Code reviews author initiated• Many small features done branched off of
master. (No “dev” branch.)• Bug fixes done right on master.
Recap
• We don’t have big data… yet. But we think we can handle it.
• Our stack, architecture, and practices allow us to move fast while also designing for scalability.
• It’s also a really fun stack to work on.
Lessons
Questions?