hbasecon 2012 | content addressable storages for fun and profit - berk demir, stumbleupon
TRANSCRIPT
Content Addressable
Storagesfor fun and profit.
Berk D. Demir@bddemir
I break and fix things at @StumbleUpon.
Problem.
Serve lots of static assetswith low latency
and high availability.
Understanding data.
A lot.Small.Frequent.
100 million.19 kilobytes.
Updates.
BLOBs don’t change. They get replaced.We want to keep all without duplicating.
Content Addressable
Store
Store immutable content and authoritatively address it with
a cryptographic hash.
We had ideas.
Very bad ideas.
Very bad.Shared Storage, i.e., NFS.
Bad ideas.
Bad.AWS S3, RS Cloud Files, ...
Distributed: AFS, GlusterHDFS (Oh my!)
Bad ideas.Take 2
o_O
Write a distributed, fault tolerant, replicating, multi datacenter, fast, CAS for
BLOBs.
Reimplementing a lot of things is generally not a
good sign.
Reuse.Don’t reimplement.
HBaseDistributed,
Fault tolerant,Replicating,
Multi datacenter,Fast.
Immutable rows with compact keys, separated into different
column families based on their access patterns.
m: d:
MD5 16 bytes(SHA-1 20 bytes)
Metadata9 bytes
BLOBMany bytes
One table to rule them all.
MAX_FILESIZE => 20G,VERSION => 1,
BLOCKCACHE => true,BLOOMFILTER => ROW
Pre-split into 512 regions at table creation time.
Scala,Finagle,
asynchbase,Varnish
HTTP has a lot to offer.
VerbsGET HEAD PUT DELETE
GET /KwIEec5utYGrKmzXYLgFzg HTTP/1.1Host: b9.sustatic.com
HeadersCache-Control: max-age=<1 year>Last-Modified: <cell timestamp>Content-MD5: <row key: base64>
Content-Disposition: attachment; filename=su.xpi
HBase and HTTP are the perfect tools to build
simple, reliable, fast data services.
Get excited and build things!
Thanks.
Like the design of this slide deck?
Direct your positive feedback to Coda Hale (@coda)