hbasecon 2012 | content addressable storages for fun and profit - berk demir, stumbleupon

30
Content Addressable Storages for fun and profit.

Upload: cloudera-inc

Post on 20-Aug-2015

1.394 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir, StumbleUpon

Content Addressable

Storagesfor fun and profit.

Page 2: HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir, StumbleUpon

Berk D. Demir@bddemir

Page 3: HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir, StumbleUpon

I break and fix things at @StumbleUpon.

Page 4: HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir, StumbleUpon

Problem.

Page 5: HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir, StumbleUpon

Serve lots of static assetswith low latency

and high availability.

Page 6: HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir, StumbleUpon

Understanding data.

Page 7: HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir, StumbleUpon

A lot.Small.Frequent.

100 million.19 kilobytes.

Updates.

Page 8: HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir, StumbleUpon

BLOBs don’t change. They get replaced.We want to keep all without duplicating.

Page 9: HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir, StumbleUpon

Content Addressable

Store

Page 10: HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir, StumbleUpon

Store immutable content and authoritatively address it with

a cryptographic hash.

Page 11: HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir, StumbleUpon

We had ideas.

Page 12: HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir, StumbleUpon

Very bad ideas.

Page 13: HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir, StumbleUpon

Very bad.Shared Storage, i.e., NFS.

Page 14: HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir, StumbleUpon

Bad ideas.

Page 15: HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir, StumbleUpon

Bad.AWS S3, RS Cloud Files, ...

Distributed: AFS, GlusterHDFS (Oh my!)

Page 16: HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir, StumbleUpon

Bad ideas.Take 2

Page 17: HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir, StumbleUpon

o_O

Write a distributed, fault tolerant, replicating, multi datacenter, fast, CAS for

BLOBs.

Page 18: HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir, StumbleUpon

Reimplementing a lot of things is generally not a

good sign.

Page 19: HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir, StumbleUpon

Reuse.Don’t reimplement.

Page 20: HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir, StumbleUpon

HBaseDistributed,

Fault tolerant,Replicating,

Multi datacenter,Fast.

Page 21: HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir, StumbleUpon

Immutable rows with compact keys, separated into different

column families based on their access patterns.

Page 22: HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir, StumbleUpon

m: d:

MD5 16 bytes(SHA-1 20 bytes)

Metadata9 bytes

BLOBMany bytes

One table to rule them all.

MAX_FILESIZE => 20G,VERSION => 1,

BLOCKCACHE => true,BLOOMFILTER => ROW

Pre-split into 512 regions at table creation time.

Page 23: HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir, StumbleUpon

Scala,Finagle,

asynchbase,Varnish

Page 24: HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir, StumbleUpon

HTTP has a lot to offer.

Page 25: HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir, StumbleUpon

VerbsGET HEAD PUT DELETE

GET /KwIEec5utYGrKmzXYLgFzg HTTP/1.1Host: b9.sustatic.com

Page 26: HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir, StumbleUpon

HeadersCache-Control: max-age=<1 year>Last-Modified: <cell timestamp>Content-MD5: <row key: base64>

Content-Disposition: attachment; filename=su.xpi

Page 27: HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir, StumbleUpon

HBase and HTTP are the perfect tools to build

simple, reliable, fast data services.

Page 28: HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir, StumbleUpon

Get excited and build things!

Page 29: HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir, StumbleUpon

Thanks.

Page 30: HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir, StumbleUpon

Like the design of this slide deck?

Direct your positive feedback to Coda Hale (@coda)