delivering big content at nbc news with ravendb

34
NoSql NOW! 2013 Delivering big content at NBC News with RavenDB

Upload: john-bennett

Post on 27-Jan-2015

119 views

Category:

Technology


8 download

DESCRIPTION

RavenDB is a schema-less document database that offers fully ACID transactions, fast and flexible search, replication, sharding, and a simple RESTful API wrapped by clients in a growing number of languages. In this session, we will discuss the experience of developing and maintaining a RavenDB-backed CMS for one of the largest news sites in the US. We'll cover: - Supporting rapid evolution of the content/data model. - Indexing for full-text, map-reduce, geospatial and other types of search. - Replicating and sharding across servers and data centers for high-availability. - Deploying with no downtime. - Handling huge traffic spikes.

TRANSCRIPT

Page 1: Delivering big content at NBC News with RavenDB

NoSql NOW! 2013

Delivering big content at NBC News with RavenDB

Page 2: Delivering big content at NBC News with RavenDB
Page 3: Delivering big content at NBC News with RavenDB

A quick tour

Page 4: Delivering big content at NBC News with RavenDB

•  Schema-less document database with RESTful API. •  Fully ACID and all writes saved to disk (ESENT). •  Indexing/queries executed with Lucene.NET.

•  Easily extended with custom logic using “bundles”.

•  Management UI provided in Silverlight.

•  Host as Windows Service, IIS app, or embedded in your app.

Raven server

Page 5: Delivering big content at NBC News with RavenDB

•  .NET client provided. Third-party clients exist for JavaScript, PHP, and Ruby.

•  Wraps HTTP API.

•  Provides client-side caching, change notification, LINQ querying.

•  Easily extended with many, many hooks into almost all operations.

Raven client

Page 6: Delivering big content at NBC News with RavenDB

•  Open source: http://github.com/ravendb/ravendb

•  License is AGPL (free) or commercial (paid).

•  Exception: Your project can use any OSI-approved license and still use Raven for free.

•  Commercial licenses based on max parallelism and RAM.

•  Windows clustering support and storage compression/encryption available with Enterprise license only.

Raven licensing

Page 7: Delivering big content at NBC News with RavenDB

Demo

Page 8: Delivering big content at NBC News with RavenDB

Why RavenDB?

Page 9: Delivering big content at NBC News with RavenDB

•  Includes nbcnews.com, today.com and more.

•  1.2 billion pageviews/month.

•  140 million video streams/month.

•  58 million unique users/month.

•  Traffic spikes up to 100x normal when big news events happen.

NBC News Digital network

Page 10: Delivering big content at NBC News with RavenDB

•  Very fast page load required

•  “Instant” publish time required

•  6 to 8 code deployments each day

•  High availability: zero* downtime allowed

One of the largest US news sites

Page 11: Delivering big content at NBC News with RavenDB

High availability

is when the answer to:

“What’s the longest outage

before you wind up

in your boss’s office?”

is < 5 seconds.

Page 12: Delivering big content at NBC News with RavenDB

Credit: Mitch Canter @studionashvegas http://twitpic.com/z13bw

Page 13: Delivering big content at NBC News with RavenDB

•  Rolling deployments and rollbacks.

•  Apps and services decoupled physically and temporally.

•  Designed for both auto-failover/recovery and manual reconfiguration by ops.

•  Seamless scale out by adding instances of any process.

•  And more…

Some prerequisites for HA

Page 14: Delivering big content at NBC News with RavenDB

•  Data schema can evolve rapidly

•  Apps shouldn’t know where data is

•  Apps should talk to the closest data replica

•  Apps should automatically find a new replica if the closest becomes unavailable

•  Ops can add/remove replicas quickly and easily, without affecting any running apps

HA data: a private data cloud

Page 15: Delivering big content at NBC News with RavenDB

•  Schema-less document database allows rapid change.

•  Fully ACID model fit business needs.

•  Strong replication functionality supported HA needs.

•  Easily customizable on both client and server.

•  Easily deployed and managed.

•  First class .NET client.

Why we chose RavenDB

Page 16: Delivering big content at NBC News with RavenDB

•  Raven used behind:

•  NBC News and TODAY apps: Windows 8, iOS,

Android, Windows Phone, XBox, Roku.

•  Growing number of sections of nbcnews.com and

today.com.

•  Raven usage stats:

•  ~10 million docs, +1000s of new docs/day.

•  10s of writes/sec.

•  100s of reads/sec (after 3 layers of caching).

Current* state of Raven usage

Page 17: Delivering big content at NBC News with RavenDB

The details

Page 18: Delivering big content at NBC News with RavenDB

•  Each doc cached as long as memory available.

•  Requests include If-Modified-Since header.

•  304 Not Modified response saves bandwidth.

•  Aggressive caching avoids the round-trip. Tunable by ops at runtime (custom).

Client-side caching

Page 19: Delivering big content at NBC News with RavenDB

•  You define sharding strategy – a method.

•  Raven manages storing each doc to the correct instance and fanning/merging queries.

•  No auto-rebalancing of shards if you change number of instances.

Raven sharding

Page 20: Delivering big content at NBC News with RavenDB

•  All queries are performed against indexes. •  Indexes can be predefined or auto-created. •  Indexing/queries are executed in Lucene.NET.

•  Fielded. •  Full text with built-in or custom analyzers. •  Geo-spatial. •  Map-reduce. •  Result transformers can load other docs.

•  Query with LINQ or Lucene syntax. •  Indexes may be stale. Can force wait for non-stale results.

(Danger! Primarily for unit tests.) •  Projections occur on server, reducing data on the wire. •  Super-cool stuff: eval patching, index scripts.

Raven indexing and querying

Page 21: Delivering big content at NBC News with RavenDB

•  Need indexes up to date before letting a client talk to a replica.

•  Indexes are created by the client app:

•  Static: CreateIndexes() at startup scans assemblies for index classes.

•  Dynamic: when client issues a query.

Indexing catch-22

Page 22: Delivering big content at NBC News with RavenDB

•  Define new index, with no code using it.

•  Deploy and allow new index to build.

•  Redeploy with code using the new index.

•  Redeploy after deleting old index definition.

•  Delete old index on each replica.

Updating a static index – a pain

Page 23: Delivering big content at NBC News with RavenDB

•  If you do it by Id, it is consistent (within a single Raven server)

•  Load() •  Store() •  Delete()

•  Queries are only eventually consistent (“eventually” is measured in milliseconds)

Consistency

Page 24: Delivering big content at NBC News with RavenDB

•  Eventual consistency – replication is async in background.

•  All replication is one-way and managed by source.

•  Can enable transitive replication – useful for new instances.

•  Set W value to ensure replication to minimum number of instances (v2.5). Or timeout.

•  Client will auto-failover to replication destinations, configurable to reads only or reads and writes.

Raven replication

Page 25: Delivering big content at NBC News with RavenDB

•  Sequential guids.

•  Unique for every write to a database.

•  Used for caching in client, concurrency control, and replication.

Etags

Page 26: Delivering big content at NBC News with RavenDB

Source: What’s the last etag I replicated to you?

Destination: 42

Source: I’m up to 49, so here’s a POST with some docs in it.

Destination: Got ‘em.

Source: What’s the last etag I replicated to you?

Destination: 49

The replication conversation

Page 27: Delivering big content at NBC News with RavenDB

•  Replication from each instance to all other instances.

•  Any instance could receive writes.

•  Reduce replication conflicts by forcing writes to single “master”.

•  Handle conflicts in your app or with custom server bundle – in our case, “last in wins” bundle.

Multi-master replication

Page 28: Delivering big content at NBC News with RavenDB

•  Null Id and tag can be extracted: client generates with Hi-Lo

•  Null Id received at server: guid

•  Id ending in / received at server: append auto-increment integer.

•  Otherwise: use the value in the object.

•  Server prefix protects against edge-case failures.

Id generation

Page 29: Delivering big content at NBC News with RavenDB

•  Control where reads and writes go. Implemented in a custom DocumentStore wrapper.

•  Control aggressive caching time.

•  Deploy new instances with replication.

•  Backup – but probably never restore in production.

•  Copy indexes.

•  Monitor with stats endpoints.

Raven operations tasks

Page 30: Delivering big content at NBC News with RavenDB

•  Modeling/versioning

•  Replication

•  Client failover

•  Consistency

Keep in mind…

•  Concurrency control

•  Indexing and updates

•  Id generation

•  Caching

Page 31: Delivering big content at NBC News with RavenDB

•  http://ravendb.net

•  GitHub: http://github.com/ravendb

•  Ayende’s blog: http://ayende.com

•  RavenDB Google group •  @RavenDB on Twitter

•  Me: @jtbennett on Twitter

More info on Raven

Page 32: Delivering big content at NBC News with RavenDB

Questions?

Page 33: Delivering big content at NBC News with RavenDB

Many thanks to:

You.

NoSql NOW!

Huge.

Rhinos: @ayende, @synhershko.

Peacocks: @benlakey, @johncoder, @pkdotnet,

Colin Hicks, Peter Durham, Bryan Wheeler.

Page 34: Delivering big content at NBC News with RavenDB

hugeinc.com [email protected] 45 Main St. #220 Brooklyn, NY 11201 +1 718 625 4843