lessons from highly scalable architectures at social networking sites

39
1 Software Engineering in a Cloud World Lessons from highly-scalable architectures at social networking sites Patrick Senti [email protected]

Upload: patrick-senti

Post on 06-May-2015

5.697 views

Category:

Technology


1 download

DESCRIPTION

What are the techniques and technolgies used by popular social networking sites such as Facebook, Twitter, Tumblr, Pinterest or Instagram? How do they architect their systems to scale to multiples of 100 million of visits per day?

TRANSCRIPT

Page 1: Lessons from Highly Scalable Architectures at Social Networking Sites

1

Software Engineering in a Cloud World

Lessons from highly-scalable architectures

at social networking sites

Patrick [email protected]

Page 2: Lessons from Highly Scalable Architectures at Social Networking Sites

2

Social Networking – Trends 2012more users... … higher share of time ...

… for longer

Source: State of Media: The Social Media Report 2012, nielsen, http://is.gd/LYHmnm

Page 3: Lessons from Highly Scalable Architectures at Social Networking Sites

3

User Adoption Faster for New Entrants

Source: author's compilations of data from company data, press statements, technical blogs & presentations

0.5 1 2 3 4 5 6 7 80.01

0.1

1

10

100

1000

User Growth

(years since launch)

Facebook

Twitter

Tumblr

Instagram

Pinterest

years

Milli

on (

loga

rithm

ic)

Page 4: Lessons from Highly Scalable Architectures at Social Networking Sites

4

Staggering Volumes

Page views 500 million/dayReads ~40k requests/secondWrites ~1 million/secondNew data ~3 TB/dayServers 1000Engineers 20

Sources: http://is.gd/mpdOPN, http://is.gd/1vJ1il, http://is.gd/58X8ns, http://is.gd/LGexI6, http://is.gd/tZfNPA, http://is.gd/bcpCJc, http://is.gd/kXVEEF

Likes (counter) 2.7 billion/dayPhotos 300 millions/dayQueries 70'000/dayNew Data 500 TB/dayServers “tens of thousands”Engineers ~1700

Tweets (peak) ~25'000/secondTweets (avg) ~250 million/day (1000/second)API calls 6 billion/day (70'000/second)New data ~8 TB/day (80MB/second)Engineers 500 (of 1000 total employees)

Page views 2.3 billion/monthGrowth rate 50% (visitors, March 2012)Machinery 150 web servers

90 caching servers70 database instances35 logging, internal

Data size 410 TB (user data)Employees ~65 (NB, until end of 2011: 12)

Page 5: Lessons from Highly Scalable Architectures at Social Networking Sites

5

Methodology

● Author's synthesis● Information collected 2010 – 2012● Mostly secondary research conducted on the internet

● Sources of information● Public presentations at industry conferences● Engineering blogs by social network companies● Research reports● Technology documentation● Author's data analysis

● Threats to validity● Subjective selection of information sources● Non-systematic analysis and synthesis of data gathered

Page 6: Lessons from Highly Scalable Architectures at Social Networking Sites

6

Typical Scalability Approaches

● Load Balancing

● Static content on dedicated servers

● Caching

● Database Partitioning

● Replication (high availability)

● (How) Do these work at social-network scale?

Page 7: Lessons from Highly Scalable Architectures at Social Networking Sites

7

Facebook

Source: Aaditya Agarwal, Facebook Architecture, Qcon'2008, London

Functionality- Type of blog - User profile with personal data- Users 'friend' each-other- Post public or private messages

Data Center- owned by facebook

Software Architecture

Page 8: Lessons from Highly Scalable Architectures at Social Networking Sites

8

Twitter

Software architecture- Ruby on Rails, Erlang- since 2009: JVM, Scala - MySQL- Memcached- Unicorn (Mongrel) web server

Functionality- 140-character messages- Users follow each-other- Posts can contain pictures, media links etc.

Data Center- dedicated data center (outsourced)

Source: Krikorian R., Twitter's Real Time Architecture, Qcon NYC 2012

Page 9: Lessons from Highly Scalable Architectures at Social Networking Sites

9

tumblr

Software architecture- PHP, Ruby, Scala - Redis, Hbase, MySQL- Memcache- Thrift

Functionality- Microblogging- Users follow each-other- Dashboard similar to a Facebook page

Data Center- started at Rackspace - co-located, dedicated

Source: Tumblr Architecture – 15 Billion Page Views A months and Harder to Scale than Twitter, Highscalability Blog

Source: tumblr.com

Page 10: Lessons from Highly Scalable Architectures at Social Networking Sites

10

Pinterest

Data Center- Amazon EC2, EBS, S3

Functionality- Photo sharing pinboards- Categorize images, share with others- mostly used by women (2012: 83%)

Software architecture- Python - Django

Source: pinterest.com

Source: Jackson B., Pinterest growth driven by Amazon cloud scalability, 04.2012, techworld.com

Page 11: Lessons from Highly Scalable Architectures at Social Networking Sites

11

Instagram

Software architecture- Python, Django- PostgreSQL- Redis- Nginx- Node.js - Android

Functionality- Smartphone photo sharing- Post to other social networks- Send messages

Data Center- started with single small scale PC (up to 30+ million users)- 100+ instances at Amazon (EC2, EBS, S3 for photos)

Employees- 2010: 2 engineers, 2012: 5 engineers- That's the total employee count

Source: Instagram, What Powers Instagram: Hundreds of Instances, Dozens of Technologies, Instagram Engineering Blog

Source: Wikipedia

Page 12: Lessons from Highly Scalable Architectures at Social Networking Sites

12

Scalability Options

scale out

scale up

#CPUsRAMdisk

#machines

●transparent scalability●scale 'out of the box'●complex hardware (high cost)●specialised Knowledge●more complex software (multi-core)

●simple hardware (low cost)●scale by numbers●difficult to implement●difficult to maintain (myth?)●nore complex software (expensive licenses)

either way- scale by parallization- partition for fault tolerance- replicate for reliability

this means:- decouple components - asynchronous processing- monitor to operate

Page 13: Lessons from Highly Scalable Architectures at Social Networking Sites

13

Caching

● Goal Reduce response times for web site & data access

● Product memcached (open source, initially developed 2003)

● Benefits All accesses (read & write) are O(1)

Page 14: Lessons from Highly Scalable Architectures at Social Networking Sites

14

memchached

Web Server

Load Balancer

Web Server

memcached

memcached

memcached

memcached

client

server = hash­f(key) % #servers

Features● Remote-accessible in-memory key/value cache● Least Recently Used (LRU) eviction ● Shared-nothing, distributed architecture

Implementation ● memcached nodes map to key-ranges (client-side hashing – no SPOF)● Multi-threaded, event-based async network I/O (200'000 requests/s at Facebook)● Single-node fault tolerance by consistent hashing scheme

Keys={1,2,3}

Keys={4,5,6}

Keys={7,8,9}

Keys={10,11,12}

Source: memcached.org

Page 15: Lessons from Highly Scalable Architectures at Social Networking Sites

16

Consistent Hashing in a nutshell

server = min(s | s.location >= (hash­f(key) % #locations))

Consistent hashing: buckets are located on a ring, contain up to pre-defined limit => at worst, only the keys of the failing node need to be re-mapped

Source: David Karget et al, Web caching with consistent hashing, Vol 31, 1999, Computer Networks

Keys={1,2,3}

Keys={3,4,5}Keys={5,6,7}

Keys={8,9,10}

m

Keys={1,2,3}

Keys={1,2,3,4,5}Keys={5,6,7}

Keys={8,9,10}

'Traditional' hashing: buckets contain pre-defined range=> at worst requires re-building the full cache, every node may be affected

Page 16: Lessons from Highly Scalable Architectures at Social Networking Sites

17

Memcached Results

● Results at Twitter

● 100s of servers

● 20TB of data covering >30 services

● 2 trillion queries/day (>23 million queries/second)

● Modified memcached, released as “Twemcache”

● Key objectives

● High Availability

● Predictable Performance

● Dynamic adoption to size (grow/shrink)

● Monitoring of cache effectiveness

Source: Chris Aniszczyk, Caching with Twemcache, 07.2012, Twitter Engineering Blog

Page 17: Lessons from Highly Scalable Architectures at Social Networking Sites

18

Shard your data

● Shards ● horizontal partitions (e.g. by user, time, ...)

● distributed to multiple physical nodes => parallelized data access

● data typically denormalized

● similar data is replicated to all shards – e.g. static data

node1 node2 node3 node4

Web Server

db-client

node = hash­f(user­id) % #nodes

Userids={A, …, F}

Userids={G, …, L}

Userids={….}

Userids={….}

Page 18: Lessons from Highly Scalable Architectures at Social Networking Sites

19

Sharding Results

● Impressive results at Facebook

● 1800 MySQL servers● 4ms reads, 5ms writes ● 60M queries/second (peak)● Growth 20x (overall data, over two years)

● What work's

● Shard by user – group similar data into the same shard

● Linking across shards – store cross-reference s in both shards (two-way access)● Fault tolerance: single-instance failure only affects subset of users

● Consistent hashing -

● What doesn't

● Join's across shards – not possible efficient● Sharding by time not helpful – one shard keeps running “hot” ● Sharding by function not helpful – non-uniform distribution, hot spots, unique access patterns● Fixed hashing – nodes become unbalanced, difficult to grow or shrink

Source: Facebook Techtalks, MySQL & Hbase, December 5, 2011

Page 19: Lessons from Highly Scalable Architectures at Social Networking Sites

20

Managing shards

● Results at Tumblr● 200 db servers● Grouped into 5 global pools / 58 shard pools● 28 TB ● 100 billion rows● No DBAs - 2 engineers keep this running at 50% of their time

● Jetpants – DB management toolkit● Clone slaves efficiently● Split shards into new shards● Master promotions● Command line to work with topology

● Open sourced ● https://github.com/tumblr/jetpants

Source: Elias E., Managing Large Sharded Topologies with Jetpants, 12.2012, Percona Live MySQL Conference

Page 20: Lessons from Highly Scalable Architectures at Social Networking Sites

21

Asynchronous & Distributed Work

● Problem Do more work in less time

● Solution Distributed, asynchronous processingMapReduce

● Requirements

● Split work job into multiple pieces

● Distribute work

● Collect results

● Fault tolerant

● Technologies

● Message Queueing

● Gearman

● Hadoop / Pig

Page 21: Lessons from Highly Scalable Architectures at Social Networking Sites

22

Asynchronous Work Example

● Instagram Push Notifications

● Image uploads

● All uploads go into a task-queue

● ~200 worker processes asynchronously process the images

● Gearman

● Open Source

● Framework to distribute work

● Load Balancing

● No SPOF

Source: gearman.org

Source: Instagram, What Powers Instagram: Hundreds of Instances, Dozens of Technologies, 2012, Instagram Engineering Blog

Page 22: Lessons from Highly Scalable Architectures at Social Networking Sites

23

Apache Hadoop

● What it is

● Distributed MapReduce engine

● Fault tolerant

● Asynchronous job scheduling

● Scalable: e.g. 4000 node cluster,sorts of 1TB in 62 seconds

● Datastorage

● HDFS – scalable to multiple PB

● Distributed storage

● Written in Java

● Data replicated among 3 nodes

● Block storage of 64MB/block

● No SPOF

● Apache Pig

● High-level query language

Sources: Apache Hadoop, Wikipedia, The Free Encyclopedia, accesses January 8, 2013Weil K., NoSQL at Twitter, 04.2010, NoSQL EU 2012

Page 23: Lessons from Highly Scalable Architectures at Social Networking Sites

24

Results

● NoSQL at Twitter

● Store 7TB of data/day

● HD speed: ~80MB/s => 24.3 hours

● Need to parallelize writes and reads

● Analysis using Pig

● Count all tweets

● 12 billion

● 5 minutes

Source: Weil K., NoSQL at Twitter, 04.2010, NoSQL EU 2012

Page 24: Lessons from Highly Scalable Architectures at Social Networking Sites

25

Simplified Queries

Source: Weil K., NoSQL at Twitter, 04.2010, NoSQL EU 2012

Page 25: Lessons from Highly Scalable Architectures at Social Networking Sites

27

Service Oriented Architecture

“Onion-Style”

outer services- public (e.g. REST)- user interface- typically scripted (Python, Ruby, JavaScript)

inner services- private & highly efficient- data access, calculation etc.- workers to accomplish work in parallel- mix of languages (Java, Scala, Python, C, ...)

fire hose- highly available, scalable service bus- distribute services as needed- typically asynchronous

Page 26: Lessons from Highly Scalable Architectures at Social Networking Sites

28

Tumblr Firehose

Apache kafka- O(1) persistent message queue- x times 100K messages/s- pub/sub interface

Apache Zookeeper (Cluster)- distributed coordination - highly available

finagle

finagle- asynchronous RPC system- JVM-hosted languages (Java, Scala, ...)- Connection pools, failure detectors, failover, load-balancing, back-pressure ...

NewPost finagle

HTTP ClientHTTP ClientHTTP Client

Results- 4 x CPUs @ 72GB RAM, 2 disks- provide 1 week of streams- ~400k messages/second- 1 Week of Tumblr posts

public API(JSON)

internal API(thrift)

Source: Blake M., Tumblr Firehose - The Gory Details, 2012, Tumblr Engineering Blog

Page 27: Lessons from Highly Scalable Architectures at Social Networking Sites

29

SOA revisited – network efficiency

consumer provider

Inte

rfac

e

1. Serialize2. Wait for response3. Deserialize

1. Deserialize2. Provide response3. Serialize

CORBA, HTTP/JSON, WSDL/XML/SOAP, ...

efficient?

Page 28: Lessons from Highly Scalable Architectures at Social Networking Sites

30

Apache thrift – optimized wire protocol

● What it is

● Human-readable interface definition language (non-XML)

● Cross-language service implementation

● Code-generation engine (C++, Java, Python, JavaScript, …)

● Binary wire protocol

● Benefits

● Low-overhead serialization/de-serialization

● Native language bindings (no XML parsing or XSD)

● Efficient protocol implementation

Page 29: Lessons from Highly Scalable Architectures at Social Networking Sites

31

thrift example

struct UserProfile { 1: i32 uid, 2: string name, 3: string blurb } service UserStorage { void store(1: UserProfile user), UserProfile retrieve(1: i32 uid) }

# Make an object up = UserProfile(uid=1, name="Test User", blurb="Thrift is great")# Talk to a server via TCP sockets, binary protocol transport = TSocket.TSocket("localhost", 9090) transport.open() Protocol =TBinaryProtocol.TBinaryProtocol(transport) # Use the service we already defined service = UserStorage.Client(protocol) service.store(up) Up2 = service.retrieve(1)

class UserStorageHandler : virtual public UserStorageIf { public: UserStorageHandler() { // Your initialization goes here } void store(const UserProfile& user) { // Your implementation goes here printf("store\n"); } void retrieve(UserProfile& _return, const int32_t uid) { // Your implementation goes here printf("retrieve\n"); } }; //main ... }

interface client

Service implementation

Source: thrift.apache.org

Page 30: Lessons from Highly Scalable Architectures at Social Networking Sites

32

Serialization / Deserialization Performance

Serialization … (thrift: -66% )

… Deserialization (thrift: -92%)

Message size (thrift: -19%)

Benchmark - CPU Core i7 2.7GHz - Serialization of a service message (media descriptor of a video)

Source: Author testing

Page 31: Lessons from Highly Scalable Architectures at Social Networking Sites

33

redis: In-Memory DB

redis

redis

redis

redis

consumer

Keys={1,2,3}

Keys={3,4,5}

Keys={5,6,7}

Keys={8,9,10}

master

slave

slave

slave

async replication

Problem Require speed of cache, query semantics, persistence, fault-tolerance of DB clusterSolution redis.io – a distributed in-memory DB

Redis● fast: O(1) access times - 100'000 writes/second, 80'000 read/second ● fault-tolerant● datatypes: strings, hashes, lists, sets, sorted sets● complex queries: intersection, subset, sort, …● more than just a DB: pub/sub channels

Page 32: Lessons from Highly Scalable Architectures at Social Networking Sites

35

redis results

● tumblr● >7500 notifications/second (well above MySQL max. concurrent limit)

● <5ms response time requirement

● Redis: 30'000 requests/second

Source: Blake M., Staircar: Redis-powered notifications, 07.2011, Tumblr Engineering Blog

Page 33: Lessons from Highly Scalable Architectures at Social Networking Sites

36

Automate everything & Monitor

● If just two engineers

● run 100+ servers

● maintain dozens of databases

● Scale a system to 30+ million users

● … automation is like air to breathe …

● … monitoring is the lifelineDashboard @ Twitter

Source: Adams J., Scaling Twitter, 2010, Chirp Conference

Page 34: Lessons from Highly Scalable Architectures at Social Networking Sites

37

Cell Architecture

● Cell Architecture

● Self-contained cells of data + logic

● Each cell itself made up of a cluster of nodes

● Cells provide internal failover

● Reliability

● Scalability

Cell

Application Server Cluster

Metadata store (HBase)

Discovery Service

Client

consistent hashing by user-id

Source: Malik P., Scaling the Messages Application Back End, 04.11, facebook Engineering's Notes

Page 35: Lessons from Highly Scalable Architectures at Social Networking Sites

38

Summary

Scalability● Cache● Data Sharding● In-Memory DB● Efficient wire protocols

Flexibility● SOA

● Decoupled● Layered (outer, inner services)● Asynchronous (firehouse)

● Automation

Reliability● Replication ● Cell Architecture

Page 36: Lessons from Highly Scalable Architectures at Social Networking Sites

39

Take Away for Application Development

● Scalability => Distribution● Loosely Coupled Components (accessible via APIs, services)● Efficiency at every level● Shared nothing

● Reliability => Replication● Automation● Monitoring ● Fast provisioning of replicates

● Flexibility => Simplification

● Build for simple use ● Abstract to simplify (e.g. Pig/Hadoop, Redis/in-Memory DB)● API-everything

Page 37: Lessons from Highly Scalable Architectures at Social Networking Sites

40

Paradigm Shift?

● New normal

● 100s of machines

● <5 engineers

● Distributed work load

● Horizontal scalability

● PBs of data

● Drivers

● Low barriers of entry – free or low-cost hosting

● Declining cost – CPU, storage, networking

● Web-scale ready open-source software

Page 38: Lessons from Highly Scalable Architectures at Social Networking Sites

41

Q & A

Thank you

Page 39: Lessons from Highly Scalable Architectures at Social Networking Sites

42

What we haven't covered

● CAP Theorem

● A/B Testing

● NoSQL Databases