scalable web architectures common patterns & approaches cal henderson

108
Scalable Web Architectures Common Patterns & Approaches Cal Henderson

Upload: liana-gatling

Post on 28-Mar-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

Scalable Web Architectures

Common Patterns & Approaches

Cal Henderson

Page 2: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 2

Hello

Page 3: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 3

Scalable Web Architectures?

What does scalable mean?

What’s an architecture?

Page 4: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 4

Scalability – myths and lies

• What is scalability?

Page 5: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 5

Scalability – myths and lies

• What is scalability not ?

Page 6: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 6

Scalability – myths and lies

• What is scalability not ?– Raw Speed / Performance– HA / BCP– Technology X– Protocol Y

Page 7: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 7

Scalability – myths and lies

• So what is scalability?

Page 8: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 8

Scalability – myths and lies

• So what is scalability?– Traffic growth– Dataset growth– Maintainability

Page 9: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 9

Scalability

• Two kinds:– Vertical (get bigger)– Horizontal (get more)

Page 10: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 10

Big Irons

Sunfire E20k

$450,000 - $2,500,00036x 1.8GHz processors

PowerEdge SC14252.8 GHz processor

Under $1,500

Page 11: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 11

Cost vs Cost

Page 12: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 12

Cost vs Cost

• But sometimes vertical scaling is right

• Buying a bigger box is quick (ish)

• Redesigning software is not

• Running out of MySQL performance?– Spend months on data federation– Or, Just buy a ton more RAM

Page 13: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 13

Cost vs Cost

• But let’s talk horizontal– Else this is going to be boring

Page 14: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 14

Architectures then?

• The way the bits fit together

• What grows where

• The trade offs between good/fast/cheap

Page 15: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 15

LAMP

• We’re talking about LAMP– Linux– Apache (or LightHTTPd)– MySQL (or Postgres)– PHP (or Perl, Python, Ruby)

• All open source• All well supported• All used in large operations

Page 16: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 16

Simple web apps

• A Web Application– Or “Web Site” in Web 1.0 terminology

Interwebnet App server Database

Page 17: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 17

App servers

• App servers scale in two ways:

Page 18: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 18

App servers

• App servers scale in two ways:

– Really well

Page 19: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 19

App servers

• App servers scale in two ways:

– Really well

– Quite badly

Page 20: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 20

App servers

• Sessions!– (State)

– Local sessions == bad• When they move == quite bad

– Central sessions == good

– No sessions at all == awesome!

Page 21: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 21

Local sessions

• Stored on disk– PHP sessions

• Stored in memory– Shared memory block

• Bad!– Can’t move users– Can’t avoid hotspots

Page 22: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 22

Mobile local sessions

• Custom built– Store last session location in cookie– If we hit a different server, pull our session

information across

• If your load balancer has sticky sessions, you can still get hotspots– Depends on volume – fewer heavier users

hurt more

Page 23: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 23

Remote centralized sessions

• Store in a central database– Or an in-memory cache

• No porting around of session data• No need for sticky sessions• No hot spots

• Need to be able to scale the data store– But we’ve pushed the issue down the stack

Page 24: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 24

No sessions

• Stash it all in a cookie!

• Sign it for safety– $data = $user_id . ‘-’ . $user_name;– $time = time();– $sig = sha1($secret . $time . $data);– $cookie = base64(“$sig-$time-$data”);

– Timestamp means it’s simple to expire it

Page 25: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 25

Super slim sessions

• If you need more than the cookie (login status, user id, username), then pull their account row from the DB– Or from the account cache

• None of the drawbacks of sessions• Avoids the overhead of a query per page

– Great for high-volume pages which need little personalization

– Turns out you can stick quite a lot in a cookie too– Pack with base64 and it’s easy to delimit fields

Page 26: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 26

App servers

• The Rasmus way– App server has ‘shared nothing’– Responsibility pushed down the stack

• Ooh, the stack

Page 27: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 27

Trifle

Page 28: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 28

Trifle

Sponge / Database

Jelly / Business Logic

Custard / Page Logic

Cream / Markup

Fruit / Presentation

Page 29: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 29

Trifle

Sponge / Database

Jelly / Business Logic

Custard / Page Logic

Cream / Markup

Fruit / Presentation

Page 30: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 30

App servers

Page 31: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 31

App servers

Page 32: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 32

App servers

Page 33: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 33

Well, that was easy

• Scaling the web app server part is easy

• The rest is the trickier part– Database– Serving static content– Storing static content

Page 34: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 34

The others

• Other services scale similarly to web apps– That is, horizontally

• The canonical examples:– Image conversion– Audio transcoding– Video transcoding– Web crawling

Page 35: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 35

Parallelizable == easy!

• If we can transcode/crawl in parallel, it’s easy– But think about queuing– And asynchronous systems– The web ain’t built for slow things– But still, a simple problem

Page 36: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 36

Asynchronous systems

Page 37: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 37

Asynchronous systems

Page 38: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 38

Helps with peak periods

Page 39: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 39

Asynchronous systems

Page 40: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 40

Asynchronous systems

Page 41: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 41

Asynchronous systems

Page 42: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 42

The big three

• Let’s talk about the big three then…

– Databases– Serving lots of static content– Storing lots of static content

Page 43: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 43

Databases

• Unless we’re doing a lot of file serving, the database is the toughest part to scale

• If we can, best to avoid the issue altogether and just buy bigger hardware

• Dual Opteron/Intel64 systems with 16GB of RAM can get you a long way

Page 44: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 44

More read power

• Web apps typically have a read/write ratio of somewhere between 80/20 and 90/10

• If we can scale read capacity, we can solve a lot of situations

• MySQL replication!

Page 45: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 45

Master-Slave Replication

Page 46: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 46

Master-Slave Replication

Reads and Writes

Reads

Page 47: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 47

Master-Slave Replication

Page 48: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 48

Master-Slave Replication

Page 49: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 49

Master-Slave Replication

Page 50: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 50

Master-Slave Replication

Page 51: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 51

Master-Slave Replication

Page 52: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 52

Master-Slave Replication

Page 53: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 53

Master-Slave Replication

Page 54: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 54

Master-Slave Replication

Page 55: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 55

Caching

• Caching avoids needing to scale!– Or makes it cheaper

• Simple stuff– mod_perl / shared memory – dumb– MySQL query cache - dumbish

Page 56: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 56

Caching

• Getting more complicated…– Write-through cache– Write-back cache– Sideline cache

Page 57: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 57

Write-through cache

Page 58: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 58

Write-back cache

Page 59: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 59

Sideline cache

Page 60: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 60

Sideline cache

• Easy to implement– Just add app logic

• Need to manually invalidate cache– Well designed code makes it easy

• Memcached– From Danga (LiveJournal)– http://www.danga.com/memcached/

Page 61: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 61

But what about HA?

Page 62: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 62

But what about HA?

Page 63: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 63

SPOF!

• The key to HA is avoiding SPOFs– Identify– Eliminate

• Some stuff is hard to solve– Fix it further up the tree

• Dual DCs solves Router/Switch SPOF

Page 64: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 64

Master-Master

Page 65: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 65

Master-Master

• Either hot/warm or hot/hot

• Writes can go to either– But avoid collisions– No auto-inc columns for hot/hot

• Bad for hot/warm too

– Design schema/access to avoid collisions• Hashing users to servers

Page 66: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 66

Rings

• Master-master is just a small ring– With 2 members

• Bigger rings are possible– But not a mesh!– Each slave may only have a single master– Unless you build some kind of manual

replication

Page 67: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 67

Rings

Page 68: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 68

Rings

Page 69: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 69

Dual trees

• Master-master is good for HA– But we can’t scale out the reads

• We often need to combine the read scaling with HA

• We can combine the two

Page 70: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 70

Dual trees

Page 71: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 71

Data federation

• At some point, you need more writes– This is tough– Each cluster of servers has limited write

capacity

• Just add more clusters!

Page 72: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 72

Data federation

• Split up large tables, organized by some primary object– Usually users

• Put all of a user’s data on one ‘cluster’– Or shard, or cell

• Have one central cluster for lookups

Page 73: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 73

Data federation

Page 74: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 74

Data federation

• Need more capacity?– Just add shards!– Don’t assign to shards based on user_id!

• For resource leveling as time goes on, we want to be able to move objects between shards– ‘Lockable’ objects

Page 75: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 75

Data federation

• Heterogeneous hardware is fine– Just give a larger/smaller proportion of objects

depending on hardware

• Bigger/faster hardware for paying users– A common approach

Page 76: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 76

Downsides

• Need to keep stuff in the right place

• App logic gets more complicated

• More clusters to manage– Backups, etc

• More database connections needed per page

• The dual table issue– Avoid walking the shards!

Page 77: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 77

Bottom line

Data federation is how large applications are

scaled

Page 78: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 78

Bottom line

• It’s hard, but not impossible

• Good software design makes it easier– Abstraction!

• Master-master pairs for shards give us HA

• Master-master trees work for central cluster (many reads, few writes)

Page 79: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 79

Multiple Datacenters

• Having multiple datacenters is hard– Not just with MySQL

• Hot/warm with MySQL slaved setup– But manual

• Hot/hot with master-master– But dangerous

• Hot/hot with sync/async manual replication– But tough

Page 80: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 80

Multiple Datacenters

Page 81: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 81

Serving lots of files

• Serving lots of files is not too tough– Just buy lots of machines and load balance!

• We’re IO bound – need more spindles!– But keeping many copies of data in sync is

hard– And sometimes we have other per-request

overhead (like auth)

Page 82: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 82

Reverse proxy

Page 83: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 83

Reverse proxy

• Serving out of memory is fast!– And our caching proxies can have disks too– Fast or otherwise

• More spindles is better• We stay in sync automatically

• We can parallelize it! – 50 cache servers gives us 50 times the serving rate of

the origin server– Assuming the working set is small enough to fit in

memory in the cache cluster

Page 84: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 84

Invalidation

• Dealing with invalidation is tricky

• We can prod the cache servers directly to clear stuff out– Scales badly – need to clear asset from every

server – doesn’t work well for 100 caches

Page 85: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 85

Invalidation

• We can change the URLs of modified resources– And let the old ones drop out cache naturally– Or prod them out, for sensitive data

• Good approach!– Avoids browser cache staleness– Hello akamai (and other CDNs)– Read more:

• http://www.thinkvitamin.com/features/webapps/serving-javascript-fast

Page 86: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 86

Reverse proxy

• Choices– L7 load balancer & Squid

• http://www.squid-cache.org/

– mod_proxy & mod_cache• http://www.apache.org/

– Perlbal and Memcache?• http://www.danga.com/

Page 87: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 87

High overhead serving

• What if you need to authenticate your asset serving– Private photos– Private data– Subscriber-only files

• Two main approaches

Page 88: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 88

Perlbal backhanding

• Perlbal can do redirection magic– Backend server sends header to Perbal– Perlbal goes to pick up the file from elsewhere– Transparent to user

Page 89: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 89

Perlbal backhanding

Page 90: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 90

Perlbal backhanding

• Doesn’t keep database around while serving

• Doesn’t keep app server around while serving

• User doesn’t find out how to access asset directly

Page 91: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 91

Permission URLs

• But why bother!?

• If we bake the auth into the URL then it saves the auth step

• We can do the auth on the web app servers when creating HTML

• Just need some magic to translate to paths

• We don’t want paths to be guessable

Page 92: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 92

Permission URLs

Page 93: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 93

Storing lots of files

• Storing files is easy!– Get a big disk– Get a bigger disk– Uh oh!

• Horizontal scaling is the key– Again

Page 94: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 94

Connecting to storage

• NFS– Stateful == Sucks– Hard mounts vs Soft mounts

• SMB / CIFS / Samba– Turn off MSRPC & WINS (NetBOIS NS)– Stateful but degrades gracefully

• HTTP– Stateless == yay!– Just use Apache

Page 95: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 95

Multiple volumes

• Volumes are limited in total size– Except under ZFS & others

• Sometimes we need multiple volumes for performance reasons– When use RAID with single/dual parity

• At some point, we need multiple volumes

Page 96: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 96

Multiple volumes

Page 97: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 97

Multiple hosts

• Further down the road, a single host will be too small

• Total throughput of machine becomes an issue

• Even physical space can start to matter

• So we need to be able to use multiple hosts

Page 98: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 98

Multiple hosts

Page 99: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 99

HA Storage

• HA is important for assets too– We can back stuff up– But we want it hot redundant

• RAID is good– RAID5 is cheap, RAID 10 is fast

Page 100: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 100

HA Storage

• But whole machines can fail

• So we stick assets on multiple machines

• In this case, we can ignore RAID– In failure case, we serve from alternative

source– But need to weigh up the rebuild time and

effort against the risk– Store more than 2 copies?

Page 101: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 101

HA Storage

Page 102: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 102

Self repairing systems

• When something fails, repairing can be a pain– RAID rebuilds by itself, but machine

replication doesn’t

• The big appliances self heal– NetApp, StorEdge, etc

• So does MogileFS

Page 103: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 103

Real world examples

• Flickr– Because I know it

• LiveJournal– Because everyone copies it

Page 104: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 104

FlickrArchitecture

Page 105: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 105

LiveJournalArchitecture

Page 106: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 106

Buy my book!

Page 107: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 107

The end!

Page 108: Scalable Web Architectures Common Patterns & Approaches Cal Henderson

SAM-SIG, 23rd August 2006 108

Awesome!

These slides are available online:

iamcal.com/talks/