architecture by accident

53
Architecture by accident Gleicon Moraes

Upload: gleicon-moraes

Post on 25-May-2015

16.776 views

Category:

Technology


5 download

DESCRIPTION

Presentation for 2nd Sao Paulo Perl Workshop on May 7th 2011 - notes and comments about architecture evolution on legacy systems

TRANSCRIPT

Page 1: Architecture by Accident

Architecture by accident

Gleicon  Moraes  

Page 2: Architecture by Accident

Required Listening

Page 3: Architecture by Accident

Agenda

•  Architecture for data - even if you don’t want it •  Databases •  Message Queues •  Cache

Page 4: Architecture by Accident

Architecture

“Everyone  has  a  plan  un4l  they  get  punched  in  the  mouth”  –  Mike  Tyson    

Page 5: Architecture by Accident

Even if you dont want it ...

•  There is an innate architecture on everything •  You may end up with more data than you had

planned to •  You may get away from your quick and dirty CRUD •  You probably are querying more than one Database •  At some point you laugh when your boss asks you

about 'Integrating Systems' •  Code turns into legacy - and so architectures •  'Scattered' is not the same that 'Distributed'

Page 6: Architecture by Accident

It  usually  starts  like  this  

DatabaseApp Server

Page 7: Architecture by Accident

then

App Servers Database

Page 8: Architecture by Accident

it

App Servers Master DB

Slave DB

Page 9: Architecture by Accident

goes

App Servers Master DB

Slave DB

Cache

Page 10: Architecture by Accident

like App Servers Master DB

Slave DB

Cache

Indexing Service

Page 11: Architecture by Accident

this App Servers Master DB

Slave DB

Cache

Indexing Service

API Servers

Page 12: Architecture by Accident

and App Servers Master DB

Slave DB

Cache

Indexing Service

API Servers

Load Balancer/Reverse Proxy

Page 13: Architecture by Accident

beyond App Servers Master DB

Slave DB

Cache

Indexing Service

API Servers

Load Balancer/Reverse Proxy

Auth Service

Page 14: Architecture by Accident
Page 15: Architecture by Accident

Problem is... An architect’s first work is apt to be spare and clean. He knows he doesn’t know what he’s doing, so he does it carefully and with great restraint.   As he designs the first work, frill after frill and embellishment after embellishment occur to him. These get stored away to be used “next time.” Sooner or later the first system is finished, and the architect, with firm confidence and a demonstrated mastery of that class of systems, is ready to build a second system.   This second is the most dangerous system a man ever designs. When he does his third and later ones, his prior experiences will confirm each other as to the general characteristics of such systems, and their differences will identify those parts of his experience that are particular and not generalizable.  The general tendency is to over-design the second system, using all the ideas and frills that were cautiously sidetracked on the first one. The result, as Ovid says, is a “big pile.”  

— Frederick P. Brooks, Jr. The Mythical Man-Month

Page 16: Architecture by Accident

Databases

Page 17: Architecture by Accident

Databases  

•  Not  an  off-­‐the-­‐shelf  architectural  duct  tape  •  Not  only  rela4onal,  other  paradigms  •  Usually  the  last  place  sought  for  op4miza4on  •  Usually  the  first  place  to  accomodate  last  minute  changes  

•  Good  ideas  to  try  out:  Sharding  and  Denormaliza4on  

•  Some  of  your  problems  may  require  something  other  than  a  Rela4onal  Database  

Page 18: Architecture by Accident
Page 19: Architecture by Accident

Relevant RDBMS Anti-Patterns

–  Dynamic table creation –  Table as cache –  Table as queue –  Table as log file –  Distributed Global Locking –  Stoned Procedures –  Row Alignment –  Extreme JOINs –  Your ORM issue full queries for Dataset iterations –  Throttle Control

Page 20: Architecture by Accident

Dynamic table creation Problem: To avoid huge tables, "dynamic schema” is created. For example, lets think about a document management company, which is adding new facilities over the country. For each storage facility, a new table is created: item_id - row - column - stuff 1 - 10 - 20 - cat food 2 - 12 - 32 - trout Side Effect: "dynamic queries", which will probably query a "central storage" table and issue a huge join to check if you have enough cat food over the country. It’s different from Sharding. Alternative: - Document storage, modeling a facility as a document -  Key/Value, modeling each facility as a SET -  Sharding properly

Page 21: Architecture by Accident

Table as cache Problem: Complex queries demand that a result be stored in a separated table, so it can be queried quickly. Worst than views Alternative: - Really ? - Memcached - Redis + AOF + EXPIRE - Denormalization

Page 22: Architecture by Accident

Table as queue Problem: A table which holds messages to be completed. Worse, they must be sorted by date. Alternative: - RestMQ, Resque - Any other message broker - Redis (LISTS - LPUSH + RPOP) - Use the right tool

Page 23: Architecture by Accident

Table as log file Problem: A table in which data gets written as a log file. From time to time it needs to be purged. Truncating this table once a day usually is the first task assigned to trainee DBAs. Alternative: - MongoDB capped collection - Redis, and a RRD pattern -  RIAK

Page 24: Architecture by Accident

Distributed Global Locking Problem: Someone learns java and synchronize. A bit later genius thinks that a distributed synchronize would be awesome. The proper place to do that would be the database of course. Start with a reference counter in a table and end up with this: > select COALESCE(GET_LOCK('my_lock',0 ),0 ) Plain and simple, you might find it embedded in a magic class called DistributedSynchronize or ClusterSemaphore. Locks, transactions and reference counters (which may act as soft locks) doesn't belongs to the database.

Page 25: Architecture by Accident

Stoned procedures Problem: Stored procedures hold most of your applications logic. Also, some triggers are used to - well - trigger important data events. SP and triggers has the magic property of vanishing of our mind instantly, being impossible to keep versioned. Alternative: - Careful so you don’t use map/reduce as stoned procedures. - Use your preferred language for business stuff, and let event handling to pub/sub or message queues.

Page 26: Architecture by Accident

Row Alignment Problem: Extra rows are created but not used, just in case. Usually they are named as a1, a2, a3, a4 and called padding. There's good will behind that, specially when version 1 of the software needed an extra column in a 150M lines database and it took 2 days to run an ALTER TABLE. Alternative: - Document based databases as MongoDB and CouchDB, where new atributes are local to the document. Also, having no schema helps - Column based databases may be not the best choice if column creation need restart/migrations

Page 27: Architecture by Accident

Extreme JOINs Problem: Business rules modeled as tables. Table inheritance (Product -> SubProduct_A). To find the complete data for a user plan, one must issue gigantic queries with lots of JOINs. Alternative: - Document storage, as MongoDB - Denormalization - Serialized objects

Page 28: Architecture by Accident

Your ORM ... Problem: Your ORM issue full queries for dataset iterations, your ORM maps and creates tables which mimics your classes, even the inheritance, and the performance is bad because the queries are huge, etc, etc Alternative: Apart from denormalization and good old common sense, ORMs are trying to bridge two things with distinct impedance. There is nothing to relational models which maps cleanly to classes and objects. Not even the basic unit which is the domain(set) of each column. Black Magic ?

Page 29: Architecture by Accident

Throttle Control Problem: A request tracker to create a throttle control by IP address, login, operation or any other event using a relational database   Ranging from an update … select to a lock/transaction block, no relational database would be the best place to do that. Alternative: use memcached, redis or any other DHT which has expiration by creating a key as THROTLE:<IP>:YYYYMMDDHH and increment it. At first glance sounds the same but the expiration will take care of cleaning up old entries. Also search time is the same as looking for a key.

Page 30: Architecture by Accident

No silver bullet - Consider alternatives   - Think outside the norm   - Denormalize   - Simplify  

Page 31: Architecture by Accident

Cycle of changes - Product A 1. There was the database model  2. Then, the cache was needed. Performance was no good.  3. Cache key: query, value: resultset  4. High or inexistent expiration time [w00t]  

(Now there's a turning point. Data didn't need to change often. Denormalization was a given with cache)   5. The cache needs to be warmed or the app wont work.  6. Key/Value storage was a natural choice. No data on MySQL anymore.

Page 32: Architecture by Accident

Cycle of changes - Product B 1. Postgres DB storing crawler results.  2. There was a counter in each row, and updating this counter

caused contention errors.  3. Memcache for reads. Performance is better.  4. First MongoDB test, no more deadlocks from counter

update.  5. Data model was simplified, the entire crawled doc was

stored.  

Page 33: Architecture by Accident

Stuff to think about Think if the data you use aren't denormalized (cached)   Most of the anti-patterns contain signs that a non-relational route (or at least a partial route) may help.   Are you dependent on cache ? Does your application fails when there is no cache ? Does it just slows down ?   Are you ready to think more about your data ?   Think about the way to put and to get back your data from the database (be it SQL or NoSQL).  

Page 34: Architecture by Accident

Extra - MongoDB and Redis The next two slides are here to show what is like to use MongoDB and Redis for the same task.   There is more to managing your data than stuffing it inside a database. You gotta plan ahead for searches and migrations.   This example is about storing books and searching between them. MongoDB makes it simpler, just liek using its query language. Redis requires that you keep track of tags and ids to use SET operations to recover which books you want.   Check http://rediscookbook.org and http://cookbook.mongodb.org/ for recipes on data handling.

Page 35: Architecture by Accident

MongoDB/Redis recap - Books MongoDB   {   'id': 1,   'title' : 'Diving into Python',   'author': 'Mark Pilgrim',   'tags': ['python','programming', 'computing']  }   {   'id':2,   'title' : 'Programing Erlang',   'author': 'Joe Armstrong',   'tags': ['erlang','programming', 'computing', 'distributedcomputing', 'FP']  }   {   'id':3,   'title' : 'Programing in Haskell',   'author': 'Graham Hutton',   'tags': ['haskell','programming', 'computing', 'FP']  }  

Redis   SET book:1 {'title' : 'Diving into Python',   'author': 'Mark Pilgrim'}  SET book:2 { 'title' : 'Programing Erlang',   'author': 'Joe Armstrong'}  SET book:3 { 'title' : 'Programing in Haskell',   'author': 'Graham Hutton'}   SADD tag:python 1  SADD tag:erlang 2  SADD tag:haskell 3  SADD tag:programming 1 2 3  SADD tag computing 1 2 3  SADD tag:distributedcomputing 2  SADD tag:FP 2 3  

Page 36: Architecture by Accident

MongoDB/Redis recap - Books MongoDB   Search tags for erlang or haskell:   db.books.find({"tags":   { $in: ['erlang', 'haskell']   }  })   Search tags for erlang AND haskell (no results)   db.books.find({"tags":   { $all: ['erlang', 'haskell']   }  })   This search yields 3 results  db.books.find({"tags":   { $all: ['programming', 'computing']   }  })  

Redis   SINTER 'tag:erlang' 'tag:haskell'  0 results   SINTER 'tag:programming' 'tag:computing'  3 results: 1, 2, 3   SUNION 'tag:erlang' 'tag:haskell'  2 results: 2 and 3   SDIFF 'tag:programming' 'tag:haskell'  2 results: 1 and 2 (haskell is excluded)  

Page 37: Architecture by Accident

Message Queues

Page 38: Architecture by Accident

Decoupling db writes with Message Queues

Page 39: Architecture by Accident

Coupled comment

Page 40: Architecture by Accident

Uncoupled comment - producer

Page 41: Architecture by Accident

Uncoupled comment - consumer

Page 42: Architecture by Accident

Async HTML Scrapper Fetch Page

1st parse

Fetch Page

1st parse

Fetch Page

1st parse

Fetch Page

1st parse

Consumer

Message Queue

Page 43: Architecture by Accident

M/R  Fetch Data

Map(Fun)

Fetch Data

Map(Fun)

Fetch Data

Map(Fun)

Fetch Data

Map(Fun)

Reduce

Message Queue

Page 44: Architecture by Accident

M/R  –  Wordcount(Map)  

Page 45: Architecture by Accident

M/R  –  Wordcount(Reduce)  

Page 46: Architecture by Accident
Page 47: Architecture by Accident

Cache

Page 48: Architecture by Accident

Cache

Page 49: Architecture by Accident

HTML processing - no cache

http://github.com/gleicon/vuvuzelr/proxy_no_cache.rb

Page 50: Architecture by Accident

HTML processing - Cached

http://github.com/gleicon/vuvuzelr/proxy.rb

Page 51: Architecture by Accident
Page 52: Architecture by Accident

Conclusion

Page 53: Architecture by Accident

Thanks  

•  @gleicon  •  hQp://www.7co.cc  •  hQp://github.com/gleicon  •  [email protected]