real world cassandra

Download Real World Cassandra

Post on 04-Jul-2015




0 download

Embed Size (px)


  • 1. Cassandra in| Online Advertising:Real Time Biddingthe prospect engine for brands.

2. Who are we?Costa Sevdinoglou & Edward Capriolo 3. Impressions look like 4. A High Level look at RTB1. Browsers visit Publishers and create impressions.2. Publishers sell impressions via Exchanges.3. Exchanges serve as auction houses for the impressions4. On behalf of the marketer, m6d bids the impressions via the auction house. If m6d wins, we display our ad to the browser. 5. Performance and Data Billions and billions of bid requests a day A single request can result in multiple Cassandra Operations! One cluster is just under 10TB and growing Low latency requirement below 120 ms typical Limited data available to m6d via the exchange 6. Segment DataSegments are how we assign product or serviceaffinity to a group of users. Users we consider to belike minded with respect to a given brand will beplaced in the same segment.Segment Data is just one component of ouroverarching data model.Segments help to reduce the number of calculationswe do in real time. 7. Old Approach for Segment DataApplication Nodes(Tomcat + MySQL ) Limitations Periodically updated.MySQL Data Push Event Logs Only subsection of the data. Cluster performance is effected during a data push.AggregationHadoop 8. Cassandra Approachfor Segment DataApplication NodesBetter! (Tomcat + Less Updating in real time now MySQL Usage) possible Distributed not duplicated Less complexity to manage Storing more information We can now bid on users Cassandrasooner! 9. One Ring to rule them all 10. Peer to Peerper operation replication Fail fast, self-healing Each write goes to all natural endpoints Hinted handoff if destination is down Repair on Read No more:STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; START SLAVE; 11. Multi Data Center No designing and managing complex replication topologies create keyspace worldwith placement_strategy =org.apache.cassandra.locator.NetworkTopologyStrategyand strategy_options={1:3, 2:3, 3:3}; The same process as single data center No log shipping, or separate processes to run 12. Monitoring & Management Many Many things to monitor with JMX Nice command line tools Most values can be tweaked at run time 13. Capacity Planning How many Rows Columns Size of Average Column Latency requirements Throughput read and writes per sec 14. Unit Tests FTW! 15. Max 2 billion columns per row Awesome Unless you accidentally write 2 billion columns to a row key named null Check maxRowSize JMX Watch logs for messages about compactinglarge rows 16. Local (NYC) Meetups


View more >