Transcript
Page 1: Real World Cassandra

|

the prospect engine for brands.

Cassandra in Online Advertising: Real Time Bidding

Page 2: Real World Cassandra

Who are we?

Costa Sevdinoglou & Edward Capriolo

Page 3: Real World Cassandra

Impressions look like…

Page 4: Real World Cassandra

A High Level look at RTB

4. On behalf of the marketer, m6d bids the impressions via the

auction house. If m6d wins, we display our ad to the

browser.

3. Exchanges serve as auction houses for the impressions

1. Browsers visit Publishers and create impressions.

2. Publishers sell impressions via Exchanges.

Page 5: Real World Cassandra

Performance and Data

• Billions and billions of bid requests a day

• A single request can result in multiple Cassandra Operations!

• One cluster is just under 10TB and growing

• Low latency requirement below 120 ms typical

• Limited data available to m6d via the exchange

Page 6: Real World Cassandra

Segment Data

Segments are how we assign product or service

affinity to a group of users. User’s we consider to be

like minded with respect to a given brand will be

placed in the same segment.

Segment Data is just one component of our

overarching data model.

Segments help to reduce the number of calculations

we do in real time.

Page 7: Real World Cassandra

Old Approach for Segment Data

Limitations

•Periodically updated.

•Only subsection of

the data.

•Cluster performance

is effected during a

data push.

Application Nodes (Tomcat + MySQL )

Event Logs

Hadoop Aggregation

MySQL Data Push

Page 8: Real World Cassandra

Cassandra Approach for Segment Data

Better!

• Updating in real time now

possible

• Distributed not duplicated

• Less complexity to manage

• Storing more information

• We can now bid on users

sooner!

Application Nodes (Tomcat + Less MySQL Usage)

Cassandra

Page 9: Real World Cassandra

One Ring to rule them all

http://askyyy.blog.163.com/blog/static/1234575992010428819399/

Page 10: Real World Cassandra

Peer to Peer per operation replication

Fail fast, self-healing

Each write goes to all natural endpoints

Hinted handoff if destination is down

Repair on Read

No more: STOP SLAVE; SET GLOBAL

SQL_SLAVE_SKIP_COUNTER = 1; START SLAVE;

Page 11: Real World Cassandra

Multi Data Center

No designing and managing complex replication topologies

create keyspace world

with placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy'

and strategy_options={1:3, 2:3, 3:3};

The same process as single data center

No log shipping, or separate processes to run

Page 12: Real World Cassandra

Monitoring & Management

Many Many things to monitor with JMX

Nice command line tools

Most values can be tweaked at run time

Page 13: Real World Cassandra

Capacity Planning

How many

Rows

Columns

Size of Average Column

Latency requirements

Throughput read and writes per sec

Page 14: Real World Cassandra

Unit Tests FTW!

Page 15: Real World Cassandra

Max 2 billion columns per row

Awesome

Unless you accidentally write 2 billion columns to a row key named “null”

Check maxRowSize JMX

Watch logs for messages about compacting large rows

Page 16: Real World Cassandra

Local (NYC) Meetups

www.meetup.com/NYC-Cassandra-User-Group/


Top Related