acunu analytics: simpler real-time cassandra apps

33
Acunu Analytics: Simpler Real-Time Cassandra Apps Tim Moreton CTO @timmoreton Monday, 29 April 13

Upload: acunu

Post on 20-Jan-2015

1.049 views

Category:

Technology


1 download

DESCRIPTION

Talk for the Cassandra Seattle Meetup April 2013: http://www.meetup.com/cassandra-seattle/events/114988872/ Cassandra's got some properties which make it an ideal fit for building real-time analytics applications -- but getting from atomic increments to live dashboards and streaming queries is quite a stretch. In this talk, Tim Moreton, CTO at Acunu, talks about how and why they built Acunu Analytics, which adds rich SQL-like queries and a RESTful API on top of Cassandra, and looks at how it keeps Cassandra's spirit of denormalization under the hood.

TRANSCRIPT

Page 1: Acunu Analytics: Simpler Real-Time Cassandra Apps

Acunu Analytics: Simpler Real-Time Cassandra Apps

Tim Moreton CTO@timmoreton

Monday, 29 April 13

Page 2: Acunu Analytics: Simpler Real-Time Cassandra Apps

2

• Scalable. No single point of {failure, bottleneck}• Fast. Especially for writes•Available. Effortless Multi-DC support•Maturing fast. Lots of production deployments

WE C*

Monday, 29 April 13

Page 3: Acunu Analytics: Simpler Real-Time Cassandra Apps

3

WE C*

Virtual nodes CQL Support

Monday, 29 April 13

Page 4: Acunu Analytics: Simpler Real-Time Cassandra Apps

4

• Spartan queries •Thrift (and CQL, a bit) •Denormalization hurts agility •Weak update semantics

Challenges remain, of course.

WE C*

Monday, 29 April 13

Page 5: Acunu Analytics: Simpler Real-Time Cassandra Apps

5

C*: Two uses

Monday, 29 April 13

Page 6: Acunu Analytics: Simpler Real-Time Cassandra Apps

5

Session storage02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html

• Many more reads than writes

• Updates to existing records(ideally, transactionally)

• Probably fits in RAM:distribute for availability

C*: Two uses

Monday, 29 April 13

Page 7: Acunu Analytics: Simpler Real-Time Cassandra Apps

5

Real-time analytics

02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html

• Many more writes than reads

• Almost all reads are to results

• Almost no writes are ‘updates’

• Distribute for availability, performance, capacity

Session storage02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html

• Many more reads than writes

• Updates to existing records(ideally, transactionally)

• Probably fits in RAM:distribute for availability

C*: Two uses

Monday, 29 April 13

Page 8: Acunu Analytics: Simpler Real-Time Cassandra Apps

5

Real-time analytics

02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html

• Many more writes than reads

• Almost all reads are to results

• Almost no writes are ‘updates’

• Distribute for availability, performance, capacity

Session storage02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html

• Many more reads than writes

• Updates to existing records(ideally, transactionally)

• Probably fits in RAM:distribute for availability

C*: Two uses

Monday, 29 April 13

Page 9: Acunu Analytics: Simpler Real-Time Cassandra Apps

6

C* on

•Rich, SQL-like queries•RESTful HTTP APIs, JSON-based•Automated denormalization •Update semantics < less critical for analytics

Supplement Cassandra with:

Monday, 29 April 13

Page 10: Acunu Analytics: Simpler Real-Time Cassandra Apps

7

Analytics: Two patterns

Monday, 29 April 13

Page 11: Acunu Analytics: Simpler Real-Time Cassandra Apps

7

Exploratory Analytics

UnstructuredWarehouses

Data Mining

?Machine Learning

Analytics: Two patterns

Monday, 29 April 13

Page 12: Acunu Analytics: Simpler Real-Time Cassandra Apps

7

Exploratory Analytics

UnstructuredWarehouses

Data Mining

?Machine Learning

Analytics: Two patterns

Operational Intelligence

Dashboards Real-time Decisions

Alerting

!

Monday, 29 April 13

Page 13: Acunu Analytics: Simpler Real-Time Cassandra Apps

7

Exploratory Analytics

UnstructuredWarehouses

Data Mining

?Machine Learning

Analytics: Two patterns

Operational Intelligence

Dashboards Real-time Decisions

Alerting

!

Complex analysis, data varietyQuery richness

Data freshness, response timeQuery speed

Monday, 29 April 13

Page 14: Acunu Analytics: Simpler Real-Time Cassandra Apps

7

Exploratory Analytics

UnstructuredWarehouses

Data Mining

?Machine Learning

Analytics: Two patterns

Operational Intelligence

Dashboards Real-time Decisions

Alerting

!

Complex analysis, data varietyQuery richness

Data freshness, response timeQuery speed

Monday, 29 April 13

Page 15: Acunu Analytics: Simpler Real-Time Cassandra Apps

8

API

event stream

event store

roll-upcubes

Ingest Processing

dashboard queries programatic interface

Monday, 29 April 13

Page 16: Acunu Analytics: Simpler Real-Time Cassandra Apps

9

Who uses Acunu?

Location DataWeb and Visitor

Market/Tick Data

Infrastructure

Sensor Data

Social Media

Social GamingSmart Grid

Production Line

Monday, 29 April 13

Page 17: Acunu Analytics: Simpler Real-Time Cassandra Apps

10

Monday, 29 April 13

Page 18: Acunu Analytics: Simpler Real-Time Cassandra Apps

10

API

event stream

event store

roll-upcubes

Ingest Processing

dashboard queries programatic interfaceAPI

event stream

event store

roll-upcubes

Ingest Processing

dashboard queries programatic interface

Cassandra stores raw events and intermediate aggregates

Monday, 29 April 13

Page 19: Acunu Analytics: Simpler Real-Time Cassandra Apps

10

API

event stream

event store

roll-upcubes

Ingest Processing

dashboard queries programatic interfaceAPI

event stream

event store

roll-upcubes

Ingest Processing

dashboard queries programatic interface

Cassandra stores raw events and intermediate aggregates

API

event stream

event store

roll-upcubes

Ingest Processing

dashboard queries programatic interface

Acunu Analytics is a Cassandra client mapping new events, queries and schema changes to aggregate reads and writes

!

API

event stream

event store

roll-upcubes

Ingest Processing

dashboard queries programatic interface

Monday, 29 April 13

Page 20: Acunu Analytics: Simpler Real-Time Cassandra Apps

10

API

event stream

event store

roll-upcubes

Ingest Processing

dashboard queries programatic interfaceAPI

event stream

event store

roll-upcubes

Ingest Processing

dashboard queries programatic interface

Cassandra stores raw events and intermediate aggregates

Acunu Dashboards provides embeddable, custom data visualization using HTTP API

API

event stream

event store

roll-upcubes

Ingest Processing

dashboard queries programatic interface

Acunu Analytics is a Cassandra client mapping new events, queries and schema changes to aggregate reads and writes

!

API

event stream

event store

roll-upcubes

Ingest Processing

dashboard queries programatic interface

Monday, 29 April 13

Page 21: Acunu Analytics: Simpler Real-Time Cassandra Apps

CREATE TABLE APICalls (time TIME(‘PST’, HOUR, MIN, SEC),path PATH(/),useragent STRING,latitude DOUBLE(0.1, 0.01),longitude DOUBLE(0.1, 0.01)

);

CREATE CUBE SELECT COUNT, AVG(respTime) FROM APICalls WHERE time, path GROUP BY time, path;

CREATE CUBE SELECT COUNT FROM APICalls WHERE latitude, longitude GROUP BY latitude, longitude;

11

(Loosely) Define a schema

• Tables have HTTP endpoint; map to a set of ColumnFamilys• Dimensions map keys in events; allow hierarchical aggregation• Cubes defines dimensions and aggregate to maintain

Monday, 29 April 13

Page 22: Acunu Analytics: Simpler Real-Time Cassandra Apps

CREATE CUBE SELECT SUM(a) FROM t WHERE x, y GROUP BY g, h, i;

12

Aggregation

API

event stream

event store

roll-upcubes

Ingest Processing

dashboard queries programatic interface

Monday, 29 April 13

Page 23: Acunu Analytics: Simpler Real-Time Cassandra Apps

CREATE CUBE SELECT SUM(a) FROM t WHERE x, y GROUP BY g, h, i;

12

Aggregation

API

event stream

event store

roll-upcubes

Ingest Processing

dashboard queries programatic interface

New event:Apply SUM(v, v’) on this cell

vA: v’X: xY: yZ: z

y

x

(g, h, i)

Monday, 29 April 13

Page 24: Acunu Analytics: Simpler Real-Time Cassandra Apps

CREATE CUBE SELECT SUM(a) FROM t WHERE x, y GROUP BY g, h, i;

12

Aggregation

• Hierarchical dimensions cause multiple writes per event(That’s ok: Cassandra’s good at writes)

• Most aggregates result in atomic counter increments

API

event stream

event store

roll-upcubes

Ingest Processing

dashboard queries programatic interface

New event:Apply SUM(v, v’) on this cell

vA: v’X: xY: yZ: z

y

x

(g, h, i)

Monday, 29 April 13

Page 25: Acunu Analytics: Simpler Real-Time Cassandra Apps

SELECT SUM(a) FROM t WHERE x = .. and y = .. GROUP BY g, h, i;

13

Queries

API

event stream

event store

roll-upcubes

Ingest Processing

dashboard queries programatic interface

• WHEREs map to a Cassandra row and GROUP BY to a compound column key in that row (very roughly)

Monday, 29 April 13

Page 26: Acunu Analytics: Simpler Real-Time Cassandra Apps

SELECT SUM(a) FROM t WHERE x = .. and y = .. GROUP BY g, h, i;

13

Queries

API

event stream

event store

roll-upcubes

Ingest Processing

dashboard queries programatic interface

New query:

• Locate slice that matches WHERE

• Return all mappings from GROUP BY tuples to cell values

vy

x

(g, h, i)

• WHEREs map to a Cassandra row and GROUP BY to a compound column key in that row (very roughly)

Monday, 29 April 13

Page 27: Acunu Analytics: Simpler Real-Time Cassandra Apps

21:00 all→1345 :00→45 :01→62 :02→87 ...

22:00 all→3221 :00→22 :01→19 :02→104 ...

... ...

UK all→228 user01→1 user14→12 user99→7 ...

US all→354 user01→4 user04→8 user56→17 ...

...

UK, 22:00 all→1904 ...

∅ all→87314 UK→238 US→354 ...

14

A concrete example

Monday, 29 April 13

Page 28: Acunu Analytics: Simpler Real-Time Cassandra Apps

21:00 all→1345 :00→45 :01→62 :02→87 ...

22:00 all→3222 :00→22 :01→19 :02→105 ...

... ...

UK all→229 user01→2 user14→12 user99→7 ...

US all→354 user01→4 user04→8 user56→17 ...

...

UK, 22:00 all→1905 ...

∅ all→87315 UK→239 US→355 ...

{cust_id: user01,session_id: 102,geography: UK,browser: IE,time: 22:02,

}

15

Each event updates multiple aggregates:

A concrete example

Monday, 29 April 13

Page 29: Acunu Analytics: Simpler Real-Time Cassandra Apps

21:00 all→1345 :00→45 :01→62 :02→87 ...

22:00 all→3222 :00→22 :01→19 :02→105 ...

... ...

UK all→229 user01→2 user14→12 user99→7 ...

US all→354 user01→4 user04→8 user56→17 ...

...

UK, 22:00 all→1905 ...

∅ all→87315 UK→239 US→355 ...

{cust_id: user01,session_id: 102,geography: UK,browser: IE,time: 22:02,

}

15

Each event updates multiple aggregates:

WHERE time IN (22:00,23:00)GROUP BY minute

A concrete example

Monday, 29 April 13

Page 30: Acunu Analytics: Simpler Real-Time Cassandra Apps

21:00 all→1345 :00→45 :01→62 :02→87 ...

22:00 all→3222 :00→22 :01→19 :02→105 ...

... ...

UK all→229 user01→2 user14→12 user99→7 ...

US all→354 user01→4 user04→8 user56→17 ...

...

UK, 22:00 all→1905 ...

∅ all→87315 UK→239 US→355 ...

{cust_id: user01,session_id: 102,geography: UK,browser: IE,time: 22:02,

}

15

Each event updates multiple aggregates:

WHERE time IN (22:00,23:00)GROUP BY minute

WHERE geography=US GROUP BY user

A concrete example

Monday, 29 April 13

Page 31: Acunu Analytics: Simpler Real-Time Cassandra Apps

16

SELECT `SUM(x)/(MAX(y) - MIN(y) + 0.5) AS 'spread' FROM ...

Arithmetic expressions

SELECT a - b AS lbound, a + b AS ubound FROM (SELECT AVG(score) AS a FROM scores WHERE year = 2012) JOIN (SELECT STDDEV(score) AS b FROM scores) USING (school)

Fast inner joins

SELECT COUNT UNIQUE (visitors) GROUP BY time(DAY(‘US/Pacific’))

Time zone support

SELECT SUM(size) FROM ..WHERE path MATCHES /usr/*

Hierarchical aggregationSELECT DRILL FROM errors WHERE category IN (“warn”, “error”)

Drill down to raw events

SELECT COUNT (items) FROM ..GROUP BY category LIMIT 3, country

... HAVING AVG(rating) < 2.0 AND COUNT >= 10

Limits

Query-time filtering

Rich queries

Monday, 29 April 13

Page 32: Acunu Analytics: Simpler Real-Time Cassandra Apps

17

Monday, 29 April 13

Page 33: Acunu Analytics: Simpler Real-Time Cassandra Apps

Apache, Apache Cassandra, Cassandra, Hadoop, and the eye and elephant logos are trademarks of the Apache Software Foundation.

Thank You.

Tim Moreton CTO@timmoreton

Monday, 29 April 13