sfpug lightning talk

16
Retroactive Analytics With Distributed PostgreSQL Dan Robinson Heap

Upload: dan-robinson

Post on 23-Dec-2014

142 views

Category:

Engineering


2 download

DESCRIPTION

SFPUG Lightning Talk

TRANSCRIPT

Page 1: SFPUG Lightning Talk

Retroactive Analytics With Distributed PostgreSQL

Dan RobinsonHeap

Page 2: SFPUG Lightning Talk

● Heap: web/iOS analytics that captures everything

Page 3: SFPUG Lightning Talk
Page 4: SFPUG Lightning Talk
Page 5: SFPUG Lightning Talk
Page 6: SFPUG Lightning Talk
Page 7: SFPUG Lightning Talk

● Heap: web/iOS analytics that captures everything

● Making this interactive is hard!

Page 8: SFPUG Lightning Talk

app_id user_id

properties HSTORE events HSTORE[]

12345 102756 email=>’[email protected]’, ab_test_grp=>’A’ ...

12345 300732 ab_test_grp=>’B’ ...

67890 628537 ...

49964 368868 utm_campaign=>’social’ ...

Page 9: SFPUG Lightning Talk

app_id user_id properties HSTORE events HSTORE[]

12345 102756 email=>’[email protected]’, ab_test_grp=>’A’

...

12345 300732 ab_test_grp=>’B’ ...

67890 628537 ...

49964 368868 utm_campaign=>’social’ ...

app_id user_id

properties HSTORE events HSTORE[]

75632 257186 ... ...

75632 120554 ... ...

app_id user_id

properties HSTORE

events HSTORE[]

……

users_001

users_002

users

Page 10: SFPUG Lightning Talk

app_id user_id properties HSTORE events HSTORE[]

12345 102756 email=>’[email protected]’, ab_test_grp=>’A’

...

12345 300732 ab_test_grp=>’B’ ...

67890 628537 ...

49964 368868 utm_campaign=>’social’ ...

……

users_001

users_002

users

app_id user_id

properties HSTORE events HSTORE[]

75632 257186 ... ...

75632 120554 ... ...

SELECT COUNT(*)FROM usersWHERE app_id = 12345GROUP BY events[1]->'path'

Page 11: SFPUG Lightning Talk

users_001

users

SELECT COUNT(*)FROM usersWHERE app_id = 12345GROUP BY events[1]->'path'

SELECT COUNT(*)FROM users_001WHERE app_id = 12345GROUP BY events[1]->'path'

Page 12: SFPUG Lightning Talk

● Denormalized → fast, no joins.

● Subqueries are just postgres.

● Add UDFs for more expressiveness.

Page 13: SFPUG Lightning Talk

funnel_events(events hstore[], pattern_array text[]) RETURNS int[]-- Returns an array with 1s corresponding to steps completed-- in the funnel, 0s in the other positions

Page 14: SFPUG Lightning Talk

funnel_events(events hstore[], pattern_array text[]) RETURNS int[]-- Returns an array with 1s corresponding to steps completed-- in the funnel, 0s in the other positions

SELECT sum( funnel_events( events, ARRAY['"path"=>"/","object"=>"pageview"', '"type"=>"submit","hierarchy"=>like "%@form;#signup;%"'] )) AS "funnel_results"FROM usersWHERE app_id = 12345

Page 15: SFPUG Lightning Talk

● Denormalized schema. (No joins.)

● CitusDB to distribute queries.

● Express any analysis with UDFs.

Page 16: SFPUG Lightning Talk

Questions?