qingpeng zhang 0711

25
Introducing VenmoPlus.com - Explore your Venmo network! Qingpeng “Q.P.” Zhang, Insight Data Engineering Fellow

Upload: qingpeng-qp-zhang

Post on 07-Apr-2017

61 views

Category:

Automotive


0 download

TRANSCRIPT

Page 1: Qingpeng zhang 0711

Introducing VenmoPlus.com - Explore your Venmo network!

Qingpeng “Q.P.” Zhang, Insight Data Engineering Fellow

Page 2: Qingpeng zhang 0711
Page 3: Qingpeng zhang 0711

Features - VenmoPlus.com

● fuzzy searching of user name, with friend list to help identify users with same name

● labeling the relationship between the payer and receiver● friend recommendation● searching transactions in friend circle● listing friends of the user

Page 4: Qingpeng zhang 0711

Features - VenmoPlus.com

● fuzzy searching of user name, with friend list to help identify users with same name

● labeling the relationship between the payer and receiver● friend recommendation● searching transactions in friend circle● listing friends of the user

Page 5: Qingpeng zhang 0711

Demo:VenmoPlus.com

Page 6: Qingpeng zhang 0711

Challenge:● Find the distance between nodes in dynamic graph in real time

Page 7: Qingpeng zhang 0711

Solutions

● Two databases○ Redis and ElasticSearch

● Algorithm design○ BFS -> Bidirectional Search○ Query relationship of a past transaction

● Query/search optimizations

Page 8: Qingpeng zhang 0711

Solutions

● Two databases○ Redis and ElasticSearch

● Algorithm design○ BFS -> Bidirectional Search○ Query relationship of a past transaction

● Query/search optimizations

Page 9: Qingpeng zhang 0711

Historical transactions

Real time transactions

A Tale of Two Databases

API

Page 10: Qingpeng zhang 0711

Redis for graph structure

420890 Graham Hadley

1630476 Leon Tang

810029 Harminder Toor

1371353 Ephraim Park

562884 Paul Min

420890 set(14935158, 562884)

1630476 set(1371353)

810029 set(190230,14935158)

1371353 set(810029,971156)

562884 set(196371,1371353)35 million edges6 million nodes

Page 11: Qingpeng zhang 0711

ElasticSearch for everything

Page 12: Qingpeng zhang 0711

Redis

Elasticsearch

Page 13: Qingpeng zhang 0711

Redis + Elasticsearch => search transactions in friend circle

Page 14: Qingpeng zhang 0711

VenmoPlus.com

m4.xlarge

m4.large

m4.xlarge

m4.large

t2.micro

$29.11/day

Page 15: Qingpeng zhang 0711

Qingpeng “Q.P.” Zhang

● Postdoc○ Lawrence Berkeley National Lab

● PhD in Computer Science, ○ Michigan State University

What I learned from Insight:

● Thinking as data engineer● Open source tools

○ Redis, Elasticsearch, Kafka, Spark Streaming, Flask, AngularJS, etc.

Page 16: Qingpeng zhang 0711

ElasticSearch for everything

Page 17: Qingpeng zhang 0711

Breadth First Search -> Bidirectional Search

Shortest distance -> intersection of sets (friend lists)

● A’s 1st degree friends ∩ B’s 1st degree friends● A’s 2nd degree friends ∩ B’s 1st degree friends

O(N^2) -> O(2*N)

O(N^3) -> O(N + N^2)

Page 18: Qingpeng zhang 0711

Query relationship of a past transaction

Page 19: Qingpeng zhang 0711

Query relationship of a past transaction

Query distance between vertices in a historic moment in a constantly changing graph (because we don’t pre-calculate the distance….)

● If there are transactions before that one, distance = 1● If the transaction is new: distance >1

○ Remove the influence of that specific transaction temporarily○ Check distance from graph (2, 3, or >3)

Page 20: Qingpeng zhang 0711
Page 21: Qingpeng zhang 0711
Page 22: Qingpeng zhang 0711

Pipeline, raw data, in distributed way

Page 23: Qingpeng zhang 0711

Query/Search Optimizations

1. Remove aggregation for better performance… (trade-off)2. Friend recommender:

a. Using Counter to get only 5 users with the most common friends

3. Search message in friend circlea. Combine query of Elasticsearch and Redis

Page 24: Qingpeng zhang 0711

More optimization

● Only store necessary info in elasticsearch● Labeling distance of history transaction can be done in batch job, reduce

the number the real time queries● Adjust AWS instances to reduce cost

Page 25: Qingpeng zhang 0711

Historical transactions

Real time transactions

Pipeline

API