machine learning and big data at foursquare

17
Machine Learning and Big Data at Foursquare Blake Shaw, PhD Data Scientist @ Foursquare @metablake

Upload: foursquarehq

Post on 16-Oct-2014

60.779 views

Category:

Documents


3 download

DESCRIPTION

At foursquare, we believe there is a huge opportunity to apply machine learning algorithms to the collective movement patterns of millions of people and build new services which help people better understand and connect with places.Foursquare is now aware of 25 million places worldwide, each of which can be described by unique signals about who is coming to these places, when, and for how long. We employ a variety of machine learning algorithms at foursquare to distill these signals into useful data for our app and our platform.In the slides below, we talk briefly about the data at foursquare and some interesting applications of machine learning. Enjoy!

TRANSCRIPT

Page 1: Machine Learning and Big Data at Foursquare

Machine Learning and Big Data at FoursquareBlake Shaw, PhDData Scientist @ Foursquare@metablake

Page 2: Machine Learning and Big Data at Foursquare

At foursquare, we think there is a great opportunity to leverage massive amounts of location data to help people better understand and connect to places

Page 3: Machine Learning and Big Data at Foursquare

What is foursquare?

An app that helps you explore your city and connect with friends

A platform for location based services and data

So, what is foursquare? It’s an app that helps you explore your city and connect with friends.

It’s also a platform for people to build location based services and collect and share location data

Page 4: Machine Learning and Big Data at Foursquare

What is foursquare?

People use foursquare to:• check in to places• discover new places• share w/ friends• get tips about places • get deals• earn points and badges• keep track of visits

People on foursquare “check-in” on their phones when they get to a place, to find out more about it, share that they are there with friends etc.

Page 5: Machine Learning and Big Data at Foursquare

What is foursquare?

Mobile Social

Local

Foursquare is in a unique place, sitting at the intersection between mobile, social, and geo.

Page 6: Machine Learning and Big Data at Foursquare

10,000,000+ people

25,000,000+ places

1,000,000,000+ check-ins

10,000+ actions/second

Stats

Foursquare is generating a ton of data, every second 35 people check-in to a location.

This data offers an unprecedented view into the behavior of millions of people worldwide, as they move around cities.

Page 7: Machine Learning and Big Data at Foursquare

Growth

Here we see the growth of the service over the last two years since, it started in mid 2009.

Page 8: Machine Learning and Big Data at Foursquare

Growth

Page 9: Machine Learning and Big Data at Foursquare

Growth

Foursquare now has data on over 25 million places all over the world.

Page 10: Machine Learning and Big Data at Foursquare

Learning with location data

• Check-ins are a rich source of data that describe human behavior

• We apply machine learning algorithms to the collective movement patterns of millions of people to build exciting new services

Check-ins are a rich source of information describing human behavior.

We apply machine learning algorithms to the collective movement patterns of millions of people to build exciting new services.

We use a variety of ML algorithms, collaborative filtering, pagerank, clustering, classification and regression

Page 11: Machine Learning and Big Data at Foursquare

Recommendation engine

•foursquare explore provides realtime recommendations using:• location• time of day• check-in history• friends preferences• venue similarities

For example, last year we launched foursquare explore, a recommendation engine that uses a variety of signals to recommend places in real time that a user might be interested in.

Explore uses a variety of machine learning models to rank venues. We combine many signals, including:

the location of the user, and the time of daythe persons past check-in history,the places their friends check inthe similarities between different venues

Page 12: Machine Learning and Big Data at Foursquare

Signals about places

Consider these signals about places. Each place has a different signature based on who is coming to the place, when, and for how long.

This plot shows 3 different places:

Gorilla Coffee, Gray’s Papaya, Amorino (a restaurant)

See how Gorilla Coffee is busy more in the morning, where Amorino is busy in the evening.

Gray’s Papaya clearly has a strong lunch crowd, but also a late night peak on the weekends.

How can we use machine learning to learn from these signals which places are similar?

Page 13: Machine Learning and Big Data at Foursquare

Networks of people

We also have unique signals that describe people.

Which people are friends. Who is checking in together. Etc.

From checkins we can build a large colocation network that can be used to better understand how people interact with each other in the real world.

Here we see an example of graph embedding applied to the foursquare employee network. People are placed near each other in 2D if they often colocate at similar places.

Page 14: Machine Learning and Big Data at Foursquare

Networks of people

Brooklyn

SFManha-an

Australia

Different parts of this map line up to the different places in the world where foursquare employees live.

This plot was made by applying minimum volume embedding, a non-linear graph based dimensionality reduction algorithm, to the foursquare employee network.

Each person on this map can be described by thousands of numbers, indicating how often they visit different places. The goal is to reduce the dimensionality of this space to 2D while preserving the strong pairwise relationships.

Page 15: Machine Learning and Big Data at Foursquare

Open questions

• How to measure similarity between people and places?• How to determine influence in large

networks of people and places?• What statistics can we use to describe

people’s behavior in the real-world?• How do we predict what information will be

timely and relevant to a user?

We are constantly considering the best ways to address many of these questions...

Page 16: Machine Learning and Big Data at Foursquare

Our data stack

• MongoDB• Amazon S3, Elastic Mapreduce• Hadoop• Hive• Flume• R and Matlab

All of this is possible because of our world-class data stack. Amazon S3 and EC2 allow us on-demand access to huge computational resources

Page 17: Machine Learning and Big Data at Foursquare

Join us!foursquare is hiring! 85+ people and growing

foursquare.com/jobs

Blake Shaw@[email protected]

Thanks so much.

Foursquare is hiring, if these projects seem interesting to you, please contact us at foursquare.com/jobs