zomato crawler & recommender

20
CONTENT & LOCATION AWARE RESTAURANT RECOMMENDATIONS USING URBAN REVIEW NETWORKS PROJECT REPORT Submitted By Jayant Jaiswal, Roll No-12600112104, Regn No-121260110042 Shoaib Khan, Roll No-12600112163, Regn No-121260110101 Rohan Agarwal, Roll No-12600112143, Regn No-121260110081 Under the Supervision of Asst. Prof. Partha Basuchowdhuri Computer Science & Engineering in partial fulfillment for the award of the degree of BACHELOR OF TECHNOLOGY In COMPUTER SCIENCE & ENGINEERING HERITAGE INSTITUTE OF TECHNOLOGY, KOLKATA MAULANA ABUL KALAM AZAD UNIVERSITY OF TECHNOLOGY

Upload: shoaib-khan

Post on 15-Apr-2017

211 views

Category:

Data & Analytics


6 download

TRANSCRIPT

Page 1: Zomato Crawler & Recommender

CONTENT & LOCATION AWARE

RESTAURANT RECOMMENDATIONS

USING URBAN REVIEW NETWORKS

PROJECT REPORT

Submitted By

Jayant Jaiswal, Roll No-12600112104, Regn No-121260110042

Shoaib Khan, Roll No-12600112163, Regn No-121260110101

Rohan Agarwal, Roll No-12600112143, Regn No-121260110081

Under the Supervision of

Asst. Prof. Partha BasuchowdhuriComputer Science & Engineering

in partial fulfillment for the award of the degree

of

BACHELOR OF TECHNOLOGY

In

COMPUTER SCIENCE & ENGINEERING

HERITAGE INSTITUTE OF TECHNOLOGY, KOLKATA

MAULANA ABUL KALAM AZAD

UNIVERSITY OF TECHNOLOGY

Page 2: Zomato Crawler & Recommender

Acknowledgements

We would take this opportunity to thank Dr. P. Chaudhuri, Principal, Heritage In-stitute of Technology for giving us the golden opportunity of working on this projectand providing us with all the necessary facilities and resources to work towards com-pletion.

We are thankful to Asst. Prof. Partha Basuchowdhuri, our advisor and guide, forhis continuous support, advise and words of encouragement without which we couldhave not seen through the completion of this project. He is not just an advisor but apatient teacher who has always been there solving our doubts no matter how trivialand providing us with valuable insights which helped us in every way possible. Wealso owe our sincere gratitude to Dr. Subhashis Majumder, the Head of the Depart-ment, for his enriching discussions, novel ideas and valuable feedbacks.

We would also like to thank our teachers, faculty members and laboratory assistantsat the Heritage Institute of Technology for playing a pivotal and decisive role duringthe development of the project. Last but not the least we thank all friends for theircooperation and encouragement.

Jayant Jaiswal

Shoaib Khan

Rohan Agarwal

i

Page 3: Zomato Crawler & Recommender

HERITAGE INSTITUTE OF TECHNOLOGY

MAULANA ABUL KALAM AZAD UNIVERSITY OF TECHNOLOGY

BONAFIDE CERTIFICATE

Certified that this Project Report : ”CONTENT & LOCATION AWARERESTAURANT RECOMMENDATIONS USING URBAN REVIEW NET-WORKS” is the bonafide work of ”Jayant Jaiswal, Shoaib Khan and RohanAgarwal” who carried out this project work under my supervision.

SIGNATURE SIGNATUREDr. Subhashis Majumder Asst. Prof. Partha BasuchowdhuriHead of the Department Project GuideComputer Science & Engineering Computer Science & EngineeringEast Kolkata Township, East Kolkata Township,Chowbaga Road,Anandapur, Chowbaga Road,Anandapur,West Bengal - 700107. West Bengal - 700107.

SIGNATURE

EXAMINER

ii

Page 4: Zomato Crawler & Recommender

Abstract

Restaurant recommendation system is a very popular service whose so-phistication keeps increasing everyday.In this paper we present a per-sonalised restaurant recommendation system which has two parts toit. The first part recommends users’ restaurants based on their restau-rant review history. The second part recommends business owners withplaces perfect to open a restaurant with a particular cuisine where theowner would get the best traffic for the restaurant. Using Zomato data,we built a restaurant recommendation system for the individuals andbusiness owners. For each user in our data we find out the cuisinepreferences and other restrictions such as services offered, ambience,average rating, etc. and based on that we recommend the restaurantsaccordingly. We propose a metric that takes the popularity as well asthe sentiment of opinions for the food items based on the user gener-ated reviews as opposed to other systems where which only considerthe features mentioned above to recommend restaurants.

iii

Page 5: Zomato Crawler & Recommender

Contents

1 Introduction 11.1 Road Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 What are Recommendation Systems? . . . . . . . . . . . . . . . . . . . . . . 1

1.2.1 Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.2 Content Based Filtering . . . . . . . . . . . . . . . . . . . . . . . . . 21.2.3 Hybrid Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Motivation for Restaurant Recommendations . . . . . . . . . . . . . . . . . 3

2 Literature Review 4

3 Problem Definition 5

4 Data Analysis 64.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64.2 Data Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

5 Methodology 75.1 Location Aware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75.2 Content Based . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

6 Conclusion 12

7 Future Works 13

8 References 14

iv

Page 6: Zomato Crawler & Recommender

List of Figures

5.1 Live Map of Kolkata sorted on the basis of ratings . . . . . . . . . . . . . . 75.2 The road network stored in PostgreSQL . . . . . . . . . . . . . . . . . . . . 85.3 Map of Kolkata showing the important intersections to setup a new restau-

rant based upon a cuisine North Indian . . . . . . . . . . . . . . . . . . . . 95.4 The system taking user id as input to generate recommendations for that user. 115.5 Top 5 restaurants recommended by the system to the user for each food item 11

v

Page 7: Zomato Crawler & Recommender

Chapter 1

Introduction

1.1 Road Map

In Chapter 1, we provide a broad description of the types of recommendation systemand applications of it in todays customer centric e-commerce market coupled with the basicknowledge about recommendation system. In Chapter 2 we give a brief overview of theprior works done in the field of restaurant recommendation. Chatpter 3 discusses aboutthe problem definition and terminologies related to it like content and location based rec-ommendation. In Chapter 4 we discuss the methods of fetching data and the preprocessingdone to suit the sytem and create good recommendation. Chapter 5 discusses about themethodologies and gives a detailed study about our system. The results of our systemon content and location specific recommendation are provided in Chapter 6. Scope forimprovements and future ideas are mentioned in Chapter 8 as future works.

1.2 What are Recommendation Systems?

Recommender systems have changed the way people find products, information, and evenother people. The goal of a Recommender System is to generate meaningful recommenda-tions to a collection of users for items or products that might interest them. It has changedthe way inanimate websites communicate with their users. Rather than providing a staticexperience in which users search for and potentially buy products, recommender systemsincrease interaction to provide a richer experience. The systems identify recommendationsautonomously for individual users based on past purchases and searches, and on other users’behavior. They study patterns of behavior to know what someone will prefer from among acollection of things he has never experienced. The technology behind recommender systemshas evolved over the past 20 years into a rich collection of tools that enable the practitioneror researcher to develop effective recommenders.

1.2.1 Collaborative Filtering

Collaborative filtering methods are based on collecting and analyzing a large amountof information on users behaviors, activities or preferences and predicting what users willlike based on their similarity to other users. A key advantage of the collaborative filtering

1

Page 8: Zomato Crawler & Recommender

approach is that it does not rely on machine analyzable content and therefore it is capable ofaccurately recommending complex items such as movies without requiring an understandingof the item itself. Many algorithms have been used in measuring user similarity or itemsimilarity in recommender systems. For example, the k-nearest neighbor (k-NN) approachand the Pearson Correlation.

1.2.2 Content Based Filtering

Content-based filtering methods are based on a description of the item and a profile ofthe users preference. In a content-based recommendation system, keywords are used todescribe the items; beside, a user profile is built to indicate the type of item this user likes.In other words, these algorithms try to recommend items that are similar to those thata user liked in the past (or is examining in the present). In particular, various candidateitems are compared with items previously rated by the user and the best-matching itemsare recommended. This approach has its roots in information retrieval and informationfiltering research.

1.2.3 Hybrid Approach

Recent research has demonstrated that a hybrid approach, combining collaborative fil-tering and content-based filtering could be more effective in some cases. Hybrid approachescan be implemented in several ways, by making content-based and collaborative-based pre-dictions separately and then combining them, by adding content-based capabilities to acollaborative-based approach (and vice versa), or by unifying the approaches into one model.Several studies empirically compare the performance of the hybrid with the pure collabo-rative and content-based methods and demonstrate that the hybrid methods can providemore accurate recommendations than pure approaches. These methods can also be used toovercome some of the common problems in recommendation systems such as cold start andthe sparsity problem. Netflix is a good example of a hybrid system. They make recommen-dations by comparing the watching and searching habits of similar users (i.e. collaborativefiltering) as well as by offering movies that share characteristics with films that a user hasrated highly (content-based filtering).

1.2.4 Applications

1) Facebook users a recommender system to suggest Facebook users you may know offline.The system is trained on personal data mutual friends, where you went to school, places ofwork and mutual networks (pages, groups, etc.), to learn who might be in your offline &offline network.

2) When you fill out your Taste Preferences or rate movies and TV shows, youre helpingNetflix to filter through the thousands of selections to get a better idea of what you mightlike to watch. Factors that Netflix algorithm uses to make such recommendations include:

a) The genre of movies and TV shows available

b) Your streaming history, and previous ratings youve made.

2

Page 9: Zomato Crawler & Recommender

c) The combined ratings of all Netflix members who have similar tastes in titles to you.

3) The Jobs You May Be Interested In feature shows jobs posted on LinkedIn that matchyour profile in some way. These recommendations shown based on the titles and descriptionsin your previous experience, and the skills other users have endorsed.

4) Amazons algorithm crunches data on all of its millions of customer baskets, to figure outwhich items are frequently bought together. This can lead to huge returns- for example,if youre buying an electrical item, and see a recommendation for the cables or batteries itrequires beneath it, youre very likely to purchase both the core product and the accessoriesfrom Amazon.

1.3 Motivation for Restaurant Recommendations

Obtaining recommendations from trusted sources is a critical component of the naturalprocess of human decision making. With burgeoning consumerism buoyed by the emergenceof the web, buyers are being presented with an increasing range of choices while sellers arebeing faced with the challenge of personalizing their advertising efforts. In parallel, it hasbecome common for enterprises to collect large volumes of transactional data that allowsfor deeper analysis of how a customer base interacts with the space of product offerings.Recommender Systems have evolved to fulfill the natural dual need of buyers and sellers byautomating the generation of recommendations based on data analysis.

There are many recommendation systems available for problems like shopping, onlinevideo entertainment, games etc. Restaurants & Dining is one area where there is a bigopportunity to recommend dining options to users based on their preferences as well ashistorical data. Zomato is a very good source of such data with not only restaurant reviews,but also user-level information on their preferred restaurants. This report describes the workto recommend restaurants to a given Zomato user based on their history or their cuisinepreferences. It also does the task of recommending cuisine specific suitable locations tonewcomers in the restaurant business.

3

Page 10: Zomato Crawler & Recommender

Chapter 2

Literature Review

In this section we bring to limelight a few previous works done in the field of providingrestaurant recommendations. Recommender systems seek to predict the ’rating’ or ’pref-erence’ that a user would give to an item. Recommender systems typically produce a listof recommendations in one of two ways - through collaborative or content-based filtering.Collaborative filtering approaches building a model from a user’s past behavior (items pre-viously purchased or selected and/or numerical ratings given to those items) as well assimilar decisions made by other users. This model is then used to predict items (or ratingsfor items) that the user may have an interest in. Content-based filtering approaches uti-lize a series of discrete characteristics of an item in order to recommend additional itemswith similar properties. These approaches are often combined to from hybrid recommendersystems.

Traditional recommendation system has used user profile to analysis and find similaruser. The systems recommend restaurants to users from result of analysis. However, thesesystems are lack of consideration of user mobility and environment. Other recommendationsystem provides service by finding restaurant and providing information of restaurant byweb site. This system is close to search system but not recommendation system. Recentlyresearch relating with context information is using user location to serve advertise, saleand event information. This system analyses user preference through user profile and findsrestaurant satisfying user preference and closing user location. The research consists of twosections, one which has online activity, and the other which processes data offline. Whenthe user is in motion, i.e., his geo-position changes notably, the system goes online andrecommendation module becomes active, retrieving nearby and restaurants and rankingthem, based on their properties, according to the scores generated offline. The offline partgenerally remains in a non-functional mode when the user is stationary. The work of theoffline system is to generate a user interest profile, using a Machine Learning algorithm. Thedrawback of the offline feature is that the interest profile is generated based on users check-in to restaurants. It doesnt take into account users taste, habits and the cuisines he favors.Thus the offline recommendation can be considered as a shallow approach lacking usersdetailed interaction with each restaurants which can be obtained in the form of reviews.

4

Page 11: Zomato Crawler & Recommender

Chapter 3

Problem Definition

Creating an innovative recommendation system to provide content basedrecommendations to restaurant goers and owners and provide locationbased recommendations to restaurant owners using Zomato RestaurantReview Network.Suggest best-suited places to new entrants in the restaurant businessfor setting up a new restaurant to fill in the cuisine void and garnerhigh traffic.Suggest restaurants to users based on their previous review activity onZomato by creating a recommendation system using all reviews fromall restaurants in a city.

5

Page 12: Zomato Crawler & Recommender

Chapter 4

Data Analysis

4.1 Data Collection

Zomato and Yelp are two popular restaurant search, discovery and review services. Whileboth are popular gloablly Zomato has an edge over Yelp in India. Since we are based in Indiawe decided to choose Zomato as our ”Restaurant Review Network”. Also, Zomato providesmore carefully curated content which will be enough to satiate appetites even outside itsnative land. Users can find restaurants, leave reviews, rate a restaurant, and keep theirown restaurant diary to share with friends. Zomato has built a highly coherent and focusedexperience that puts the emphasis on being a comprehensive network for food-lovers. Verylittle on the site is superfluous. In a survey users were impressed with the amount ofattention to detail evident in the sites content, and unlike on Yelp, several testers actuallyused Zomatos curated lists and suggestions to find restaurants. Hence, Zomato was chosenas our ”Restaurant Review Network” over other popular services due to it’s detailed yetsimplistic data.

4.2 Data Handling

We first crawled the restaurant data of Delhi from Zomato. The crawling was done usingdata crawlers built in python which would specifically crawl restaurant data. The datacomprised of all possible features listed on Zomato like ”Dine-in or Takeaway”, ”AC orNon-Ac”, etc. But then we switched to Kolkata as our sample city for data analysis dueto a number of reasons. Firstly, Kolkata had around 2000 restaurants compared to Delhi’s10000+ restaurants. Also, we were based in Kolkata and knew the city in and out.Thus,we could analyze the results better.After the first crawl of restaurant’s data we crawled restaurant reviews for these 2000restaurants. This crawl operation generated over two hundred thousand reviews. Eachreview also comprised of restaurant name, reviewer id and name, details of the review. Allthe data was later stored in MongoDB which is a No-Sql Database for easy fetching andmanipulation. These is the data that will be used by or system.

6

Page 13: Zomato Crawler & Recommender

Chapter 5

Methodology

5.1 Location Aware

The location aware part of our project has the primary motive of recommending to peoplewho want to setup a new restaurant business explained earlier in our problem statement.We assess the road map to identify concentration of restaurant clusters in a city. Theseclusters can be defined as restaurant hotspots. We have generated a live map of our samplecity (Kolkata) with all the restaurants marked in it. The nodes are given a particular coloras per the rating range in which they fall into and clicking on a node gives the details ofthe restaurant the node is representing. This will help provide real time recommendationsto users based on ratings and locations. Below is a snapshot of the aforementioned.

Figure 5.1: Live Map of Kolkata sorted on the basis of ratings

The road network of our sample city was generated using OepenStreetMaps and Post-greSql. It was a graph with road intersections as nodes and roads as edges. We couldn’t useGoogle Maps for this as as it came at a premium. We had the coordinates of all restaurantsin the city. We added the restaurants as pendant nodes to their nearest intersection by

7

Page 14: Zomato Crawler & Recommender

using K-NIN (K nearest neighbor) algorithm. Thus, we get a complete road network withall restaurants and road intersections as nodes and the roads as edges.

Figure 5.2: The road network stored in PostgreSQL

For every node, which is an intersection, in the road network we store the distances of thenearest restaurant for each and every cuisine at that node itself as node attributes. We nowcreated a vector for each road intersection which stores the X/Y ratios for top 10 cuisineswhere,

X = Avg. rating for the nearest restaurant

Y = The distance of the nearest restaurant

The lesser this ratio (X/Y), the better it is suited to opening of a new restaurant for somecuisine. Running the Page Rank Algorithm on the graph will also give us the importantintersections having more importance and traffic. Combining the page rank probabliliteswith our ratio we define a new ratio R as follows :-

R =Page Rank Probability

Ratio(X/Y )

If this ratio R is maximum, then the intersection is the best suited to setting up of a newrestaurant for that cuisine.

8

Page 15: Zomato Crawler & Recommender

Figure 5.3: Map of Kolkata showing the important intersections to setup a new restaurantbased upon a cuisine (North Indian)

The map shows most important intersections favourable to setting up of a new restaurantoffering north indian cuisine in red colour.

5.2 Content Based

We have crawled the restaurants of our sample city (Kolkata) for their features (i.e. rat-ings, cuisines, bar, ac/non-ac, veg/non-veg, etc.) and their reviews given by the visitingusers. These reviews would give us insights about a users degree of likeness towards a par-ticular restaurant. The review data for each restaurant in Kolkata consisted of user id, username, restaurant id restaurant name and the review. The reviews for each restaurants takenindividually were passed to ”Intellexer Sentiment Analyser” module. Applying Intellexeron the restaurant reviews we get

1) Opinion Holders which are food items.

2) Each opinion holder (food item) having multiple opinions.

3) Sentiment values which can be either positive or negative for each opinion.

Intellexer Sentiment Analyzer is a powerful and efficient solution that automatically ex-tracts sentiments (positivity/negativity), opinion objects and emotions (liking, anger, dis-gust, etc.) from unstructured text information. From these sentiment values we found outthe best food items available in that restaurant, applying sorting in descending order oftheir sentiments. The sentiment values are calculated using the metric in equation 5.1. Thecalculation for getting the best food item is done in the following way. For each opinionholder we had a 3-tuple list of n items. The values in each tuple were

1) Food tags

2) Average Sentiment

9

Page 16: Zomato Crawler & Recommender

3) Opinion Count

The food tags are the best spelling suited for a particular food item of North Indiancuisine. These tags are sufficient to identify north Indian dishes and reviews from all theuser reviews. Example of these north Indian food tags are biryani, kebab, qorma, tikkaetc. Since there are multiple possible spellings for each unique food item holder given bythe users we attempt to replace all by a best suited name for the holder. Example Biryanihas multiple forms like biriyani,beryani,beeryani etc. We clubbed the sentiment of similarfood items with different spellings using fuzzywuzzy. Fuzzy String Matching, also calledApproximate String Matching, is the process of finding strings that approximately match agiven pattern. The closeness of a match is often measured in terms of edit distance, whichis the number of primitive operations necessary to convert the string into an exact match.We took each words in our repository and found out a partial ratio of these items usingfuzz.partial Ratio() with the holders extracted from intellexer. We took a threshold valueof 67 i.e. if the ratio is greater than or equal to 67 then we considered the items to besimilar using this we clustered all the similar items and found the average of each tags inour repository using the sentiment of the opinion from intellexer. Once we have got theaverage sentiment and count of all the tags we then used a metric to rank the tags in thatrestaurant. Here are some of the terms to be known:

Max cnt = Maximum of the opinion count value of all food tags

Min cnt = Minimum of the opinion count value of all food tags

The metric is :

Sentiment of tag = Avg. Sentiment of tag × opinion count of tag −min count + 1

max cnt−min cnt + 1(5.1)

This metric is applied for each restaurant and their tags obtained. Using this we get anormalized sentiment of each tag and sorting the tags on this value in descending order willgive us the highest ranked to lowest ranked tags. The normalization is done for a particularrestaurant and not related to all restaurants.

The Restaurant data of all the restaurants were inserted in MongoDB along with theiropinion counts. Thus, for each food tag in our repository we found out the restaurants thatprovide that food. The previous max cnt and min cnt metric is again applied to individualfood tags in our repository for all restaurants. The result after sorting will give the toprestaurants for a particular food tag. This result for each individual tag is stored in thedatabase. Now for any user, his reviews are fed into intellexer and his opinion holders(fooditems) are generated. These opinion holders indicate his food preference. The number ofopinion holders should be greater than one. Now comes the easy part of recommending thealready stored top 5 restaurants for each tag in the database to the user. Thus, the usergets recommended based on his own reviews using our system. The results are shown inscreenshots below.

10

Page 17: Zomato Crawler & Recommender

Figure 5.4: The system taking user id as input to generate recommendations for that user.

Figure 5.5: Top 5 restaurants recommended by the system to the user for each food item

11

Page 18: Zomato Crawler & Recommender

Chapter 6

Conclusion

Our results show the busiest roads in the city which are ideal for setting up of newrestaurants and for foodies to take that course is a delight in itself. This can have positiveinfluence on current businesses also. Our results can provide cuisines lacking in a particulararea which can be exploited by current businesses. Previous works on location based recom-mendation focused on providing results to users only but our system focuses on restaurantowners.

Our system implements content-based filtering to provide restaurant recommendationsbased on their previous reviews. The recommendations are pretty much accurate as perour tests. Our system can be easily extended to other cities and cuisines. Our system hasimmense potential and is multipurpose as it can come handy for businesses as well as theaverage user. The field of restaurant recommendations is one of the uncharted territoriesand our system is a small step in a giant ocean.

12

Page 19: Zomato Crawler & Recommender

Chapter 7

Future Works

We plan on building our own sentiment analyzer pertaining to restaurants rather thanrelying on a Intellexer Sentiment Analyser module. This will help in getting correct andaccurate sentiments for tags like services, food and other features of the restaurants andsolve our sentiment ambiguity. Sentiments are not purely positive or negative infact thereare various levels to identify sentiments and their effect in the statement. These can beemployed in details to the system for more accurate results.

We also plan on using Collaborative Filtering method to our system. These systems donot use any information regarding the actual content of the items (as opposed to contentfiltering). They are based on usage or preference patterns of other users. Selection (orfiltering) of items is done in a method similar to individuals collaborating to make recom-mendations for each other i.e. if some tags are similar between users then the users aretermed as similar and similar recommendations are provided.

13

Page 20: Zomato Crawler & Recommender

Chapter 8

References

1. Anant Gupta and Kuldeep Singh. Location Based Personalized Restaurant Recommen-dation System for Mobile Environments.2. Sumit Negi. Single Document Keyphrase Extraction Using Label Information.3.Liu, J., Shang, J., Wang, C., Ren, X., Han, J., 2015. Mining Quality Phrases from MassiveText Corpora, in:. Presented at the Proceedings of the 2015 ACM SIGMOD InternationalConference on Management of Data, ACM, pp. 17291744.4. Mariana Romanyshyn. RULE-BASED SENTIMENT ANALYSIS OF UKRAINIAN RE-VIEWS. International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 4, No.4, July 2013.5. El-Kishky, A., Song, Y., Wang, C., Voss, C.R., Han, J., 2014. Scalable topical phrasemining from text corpora. Proceedings of the VLDB Endowment 8, 305316.6. Burusothman Ahiladas, Paraneetharan Saravanaperumal, Sanjith Balachandran, Thamayan-thy Sripalan and Surangika Ranathunga. Ruchi: Rating Individual Food Items in Restau-rant Reviews.

14