f o o d i ef o o d i e marc greenberg – [email protected] a study in collecting and parsing...

9
F o o d i e Marc Greenberg – [email protected] A study in collecting and parsing recipes…

Upload: micheal-strait

Post on 29-Mar-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: F o o d i eF o o d i e Marc Greenberg – mgreenberg@cs.usfca.edu A study in collecting and parsing recipes…

F o o d i e

Marc Greenberg – [email protected]

A study in collecting

and parsing

recipes…

Page 2: F o o d i eF o o d i e Marc Greenberg – mgreenberg@cs.usfca.edu A study in collecting and parsing recipes…

I’m Hungry!...

• Enter ingredients at your disposal

• Foodie lists recipe options• Rate recipes• It learns what you like,

and your eating habits…(that’s another presentation)

Page 3: F o o d i eF o o d i e Marc Greenberg – mgreenberg@cs.usfca.edu A study in collecting and parsing recipes…

But We Need To PopulateThe Device

• Food and recipe database needed• Collect and parse recipes instead of manual

entry• Recipe collection from different sources

– Predictable vs. non-predictable URLs– Regular vs. irregular recipe format

Page 4: F o o d i eF o o d i e Marc Greenberg – mgreenberg@cs.usfca.edu A study in collecting and parsing recipes…

Collecting Recipes• Two types of crawlers (written in python)

– URL Substitution: • Epicurious.com,

http://www.epicurious.com/recipes/recipe_views/printer_friendly/11311

Page 5: F o o d i eF o o d i e Marc Greenberg – mgreenberg@cs.usfca.edu A study in collecting and parsing recipes…

Collecting Recipes• Two types of crawlers (written in python)

– URL Substitution: • Epicurious.com,

http://www.epicurious.com/recipes/recipe_views/printer_friendly/11311

– Link Crawler: • RecipeSource.com (serving, title, minute, hour, .6)

http://www.recipesource.com/fgv/rice/03/rec0362.html

• FoodNetwork.com, (recipe, serving, yield, time, print, minute, .8)

http://www.foodnetwork.com/food/recipes/recipe/0,,FOOD_9936_17273,00.html

• Need to identify good and bad pages

Page 6: F o o d i eF o o d i e Marc Greenberg – mgreenberg@cs.usfca.edu A study in collecting and parsing recipes…

Finding the Ingredients

• Induction wrappers

• Layout• Character and grammar

structure

Page 7: F o o d i eF o o d i e Marc Greenberg – mgreenberg@cs.usfca.edu A study in collecting and parsing recipes…

Parsing

• Recipe metadata– Title, summary, serving size, prep time, etc.

• Ingredient list– Amount, unit, food item

• Directions

Page 8: F o o d i eF o o d i e Marc Greenberg – mgreenberg@cs.usfca.edu A study in collecting and parsing recipes…

Existing Software

• MasterCookTM, leading software product

• Manual import features• Slow full text search• Starting database has just

over 8000 recipes

Page 9: F o o d i eF o o d i e Marc Greenberg – mgreenberg@cs.usfca.edu A study in collecting and parsing recipes…

?QuestionsQuestions