f o o d i ef o o d i e marc greenberg – [email protected] a study in collecting and parsing...
TRANSCRIPT
I’m Hungry!...
• Enter ingredients at your disposal
• Foodie lists recipe options• Rate recipes• It learns what you like,
and your eating habits…(that’s another presentation)
But We Need To PopulateThe Device
• Food and recipe database needed• Collect and parse recipes instead of manual
entry• Recipe collection from different sources
– Predictable vs. non-predictable URLs– Regular vs. irregular recipe format
Collecting Recipes• Two types of crawlers (written in python)
– URL Substitution: • Epicurious.com,
http://www.epicurious.com/recipes/recipe_views/printer_friendly/11311
Collecting Recipes• Two types of crawlers (written in python)
– URL Substitution: • Epicurious.com,
http://www.epicurious.com/recipes/recipe_views/printer_friendly/11311
– Link Crawler: • RecipeSource.com (serving, title, minute, hour, .6)
http://www.recipesource.com/fgv/rice/03/rec0362.html
• FoodNetwork.com, (recipe, serving, yield, time, print, minute, .8)
http://www.foodnetwork.com/food/recipes/recipe/0,,FOOD_9936_17273,00.html
• Need to identify good and bad pages
Finding the Ingredients
• Induction wrappers
• Layout• Character and grammar
structure
Parsing
• Recipe metadata– Title, summary, serving size, prep time, etc.
• Ingredient list– Amount, unit, food item
• Directions
Existing Software
• MasterCookTM, leading software product
• Manual import features• Slow full text search• Starting database has just
over 8000 recipes
?QuestionsQuestions