machine learning approaches to brewing beer

Operation Brewster

Draft – Gregg Barrett

Requirement

- Classify a style based on ingredients and preparation - Brewer has some ingredients at hand and/or options in mind. What style do these ingredients and preparation

options fit with?

- For a given style suggest ingredients, ingredient amounts and preparation to create a new recipe- Typical

- Complementary ingredients

- Also suggest amount of ingredients

- Non-typical

- Non-complementary ingredients

- Also suggest amount of ingredients

Outline

- Section 1: The data set

- Section 2: Classification

- Section 3: Ingredient combination

- Section 4: Ingredient amount

- Section 5: Other considerations

- Conclusion

- Reference

Section 1

The data set

The data set

- Data is random sampled into:- Training set containing 60% of the data

- Validation set containing 20% of the data

- Test set containing the remaining 20% of the data

- Training set used to train the model

- Validation set used to assess model performance on unseen data and choose between models and their tuning parameters

- Test set used to assess how our final selected model performs on unseen data

The data set

Approach 1:

- Training set

- Validation set

- Test set

If there are large imbalances in the styles:

Approach 2:

- Training set subsampled to have an equal amount of each style

- Validation set – same set as used in approach 1

- Test set – same set as used in approach 1

Use both approaches and see which performs the best on validation and test data

The data set

- A full Exploratory Data Analysis (EDA) is needed before moving forward with any modelling effort

- The EDA will assist in identifying the scope of the data cleansing requirements

- Transformation of features can be explored

- Feature engineering (creating new features from existing features) can be explored

- If there are missing values decisions will need to be made:- Removing recipes with missing values from the data set

- Imputing values (mean/median/mode)

- Building models to predict the missing values

- Based on the requirement provided it appears that there are no missing values: “The variables contained in the dataset are sufficient for the brewer to replicate the beer in question”. One would therefore assume that features/variables that contain no values are not relevant to the beer recipe in question.

- The data set provided does not contain instructions. It does contain information on boiling time, but no sequence of actions. The assumption is therefore that instructions are not needed for this requirement – the person brewing the beer only needs ingredient information and boiling time – and knows how to put it all together.

- Data set provided does not contain a data dictionary therefore one would need to be compiled

Section 2

Classification

Classification

- Classification problem

- Supervised learning

- Specifically, by treating recipes as instances, ingredients and preparation (like boiling time) as features, and style as class labels, the aim is to build a classifier model to predict the styles of recipes.

- Using unsupervised learning may also be helpful

- Visual depiction of the similarities and differences between the styles

- Possibly provide some insight into which features are useful in defining styles

Classification

- Range of classification techniques should be considered to see what works best- Supervised

- Logistic Regression

- Linear Discriminant Analysis (LDA)*

- Quadratic Discriminant Analysis (QDA)*

- Generalised Additive Model (GAM)

- Random Forest

- Gradient Boosting

- Support Vector Machine (SVM)

- Neural Networks

- K-Nearest Neighbours (KNN)

- Unsupervised- Principle Components Analysis (PCA) - derive variables for use in supervised learning

- K-Means Clustering

- Hierarchical Clustering

- Ensemble consisting of any number of the above

* Strictly speaking, LDA and QDA should not be used with qualitative predictors, but in practice it often is if the goal is simply to find a good predictive model

Classification

- Neural Networks and Gradient Boosting are powerful techniques, but with no prior knowledge and experience in the domain of brewing recipes and the data set in question, what ground is there, aside from their good performance in general, to assume these will be the best techniques for the challenge at hand. No harm in trying a variety of techniques.

- The best solution may be an ensemble of techniques.

Classification

Assessing classification performance between the various techniques on the validation data:

- A plethora of measures of performance

- Initial thoughts are to compare Cohen’s kappa for the various (supervised) techniques on the validation data. (Ben-David, 2007)

- It may be worth investigating other measures of performance (Valverde-Albacete, Peláez-Moreno, 2010).

Section 3

Ingredient combination


For a given style suggest ingredients where “suggest” is a tuneable parameter - Typical - uses complementary ingredients

- Non-typical - uses non-complementary ingredients

Calculation of complementary ingredients:

- Pairing - Pairwise Bayesian probabilities

- Ingredient Network

Or

Learn a generative probabilistic model from the ingredient data and then randomly sample it and observe the resulting ingredientcombinations:

- Deep Belief Network (DBN)

Or

Creation of ingredient clusters:

- Principle Component Analysis (PCA)

Pairing within a style:Calculate pairwise probabilities of ingredients from the training data by counting how many times each pair of ingredients appears in the set of recipes within a style.

It would be ideal to maximize the probability over the entire subset. However, this would entail a large search space.

The approach could therefore be to start with a set specified by the brewer and iteratively add new ingredients to the set by taking the most feasible (in the case of a “typical” recipe) from the remaining ingredients using the joint probabilities of the new ingredient with only the last added one.

Stop adding new ingredients once the probability of adding a new one goes below a certain threshold.

(Naik, Polamreddi, 2015)



Ingredient Network (bipartite) within a style:Another approach could be to use an ingredient network in which two nodes (ingredients) are connected if they share at least one recipe in common. The weight of each link represents the number of shared ingredients, turning the ingredient network into a weighted network.

The approach could be to start with a set specified by the brewer and iteratively add new ingredients to the set by taking (in the case of a “typical” recipe) from the remaining ingredients using the ingredient that has the highest network weight. Moving along the network tracking the highest weight for each new ingredient.

Stop adding new ingredients once the weight falls below a certain threshold.

(Ahn, Ahnert, Bagrow, Barabasi, 2011)


Deep Belief Network within a style:A Deep Belief Network (DBN) could be used to learn generative models of ingredient distributions within each style.

We could then randomly sample it and observe the resulting ingredient combinations.

Changing the parameters of the DBN (the network shape) could lead to different results, giving new combinations of ingredients, varying ingredient lists, etc.

(Nedovic, 2013)


Principle Component Analysis within a style:Ingredients can be clustered based on their use in recipes. Within cluster ingredients can be suggested and selected by the brewer.

(De Clercq, 2014)

Section 4

Ingredient amount

Ingredient amount

In particular, given n ingredients and n − 1 amounts, the brewer wants to find the nth amount- Clustering

- Dimension reduction

- Regression

(Safreno, Deng, 2013)

Section 5

Other considerations


Ingredient-instruction dependency tree representation

Simplified Ingredient Merging Map in Recipes (SIMMR)

SIMMR represents a recipe as a dependency tree whose leaves (terminal nodes) are the recipe ingredients, and whose internal nodes are the recipe instructions. The SIMMR representation captures the high-level flow of ingredients but without modelling the semantics in each individual instruction

(Jermsurawong, Habash, 2015)


Once the style, ingredients and amounts have been selected, generate Instructions using pairwise Bayesian probabilities:

Instructions for a recipe are a sequence of actions, each of which is a tuple of verb and ingredient.

Action-Ingredient-Verb Probabilities:

First choose an ingredient to work on, given the previous action performed. Then, a verb is predicted conditioned on both the previous complete action and the new ingredient chosen. Thus, this model assumes a logical ordering of ingredients that we work on during a particular preparation and a logical set of verbs that can possibly be performed on a given ingredient.

(Naik, Polamreddi, 2015)


Data as a graphIf the data is modelled as graphs we could use a subgraph mining algorithm FSG (Frequent subgraph discovery) and then compute a recipe similarity measure. Using this method, the brewer can perform similarity search over the graph structure, shared characteristics, and distinct characteristics of each recipe.

(Wang, Li, Li, Dong, Yang, 2008)


Visual mapping with t-SNETo visualize the data and obtain some insight into its structure t- Distributed Stochastic Neighbor Embedding can be used.

Also of consideration is a parametric version of t-SNE that allows for generalization to held-out validation data by using the t-SNE objective function to train a neural network that provides an explicit mapping to a low-dimensional space.

(van der Maaten, Hinton, 2008)


Ingredient complement network:Construct an ingredient complement network based on pointwise mutual information (PMI) defined on pairs of ingredients. The PMI gives the probability that two ingredients occur together against the probability that they occur separately.

(Teng, Lin, Adamic, 2012)

Conclusion

Conclusion

If time and resources were severely limited:- Requirement: Classify a style based on ingredients and preparation

- Pursue Boosting and Neural Networks for this requirement

- Requirement: For a given style suggest ingredients, ingredient amounts and preparation to create a new recipe

- Use a Deep Belief Network (DBN) and randomly sample it to derive ingredient combinations

Reference

Ahn, Y. Ahnert, S. Bagrow, J. Barabasi, A. (2011). Flavor network and the principles of food pairing. [pdf].

Retrieved from http://www.nature.com

Ben-David, A. (2007). A lot of randomness is hiding in accuracy. [pdf]. Retrieved from

http://www.sciencedirect.com

De Clercq, M. (2014). Prediction of Ingredient Combinations using Machine Learning Techniques. [pdf].

Retrieved from http://lib.ugent.be/fulltxt/RUG01/002/166/653/RUG01-002166653_2014_0001_AC.pdf

Jermsurawong, J. Habash, N. (2015). Predicting the Structure of Cooking Recipes. [pdf]. Retrieved from

http://www.aclweb.org/anthology/D15-1090

Naik, J. Polamreddi, V. (2015). Cuisine Classification and Recipe Generation. [pdf]. Retrieved from

http://cs229.stanford.edu/proj2015/233_report.pdf

Nedovic, V. (2013). Learning recipe ingredient space using generative probabilistic models. [pdf]. Retrieved

from http://liris.cnrs.fr/cwc/papers/cwc2013_submission_2.pdf

Safreno, D. Deng, Y. (2013). The Recipe Learner. [pdf]. Retrieved from

http://cs229.stanford.edu/proj2013/DengSafreno-TheRecipeLearner.pdf

Teng, C. Lin, Y. Adamic, L. (2012). Recipe recommendation using ingredient networks. [pdf]. Retrieved from

https://arxiv.org/pdf/1111.3919.pdf

Valverde-Albacete, F. Peláez-Moreno, C. (2010). Two information-theoretic tools to assess the performance of

multi-class classifiers. [pdf]. Retrieved from http://www.sciencedirect.com

van der Maaten, L. Hinton, G. (2008). Visualizing Data using t-SNE. [pdf]. Retrieved from

http://www.jmlr.org/papers/v9/vandermaaten08a.html

Wang, L. Li, Q. Li, Na. Dong, G. Yang, Y. (2008). Substructure Similarity Measurement in Chinese Recipes.

[pdf]. Retrieved from http://wwwconference.org/www2008/papers/pdf/p979-wang.pdf

Reference

machine learning approaches to brewing beer

Data & Analytics