decision-making bias in instance matching model...
TRANSCRIPT
Decision-making
Bias in Instance
Matching Model
SelectionMayank Kejriwal, Daniel P. Miranker
Acknowledgements: US National Science Foundation, Microsoft Research
Instance Matching
50+ year old Artificial Intelligence problem
When do two entities refer to the same underlying entity?
“Record linkage: making maximum use of the discriminating power of
identifying information.” Newcombe and Kennedy (1962)
Numerous surveys by Winkler (2006), Rahm et al. (2010) etc.
2
Machine learning
3
Classifier example: feedforward multilayer perceptron (MLP)
“Machine Learning: an artificial intelligence approach.” Michalski,
Carbonell and Mitchell (2013)
Supervised machine learning
4
▪ Requires a (manually) labeled set for both training and
validation
▪ Typically acquired through sampling a ground-truth
▪ Training: Classifier parameters (e.g. edge weights of MLP)
▪ Validation: Classifier hyperparameters (e.g. number of
layers, nodes, learning rate...)
▪ Also requires model selection decisions:
▪ Which training algorithm?
▪ What sampling technique?
▪ How to split the data for training/validation?
▪ Not obvious
“Machine Learning: an artificial intelligence approach.” Michalski,
Carbonell and Mitchell (2013)
Model Selection Exercise
What percentage of labeled data should I
use for training and what percentage for
validation?
“Machine Learning: an artificial intelligence approach.” Michalski,
Carbonell and Mitchell (2013)
5
What do other people do?
Most common approach in the literature is a ten-fold
split (and less often, two-fold)
What if I care more about one performance metric (say
recall, versus precision) within reasonable constraints?
What if I have sampled and labeled a lot of data (say 90%
of the estimated ground-truth?)
Should answers to these questions (and others) bias my
decision?
“Semi-supervised instance matching using boosted classifiers.” Kejriwal
and Miranker (2015)
6
Let’s do an experiment
Labeled Data (as
percentage of
ground-truth)
Precision Recall
10% 54.13% 25.77%
50% 61.51% 28.77%
90% 73.27% 27.69%
10% 45.47% 35.64%
50% 55.50% 34.92%
90% 66.67% 36.92%
Ten-fold split
Two-fold split
Results for the Amazon-GoogleProducts benchmark, using MLP
Consistent results across two other benchmarks, and
several experimental controls...7
What if I care more about recall than precision?
I should choose a two-fold split (unlike what the
literature would suggest)
What if I have sampled and labeled a lot of data(say
90% of the estimated ground-truth?)
An irrelevant concern, once the metric is
specified
Concluding the exercise
8
Takeaway: Some model selection decisions can bias other
model selection decisions, not always in an obvious way
How do we make informed model
selection decisions?
9
Decision-making and Model
Selection
Cognitive psychology has shown (empirically) that
human beings are neither logical nor rational
Wason Selection Task
Prospect Theory (awarded the 2002 Nobel Prize for
Economics)
“Reasoning about a rule.” Wason (1968)
“The logic of social exchange: Has natural selection shaped how humans
reason? Studies with the Wason selection task.” Cosmides (1989)
“Propsect theory: an analysis of decision under risk.” Kahneman and
Tversky (1979)
10
One systematic method is to
start by...
Visualizing decision-making biases through capturing
influences between decisions
Labeling
budget
Computational
resources
Training/
Validation
split
Performance
Metric
11
Decision
Concise approach: bipartite graphs
“Bipartite graphs and their applications.” Asratian et al. (1998)
Labeling
budget
Computational
resources
Training/
Validation
split
Performance
Metric
Node of influence
12
The interpretation of the nodes and edges is
abstract (we don’t impose strict requirements)
Hypothesizing about biases
The art in model selection: are there edges we should
consider removing/adding?
In the paper, we form at least four hypotheses that
directly translate to recommendations
Labeling
budget
Computational
resources
Training/
Validation
split
Performance
Metric
13
14
Experimental platform Collected over 25 GB of data on the Microsoft Azure ML platform
Used three publicly available benchmarks
15
Efficiency Recommendation 1
Validation is usually much faster than training,
especially for expressive classifiers
Run-time reductions of almost 70% with proportionally less
loss in effectiveness
Recommendation: consider favoring more validation over
training if speed is an important concern
16
Efficiency Recommendation 2
Validation is usually much faster than training,
especially for expressive classifiers
Grid search is no more effective than random search for
default hyperparameter values
Mean difference less than 0.99% and not statistically
significant
Recommendation: Favor random search in your
hyperparameter optimization as it is much faster (over 90%
run-time decrease)
17
Concluding notes
Hard problems (e.g. instance matching) require an
ingenious combination of heuristics, biases and models
Understanding decision-making biases can help us do
better model selection
Can also help to identify experimental confounds!
There are many proposals to visualize decision-making,
but not decision-making bias
We proposed a bipartite graph as a good candidate
The visualization is not just a pedantic exercise
About 25 GB of data shows that it can also be useful
Many future directions!
kejriwalresearch.azurewebsites.net
18
https://sites.google.com/a/utexas.edu/mayank-
kejriwal/projects/semantics-and-model-selection
What biases go into your
model selection process?
19