plotcon 2016 visualization talk by alexandra johnson

Visualizing Abstract Concepts inMachine Learning

PICAlexandra Johnson

___________Software Engineer @ SigOpt

#MachineLearning #MLViz

Visualizing Abstract Concepts in Machine Learning | 1


What is Machine Learning?

Versicolor

Setosa

Virginica

Training Data + Model -> Labels (Classification)or Numbers (Regression)

Why is this so Intimidating?


In-brower deep neural net from playground.tensorflow.org

Hyperparameters = yourmodel's magic numbers Examples: learning rate, ratioof train to test data, numberof hidden layers, neurons perhidden layerHyperparameter values mustbe set before training

Solution: Hyperparameter OptimizationAnd four visualization challenges


Values you choose for yourhyperparameters have adirect effect on theperformance of your modelHard to capture interactionsof 20 hyperparameters

20 Dimensional Math is Hard


−15 −10 −5 0 5

0.2

0.4

0.6

0.8

1

log_C

Accuracy



First try: graph modelperformance vshyperparameter value For every hyperparameterGood for understandingindivudal hyperparameters,bad for understandinginteractions

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Accuracy



Graph up to 4 dimensions atonce: x, y, z axis + colorHard to visualize 4dimensions at once, imagine20!Maybe you want to use analgorithm to handlehyperparameter optimization


Hyperparameter OptimizationStrategies are Different

Grid Search Random Search Bayesian Optimization

Some Strategies ProduceBetter Results

0.96 0.97 0.98 0.990

5

10

15

20

25

Distribution of Best Found Values over Experiments of 25 Iterations

Maximum Accuracy

Ex

pe

rim

en

ts


Experiment = optimizinghyperparameters of yourmodel, results in somemaximum performanceSome hyperparameteroptimization strategies arestochastic, can't just look atone experimentLook at distribution ofmaximum performance overmany experiments optimizinghyperparameters of the samemodel

Some Strategies ProduceBetter Results

0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 10

5

10

15

20

25

Distribution of Best Found Values over Experiments of 25 Iterations

Maximum Accuracy

Ex

pe

rim

en

ts

Random Search

Grid Search

Bayesian Optimization


Use the Mann-Whitney U Test to compare distributions ofmaximum performance

Some Strategies ProduceBetter Results, Faster

0 5 10 15 20

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Best Seen Trace

Timestep

Be

st

Se

en

Accu

ra

cy


How much time do you havefor optimization?Strategies that reliablyproduce better results fastercan optimize thehyperparameters of yourmodel in less time


0 5 10 15 20

0.4

0.5

0.6

0.7

0.8

0.9

1

Interquartile Range of Best Seen Traces

Timestep

Be

st

Se

en

Accu

ra

cy


Again, consider a distributionof optimization experiments25th - 75th percentile ofperformance our modelcould acheive if we stoppedearly


0 5 10 15 20

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Interquartile Ranges of Best Seen Traces

Timestep

Be

st

Se

en

Accu

ra

cy

Grid Search

Random Search

Bayesian Optimization


Compare the area under thecurve of different strategies Further reading atsigopt.com/research

Takeaways


Hyperparameter optimization is an invaluable part of any modernmachine learning pipeline

Concepts like comparing hyperparameter optimization strategiesare extremely abstract and difficult to understand

Visualizations are in their infancy, but are an important part ofexplaining these ideas

Thank You!


Email: [email protected]: @alexandraj777

www.sigopt.com