sense - data driven nyc // june 2014

22
This I Believe (Data Science Edition) [email protected] | @tristanzajonc Tristan Zajonc Cofounder of Sense

Upload: firstmark

Post on 18-May-2015

179 views

Category:

Technology


0 download

DESCRIPTION

Sense founder and CEO Tristan Zajonc presented at June's edition of Data Driven NYC. Sense allows you to collaborate on, scale, and deploy data analysis and advanced analytics projects radically faster.

TRANSCRIPT

Page 1: Sense - Data Driven NYC // June 2014

This I Believe(Data Science Edition)

[email protected] | @tristanzajonc

Tristan ZajoncCofounder of Sense

Page 2: Sense - Data Driven NYC // June 2014
Page 3: Sense - Data Driven NYC // June 2014
Page 4: Sense - Data Driven NYC // June 2014
Page 5: Sense - Data Driven NYC // June 2014

Business Intelligence ToolsTableau, Cognos, ChartIO, GoodData, Birst,

Platfora, Alpine Data Labs, Datameer, Looker, DataHero, Plotly, DataPad, Mode

Data Infrastructure Hadoop, Oracle, AWS, GCE, Azure

Cloudera, HortonWorks, Mortar Data, Qubole, Treasure Data, AltiScale, TempoDB,

Keen.IO, Cloudant, 1010Data, MongoDB

Data ProvidersBloomberg, Thomson-

Reuters, BlueKai, Axciom, ComScore

Analytical AppsGoogle Analytics,

Mixpanel, SiftScience Custora, YieldStar

AnalystsBusiness Users

DevelopersOps Engineers

Page 6: Sense - Data Driven NYC // June 2014

Business Intelligence ToolsTableau, Cognos, ChartIO, GoodData, Birst,

Platfora, Alpine Data Labs, Datameer, Looker, DataHero, Plotly, DataPad, Mode

Data Infrastructure Hadoop, Oracle, AWS, GCE, Azure

Cloudera, HortonWorks, Mortar Data, Qubole, Treasure Data, AltiScale, TempoDB,

Keen.IO, Cloudant, 1010Data, MongoDB

Data ProvidersBloomberg, Thomson-

Reuters, BlueKai, Axciom, ComScore

Analytical AppsGoogle Analytics,

Mixpanel, SiftScience Custora, YieldStar

AnalystsBusiness Users

Then a Miracle Occurs Data Scientists

DevelopersOps Engineers

Page 7: Sense - Data Driven NYC // June 2014

How do we make miracles occur?

Page 8: Sense - Data Driven NYC // June 2014
Page 9: Sense - Data Driven NYC // June 2014

!

Delivering data science is painful, slow, and costly.The Productivity Challenge

Page 10: Sense - Data Driven NYC // June 2014

There’s hope…

Page 11: Sense - Data Driven NYC // June 2014

Data Science Productivity Platforms(to be self-indulgent)

Page 12: Sense - Data Driven NYC // June 2014

https://senseplatform.com Also: Yhat, Domino Data Labs

Sense | Agile Data Science

Page 13: Sense - Data Driven NYC // June 2014

!

Our methods are not yet miraculous.The Power Challenge

Page 14: Sense - Data Driven NYC // June 2014

There’s hope…

Page 15: Sense - Data Driven NYC // June 2014

Probabilistic Programming

Page 16: Sense - Data Driven NYC // June 2014
Page 17: Sense - Data Driven NYC // June 2014

29

•  Shorter: Reduce LOC by 100x for machine learning applications •  Seismic Monitoring: 28K LOC in C vs. 25 LOC in BLOG •  Microsoft MatchBox: 15K LOC in C# vs. 300 LOC in Fun

•  Faster: Reduce development time by 100x •  Seismic Monitoring: Several years vs. 1 hour •  Microsoft TrueSkill: Six months for competent developer vs. 2 hours with Infer.Net •  Enable quick exploration of many models

•  More Informative: Develop models that are 10x more sophisticated •  Enable surprising, new applications •  Incorporate rich domain-knowledge

•  Produce more accurate answers •  Require less data •  Increase robustness with respect to noise •  Increase ability to cope with contradiction

•  With less expertise: Enable 100x more programmers •  Separate the model (the program) from the solvers (the compiler),

enabling domain experts without machine learning PhDs to write applications

The Promise of Probabilistic Programming Languages

Probabilistic Programming could empower domain experts and ML experts

Sources: •  Bayesian Data Analysis, Gelman, 2003 •  Pattern Recognition and Machine Learning,

Bishop, 2007 •  Science, Tanenbaum et al, 2011

DISTRIBUTION STATEMENT F. Further dissemination only as directed by DARPA, (February 20, 2013) or higher DoD authority.

Source: http://www.darpa.mil/Our_Work/I2O/Programs/Probabilistic_Programming_for_Advanced_Machine_Learning_(PPAML).aspx

Page 18: Sense - Data Driven NYC // June 2014

Deep Reinforcement Learning

Page 19: Sense - Data Driven NYC // June 2014
Page 20: Sense - Data Driven NYC // June 2014

0

50

100

150

200

250

0 10 20 30 40 50 60 70 80 90 100

Ave

rage R

ew

ard

per

Epis

ode

Training Epochs

Average Reward on Breakout

0

200

400

600

800

1000

1200

1400

1600

1800

0 10 20 30 40 50 60 70 80 90 100

Ave

rage R

ew

ard

per

Epis

ode

Training Epochs

Average Reward on Seaquest

0

0.5

1

1.5

2

2.5

3

3.5

4

0 10 20 30 40 50 60 70 80 90 100

Ave

rage A

ctio

n V

alu

e (

Q)

Training Epochs

Average Q on Breakout

0

1

2

3

4

5

6

7

8

9

0 10 20 30 40 50 60 70 80 90 100

Ave

rage A

ctio

n V

alu

e (

Q)

Training Epochs

Average Q on Seaquest

Figure 2: The two plots on the left show average reward per episode on Breakout and Seaquestrespectively during training. The statistics were computed by running an ✏-greedy policy with ✏ =

0.05 for 10000 steps. The two plots on the right show the average maximum predicted action-valueof a held out set of states on Breakout and Seaquest respectively. One epoch corresponds to 50000minibatch weight updates or roughly 30 minutes of training time.

Figure 3: The leftmost plot shows the predicted value function for a 30 frame segment of the gameSeaquest. The three screenshots correspond to the frames labeled by A, B, and C respectively.

5.2 Visualizing the Value Function

Figure 3 shows a visualization of the learned value function on the game Seaquest. The figure showsthat the predicted value jumps after an enemy appears on the left of the screen (point A). The agentthen fires a torpedo at the enemy and the predicted value peaks as the torpedo is about to hit theenemy (point B). Finally, the value falls to roughly its original value after the enemy disappears(point C). Figure 3 demonstrates that our method is able to learn how the value function evolves fora reasonably complex sequence of events.

5.3 Main Evaluation

We compare our results with the best performing methods from the RL literature [3, 4]. The methodlabeled Sarsa used the Sarsa algorithm to learn linear policies on several different feature sets hand-engineered for the Atari task and we report the score for the best performing feature set [3]. Con-tingency used the same basic approach as Sarsa but augmented the feature sets with a learnedrepresentation of the parts of the screen that are under the agent’s control [4]. Note that both of thesemethods incorporate significant prior knowledge about the visual problem by using background sub-traction and treating each of the 128 colors as a separate channel. Since many of the Atari games useone distinct color for each type of object, treating each color as a separate channel can be similar toproducing a separate binary map encoding the presence of each object type. In contrast, our agentsonly receive the raw RGB screenshots as input and must learn to detect objects on their own.

In addition to the learned agents, we also report scores for an expert human game player and a policythat selects actions uniformly at random. The human performance is the median reward achievedafter around two hours of playing each game. Note that our reported human scores are much higherthan the ones in Bellemare et al. [3]. For the learned methods, we follow the evaluation strategy usedin Bellemare et al. [3, 5] and report the average score obtained by running an ✏-greedy policy with✏ = 0.05 for a fixed number of steps. The first five rows of table 1 show the per-game average scoreson all games. Our approach (labeled DQN) outperforms the other learning methods by a substantialmargin on all seven games despite incorporating almost no prior knowledge about the inputs.

We also include a comparison to the evolutionary policy search approach from [8] in the last threerows of table 1. We report two sets of results for this method. The HNeat Best score reflects theresults obtained by using a hand-engineered object detector algorithm that outputs the locations and

7

Figure 1: Screen shots from five Atari 2600 Games: (Left-to-right) Pong, Breakout, Space Invaders,Seaquest, Beam Rider

an experience replay mechanism [13] which randomly samples previous transitions, and therebysmooths the training distribution over many past behaviors.

We apply our approach to a range of Atari 2600 games implemented in The Arcade Learning Envi-ronment (ALE) [3]. Atari 2600 is a challenging RL testbed that presents agents with a high dimen-sional visual input (210 ⇥ 160 RGB video at 60Hz) and a diverse and interesting set of tasks thatwere designed to be difficult for humans players. Our goal is to create a single neural network agentthat is able to successfully learn to play as many of the games as possible. The network was not pro-vided with any game-specific information or hand-designed visual features, and was not privy to theinternal state of the emulator; it learned from nothing but the video input, the reward and terminalsignals, and the set of possible actions—just as a human player would. Furthermore the network ar-chitecture and all hyperparameters used for training were kept constant across the games. So far thenetwork has outperformed all previous RL algorithms on six of the seven games we have attemptedand surpassed an expert human player on three of them. Figure 1 provides sample screenshots fromfive of the games used for training.

2 Background

We consider tasks in which an agent interacts with an environment E , in this case the Atari emulator,in a sequence of actions, observations and rewards. At each time-step the agent selects an actionat from the set of legal game actions, A = {1, . . . ,K}. The action is passed to the emulator andmodifies its internal state and the game score. In general E may be stochastic. The emulator’sinternal state is not observed by the agent; instead it observes an image xt 2 Rd from the emulator,which is a vector of raw pixel values representing the current screen. In addition it receives a rewardrt representing the change in game score. Note that in general the game score may depend on thewhole prior sequence of actions and observations; feedback about an action may only be receivedafter many thousands of time-steps have elapsed.

Since the agent only observes images of the current screen, the task is partially observed and manyemulator states are perceptually aliased, i.e. it is impossible to fully understand the current situationfrom only the current screen xt. We therefore consider sequences of actions and observations, st =x1, a1, x2, ..., at�1, xt, and learn game strategies that depend upon these sequences. All sequencesin the emulator are assumed to terminate in a finite number of time-steps. This formalism givesrise to a large but finite Markov decision process (MDP) in which each sequence is a distinct state.As a result, we can apply standard reinforcement learning methods for MDPs, simply by using thecomplete sequence st as the state representation at time t.

The goal of the agent is to interact with the emulator by selecting actions in a way that maximisesfuture rewards. We make the standard assumption that future rewards are discounted by a factor of� per time-step, and define the future discounted return at time t as Rt =

PTt0=t �

t0�trt0 , where T

is the time-step at which the game terminates. We define the optimal action-value function Q

⇤(s, a)

as the maximum expected return achievable by following any strategy, after seeing some sequences and then taking some action a, Q⇤

(s, a) = max⇡ E [Rt|st = s, at = a,⇡], where ⇡ is a policymapping sequences to actions (or distributions over actions).

The optimal action-value function obeys an important identity known as the Bellman equation. Thisis based on the following intuition: if the optimal value Q

⇤(s

0, a

0) of the sequence s

0 at the nexttime-step was known for all possible actions a

0, then the optimal strategy is to select the action a

0

2

Intelligence measures an agent’s ability to achieve goals in a wide range of environments.

Page 21: Sense - Data Driven NYC // June 2014

To realize the potential of data science, we must radically increase both the power and productivity of data scientists and

data-driven organizations.

Page 22: Sense - Data Driven NYC // June 2014

To realize the potential of data science, we must radically increase both the power and productivity of data scientists and

data-driven organizations.

Thankfully, the future is bright!