sds podcast episode 345: machine learning at twitter · 2020-03-04 · kirill eremenko: and now, on...

SDS PODCAST

EPISODE 345:

MACHINE

LEARNING AT

TWITTER

http://www.superdatascience.com/345

Kirill Eremenko: This is episode number 345, with Senior Machine

Learning Engineer at Twitter Cortex, Dan Shiebler.

Kirill Eremenko: Welcome to the SuperDataScience podcast. My name

is Kirill Eremenko, Data Science Coach and Lifestyle

Entrepreneur. And each week we bring you inspiring

people and ideas to help you build your successful

career in Data Science. Thanks for being here today.

And now, let's make the complex, simple.

Kirill Eremenko: Welcome back to the SuperDataScience podcast ladies

and gentlemen and everybody. I'm super pumped for

today's episode. But before we proceed, I've got a

question for you: Are you an advanced data scientist?

Have you been listening to this podcast for a while?

Well, maybe it's time to come meet in person. Do you

know that at DataScienceGO in the USA in October

23, 24, 25, in Los Angeles, which we are running in

the UCLA Campus, we're going to have a special track,

a unique track dedicated to advanced practitioners in

data science?

Kirill Eremenko: If you come to this track, you will hear from people like

Dan Shiebler, who you'll meet on today's podcast and

who works at Twitter Cortex, that'll already be an

exciting thing and he'll be talking about task

engineering. Dan has presented at DataScienceGO

twice already, this will be his third, if you're , this time

he will be focused on advanced practitioners. So you'll

learn about task engineering. And what is that? Well,

that's a very interesting thing, you'll hear about it

more on this podcast. Basically it's you designing and

understanding what you will do with your model, what

happens to your model when it actually affects the


underlying population which is used to train that

model. Very interesting, kind of like an inception type

of scenario but it happens and it happens in Twitter

Cortex. You will find out what you can do, a very

hands on workshop for advanced practitioners.

Kirill Eremenko: Also, Morgan Mendis will be flying in all the way from

Haiti, to talk about Airflow and how to build data

science pipelines. Sinan Ozdemir will be coming to talk

about Flask, Django, Docker containers and

Kubernetes, or, we're still deciding which topic, or it

will be BERT, Transformers and Architecture and use

cases for NLP and BERT. So as you can see already

just from these three examples, I'm personally hand-

picking the advanced practitioners to come talk with

you. So if you are interested to sky rocket your data

science career as an advanced practitioner, head on

over to datasciencego.com and get your ticket today.

Kirill Eremenko: And now, on to today's episode. So the guest for

today's episode is Dan Shiebler. It's been over two

years since he came on the podcast last time in June

2017. Since then, a lot has changed in his career. He's

left TrueMotion and now he works at Twitter Cortex

and there is going to be a lot of interesting components

to this podcast both on the technical side and the

career side. So definitely you'll learn about Dan's

career, how he makes his choices of moving and select

the companies he works for and why; why he's, in

parallel to his career, doing PhD research, how that

affects the lens through which he sees things in his

career. Very interesting conversation, and his PhD,


btw is on category theoretic machine learning. Boom.

Blows your mind. Blew my mind for sure.

Kirill Eremenko: You'll learn about his work at Twitter Cortex, in fact,

you'll learn about it quite intimately, so the things of

course we could discuss. You'll learn about vectors,

embedding nearest neighbors techniques and

methodologies that he uses in his work. So in a

nutshell a podcast packed with value. Can't wait for

you to hear it, so without a further ado, let's welcome

to the show Dan Shiebler, Senior Machine Learning

Engineer at Twitter Cortex.

Kirill Eremenko: Welcome back everybody to the SuperDataScience

podcast, very excited to have you on the show,

because today we've got a very special guest, returning

for the second time, Dan Shiebler calling in from New

York.

Kirill Eremenko: Dan, how you going today?

Dan Shiebler: I'm doing great Kirill. How are you?

Kirill Eremenko: Very good as well. And it's crazy, like you said just

now, that there was no snow in New York the whole

winter.

Dan Shiebler: There's a week when I was away and I heard that there

was snow during that week, but I haven't seen any at

all the past few months.

Kirill Eremenko: That's insane. Hopefully things get better. This global

warming thing is getting quite out of hand, it was

really hot in Australia the whole summer. Strange.

Anyway Dan, it's been a while. I looked, our previous

podcast was June 2017, that's two and a half years


ago. And last time we caught up in person was at

DataScienceGO 2018, which was one and a half years

ago. How have you been since then?

Dan Shiebler: Doing great. Lots of really interesting projects I've been

working on, but great things happening.

Kirill Eremenko: And you changed jobs, congrats. You've been at

Twitter, so you were at TrueMotion before, now you're

at Twitter Cortex, you've been there for over two years.

That's awesome, congratulations man.

Dan Shiebler: Thank you. It's been great.

Kirill Eremenko: What's the best thing about Twitter?

Dan Shiebler: Well, it's a really fast-growing platform, but it also has

really established niche. There's a lot of people who

consider Twitter their favorite social network. It

balances real high quality information and

lighthearted commentary. I think that it's in a real

sweet spot as far as social networks go. I'm really

proud to be working here.

Kirill Eremenko: And it's interesting that it's managed to stay that way

for what is it? A decade now, or more. I remember I

was at university and we had one lecturer, this was

around 2012, or maybe 2011, we had one lecturer, he

was teaching about an unrelated topic, was about

finance. But he was talking about how, looking at

some theories of how things change in the world. And

how there are cycles of certain phenomena. And he

was giving an example of social media, saying,

Facebook Twitter. I remember his comments like

Twitter specifically. He said that Twitter is not going to


be around in four years, that was the predication for

2015. Because, simply, naturally, we see that with a

lot of things, with a lot of other platforms that used to

be popular. Facebook was very popular back in the

day, but now there's so many alternatives. I haven't

been on Facebook for years now.

Kirill Eremenko: But, it's very impressive that Twitter has managed to

adapt and stay afloat. What are your comments on

that? Why do you think it's so successful in that way?

Dan Shiebler: I think we have a good understanding of our user

base. And we have a willingness to change, but also a

really deep understanding of what are the things that

made Twitter popular in the first place. I think

knowing our strengths and knowing our users, gives

us a real advantage.

Kirill Eremenko: Yeah. Okay. What is, you'd say, is a way that Twitter

has changed, even since you've joined. You've been

there what, just over two years. What's a way you've

seen Twitter adapt to the changes in user needs over

that time?

Dan Shiebler: We recently launched a topics product, that allows

users to follow individual topics, rather than just

following accounts. While following accounts is an

excellent way to consume information for many Twitter

users, some users have difficulty finding all of the

information that they want on topics that they're

interested in. And topics is designed to better serve

those users. I think this initiative and this product and

family of products, really came out of an

understanding of what are the challenges that some


users have with the platform. And how can we better

serve the users who we're not serving as well right

now. And bring them to see all the things that are

great about Twitter.

Kirill Eremenko: Mm-hmm (affirmative). Fantastic. That's a really cool

thing. I use Google Alerts, I don't use them that often,

but I've set up a Google Alert for data science. And

once a week I get an update on the most trending data

science topics from Google. So, something like that,

right?

Dan Shiebler: Yeah. It's very similar. We're still, every day, trying to

make it better. And really understand how we can best

serve users on the product service.

Kirill Eremenko: Okay. Got you. And apart from Twitter, which we'll

definitely get back to, you've got so much going on in

your life. When I looked at your LinkedIn, it was so

exciting to see. You finished up that research at Brown

University on deep learning, but you seem to never

allow yourself to slack off or get bored. You started a

PhD, that's is so cool man. That's awesome.

Dan Shiebler: Thank you.

Kirill Eremenko: Why'd you decide to start one?

Dan Shiebler: I really enjoyed my research at Brown. I think that we

really got a lot of high quality research and it was very

exciting to me to have a pursuit, separate from my

work, that allowed me to see things from a more

academic mindset. And with more academic

incentives. And I feel like that has really helped me

grow. And doing this PhD is really something that lets


me continue to do that at a more formal, and more

intense pace. I'm very excited about it, really enjoy it.

Kirill Eremenko: What's the topic of your PhD? Well, I see it on

LinkedIn, but if you can share for our listeners.

Dan Shiebler: The topic is on category theory and machine learning.

It's a really defining category theoretic constructions

for discussing and researching and understanding the

links between different kinds of machine learning. And

different kinds of fields that are closely related to

machine learning. Category theory is a branch of

mathematics that has shown a lot of applications in

unifying previously disparate areas of mathematics.

And there's been a good push recently in applied

category theory in taking category theoretic ideas and

trying to apply the same kind of unification powers to

more applied fields. Like a game theory, or biology, or

physics. There's been some really great research on

quantum physics, that's come from category theoretic

perspectives. And I'm trying to utilize these same tools

to increase our understanding of machine learning.

Kirill Eremenko: Wow, really cool. What's an example you can give of

unification in mathematics through categories theory?

Dan Shiebler: There's some aspects of algebraic typology that were

previously separate from similar concepts in analysis.

And similar concepts in SEP theory that when we take

a category theoretic standpoint and zoom out, and

look at these things at a higher level of abstraction, we

can see these individual constructions as particular

instantiations of a higher order structure. It's allows

us to say, oh, these different kinds of transformations,


they all satisfy these key properties that make it this

particular kind of category theoretic transformation.

Dan Shiebler: And then, when we're operating at that level of

abstraction, we can simultaneously prove theorems

about each of these different subareas of mathematics.

We're talking about things at this higher level. And

let's just get many theorems for free, is one of the key

tenets of category theory. That's similar to how a

programmer might structure their programs so that

core components only need to be implemented once,

rather than multiple times. Category theory allows us

to have that same degree of abstraction on multiple

fields of mathematics, or ideally, applied fields as well.

Kirill Eremenko: Wow. Fantastic. I love your tagline, theorems for free.

You should put that in your subtitle on LinkedIn or

something.

Dan Shiebler: I'll consider it. I didn't come up with it, but it does do a

good job of describing what it is.

Kirill Eremenko: When you were talking about that, a few light bulbs

went off in my head, because it's been while, but in my

Bachelor Degree, I had algebraic typology back in my

high school. And I had SEP theory in my degree. And I

remember feeling these two are very similar, this SEP

theory stuff. I've seen it somewhere, it really looks a lot

like what we did in high school and stuff like that.

Again, it's been a while, so I wouldn't be able to come

up with examples, but I can see the value in that.

Kirill Eremenko: And that's a really cool, very abstract though, thing to

be doing. I'm just curious, it's not really linked to your

work at Twitter, is it?


Dan Shiebler: There are links, but the links I would say are at a very

high level, birds-eye view. And I would say the fact that

the day-to-day work is very different, was largely by

design. It's very challenging to have a full day at a job,

even a job that you love, and then go home and do lots

of other work that feels very similar, with similar

frustrations and similar problems.

Dan Shiebler: One of the benefits of focusing my PhD research on

something so different, and so much more theoretical

from the concerns that I focus on at work, is that

when I do, one, it doesn't exhaust me for the other.

Each one allows me to work a different part of my

mind, that will allow other parts to rest.

Dan Shiebler: That really is a nice way to balance things in a more

holistic fashion.

Kirill Eremenko: Love it. And that's similar to what you did back when

you were at TrueMotion and you were doing the

research in Brown University. At TrueMotion you were

doing machine learning, but at Brown University you

were doing research on deep learning, as far as I

remember.

Dan Shiebler: Yes. My research at Brown was far more focused on

standard deep learning for image modeling, whereas

my work at TrueMotion was much more focused on a

signal processing perspective on machine learning.

And machine learning for signal processing

applications. It was very similar in that the two main

pushes had overlap, but felt very different.

Kirill Eremenko: And it's not like you have to do this research. It feels

like you're doing it more as a hobby. Is there any other


motive for doing research? Or maybe, somebody

listening to this podcast might think, oh, wow, maybe I

should get into research too. What's the reason to

continuously doing research? First at Brown, now

you're doing your PhD at Oxford. Any comments on

that?

Dan Shiebler: I think there's a lot of reasons. I would say that for me,

I genuinely enjoy it. I enjoy it at a very deep level. And

I don't think it would be the right thing for someone to

do, who didn't really genuinely enjoy it. But I think

that there are a lot of significant benefits beyond just

my own enjoyment. Really giving myself the

opportunity to look at research from the perspective of

a researcher, as opposed to perspective of a

practitioner, allows me to see ideas on a deeper level

than just, is this the right tool for me to use for this

job right now? And more to think about things in

terms of their deeper applications and other work that

they might open up, other avenues of exploration.

Dan Shiebler: That kind of perspective, I think, it makes me smarter,

it makes me more creative and it really gives me the

ability to learn new things much more easily. It's far

easier for me to pick up ideas on some other team at

Twitter who I haven't worked with. Really understand

what they're doing. And understand the kinds of

problems that they face, because I've drilled myself to

be able to understand really complex topics, really

quickly, through my research experience.

Kirill Eremenko: Wow. Very distinct explanation. There you go. If

somebody listening to this is, I agree, is passionate

about a topic, maybe consider research in that space-


Dan Shiebler: Absolutely.

Kirill Eremenko: It has its advantages like that. Let's get to your work

on Twitter. I read on LinkedIn the description, how you

describe your role, very cool description. You develop

systems and models to improve the performance and

efficiency of machine learning at Twitter. It's like you're

doing machine learning on machine learning, almost.

Tell us about that.

Dan Shiebler: I think in order to really give it context, I can define

the role of Cortex in general, then how I fit into that.

Twitter's engineering is split across a large number of

product teams, serving different kinds of product

services. The timeline, the advertisement products, the

notifications products, or email products. Cortex is an

organization within Twitter that develops machine

learning systems and components of machine learning

models that are incorporated into the modeling

pipelines, each of these different things.

Dan Shiebler: And our work is at a level of abstraction, similar to the

notion of category theory, where we are developing

things that fit into multiple different products. Often

what we will do is, we will assemble a couple of

different product services that seem like they can be

solved with a particular modeling approach. They seem

to share similar restrictions on their current

performance. And identify different ways that we can

develop models that would serve each of these product

surfaces. My team in particular focuses on models that

utilize embeddings and nearest neighbors to serve

products where we need to match users or other

things, mainly users with large sets of possible


candidate content. Like a large set of Tweets, or large

set of potential notifications.

Kirill Eremenko: Interesting. So embeddings and nearest neighbors,

let’s talk about nearest neighbors for a second. For

nearest neighbors, or for any kind of categorical

machine learning, I would expect you need a range of

deals or a range of columns to be working with. What

kind of columns are you working with at Twitter?

Because, there's mostly just the Tweets that people

have.

Dan Shiebler: In this case, if we're serving nearest neighbors, what

we would be doing is first, the nearest neighbors are

defined over an embedding space. The columns in this

case are the embedding dimensions.

Kirill Eremenko: What is embedding? Sorry, can you get me up to speed

please, what is an embedding space?

Dan Shiebler: An embedding in this cart is just a vector

representation of some kind of entity. For instance, it

could be a 300 dimensional vector, or 1000

dimensional vector, that represents a user, or a Tweet.

The embedding plus nearest neighbors approach, for

recommending content, involves constructing

embeddings for users and constructing embeddings for

content. Such that, the distance, or angle between two

embeddings is indicative of some notion of affinity, or

similarity where users will be close in embedding

space to Tweets that they might like. We can utilize

this to then create these users and all of these Tweets

and then find nearest neighbors in the space to

recommend content.


Kirill Eremenko: Got you. And then other teams can use the

embeddings you've created to run their machine

learning.

Dan Shiebler: Indeed. They could use the embeddings we create as

features, or they can use the pipelines that we build to

create the embeddings in the first place to create new

embeddings that are optimized for their surface. And

sometimes these new embeddings are constructed on

top of other embeddings and everything will feed into

each other.

Kirill Eremenko: Wow, that's so crazy. If you're able to share, because

I'm sure there's parts which you can’t disclose due to

proprietary information, but if you're able to share, do

the embeddings that you create for users that are

indicative of... If two of these vectors, 1000 dimension

vectors are close, have a very little angle between each

other, then that means the users are close in their

behavior or in their characteristics. If two vectors for

content are close, that means maybe somebody who

will like this content, will like that content as well. I

can see the implications of that. What goes into the

embeddings in the first place? And going back to the

question of what are the original features? Apart from

their Twitter text, the messages they send or people

they follow, there's no already transactional data that

this person on Netflix, or Amazon, that this person

purchased these items. Basically, these are their

specific interests. Is it all to do with NLP? I'm just

curious. What goes into the embeddings in the first

place?


Dan Shiebler: Great question. To start, I would say that NLP while

very important for many use cases, for the purpose of

recommendation, is actually a very myopic view on the

structure of a Tweet. And the reason for this is that

Jack Dorsey Tweeting a single-word Tweet, like hello,

or something like that, had very different set of users

who I might want to recommend that to than if I

Tweeted hello, or something like that.

Dan Shiebler: Often the most useful information that when we can

look at a Tweet, from the Tweet perspective can be the

people who have interacted with the Tweet. And the

dynamics of the author of the Tweet. There's a huge

amount of information on Twitter that's represented in

terms of the engagement graph and the follow graph.

The follow graph is just the relationship between all of

the users, based on who follows who. The engagement

graph is the relationship between users and Tweets, as

well as users and users, based on users choosing to

like Tweets.

Dan Shiebler: Users choosing to like, or reply to, or retweet Tweets.

And these kinds of behaviors incorporate an enormous

amount of information. A Tweet that has 100 likes, all

from machine learning focused people, really gives a

very strong indication that the Tweet is about machine

learning. And we can drill down very deeply into

content utilizing this kind of social or contextual

information, we often refer to as [inaudible 00:26:38]

to collaborative filtering.

Kirill Eremenko: Wow. Okay. Got you. Basically, you can even extract

information that this Tweet is about machine learning,

based on the likes it's getting, from whom it's getting


those likes, without even digging into the processing of

the text within that Tweet.

Dan Shiebler: Yeah. There are of course limitations to this.

Sometimes a Tweet might be about a social issue that

is popular among machine learning researchers. Or, a

personal issue related to a popular personality

machine learning, may end up with a very similar like

profile to one that is about machine learning at a more

core level. But often for recommendation use cases,

understanding which communities of people are

interested in something and representing something in

terms of that perspective, can be the most rich way to

get the kinds of information that the model needs to

know. There is a lot more information that can be

driven out of the Tweet text itself and we do of course

extract this and utilize this. But, in general, if you had

to choose between just the collaborative information,

or just the Tweet text information, the collaborative

information would win without a doubt.

Kirill Eremenko: Oh, very interesting. And what kind of tools do you use

for this?

Dan Shiebler: We have stacks instructed and Scala and Python at

the language level. Our modeling is almost entirely

done in TensorFlow, from the perspective of all neural

networks and such. We do have a number of in-house

matrix factorizations, style tooling that's written in

Hadoop or Spark that's used for some applications. We

do have a very big deployments that uses a piece of

software that we open-sourced, called Scalding, which

is a scholar-based Hadoop tooling. That works quite


well for constructing really large Hadoop jobs that can

operate the Twitter scale.

Kirill Eremenko: Okay. Is that a good description of what a machine

learning engineer does, is that you prepare machine

learning tools for other people and departments to

use?

Dan Shiebler: I would not say that it's just a construction of tools.

Machine learning engineers at Twitter fall into a couple

of different categories. Myself, I would not say that my

job would described in that way. My work is more

around the construction of instantiations of the

embedding pipeline. My team often partners with

product teams. And we have our own set of tooling and

our own set of systems that are really designed for

constructing these kinds of embedding nearest

neighbor pipelines. We will actually construct the

models and help other teams construct these models

in a more consultancy fashion.

Dan Shiebler: But there are engineers within Cortex whose role is to

create the deep learning model deployments or the

platform tooling for analyzing data or scheduling

model reruns. There's a spectrum of these more core

engineering tasks and more direct modeling and

machine learning model creation tasks.

Kirill Eremenko: Mm, okay. It's like a variety of things. Got you. And

you mentioned team a couple of times, how big is your

team? And which part of the team are you working in?

Dan Shiebler: Cortex as a whole, is about 150 people, of which, my

team is in the sub-organization focused largely on

platform. And our team is about 10 people. My role is


really more focused on model construction and

understanding of the relationship between models and

business value. My team has some people who are

more focused on the optimization of our nearest

neighbor pipelines, which are highly optimized and

state of the art. And some people, who are more

focused on the core software development as well.

Kirill Eremenko: Got you. When you say nearest neighbor, does that

mean you went and consciously selected for your

characterization, the nearest neighbor algorithm or, is

that just a broad way of saying we're finding the

nearest neighbors? Because there's other methods of

clustering that could be used to group users into

groups. Or finding, as you said, doing this

collaborative filtering. Just a question around that.

Dan Shiebler: We actually use an approximate nearest neighbors

system. And the reason why we selected that, is based

on scale. The reason is because we're not simply

grouping users together, but we're trying to find the

nearest content for each user. If we're in a situation

where we have 300 million users and half a billion

Tweets in a particular day, when we're trying to match

for each user, the best Tweets. Exhaustively looking at

each user-Tweet pair is completely not scalable. 300

million times 500 million operations and many

standard strategies would require utilizing something

like that. Our approximate nearest neighbor systems

allow us to dramatically optimize this, by constructing

these graphs of Tweets that the user's embedding

essentially traverses. That's a whole topic that's very

interesting, the construction and optimization of these


algorithms for really allowing the user to content

pairing process to scale.

Dan Shiebler: There are other solutions of course, that can solve the

same thing, but clustering is one that you mentioned.

This one is nice because of its connection to the

amount of flexibility that we have in the construction

of the embedding. The embedding itself can be

constructed such that the distance relationships are a

model output. And any kind of machine learning

technique can be utilized there. And deployed,

essentially, at scale for free and that's a very attractive

aspect of that family [crosstalk 00:34:32].

Kirill Eremenko: Wow, very cool. I was just thinking, when you gave the

example of 300 million times 500 million operations,

do you think if quantum computing picks up, that

you'd be able to solve it completely differently? Just

look at all the pairs and find the-

Dan Shiebler: I think there's all kinds of optimizations that we do

right now, that would be unnecessary if we had access

to quantum computers. And in the deployment of

machine learning models, certainly, but in the training

as well. There's many things we're not able to do

because of scale restrictions in terms of data

collection, pipelines and such, that would be

completely overhauled in the presence of really

effective quantum computers.

Kirill Eremenko: That's very cool. Have you looked into quantum

computers quite a bit?

Dan Shiebler: Not as much as I'd like. There's a research group at

Oxford that does a lot of interesting research in the


intersection of category theory and quantum

computers. Utilizing category theory to make some

quantum computing ideas much simpler and easier to

build on top of. But, I can't say that I'm familiar with it

more than on a surface level.

Kirill Eremenko: Okay. Wow. It's a very exciting topic and I can't wait to

see what happens when the quantum computers

come. Thanks for such an interesting description of

your role at Twitter. It's very exciting and I can see

how you're super pumped to go to work every day.

[crosstalk 00:36:23] and come back and do you

research. That is really cool.

Kirill Eremenko: I wanted to shift gears a little bit. For our listeners, we

have an exciting announcement. Dan was at

DataScienceGO 2018 and you are coming back this

year, very excited to have you back. How are you

feeling about that?

Dan Shiebler: I'm feeling great, looking forward to it.

Kirill Eremenko: And as we discussed, this time, we're considering

aiming for a more advanced practitioner talk for Dan.

Out of all the things that you've talked about, if you

had to pick a topic for your talk right now, what's the

first thing that comes to mind? In a hands on type of

workshop, what is it that you would be passionate to

share with the audience?

Dan Shiebler: Absolutely task engineering. The process of creating a

machine learning task where a model that does well on

that task, will actually drive business value. The

creation of models that can tie closely to core value, I

think is something that is a real science that I've


continued to learn about. And I think is one of the

most important areas in machine learning and data

science for people to understand at a deep level.

Kirill Eremenko: Interesting. Can you give an example? What's an

example of task engineering in business?

Dan Shiebler: At Twitter, for example, when we train our models on

Tweets or users, or any sort of data, we need to be very

careful about how the models that we deploy affect the

data that we're training on. A model that's already

trying to show users content that it thinks they'll like

is corrupting the quality of the training data that feeds

back into the model, in that the distribution is

shifting. The task that we are constructing for the

model, when we retrain it, is now worse than it was

originally. The awareness of these kinds of issues, and

the construction of the model task and the pipelines

that support it in a way such that increased model

performance will continue to increase business metrics

is a really deep science that has enormous

applications.

Kirill Eremenko: Wow, that's very cool. That's a very good description,

because the data you're dealing with, you're dealing

with users. And now I think about it, that would be

applied across most business cases. The only

situations where that wouldn't be relevant is when, for

instance the data you're analyzing is the national

cohort, or the global cohort. Like a massive sample of

people, you're analyzing census data or even daily

data, but in a much greater ecosystem than your own

company. And then you're applying it to your

company. Then, what happens in your user base, with


your company, doesn't really effect the world that

much. But, for instance, an example, if you're

analyzing stock prices and then you go and buy some

Tesla stock or you sell something else, Apple stock,

that's not going to effect the world. You can keep

analyzing the same way you were analyzing before. But

in your case, you're directly impacting the whole user

base with your model.

Dan Shiebler: Absolutely. These kinds of decisions and how these

decisions can be the difference in user consumption, is

really critical. Things like if we start sending bad

notifications to users, and users start opting out of the

notifications, then we're in a situation where we no

longer have a data coming from users who really didn't

like their notifications. And a model that now starts

performing well on this new setting and this new

world, where we don't have this data from the users

who didn't like notifications, is not actually the best

model. And the understanding of that, as we construct

a task, such that the best model on that task is

actually the best model for deployment, is really

critical.

Kirill Eremenko: Wow, that's such a cool teaser. I want to come to this

workshop now, this is exciting. Awesome. Thank you

very much. This is going to pique people's interest in

the event and also specifically your talk. If you want to

learn about task engineering, check out Dan's talk at

DataScienceGO 2020, 23rd, 24th, 25th October.

Kirill Eremenko: And now, I wanted to jump into something really cool.

I'm not sure if you saw, but 24 hours ago, I posted a

question on LinkedIn, I was about to say Twitter, on


LinkedIn. That's where I hang out more for some

reason, it's happened that way. And I posted a

question for our followers or our audience to post

questions for you and see what they want to ask you

on the podcast. We're going to go through these.

Kirill Eremenko: Are you ready for some rapid fire questions?

Dan Shiebler: All right. See what I can do.

Kirill Eremenko: All right. Here we go. Deepa asks, how are

unsupervised models improved over time and what are

the metrics you track to measure them?

Dan Shiebler: Great question. They've improved over time in terms of

scale, certainly. But, in terms of our understanding of

them, the development of them, there's many kinds of

really deep unsupervised models of course that have

come a very long way in the face of improved

computation. I think that tracking the performance of

an unsupervised model is something that's extremely

application dependent. If we're training a feature

extractor, then the performance of the model that is

utilizing those features would be the sort of thing that

we would be tracking. If we're tracking something

that's going to be used for visualization, some sort of

clustering or generative model, then it's much trickier.

There's heuristics who might be able to apply, but we

may actually need human evaluation in order to really

effectively compare models.

Kirill Eremenko: Okay. And does that change between unsupervised

and supervised models?


Dan Shiebler: Supervised models tend to have a more built-in

performance metric in that there's a goal in mind,

some sort of prediction goal that we've constructed.

Classification might have how well is this model

actually completing this classification task. But, of

course as I mentioned a moment ago, with task

engineering, this problem is not automatically solved

for supervised models because we have these

situations where this task we're training our model on

is not actually what we're interested in having it do.

Kirill Eremenko: And over time things might change as well. Previously,

in one company, I had the situation where a

classification model was built maybe, I don't know, 18

months before I joined. And everything was great, but

then the population behavior changed, because of I

don't know, the aging of population. And sometimes

behaviors of consumers, especially in retail, change.

And the model was no longer working even though

originally it had that supervised training.

Dan Shiebler: Absolutely. I think that's a problem that many

companies face. That's certainly a problem that we

grapple with.

Kirill Eremenko: And how do you deal with it?

Dan Shiebler: Regular retraining is one of the basic hygiene

techniques that we utilize, but of course, when we're in

situations where our own model is corrupting data

stream, even that alone is not enough. Things like

setting aside certain populations for deploying different

models on different groups of users and trying to avoid


these kinds of self-contamination effects, can go a long

way.

Kirill Eremenko: Got you. Next question is from Linda. What emergent

technology should we be paying attention to and which

industries will they impact the most?

Dan Shiebler: I think that improvements and hardware have really

come a long way in terms of the types of machine

learning models that can be used. The kinds of

applications that we build on top of it. And I think that

one of the reasons why compute hardware, in terms of

things like GPU's and TPU's up until a few years ago

improved CPU's become so important in terms of what

gets built. It's a feedback effect. When a new kind of

hardware is shown to be really powerful for a

particular application, more things get built utilizing

that hardware and for that application, which then

spurs additional research into that kind of hardware.

Dan Shiebler: One of the reasons why machine learning conferences

are so completely swamped right now with super deep

networks, rather than more rule-based or symbolic

kinds of approaches, is that the sorts of hardware that

we have access to, the best most powerful kinds of

hardware, is really well suited for deep networks.

Dan Shiebler: And that's a result of the self-supporting process of

deep networks encouraging more research on these

kinds of hardware, which then encourages more

research and better results from deep networks.

Kirill Eremenko: Got you. Would you agree with, I've seen in the news

recently, or the past half year or so, that Moore's Law

is dead. That we've come to a limit in terms of how


small our integrated circuits can be and how many

transistors can fit on them and that's it. From here,

our exponential amazing benefit that we were getting is

over and now it's all going to flatten out.

Dan Shiebler: In some ways, I definitely think that's true. I can't say

that I'm an expert in transistors or the necessary

limitations on how small we can make them. But I can

say that, improvements in our ability to parallelize

computations and improvements in the construction of

specialized hardware, have allowed us to maintain

exponential growth in terms of the computations we're

capable of. Certainly these effects seem like they have

limits and ceilings that are much lower than the

seemingly unbounded limitations of Moore's Law. But

it's certainly possible, that as innovations continue,

we'll be able to find out new ways to utilize other kinds

of tricks to continue to improve computation. I don't

think that the speed of computation is necessarily

never going to be able to increase with an exponential

rate simply because we can't make transistors smaller

right now.

Kirill Eremenko: I agree. I completely agree. I think we'll find a way. It's

been so good. The next one is from Oscar, who is

asking about some insights into how Twitter is using

machine learning to detect bots or bot accounts or bot

farms. And, what are scalable solutions that are being

implemented for cyber security and or fraudulent

account detection? Anything you can share on that?

Dan Shiebler: I can't talk about specifics on this, also because I don't

work on those teams and so I don't have an intimate

understanding of the specifics. But I will say that


there's a multidisciplinary teams combining machine

learning techniques, heuristics and really rigorous

research and understanding of the sorts of adversaries

in fields and the user behaviors, the diversity of all

kinds of healthy user behaviors as well. That's

understood at not only at an engineering machine

level, but also a very human level to combat these

kinds of issues.

Kirill Eremenko: Okay. Perfect. Next one is from Nikhill who's saying,

how much time is realistically spent on data to get it

ready for model development?

Dan Shiebler: I think it really depends on what state the data is

beginning in and the expectations of the model. Of

course, it's very easy to go to scikit-learn and train on

logistic regression on the Iris data set. There's really

not much data [inaudible 00:50:07] at all. But,

accessing data, for example, if you're a data scientist

who works at the Federal Reserve, it may take you

years to be able to complete all of the necessary

documentation. And track down all of the data and all

of the different places under each of the different

permission walls. And then, process it into a form that

will realistically work. I'd say somewhere between 10

seconds and multiple years, depending on your

application. Realistically, for a more useful answer, I'd

say in general probably at least 80% of modeling time

would be spent on some sort of data related task.

Kirill Eremenko: Yeah, like out of the whole, right? The modeling would

be 20% of your time spent on the whole thing.

Dan Shiebler: Yeah.


Kirill Eremenko: For instance, at Twitter, when you're developing some

new model or something, I assume you already have

some data pipelines prepared. But, if you were to

create a new data pipeline, how long would that take

you?

Dan Shiebler: Even for creating new data pipelines, a lot of our

tooling is very well developed for exactly that purpose.

For the process of creating new data pipelines and for

the process of maintaining the data pipelines that we

already have. I think the most time consuming

problems at Twitter, are really understanding model

behavior and understanding how a new source of data

will allow us to construct better models, and less

about the actual engineering work itself. Or, the

modeling work, both of which are very supported on

tools. It's the decision making and analysis and

understanding that can often take most of the time.

Kirill Eremenko: Isn't that amazing, you don't need to process data.

This is one of the rarest cases in data science where

you just have the luxury of, all right, I'm going to think

about creative stuff all day long. Well, of course,

there's some more mundane tasks I'm sure, but you've

created an environment where it is such that you can

just do the fun stuff all the time. It's so exciting.

Dan Shiebler: I think that it's supported by the really serious

investment of Twitter into making modeling easier and

making modeling more scalable. I will say that there's

of course tradeoffs to having so much of the pipeline

already exist and already be buildable and adaptable

in that when we want to build modeling strategies that

break some of the abstractions that are in place, it can


be very challenging to understand the pipelines that

have been built up over years, by many different

teams. And there's a very real learning curve to the

depth of Twitter's infrastructure and Twitter's

modeling pipelines. That I think can be intimidating

for people who start.

Kirill Eremenko: Was it already in place when you stared two years

ago?

Dan Shiebler: It certainly changed very significantly. But a very

serious amount of this infrastructure was definitely in

place. I remember having difficulty in the beginning

really wrapping my head around the pure scale of

what exists. Very common for me, at the beginning,

was to build 80% of a solution. Only to find that some

other team in London, or Boston had a solution that

was far better than mine. That they'd spent the last

several years on, that really completely obviated the

need for any of my work in first place. Often

understanding what's been done previously in a space,

really at a deep level, and what can be exploited from

the work that's previously done can be more valuable

than trying to write a half-baked solution. Even if it's

can be more fun to write a more half-baked solution.

Kirill Eremenko: Got you. And it's interesting, because from this it

sounds like it's a big investments for and a big bet for

Twitter to bring you, or someone, on board to spend a

few months getting their head around these things.

They're investing their time, their efforts into this new

person that's joining the team. They want to be sure

that you're going to stay for long enough to create

some stuff of your own bring your idea to the team.


Kirill Eremenko: We didn't speak about this, but I'm curious, how did

your interview with Twitter go? Was it very clear at the

start, okay, this is a perfect match? Or you were still

thinking, or they were thinking? How did they know

that you are the right person? It's only a 10 person

team, by adding you to this team, they're going to

bring a lot of value to the company.

Dan Shiebler: Well the team that I'm a part of now, didn't really exist

when I started. But when I started Cortex, the entire

organization was only about 15 people. Like I said, it's

almost 10 times larger now. I don't think that when

they hired me they were thinking about the way things

would be right now, in this position. I think they were

more considering the possible ways that Cortex might

develop and Twitter might develop and how I could

help and fit into these different possible developments.

And I think one of the reasons why Twitter has

managed to remain relevant and be really an

important social network in the world, is because

there's a lot of attention paid to the kinds of people

that we hire.

Kirill Eremenko: Got you. And another question popped to my mind.

The team, Cortex, has grown 10X, from 15 to 150,

you've been there two and a half years. Any thoughts,

is it in your plans to become a data science manager,

or you prefer to do the hands on work and develop

your skills there?

Dan Shiebler: I definitely feel that I've grown significantly as a leader

over the course of my time at Twitter. I've been tech

lead for a number of projects and I'm continuing to

lead various sorts of initiatives. I do think at some


point, perhaps in the not so distant future, I will

switch to management. Because I do really enjoy

leadership and thinking about things from a higher

level. At the moment I'm invested in making the

technical projects that I'm a part of be successful.

Whether that's through direct technical involvement,

mentorship, or leadership on a more macro scale.

Kirill Eremenko: Got you. Okay. Let's do one more question. There's a

lot more. But this one got the most votes, people

actually voted for the questions.

Dan Shiebler: Oh, cool.

Kirill Eremenko: Here we go. This is from Oren. Oren asks how much of

computer science topics, like algorithms and data

structures, does a non-computer science data scientist

need to master in order to advance from a build a

model and present your report type of data scientist to

a machine learning engineer that normally deals with

production processes type of data scientist?

Dan Shiebler: I would say that there's a couple of ways of looking at

that. On one sense, I do think that it's quite possible

to really advance as a serious engineering engineer

without really ever thinking super deeply about some

of the core data structures and algorithms. But I do

think that somebody who does that, is at a

disadvantage, because there are many concepts that

are critical in terms of the structure of different sorts

of systems and the interplay of different kinds of

components. And the elegance of different sorts of

techniques that feel very unified and clear. And easy to

understand when you understand these key topics to


begin with. But can feel more jagged or harder to wrap

your mind around, or harder to have that sort of

solution be your first attempt, if you're coming at

things from learning each fact individually, rather than

really developing an understanding of these kinds of

fundamentals.

Dan Shiebler: That said, I will say that there are situations even

when these fundamentals themselves can help

directly. Not too long ago, I found myself in a situation

where a suffix tree, which is a classic, the intro to data

structures and algorithms data structure, was exactly

what I needed in order to build a feature importance

algorithm that would run efficiently. And implementing

it at, yielded an 10X speed up, over the next best

solution. And I certainly never would have come to

that had I not taken a data science and algorithms

class back in the day. But the fact that this is a single

anecdote from six months ago, and I certainly can't

think of another one in the past year, I think probably

says that the knowledge itself is not incredibly

important.

Kirill Eremenko: Focusing on fundamentals and structures of systems,

you gave that one example of the suffix tree, which I'd

be curious to learn more about, but I'll do that in my

own time. What's another example? Not like of an

application of a system, but how thinking about the

fundamentals can help somebody advance their

career?

Dan Shiebler: There's a lot of times when the construction of a

system can take different roles in terms of its

interaction with different interfaces. There's a degree of


abstraction that comes in, in the creation of software

systems. The assembling of pipelines that deal with

different sorts of data sources, different kinds of

modeling infrastructure. The different ways that we

can structure the sorts of software pipelines that touch

on each of these different kinds of systems. When

they're well-structured in ways that make bugs

difficult to introduce, make systems easy to adapt and

add to and redesign, this can yield enormous

improvements in model quality and pipeline quality

over time. Especially when operating as part of a team.

I think that one of the largest applications is in the

construction of data generation pipelines, and the

model training code as it interfaces with these

pipelines, and having those constructed in a principled

way is really valuable.

Kirill Eremenko: Okay. In a nutshell, the answer would be, rather than

going for quantity of topics in computer science, go for

the fundamentals and structure of systems. And think

things through holistically. Then the follow-up

question I would have is, how does one go about

learning this kind of stuff? Do you have any books you

can recommend, or sources online? Just even specific

topics to look into for somebody who's serious to follow

this pathway, but just doesn't know where to get

started.

Dan Shiebler: Yeah, absolutely. I do think that there's value in going

through core algorithms, data structures textbooks,

for the purpose of understanding these concepts. I

personally like Algorithms by Dasgupta for that. But I


would say that would be more of a second order

strategy.

Dan Shiebler: I think that the first order strategy in terms of the

fastest way to really develop this intuition on a deep

level, is to simply be part of large software projects. For

somebody working at home, this would mean

contributing to open-source projects that ideally in a

way that you would be able to get feedback on the code

that you write through code reviews. Or, through a

community of people who are contributing to a large

project, or for somebody who's working as a data

scientist in a company, trying to get an understanding

of the kinds of systems that software engineers are

working on. And if you could even be part of one of

those projects for a little while and understand these

things from the perspective of the software engineers,

who write code that gets reviewed by multiple people.

And is part of really large, complex, multi-tenants

infrastructures. And the kinds of concerns involved

there, there's really no better way to learn these sorts

of issues than by simply working on them on a day to

day basis.

Kirill Eremenko: And if you're stuck at home, you don't have access to

something like this at work, or you're still learning and

things like that, you can just go to Github, open a

recent development in machine learning or deep

learning, whatever you're interested in and read

through how it developed. What is version one, what is

version two, what was fixed, what was changed, what

bugs came up, what bugs were removed. What

features were added, what were the user complaints


and so on. And just by doing that you can understand

better the intuition, as Dan here pointed out, the

intuition that went into all this. And the motives that

were driving these changes.

Dan Shiebler: Yeah, absolutely. As you become more comfortable,

being part of it and contributing it yourself and feeling

the pain of these bugs, I think is a really exceptional

way to grow.

Kirill Eremenko: Mm-hmm (affirmative). Got you. Well on that note,

we're coming close to the end of this podcast, been

super exciting. How did you enjoy your second

appearance on this show so far?

Dan Shiebler: It was great. Excellent. Lots of fun.

Kirill Eremenko: Great. I loved chatting to you. Great insights. Any

parting thoughts? Any things you'd like to wish our

audience on their way to becoming machine learning

engineers and data scientists?

Dan Shiebler: I would say to really keep your mind open with respect

to learning things. That it could be very easy to fall

into the trap of only reading about the very latest,

highest scoring on benchmarks sorts of architectures

and really focusing on that. The really deep

understanding of how machine learning got to where it

is, understanding what was machine learning like in

1990? What were the people then thinking? I think

going at things from a temporal perspective is an

excellent way to develop the kinds of intuitions that

makes somebody an exceptional machine learning

engineer and machine learning researcher. I would


encourage people to really think about how to develop

that understanding as deep as possible.

Kirill Eremenko: Fantastic. Great advice. Well on that note, Dan, what

are the best ways for our listeners to get in touch with

you, or follow you, contact you? Just see how your

career develops from here.

Dan Shiebler: My LinkedIn, Dan Shiebler, works. Also my email, if

anyone has any questions for me. I'm happy to answer

it, just [email protected] or

[email protected], if it's Twitter related.

Kirill Eremenko: Mm-hmm (affirmative). Got you. Fantastic. Well, once

again, thanks so much. And you mentioned one book,

and before I let you go. I wanted to see, do you have

any other books that you can recommend that have

impacted your career personally?

Dan Shiebler: Absolutely. I have two books actually that I'll

recommend. The first one is something I read very

early on. It was probably the first actual textbook that

had anything to do with programing and it's Coding

the Matrix by Philip Klein. And it's actually a book on

linear algebra, and I'd recommend it for somebody who

is either a data analyst or a software engineer who

doesn't necessarily feel that they're super comfortable

with linear algebra. Because the ideas introduced in

this book, there's many of them that ended up being

really pivotal in my understanding of machine

learning. And I think it's just written from a great

perspective of somebody who wants to understand

how each of these different algorithms, that deal with

matrices and deal with vectors, play together in a way


that makes sense to someone who's used to

programing.

Kirill Eremenko: Got you. It's interesting, Klein, for a second I thought

it was the Klein that developed that abstract

mathematical concept. What was it called? The bottle

of Klein, or something like that, but obviously not.

Probably not, it's a more recent guy.

Dan Shiebler: It is. He is a little bit more recent. But he's also a very

abstract mathematician, who does some very

interesting abstract research on graph theory.

Kirill Eremenko: All right. And then the second book?

Dan Shiebler: The second book is something I read more recently. It's

An Introduction to Computational Learning Theory by

Michael Kearns. This is a definitely far less applied

book. And not necessarily that I'd recommend to

someone who's looking for a book that will immediately

change their career. But it's written from the

perspective of the state of the art of machine learning

and the theory behind machine learning in 1994. And

it introduces a lot of fundamental ideas, some of which

have really gone on to take off. And some of which

were largely forgotten, but understanding things from

that perspective and in a theoretical framework that's

discussed and it, I think, has given me a lot of context

in learning new things about machine learning. And

understanding which ideas last and which ideas end

up disappearing.

Kirill Eremenko: Fantastic. Exactly what you mentioned before.

Dan Shiebler: Yeah.


Kirill Eremenko: Study the history of something. Yeah. Very cool. It's

interesting you mentioned it, because in the

FiveMinuteFriday episodes that I do in the podcast,

literally as this episode is going to go live, there's going

to be five episodes about the history of data science. It

doesn't go into the details of algorithms and things like

that, but historically, how the field of data science has

been progressing. Because I was also curious, I had

the same thought. In fact, actually, the team suggested

this. And I was like, wow, this is a really cool idea.

Knowing the history of something allows you to

understand better, what the future will be like.

Dan Shiebler: Absolutely agree.

Kirill Eremenko: Yeah at least the fundamentals, right?

Dan Shiebler: Yeah. I totally agree. That sounds great.

Kirill Eremenko: Awesome. Well, once again Dan, thanks so much for

coming on the show. Looking forward to seeing you at

DataScienceGO 2020. Can't wait for your talk, it's

going to be epic.

Dan Shiebler: Absolutely. Looking forward to it. Thank you.

Kirill Eremenko: So there you have it everybody, that was Dan Shiebler,

senior machine learning engineer at Twitter Cortex.

What was your favorite part of the discussion? For me

it was definitely the whole talk about Dan's PhD, this

whole conversation about theoretic machine learning

and algebraic topology brought back memories rushing

from my university years. So it was really good fun

listening to that, but I'm sure you had your personal

favorite of this talk. If you would like to meet Dan in


person and be part of that advanced practitioner

workshop exclusive track for advanced practitioners,

make sure to secure your seat today. Head on over to

datasciencego.com, click the option for Los Angeles, he

will be there. So we are running in two cities this year,

in Berlin and Los Angeles. You want the Los Angeles

option for 23rd, 24th, 25th October. Get your ticket

today and you'll be part of that advanced practitioning

group, you'll learn from Dan in a hands-on workshop,

personally from him. So once again, the website is

datasciencego.com.

Kirill Eremenko: And as usual, you can get all of the show notes and

materials mentioned in this episode at

superdatascience.com/345. You'll get the transcript

there plus any links, materials we mentioned,

including link or the URL to Dan's LinkedIn where you

can connect with him and follow him at any other

places on social media where you can catch up and

follow him there as well. So, that is at

superdatascience.com/345. That's also how you can

share this episode with your friends and colleagues.

Just send them the link superdatascience.com/345 so

they can get up to speed about with all the amazing

topics we talked about today, including vectors,

embedding nearest neighbors, different techniques and

methodologies Dan uses in his work plus how to think

about your career and why to maybe even do a PhD in

parallel.

Kirill Eremenko: So there we go, hope you enjoyed this episode. Can't

wait to see you on the next one and until then my

friends, happy analyzing.


sds podcast episode 345: machine learning at twitter · 2020-03-04 · kirill eremenko: and now, on...

Documents