sds podcast episode 345: machine learning at twitter · 2020-03-04 · kirill eremenko: and now, on...
TRANSCRIPT
Kirill Eremenko: This is episode number 345, with Senior Machine
Learning Engineer at Twitter Cortex, Dan Shiebler.
Kirill Eremenko: Welcome to the SuperDataScience podcast. My name
is Kirill Eremenko, Data Science Coach and Lifestyle
Entrepreneur. And each week we bring you inspiring
people and ideas to help you build your successful
career in Data Science. Thanks for being here today.
And now, let's make the complex, simple.
Kirill Eremenko: Welcome back to the SuperDataScience podcast ladies
and gentlemen and everybody. I'm super pumped for
today's episode. But before we proceed, I've got a
question for you: Are you an advanced data scientist?
Have you been listening to this podcast for a while?
Well, maybe it's time to come meet in person. Do you
know that at DataScienceGO in the USA in October
23, 24, 25, in Los Angeles, which we are running in
the UCLA Campus, we're going to have a special track,
a unique track dedicated to advanced practitioners in
data science?
Kirill Eremenko: If you come to this track, you will hear from people like
Dan Shiebler, who you'll meet on today's podcast and
who works at Twitter Cortex, that'll already be an
exciting thing and he'll be talking about task
engineering. Dan has presented at DataScienceGO
twice already, this will be his third, if you're , this time
he will be focused on advanced practitioners. So you'll
learn about task engineering. And what is that? Well,
that's a very interesting thing, you'll hear about it
more on this podcast. Basically it's you designing and
understanding what you will do with your model, what
happens to your model when it actually affects the
underlying population which is used to train that
model. Very interesting, kind of like an inception type
of scenario but it happens and it happens in Twitter
Cortex. You will find out what you can do, a very
hands on workshop for advanced practitioners.
Kirill Eremenko: Also, Morgan Mendis will be flying in all the way from
Haiti, to talk about Airflow and how to build data
science pipelines. Sinan Ozdemir will be coming to talk
about Flask, Django, Docker containers and
Kubernetes, or, we're still deciding which topic, or it
will be BERT, Transformers and Architecture and use
cases for NLP and BERT. So as you can see already
just from these three examples, I'm personally hand-
picking the advanced practitioners to come talk with
you. So if you are interested to sky rocket your data
science career as an advanced practitioner, head on
over to datasciencego.com and get your ticket today.
Kirill Eremenko: And now, on to today's episode. So the guest for
today's episode is Dan Shiebler. It's been over two
years since he came on the podcast last time in June
2017. Since then, a lot has changed in his career. He's
left TrueMotion and now he works at Twitter Cortex
and there is going to be a lot of interesting components
to this podcast both on the technical side and the
career side. So definitely you'll learn about Dan's
career, how he makes his choices of moving and select
the companies he works for and why; why he's, in
parallel to his career, doing PhD research, how that
affects the lens through which he sees things in his
career. Very interesting conversation, and his PhD,
btw is on category theoretic machine learning. Boom.
Blows your mind. Blew my mind for sure.
Kirill Eremenko: You'll learn about his work at Twitter Cortex, in fact,
you'll learn about it quite intimately, so the things of
course we could discuss. You'll learn about vectors,
embedding nearest neighbors techniques and
methodologies that he uses in his work. So in a
nutshell a podcast packed with value. Can't wait for
you to hear it, so without a further ado, let's welcome
to the show Dan Shiebler, Senior Machine Learning
Engineer at Twitter Cortex.
Kirill Eremenko: Welcome back everybody to the SuperDataScience
podcast, very excited to have you on the show,
because today we've got a very special guest, returning
for the second time, Dan Shiebler calling in from New
York.
Kirill Eremenko: Dan, how you going today?
Dan Shiebler: I'm doing great Kirill. How are you?
Kirill Eremenko: Very good as well. And it's crazy, like you said just
now, that there was no snow in New York the whole
winter.
Dan Shiebler: There's a week when I was away and I heard that there
was snow during that week, but I haven't seen any at
all the past few months.
Kirill Eremenko: That's insane. Hopefully things get better. This global
warming thing is getting quite out of hand, it was
really hot in Australia the whole summer. Strange.
Anyway Dan, it's been a while. I looked, our previous
podcast was June 2017, that's two and a half years
ago. And last time we caught up in person was at
DataScienceGO 2018, which was one and a half years
ago. How have you been since then?
Dan Shiebler: Doing great. Lots of really interesting projects I've been
working on, but great things happening.
Kirill Eremenko: And you changed jobs, congrats. You've been at
Twitter, so you were at TrueMotion before, now you're
at Twitter Cortex, you've been there for over two years.
That's awesome, congratulations man.
Dan Shiebler: Thank you. It's been great.
Kirill Eremenko: What's the best thing about Twitter?
Dan Shiebler: Well, it's a really fast-growing platform, but it also has
really established niche. There's a lot of people who
consider Twitter their favorite social network. It
balances real high quality information and
lighthearted commentary. I think that it's in a real
sweet spot as far as social networks go. I'm really
proud to be working here.
Kirill Eremenko: And it's interesting that it's managed to stay that way
for what is it? A decade now, or more. I remember I
was at university and we had one lecturer, this was
around 2012, or maybe 2011, we had one lecturer, he
was teaching about an unrelated topic, was about
finance. But he was talking about how, looking at
some theories of how things change in the world. And
how there are cycles of certain phenomena. And he
was giving an example of social media, saying,
Facebook Twitter. I remember his comments like
Twitter specifically. He said that Twitter is not going to
be around in four years, that was the predication for
2015. Because, simply, naturally, we see that with a
lot of things, with a lot of other platforms that used to
be popular. Facebook was very popular back in the
day, but now there's so many alternatives. I haven't
been on Facebook for years now.
Kirill Eremenko: But, it's very impressive that Twitter has managed to
adapt and stay afloat. What are your comments on
that? Why do you think it's so successful in that way?
Dan Shiebler: I think we have a good understanding of our user
base. And we have a willingness to change, but also a
really deep understanding of what are the things that
made Twitter popular in the first place. I think
knowing our strengths and knowing our users, gives
us a real advantage.
Kirill Eremenko: Yeah. Okay. What is, you'd say, is a way that Twitter
has changed, even since you've joined. You've been
there what, just over two years. What's a way you've
seen Twitter adapt to the changes in user needs over
that time?
Dan Shiebler: We recently launched a topics product, that allows
users to follow individual topics, rather than just
following accounts. While following accounts is an
excellent way to consume information for many Twitter
users, some users have difficulty finding all of the
information that they want on topics that they're
interested in. And topics is designed to better serve
those users. I think this initiative and this product and
family of products, really came out of an
understanding of what are the challenges that some
users have with the platform. And how can we better
serve the users who we're not serving as well right
now. And bring them to see all the things that are
great about Twitter.
Kirill Eremenko: Mm-hmm (affirmative). Fantastic. That's a really cool
thing. I use Google Alerts, I don't use them that often,
but I've set up a Google Alert for data science. And
once a week I get an update on the most trending data
science topics from Google. So, something like that,
right?
Dan Shiebler: Yeah. It's very similar. We're still, every day, trying to
make it better. And really understand how we can best
serve users on the product service.
Kirill Eremenko: Okay. Got you. And apart from Twitter, which we'll
definitely get back to, you've got so much going on in
your life. When I looked at your LinkedIn, it was so
exciting to see. You finished up that research at Brown
University on deep learning, but you seem to never
allow yourself to slack off or get bored. You started a
PhD, that's is so cool man. That's awesome.
Dan Shiebler: Thank you.
Kirill Eremenko: Why'd you decide to start one?
Dan Shiebler: I really enjoyed my research at Brown. I think that we
really got a lot of high quality research and it was very
exciting to me to have a pursuit, separate from my
work, that allowed me to see things from a more
academic mindset. And with more academic
incentives. And I feel like that has really helped me
grow. And doing this PhD is really something that lets
me continue to do that at a more formal, and more
intense pace. I'm very excited about it, really enjoy it.
Kirill Eremenko: What's the topic of your PhD? Well, I see it on
LinkedIn, but if you can share for our listeners.
Dan Shiebler: The topic is on category theory and machine learning.
It's a really defining category theoretic constructions
for discussing and researching and understanding the
links between different kinds of machine learning. And
different kinds of fields that are closely related to
machine learning. Category theory is a branch of
mathematics that has shown a lot of applications in
unifying previously disparate areas of mathematics.
And there's been a good push recently in applied
category theory in taking category theoretic ideas and
trying to apply the same kind of unification powers to
more applied fields. Like a game theory, or biology, or
physics. There's been some really great research on
quantum physics, that's come from category theoretic
perspectives. And I'm trying to utilize these same tools
to increase our understanding of machine learning.
Kirill Eremenko: Wow, really cool. What's an example you can give of
unification in mathematics through categories theory?
Dan Shiebler: There's some aspects of algebraic typology that were
previously separate from similar concepts in analysis.
And similar concepts in SEP theory that when we take
a category theoretic standpoint and zoom out, and
look at these things at a higher level of abstraction, we
can see these individual constructions as particular
instantiations of a higher order structure. It's allows
us to say, oh, these different kinds of transformations,
they all satisfy these key properties that make it this
particular kind of category theoretic transformation.
Dan Shiebler: And then, when we're operating at that level of
abstraction, we can simultaneously prove theorems
about each of these different subareas of mathematics.
We're talking about things at this higher level. And
let's just get many theorems for free, is one of the key
tenets of category theory. That's similar to how a
programmer might structure their programs so that
core components only need to be implemented once,
rather than multiple times. Category theory allows us
to have that same degree of abstraction on multiple
fields of mathematics, or ideally, applied fields as well.
Kirill Eremenko: Wow. Fantastic. I love your tagline, theorems for free.
You should put that in your subtitle on LinkedIn or
something.
Dan Shiebler: I'll consider it. I didn't come up with it, but it does do a
good job of describing what it is.
Kirill Eremenko: When you were talking about that, a few light bulbs
went off in my head, because it's been while, but in my
Bachelor Degree, I had algebraic typology back in my
high school. And I had SEP theory in my degree. And I
remember feeling these two are very similar, this SEP
theory stuff. I've seen it somewhere, it really looks a lot
like what we did in high school and stuff like that.
Again, it's been a while, so I wouldn't be able to come
up with examples, but I can see the value in that.
Kirill Eremenko: And that's a really cool, very abstract though, thing to
be doing. I'm just curious, it's not really linked to your
work at Twitter, is it?
Dan Shiebler: There are links, but the links I would say are at a very
high level, birds-eye view. And I would say the fact that
the day-to-day work is very different, was largely by
design. It's very challenging to have a full day at a job,
even a job that you love, and then go home and do lots
of other work that feels very similar, with similar
frustrations and similar problems.
Dan Shiebler: One of the benefits of focusing my PhD research on
something so different, and so much more theoretical
from the concerns that I focus on at work, is that
when I do, one, it doesn't exhaust me for the other.
Each one allows me to work a different part of my
mind, that will allow other parts to rest.
Dan Shiebler: That really is a nice way to balance things in a more
holistic fashion.
Kirill Eremenko: Love it. And that's similar to what you did back when
you were at TrueMotion and you were doing the
research in Brown University. At TrueMotion you were
doing machine learning, but at Brown University you
were doing research on deep learning, as far as I
remember.
Dan Shiebler: Yes. My research at Brown was far more focused on
standard deep learning for image modeling, whereas
my work at TrueMotion was much more focused on a
signal processing perspective on machine learning.
And machine learning for signal processing
applications. It was very similar in that the two main
pushes had overlap, but felt very different.
Kirill Eremenko: And it's not like you have to do this research. It feels
like you're doing it more as a hobby. Is there any other
motive for doing research? Or maybe, somebody
listening to this podcast might think, oh, wow, maybe I
should get into research too. What's the reason to
continuously doing research? First at Brown, now
you're doing your PhD at Oxford. Any comments on
that?
Dan Shiebler: I think there's a lot of reasons. I would say that for me,
I genuinely enjoy it. I enjoy it at a very deep level. And
I don't think it would be the right thing for someone to
do, who didn't really genuinely enjoy it. But I think
that there are a lot of significant benefits beyond just
my own enjoyment. Really giving myself the
opportunity to look at research from the perspective of
a researcher, as opposed to perspective of a
practitioner, allows me to see ideas on a deeper level
than just, is this the right tool for me to use for this
job right now? And more to think about things in
terms of their deeper applications and other work that
they might open up, other avenues of exploration.
Dan Shiebler: That kind of perspective, I think, it makes me smarter,
it makes me more creative and it really gives me the
ability to learn new things much more easily. It's far
easier for me to pick up ideas on some other team at
Twitter who I haven't worked with. Really understand
what they're doing. And understand the kinds of
problems that they face, because I've drilled myself to
be able to understand really complex topics, really
quickly, through my research experience.
Kirill Eremenko: Wow. Very distinct explanation. There you go. If
somebody listening to this is, I agree, is passionate
about a topic, maybe consider research in that space-
Dan Shiebler: Absolutely.
Kirill Eremenko: It has its advantages like that. Let's get to your work
on Twitter. I read on LinkedIn the description, how you
describe your role, very cool description. You develop
systems and models to improve the performance and
efficiency of machine learning at Twitter. It's like you're
doing machine learning on machine learning, almost.
Tell us about that.
Dan Shiebler: I think in order to really give it context, I can define
the role of Cortex in general, then how I fit into that.
Twitter's engineering is split across a large number of
product teams, serving different kinds of product
services. The timeline, the advertisement products, the
notifications products, or email products. Cortex is an
organization within Twitter that develops machine
learning systems and components of machine learning
models that are incorporated into the modeling
pipelines, each of these different things.
Dan Shiebler: And our work is at a level of abstraction, similar to the
notion of category theory, where we are developing
things that fit into multiple different products. Often
what we will do is, we will assemble a couple of
different product services that seem like they can be
solved with a particular modeling approach. They seem
to share similar restrictions on their current
performance. And identify different ways that we can
develop models that would serve each of these product
surfaces. My team in particular focuses on models that
utilize embeddings and nearest neighbors to serve
products where we need to match users or other
things, mainly users with large sets of possible
candidate content. Like a large set of Tweets, or large
set of potential notifications.
Kirill Eremenko: Interesting. So embeddings and nearest neighbors,
let’s talk about nearest neighbors for a second. For
nearest neighbors, or for any kind of categorical
machine learning, I would expect you need a range of
deals or a range of columns to be working with. What
kind of columns are you working with at Twitter?
Because, there's mostly just the Tweets that people
have.
Dan Shiebler: In this case, if we're serving nearest neighbors, what
we would be doing is first, the nearest neighbors are
defined over an embedding space. The columns in this
case are the embedding dimensions.
Kirill Eremenko: What is embedding? Sorry, can you get me up to speed
please, what is an embedding space?
Dan Shiebler: An embedding in this cart is just a vector
representation of some kind of entity. For instance, it
could be a 300 dimensional vector, or 1000
dimensional vector, that represents a user, or a Tweet.
The embedding plus nearest neighbors approach, for
recommending content, involves constructing
embeddings for users and constructing embeddings for
content. Such that, the distance, or angle between two
embeddings is indicative of some notion of affinity, or
similarity where users will be close in embedding
space to Tweets that they might like. We can utilize
this to then create these users and all of these Tweets
and then find nearest neighbors in the space to
recommend content.
Kirill Eremenko: Got you. And then other teams can use the
embeddings you've created to run their machine
learning.
Dan Shiebler: Indeed. They could use the embeddings we create as
features, or they can use the pipelines that we build to
create the embeddings in the first place to create new
embeddings that are optimized for their surface. And
sometimes these new embeddings are constructed on
top of other embeddings and everything will feed into
each other.
Kirill Eremenko: Wow, that's so crazy. If you're able to share, because
I'm sure there's parts which you can’t disclose due to
proprietary information, but if you're able to share, do
the embeddings that you create for users that are
indicative of... If two of these vectors, 1000 dimension
vectors are close, have a very little angle between each
other, then that means the users are close in their
behavior or in their characteristics. If two vectors for
content are close, that means maybe somebody who
will like this content, will like that content as well. I
can see the implications of that. What goes into the
embeddings in the first place? And going back to the
question of what are the original features? Apart from
their Twitter text, the messages they send or people
they follow, there's no already transactional data that
this person on Netflix, or Amazon, that this person
purchased these items. Basically, these are their
specific interests. Is it all to do with NLP? I'm just
curious. What goes into the embeddings in the first
place?
Dan Shiebler: Great question. To start, I would say that NLP while
very important for many use cases, for the purpose of
recommendation, is actually a very myopic view on the
structure of a Tweet. And the reason for this is that
Jack Dorsey Tweeting a single-word Tweet, like hello,
or something like that, had very different set of users
who I might want to recommend that to than if I
Tweeted hello, or something like that.
Dan Shiebler: Often the most useful information that when we can
look at a Tweet, from the Tweet perspective can be the
people who have interacted with the Tweet. And the
dynamics of the author of the Tweet. There's a huge
amount of information on Twitter that's represented in
terms of the engagement graph and the follow graph.
The follow graph is just the relationship between all of
the users, based on who follows who. The engagement
graph is the relationship between users and Tweets, as
well as users and users, based on users choosing to
like Tweets.
Dan Shiebler: Users choosing to like, or reply to, or retweet Tweets.
And these kinds of behaviors incorporate an enormous
amount of information. A Tweet that has 100 likes, all
from machine learning focused people, really gives a
very strong indication that the Tweet is about machine
learning. And we can drill down very deeply into
content utilizing this kind of social or contextual
information, we often refer to as [inaudible 00:26:38]
to collaborative filtering.
Kirill Eremenko: Wow. Okay. Got you. Basically, you can even extract
information that this Tweet is about machine learning,
based on the likes it's getting, from whom it's getting
those likes, without even digging into the processing of
the text within that Tweet.
Dan Shiebler: Yeah. There are of course limitations to this.
Sometimes a Tweet might be about a social issue that
is popular among machine learning researchers. Or, a
personal issue related to a popular personality
machine learning, may end up with a very similar like
profile to one that is about machine learning at a more
core level. But often for recommendation use cases,
understanding which communities of people are
interested in something and representing something in
terms of that perspective, can be the most rich way to
get the kinds of information that the model needs to
know. There is a lot more information that can be
driven out of the Tweet text itself and we do of course
extract this and utilize this. But, in general, if you had
to choose between just the collaborative information,
or just the Tweet text information, the collaborative
information would win without a doubt.
Kirill Eremenko: Oh, very interesting. And what kind of tools do you use
for this?
Dan Shiebler: We have stacks instructed and Scala and Python at
the language level. Our modeling is almost entirely
done in TensorFlow, from the perspective of all neural
networks and such. We do have a number of in-house
matrix factorizations, style tooling that's written in
Hadoop or Spark that's used for some applications. We
do have a very big deployments that uses a piece of
software that we open-sourced, called Scalding, which
is a scholar-based Hadoop tooling. That works quite
well for constructing really large Hadoop jobs that can
operate the Twitter scale.
Kirill Eremenko: Okay. Is that a good description of what a machine
learning engineer does, is that you prepare machine
learning tools for other people and departments to
use?
Dan Shiebler: I would not say that it's just a construction of tools.
Machine learning engineers at Twitter fall into a couple
of different categories. Myself, I would not say that my
job would described in that way. My work is more
around the construction of instantiations of the
embedding pipeline. My team often partners with
product teams. And we have our own set of tooling and
our own set of systems that are really designed for
constructing these kinds of embedding nearest
neighbor pipelines. We will actually construct the
models and help other teams construct these models
in a more consultancy fashion.
Dan Shiebler: But there are engineers within Cortex whose role is to
create the deep learning model deployments or the
platform tooling for analyzing data or scheduling
model reruns. There's a spectrum of these more core
engineering tasks and more direct modeling and
machine learning model creation tasks.
Kirill Eremenko: Mm, okay. It's like a variety of things. Got you. And
you mentioned team a couple of times, how big is your
team? And which part of the team are you working in?
Dan Shiebler: Cortex as a whole, is about 150 people, of which, my
team is in the sub-organization focused largely on
platform. And our team is about 10 people. My role is
really more focused on model construction and
understanding of the relationship between models and
business value. My team has some people who are
more focused on the optimization of our nearest
neighbor pipelines, which are highly optimized and
state of the art. And some people, who are more
focused on the core software development as well.
Kirill Eremenko: Got you. When you say nearest neighbor, does that
mean you went and consciously selected for your
characterization, the nearest neighbor algorithm or, is
that just a broad way of saying we're finding the
nearest neighbors? Because there's other methods of
clustering that could be used to group users into
groups. Or finding, as you said, doing this
collaborative filtering. Just a question around that.
Dan Shiebler: We actually use an approximate nearest neighbors
system. And the reason why we selected that, is based
on scale. The reason is because we're not simply
grouping users together, but we're trying to find the
nearest content for each user. If we're in a situation
where we have 300 million users and half a billion
Tweets in a particular day, when we're trying to match
for each user, the best Tweets. Exhaustively looking at
each user-Tweet pair is completely not scalable. 300
million times 500 million operations and many
standard strategies would require utilizing something
like that. Our approximate nearest neighbor systems
allow us to dramatically optimize this, by constructing
these graphs of Tweets that the user's embedding
essentially traverses. That's a whole topic that's very
interesting, the construction and optimization of these
algorithms for really allowing the user to content
pairing process to scale.
Dan Shiebler: There are other solutions of course, that can solve the
same thing, but clustering is one that you mentioned.
This one is nice because of its connection to the
amount of flexibility that we have in the construction
of the embedding. The embedding itself can be
constructed such that the distance relationships are a
model output. And any kind of machine learning
technique can be utilized there. And deployed,
essentially, at scale for free and that's a very attractive
aspect of that family [crosstalk 00:34:32].
Kirill Eremenko: Wow, very cool. I was just thinking, when you gave the
example of 300 million times 500 million operations,
do you think if quantum computing picks up, that
you'd be able to solve it completely differently? Just
look at all the pairs and find the-
Dan Shiebler: I think there's all kinds of optimizations that we do
right now, that would be unnecessary if we had access
to quantum computers. And in the deployment of
machine learning models, certainly, but in the training
as well. There's many things we're not able to do
because of scale restrictions in terms of data
collection, pipelines and such, that would be
completely overhauled in the presence of really
effective quantum computers.
Kirill Eremenko: That's very cool. Have you looked into quantum
computers quite a bit?
Dan Shiebler: Not as much as I'd like. There's a research group at
Oxford that does a lot of interesting research in the
intersection of category theory and quantum
computers. Utilizing category theory to make some
quantum computing ideas much simpler and easier to
build on top of. But, I can't say that I'm familiar with it
more than on a surface level.
Kirill Eremenko: Okay. Wow. It's a very exciting topic and I can't wait to
see what happens when the quantum computers
come. Thanks for such an interesting description of
your role at Twitter. It's very exciting and I can see
how you're super pumped to go to work every day.
[crosstalk 00:36:23] and come back and do you
research. That is really cool.
Kirill Eremenko: I wanted to shift gears a little bit. For our listeners, we
have an exciting announcement. Dan was at
DataScienceGO 2018 and you are coming back this
year, very excited to have you back. How are you
feeling about that?
Dan Shiebler: I'm feeling great, looking forward to it.
Kirill Eremenko: And as we discussed, this time, we're considering
aiming for a more advanced practitioner talk for Dan.
Out of all the things that you've talked about, if you
had to pick a topic for your talk right now, what's the
first thing that comes to mind? In a hands on type of
workshop, what is it that you would be passionate to
share with the audience?
Dan Shiebler: Absolutely task engineering. The process of creating a
machine learning task where a model that does well on
that task, will actually drive business value. The
creation of models that can tie closely to core value, I
think is something that is a real science that I've
continued to learn about. And I think is one of the
most important areas in machine learning and data
science for people to understand at a deep level.
Kirill Eremenko: Interesting. Can you give an example? What's an
example of task engineering in business?
Dan Shiebler: At Twitter, for example, when we train our models on
Tweets or users, or any sort of data, we need to be very
careful about how the models that we deploy affect the
data that we're training on. A model that's already
trying to show users content that it thinks they'll like
is corrupting the quality of the training data that feeds
back into the model, in that the distribution is
shifting. The task that we are constructing for the
model, when we retrain it, is now worse than it was
originally. The awareness of these kinds of issues, and
the construction of the model task and the pipelines
that support it in a way such that increased model
performance will continue to increase business metrics
is a really deep science that has enormous
applications.
Kirill Eremenko: Wow, that's very cool. That's a very good description,
because the data you're dealing with, you're dealing
with users. And now I think about it, that would be
applied across most business cases. The only
situations where that wouldn't be relevant is when, for
instance the data you're analyzing is the national
cohort, or the global cohort. Like a massive sample of
people, you're analyzing census data or even daily
data, but in a much greater ecosystem than your own
company. And then you're applying it to your
company. Then, what happens in your user base, with
your company, doesn't really effect the world that
much. But, for instance, an example, if you're
analyzing stock prices and then you go and buy some
Tesla stock or you sell something else, Apple stock,
that's not going to effect the world. You can keep
analyzing the same way you were analyzing before. But
in your case, you're directly impacting the whole user
base with your model.
Dan Shiebler: Absolutely. These kinds of decisions and how these
decisions can be the difference in user consumption, is
really critical. Things like if we start sending bad
notifications to users, and users start opting out of the
notifications, then we're in a situation where we no
longer have a data coming from users who really didn't
like their notifications. And a model that now starts
performing well on this new setting and this new
world, where we don't have this data from the users
who didn't like notifications, is not actually the best
model. And the understanding of that, as we construct
a task, such that the best model on that task is
actually the best model for deployment, is really
critical.
Kirill Eremenko: Wow, that's such a cool teaser. I want to come to this
workshop now, this is exciting. Awesome. Thank you
very much. This is going to pique people's interest in
the event and also specifically your talk. If you want to
learn about task engineering, check out Dan's talk at
DataScienceGO 2020, 23rd, 24th, 25th October.
Kirill Eremenko: And now, I wanted to jump into something really cool.
I'm not sure if you saw, but 24 hours ago, I posted a
question on LinkedIn, I was about to say Twitter, on
LinkedIn. That's where I hang out more for some
reason, it's happened that way. And I posted a
question for our followers or our audience to post
questions for you and see what they want to ask you
on the podcast. We're going to go through these.
Kirill Eremenko: Are you ready for some rapid fire questions?
Dan Shiebler: All right. See what I can do.
Kirill Eremenko: All right. Here we go. Deepa asks, how are
unsupervised models improved over time and what are
the metrics you track to measure them?
Dan Shiebler: Great question. They've improved over time in terms of
scale, certainly. But, in terms of our understanding of
them, the development of them, there's many kinds of
really deep unsupervised models of course that have
come a very long way in the face of improved
computation. I think that tracking the performance of
an unsupervised model is something that's extremely
application dependent. If we're training a feature
extractor, then the performance of the model that is
utilizing those features would be the sort of thing that
we would be tracking. If we're tracking something
that's going to be used for visualization, some sort of
clustering or generative model, then it's much trickier.
There's heuristics who might be able to apply, but we
may actually need human evaluation in order to really
effectively compare models.
Kirill Eremenko: Okay. And does that change between unsupervised
and supervised models?
Dan Shiebler: Supervised models tend to have a more built-in
performance metric in that there's a goal in mind,
some sort of prediction goal that we've constructed.
Classification might have how well is this model
actually completing this classification task. But, of
course as I mentioned a moment ago, with task
engineering, this problem is not automatically solved
for supervised models because we have these
situations where this task we're training our model on
is not actually what we're interested in having it do.
Kirill Eremenko: And over time things might change as well. Previously,
in one company, I had the situation where a
classification model was built maybe, I don't know, 18
months before I joined. And everything was great, but
then the population behavior changed, because of I
don't know, the aging of population. And sometimes
behaviors of consumers, especially in retail, change.
And the model was no longer working even though
originally it had that supervised training.
Dan Shiebler: Absolutely. I think that's a problem that many
companies face. That's certainly a problem that we
grapple with.
Kirill Eremenko: And how do you deal with it?
Dan Shiebler: Regular retraining is one of the basic hygiene
techniques that we utilize, but of course, when we're in
situations where our own model is corrupting data
stream, even that alone is not enough. Things like
setting aside certain populations for deploying different
models on different groups of users and trying to avoid
these kinds of self-contamination effects, can go a long
way.
Kirill Eremenko: Got you. Next question is from Linda. What emergent
technology should we be paying attention to and which
industries will they impact the most?
Dan Shiebler: I think that improvements and hardware have really
come a long way in terms of the types of machine
learning models that can be used. The kinds of
applications that we build on top of it. And I think that
one of the reasons why compute hardware, in terms of
things like GPU's and TPU's up until a few years ago
improved CPU's become so important in terms of what
gets built. It's a feedback effect. When a new kind of
hardware is shown to be really powerful for a
particular application, more things get built utilizing
that hardware and for that application, which then
spurs additional research into that kind of hardware.
Dan Shiebler: One of the reasons why machine learning conferences
are so completely swamped right now with super deep
networks, rather than more rule-based or symbolic
kinds of approaches, is that the sorts of hardware that
we have access to, the best most powerful kinds of
hardware, is really well suited for deep networks.
Dan Shiebler: And that's a result of the self-supporting process of
deep networks encouraging more research on these
kinds of hardware, which then encourages more
research and better results from deep networks.
Kirill Eremenko: Got you. Would you agree with, I've seen in the news
recently, or the past half year or so, that Moore's Law
is dead. That we've come to a limit in terms of how
small our integrated circuits can be and how many
transistors can fit on them and that's it. From here,
our exponential amazing benefit that we were getting is
over and now it's all going to flatten out.
Dan Shiebler: In some ways, I definitely think that's true. I can't say
that I'm an expert in transistors or the necessary
limitations on how small we can make them. But I can
say that, improvements in our ability to parallelize
computations and improvements in the construction of
specialized hardware, have allowed us to maintain
exponential growth in terms of the computations we're
capable of. Certainly these effects seem like they have
limits and ceilings that are much lower than the
seemingly unbounded limitations of Moore's Law. But
it's certainly possible, that as innovations continue,
we'll be able to find out new ways to utilize other kinds
of tricks to continue to improve computation. I don't
think that the speed of computation is necessarily
never going to be able to increase with an exponential
rate simply because we can't make transistors smaller
right now.
Kirill Eremenko: I agree. I completely agree. I think we'll find a way. It's
been so good. The next one is from Oscar, who is
asking about some insights into how Twitter is using
machine learning to detect bots or bot accounts or bot
farms. And, what are scalable solutions that are being
implemented for cyber security and or fraudulent
account detection? Anything you can share on that?
Dan Shiebler: I can't talk about specifics on this, also because I don't
work on those teams and so I don't have an intimate
understanding of the specifics. But I will say that
there's a multidisciplinary teams combining machine
learning techniques, heuristics and really rigorous
research and understanding of the sorts of adversaries
in fields and the user behaviors, the diversity of all
kinds of healthy user behaviors as well. That's
understood at not only at an engineering machine
level, but also a very human level to combat these
kinds of issues.
Kirill Eremenko: Okay. Perfect. Next one is from Nikhill who's saying,
how much time is realistically spent on data to get it
ready for model development?
Dan Shiebler: I think it really depends on what state the data is
beginning in and the expectations of the model. Of
course, it's very easy to go to scikit-learn and train on
logistic regression on the Iris data set. There's really
not much data [inaudible 00:50:07] at all. But,
accessing data, for example, if you're a data scientist
who works at the Federal Reserve, it may take you
years to be able to complete all of the necessary
documentation. And track down all of the data and all
of the different places under each of the different
permission walls. And then, process it into a form that
will realistically work. I'd say somewhere between 10
seconds and multiple years, depending on your
application. Realistically, for a more useful answer, I'd
say in general probably at least 80% of modeling time
would be spent on some sort of data related task.
Kirill Eremenko: Yeah, like out of the whole, right? The modeling would
be 20% of your time spent on the whole thing.
Dan Shiebler: Yeah.
Kirill Eremenko: For instance, at Twitter, when you're developing some
new model or something, I assume you already have
some data pipelines prepared. But, if you were to
create a new data pipeline, how long would that take
you?
Dan Shiebler: Even for creating new data pipelines, a lot of our
tooling is very well developed for exactly that purpose.
For the process of creating new data pipelines and for
the process of maintaining the data pipelines that we
already have. I think the most time consuming
problems at Twitter, are really understanding model
behavior and understanding how a new source of data
will allow us to construct better models, and less
about the actual engineering work itself. Or, the
modeling work, both of which are very supported on
tools. It's the decision making and analysis and
understanding that can often take most of the time.
Kirill Eremenko: Isn't that amazing, you don't need to process data.
This is one of the rarest cases in data science where
you just have the luxury of, all right, I'm going to think
about creative stuff all day long. Well, of course,
there's some more mundane tasks I'm sure, but you've
created an environment where it is such that you can
just do the fun stuff all the time. It's so exciting.
Dan Shiebler: I think that it's supported by the really serious
investment of Twitter into making modeling easier and
making modeling more scalable. I will say that there's
of course tradeoffs to having so much of the pipeline
already exist and already be buildable and adaptable
in that when we want to build modeling strategies that
break some of the abstractions that are in place, it can
be very challenging to understand the pipelines that
have been built up over years, by many different
teams. And there's a very real learning curve to the
depth of Twitter's infrastructure and Twitter's
modeling pipelines. That I think can be intimidating
for people who start.
Kirill Eremenko: Was it already in place when you stared two years
ago?
Dan Shiebler: It certainly changed very significantly. But a very
serious amount of this infrastructure was definitely in
place. I remember having difficulty in the beginning
really wrapping my head around the pure scale of
what exists. Very common for me, at the beginning,
was to build 80% of a solution. Only to find that some
other team in London, or Boston had a solution that
was far better than mine. That they'd spent the last
several years on, that really completely obviated the
need for any of my work in first place. Often
understanding what's been done previously in a space,
really at a deep level, and what can be exploited from
the work that's previously done can be more valuable
than trying to write a half-baked solution. Even if it's
can be more fun to write a more half-baked solution.
Kirill Eremenko: Got you. And it's interesting, because from this it
sounds like it's a big investments for and a big bet for
Twitter to bring you, or someone, on board to spend a
few months getting their head around these things.
They're investing their time, their efforts into this new
person that's joining the team. They want to be sure
that you're going to stay for long enough to create
some stuff of your own bring your idea to the team.
Kirill Eremenko: We didn't speak about this, but I'm curious, how did
your interview with Twitter go? Was it very clear at the
start, okay, this is a perfect match? Or you were still
thinking, or they were thinking? How did they know
that you are the right person? It's only a 10 person
team, by adding you to this team, they're going to
bring a lot of value to the company.
Dan Shiebler: Well the team that I'm a part of now, didn't really exist
when I started. But when I started Cortex, the entire
organization was only about 15 people. Like I said, it's
almost 10 times larger now. I don't think that when
they hired me they were thinking about the way things
would be right now, in this position. I think they were
more considering the possible ways that Cortex might
develop and Twitter might develop and how I could
help and fit into these different possible developments.
And I think one of the reasons why Twitter has
managed to remain relevant and be really an
important social network in the world, is because
there's a lot of attention paid to the kinds of people
that we hire.
Kirill Eremenko: Got you. And another question popped to my mind.
The team, Cortex, has grown 10X, from 15 to 150,
you've been there two and a half years. Any thoughts,
is it in your plans to become a data science manager,
or you prefer to do the hands on work and develop
your skills there?
Dan Shiebler: I definitely feel that I've grown significantly as a leader
over the course of my time at Twitter. I've been tech
lead for a number of projects and I'm continuing to
lead various sorts of initiatives. I do think at some
point, perhaps in the not so distant future, I will
switch to management. Because I do really enjoy
leadership and thinking about things from a higher
level. At the moment I'm invested in making the
technical projects that I'm a part of be successful.
Whether that's through direct technical involvement,
mentorship, or leadership on a more macro scale.
Kirill Eremenko: Got you. Okay. Let's do one more question. There's a
lot more. But this one got the most votes, people
actually voted for the questions.
Dan Shiebler: Oh, cool.
Kirill Eremenko: Here we go. This is from Oren. Oren asks how much of
computer science topics, like algorithms and data
structures, does a non-computer science data scientist
need to master in order to advance from a build a
model and present your report type of data scientist to
a machine learning engineer that normally deals with
production processes type of data scientist?
Dan Shiebler: I would say that there's a couple of ways of looking at
that. On one sense, I do think that it's quite possible
to really advance as a serious engineering engineer
without really ever thinking super deeply about some
of the core data structures and algorithms. But I do
think that somebody who does that, is at a
disadvantage, because there are many concepts that
are critical in terms of the structure of different sorts
of systems and the interplay of different kinds of
components. And the elegance of different sorts of
techniques that feel very unified and clear. And easy to
understand when you understand these key topics to
begin with. But can feel more jagged or harder to wrap
your mind around, or harder to have that sort of
solution be your first attempt, if you're coming at
things from learning each fact individually, rather than
really developing an understanding of these kinds of
fundamentals.
Dan Shiebler: That said, I will say that there are situations even
when these fundamentals themselves can help
directly. Not too long ago, I found myself in a situation
where a suffix tree, which is a classic, the intro to data
structures and algorithms data structure, was exactly
what I needed in order to build a feature importance
algorithm that would run efficiently. And implementing
it at, yielded an 10X speed up, over the next best
solution. And I certainly never would have come to
that had I not taken a data science and algorithms
class back in the day. But the fact that this is a single
anecdote from six months ago, and I certainly can't
think of another one in the past year, I think probably
says that the knowledge itself is not incredibly
important.
Kirill Eremenko: Focusing on fundamentals and structures of systems,
you gave that one example of the suffix tree, which I'd
be curious to learn more about, but I'll do that in my
own time. What's another example? Not like of an
application of a system, but how thinking about the
fundamentals can help somebody advance their
career?
Dan Shiebler: There's a lot of times when the construction of a
system can take different roles in terms of its
interaction with different interfaces. There's a degree of
abstraction that comes in, in the creation of software
systems. The assembling of pipelines that deal with
different sorts of data sources, different kinds of
modeling infrastructure. The different ways that we
can structure the sorts of software pipelines that touch
on each of these different kinds of systems. When
they're well-structured in ways that make bugs
difficult to introduce, make systems easy to adapt and
add to and redesign, this can yield enormous
improvements in model quality and pipeline quality
over time. Especially when operating as part of a team.
I think that one of the largest applications is in the
construction of data generation pipelines, and the
model training code as it interfaces with these
pipelines, and having those constructed in a principled
way is really valuable.
Kirill Eremenko: Okay. In a nutshell, the answer would be, rather than
going for quantity of topics in computer science, go for
the fundamentals and structure of systems. And think
things through holistically. Then the follow-up
question I would have is, how does one go about
learning this kind of stuff? Do you have any books you
can recommend, or sources online? Just even specific
topics to look into for somebody who's serious to follow
this pathway, but just doesn't know where to get
started.
Dan Shiebler: Yeah, absolutely. I do think that there's value in going
through core algorithms, data structures textbooks,
for the purpose of understanding these concepts. I
personally like Algorithms by Dasgupta for that. But I
would say that would be more of a second order
strategy.
Dan Shiebler: I think that the first order strategy in terms of the
fastest way to really develop this intuition on a deep
level, is to simply be part of large software projects. For
somebody working at home, this would mean
contributing to open-source projects that ideally in a
way that you would be able to get feedback on the code
that you write through code reviews. Or, through a
community of people who are contributing to a large
project, or for somebody who's working as a data
scientist in a company, trying to get an understanding
of the kinds of systems that software engineers are
working on. And if you could even be part of one of
those projects for a little while and understand these
things from the perspective of the software engineers,
who write code that gets reviewed by multiple people.
And is part of really large, complex, multi-tenants
infrastructures. And the kinds of concerns involved
there, there's really no better way to learn these sorts
of issues than by simply working on them on a day to
day basis.
Kirill Eremenko: And if you're stuck at home, you don't have access to
something like this at work, or you're still learning and
things like that, you can just go to Github, open a
recent development in machine learning or deep
learning, whatever you're interested in and read
through how it developed. What is version one, what is
version two, what was fixed, what was changed, what
bugs came up, what bugs were removed. What
features were added, what were the user complaints
and so on. And just by doing that you can understand
better the intuition, as Dan here pointed out, the
intuition that went into all this. And the motives that
were driving these changes.
Dan Shiebler: Yeah, absolutely. As you become more comfortable,
being part of it and contributing it yourself and feeling
the pain of these bugs, I think is a really exceptional
way to grow.
Kirill Eremenko: Mm-hmm (affirmative). Got you. Well on that note,
we're coming close to the end of this podcast, been
super exciting. How did you enjoy your second
appearance on this show so far?
Dan Shiebler: It was great. Excellent. Lots of fun.
Kirill Eremenko: Great. I loved chatting to you. Great insights. Any
parting thoughts? Any things you'd like to wish our
audience on their way to becoming machine learning
engineers and data scientists?
Dan Shiebler: I would say to really keep your mind open with respect
to learning things. That it could be very easy to fall
into the trap of only reading about the very latest,
highest scoring on benchmarks sorts of architectures
and really focusing on that. The really deep
understanding of how machine learning got to where it
is, understanding what was machine learning like in
1990? What were the people then thinking? I think
going at things from a temporal perspective is an
excellent way to develop the kinds of intuitions that
makes somebody an exceptional machine learning
engineer and machine learning researcher. I would
encourage people to really think about how to develop
that understanding as deep as possible.
Kirill Eremenko: Fantastic. Great advice. Well on that note, Dan, what
are the best ways for our listeners to get in touch with
you, or follow you, contact you? Just see how your
career develops from here.
Dan Shiebler: My LinkedIn, Dan Shiebler, works. Also my email, if
anyone has any questions for me. I'm happy to answer
it, just [email protected] or
[email protected], if it's Twitter related.
Kirill Eremenko: Mm-hmm (affirmative). Got you. Fantastic. Well, once
again, thanks so much. And you mentioned one book,
and before I let you go. I wanted to see, do you have
any other books that you can recommend that have
impacted your career personally?
Dan Shiebler: Absolutely. I have two books actually that I'll
recommend. The first one is something I read very
early on. It was probably the first actual textbook that
had anything to do with programing and it's Coding
the Matrix by Philip Klein. And it's actually a book on
linear algebra, and I'd recommend it for somebody who
is either a data analyst or a software engineer who
doesn't necessarily feel that they're super comfortable
with linear algebra. Because the ideas introduced in
this book, there's many of them that ended up being
really pivotal in my understanding of machine
learning. And I think it's just written from a great
perspective of somebody who wants to understand
how each of these different algorithms, that deal with
matrices and deal with vectors, play together in a way
that makes sense to someone who's used to
programing.
Kirill Eremenko: Got you. It's interesting, Klein, for a second I thought
it was the Klein that developed that abstract
mathematical concept. What was it called? The bottle
of Klein, or something like that, but obviously not.
Probably not, it's a more recent guy.
Dan Shiebler: It is. He is a little bit more recent. But he's also a very
abstract mathematician, who does some very
interesting abstract research on graph theory.
Kirill Eremenko: All right. And then the second book?
Dan Shiebler: The second book is something I read more recently. It's
An Introduction to Computational Learning Theory by
Michael Kearns. This is a definitely far less applied
book. And not necessarily that I'd recommend to
someone who's looking for a book that will immediately
change their career. But it's written from the
perspective of the state of the art of machine learning
and the theory behind machine learning in 1994. And
it introduces a lot of fundamental ideas, some of which
have really gone on to take off. And some of which
were largely forgotten, but understanding things from
that perspective and in a theoretical framework that's
discussed and it, I think, has given me a lot of context
in learning new things about machine learning. And
understanding which ideas last and which ideas end
up disappearing.
Kirill Eremenko: Fantastic. Exactly what you mentioned before.
Dan Shiebler: Yeah.
Kirill Eremenko: Study the history of something. Yeah. Very cool. It's
interesting you mentioned it, because in the
FiveMinuteFriday episodes that I do in the podcast,
literally as this episode is going to go live, there's going
to be five episodes about the history of data science. It
doesn't go into the details of algorithms and things like
that, but historically, how the field of data science has
been progressing. Because I was also curious, I had
the same thought. In fact, actually, the team suggested
this. And I was like, wow, this is a really cool idea.
Knowing the history of something allows you to
understand better, what the future will be like.
Dan Shiebler: Absolutely agree.
Kirill Eremenko: Yeah at least the fundamentals, right?
Dan Shiebler: Yeah. I totally agree. That sounds great.
Kirill Eremenko: Awesome. Well, once again Dan, thanks so much for
coming on the show. Looking forward to seeing you at
DataScienceGO 2020. Can't wait for your talk, it's
going to be epic.
Dan Shiebler: Absolutely. Looking forward to it. Thank you.
Kirill Eremenko: So there you have it everybody, that was Dan Shiebler,
senior machine learning engineer at Twitter Cortex.
What was your favorite part of the discussion? For me
it was definitely the whole talk about Dan's PhD, this
whole conversation about theoretic machine learning
and algebraic topology brought back memories rushing
from my university years. So it was really good fun
listening to that, but I'm sure you had your personal
favorite of this talk. If you would like to meet Dan in
person and be part of that advanced practitioner
workshop exclusive track for advanced practitioners,
make sure to secure your seat today. Head on over to
datasciencego.com, click the option for Los Angeles, he
will be there. So we are running in two cities this year,
in Berlin and Los Angeles. You want the Los Angeles
option for 23rd, 24th, 25th October. Get your ticket
today and you'll be part of that advanced practitioning
group, you'll learn from Dan in a hands-on workshop,
personally from him. So once again, the website is
datasciencego.com.
Kirill Eremenko: And as usual, you can get all of the show notes and
materials mentioned in this episode at
superdatascience.com/345. You'll get the transcript
there plus any links, materials we mentioned,
including link or the URL to Dan's LinkedIn where you
can connect with him and follow him at any other
places on social media where you can catch up and
follow him there as well. So, that is at
superdatascience.com/345. That's also how you can
share this episode with your friends and colleagues.
Just send them the link superdatascience.com/345 so
they can get up to speed about with all the amazing
topics we talked about today, including vectors,
embedding nearest neighbors, different techniques and
methodologies Dan uses in his work plus how to think
about your career and why to maybe even do a PhD in
parallel.
Kirill Eremenko: So there we go, hope you enjoyed this episode. Can't
wait to see you on the next one and until then my
friends, happy analyzing.