this is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/supcom.pdf · git, github linux,...
TRANSCRIPT
![Page 1: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/1.jpg)
Machine Learning forLarge Scale Code Analysis
This is source{d}.Vadim Markovtsev
![Page 2: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/2.jpg)
Who I am
2
![Page 3: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/3.jpg)
Vadim Markovtsev● Machine Learning team leader● Joined source{d} in mid-2016● Worked in 5 software companies, e.g. Samsung Research, Mail.Ru● Spoke 30+ times on IT conferences, from meetups to AAA● Master in Applied Mathematics (Moscow University of Physics and
Technology, 2012)
3
![Page 4: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/4.jpg)
How we work
4
![Page 5: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/5.jpg)
Who we are
● 35+ employees● 70% are engineers● 15% are management● Major cultures: Hispanic,
Slavic, French5
![Page 6: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/6.jpg)
Timeline● Early 2015: appear as Tyba, seed investments● Late 2015: pivot, become source{d} (new CTO's side project)● Early 2016: run out of money, some employees leave● Mid 2016: raise €5mm, salvation● Late 2016: pivot, stop making money and dismiss 50%● 2017: let's clone all Git repositories, parse them, and do MLonCode● Late 2017: remote first● 2018: assisted code review with MLonCode● Early 2019: engineering observability… with MLonCode● Mid 2019: first revenue and raise €Xmm
6
![Page 7: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/7.jpg)
Human resources
7
Number of skilled engineers
Time
High-risk, high-profit, innovation
Low-risk, no extra profit, optimization
source{d} right now
![Page 8: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/8.jpg)
Offices● HQ: Madrid and San Francisco● 60% are remote● Flexible everything
8
![Page 9: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/9.jpg)
Perks● 23 days of vacations● Open Source Day● Research Day● Conferences● Papers
9
![Page 10: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/10.jpg)
Teams● Applications● Solutions● Machine Learning, applied● Machine Learning, research● Data Processing● Data Retrieval● Language Analysis● Developer Operations● Developer Relations● Quality Assurance
10
● Product management● Business Intelligence● Management
![Page 11: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/11.jpg)
Structure
11
Management (CEO, CTO, etc.) VP of engineering
Team Lead
Coordinator Engineer
Product Manager
![Page 12: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/12.jpg)
Hiring● Remote coding challenge● Remote Machine Learning challenge*● Personal with CTO or VP of Engineering● Personal with CEO (sometimes)● Design interview● Machine Learning interview*● Q&A interview● Logical Thinking interview
Open discussion with veto-ing and overriding by the team lead
12
![Page 13: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/13.jpg)
github.com/src-d/guide13
![Page 14: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/14.jpg)
Culture● 4 full-time engineers left since mid-2016● 1 joined back
○ He became our VP of engineering
14
![Page 15: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/15.jpg)
Technologies
15
![Page 16: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/16.jpg)
Programming languages
1. Go2. Python 33. Scala4. Javascript, Typescript5. C/C++6. CUDA
16
![Page 17: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/17.jpg)
Our engineers play with
● Rust● Elm● D● Haskell
17
![Page 18: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/18.jpg)
Tools
● Git, GitHub● Linux, macOS● Visual Studio Code● PyCharm, GoLand, CLion● vim● Ghost● Gimp, Inkscape● shwr.me
18
● Slack● Google Docs● Zoom, appear.in● Octobox
![Page 19: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/19.jpg)
If we go deeper
● Indent with spaces● Switched to Go modules, no vendoring● go-git● Python scientific stack● Tensorflow and Pytorch● Apache Spark● React● Kubernetes, Docker
19
![Page 20: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/20.jpg)
Clouds
● Amazon● Azure (Microsoft)● Google● etc.
20
➖ Expensive➖ Support sucks
➕ X × $10,000 for startups➕ Awesome quality: reliable, performant, etc.
![Page 21: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/21.jpg)
Google21
![Page 22: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/22.jpg)
GitHub22
![Page 23: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/23.jpg)
Computer science stuff we did use
● Graph traversal● Connected components, shortest path, etc.● Linear Programming: bipartite matching, network flow, and many others● Convex optimization● Grammar parsers● Complexity theory● String algorithms, e.g. LCS with suffix arrays● Compression theory● Disjoint sets
23
Force yourself to study the theory, this is your competitive advantage
● Merkle tree● Dynamic Programming● Max-SAT● ...
![Page 24: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/24.jpg)
Machine Learning24
![Page 25: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/25.jpg)
Natural Language Processing
● Working with word distributions● Pipelines● word2vec; Swivel● Transformers
25
![Page 26: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/26.jpg)
Classics
● Linear Regression● Random Tree Forest● Production Rules● GBDT: xgboost, catboost● Hyperparameter optimization
26
![Page 27: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/27.jpg)
Clustering
● K-Means● t-SNE● UMAP● k-NN, e.g. KD Tree, VP Tree● aNN, e.g. hnsw● MinHash, Weighted MinHash
27
![Page 28: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/28.jpg)
Deep Learning
● Char-level CNN● Transformers, Inception● LSTM, GRU● Gated Graph Neural Networks
28
![Page 29: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/29.jpg)
What we do
29
![Page 30: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/30.jpg)
Code as Data
● "Code Lake"● Dashboards● Advanced Insights
Engineering Observability
30
![Page 31: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/31.jpg)
Machine Learning on Source Code
● Automatic Program Repair● Code Naturalness
Assisted Code Review
31
![Page 32: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/32.jpg)
Challenges
32
![Page 33: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/33.jpg)
Remote communication33
![Page 34: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/34.jpg)
Talent34
![Page 35: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/35.jpg)
Transparency35
![Page 36: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/36.jpg)
Pioneering36
![Page 37: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/37.jpg)
Academia37
![Page 38: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/38.jpg)
Sales38
![Page 39: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/39.jpg)
Scaling39
![Page 40: This is source{d}. - vmarkovtsev.github.iovmarkovtsev.github.io/SUPCOM.pdf · Git, GitHub Linux, macOS Visual Studio Code PyCharm, GoLand, CLion vim Ghost Gimp, Inkscape shwr.me 18](https://reader036.vdocuments.site/reader036/viewer/2022081615/5fe00547f143b65544413fe5/html5/thumbnails/40.jpg)
Machine Learning for Large Scale Code Analysis
sourced.tech · github.com/src-d · @sourcedtech · blog.sourced.tech