stack the liip data science - netlivemypage.netlive.ch/demandit/files/m_d0861cc4dcef62dfadc... ·...

30
The Liip Data Science Stack Insights from building and maintaining it Zürich, 08.05.2018

Upload: others

Post on 24-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining

The Liip Data Science StackInsights from building and maintaining it

Zürich, 08.05.2018

Page 2: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining

2

About me - Quick facts

Dr. Thomas Ebermann

- Diploma in Computer Science at the Univ. of

Mannheim & Waterloo.

- PhD in Computational Social Science predicting

information flow in Twitter.

- Working for Liip as Data Scientist since 2016.

- Love Ruby and Python.

Page 3: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining

Purpose over profitsTrust over controlPractice over theoryRisk over safety Flexibility over strengthOpen over closed Compasses over maps

LIIP PRINCIPLES

3

Page 4: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining

4

The Data Science Stack

Page 5: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining

History

5

– All github stars– All my bookmarks mobile and mac– Email / newsletters– Internal company slack

– Collect all the data science tools that, I use on a regular basis, have emerged on my horizon.

– Finally sort the mess in my head.

Page 6: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining

The Stack Idea

We use stacks in web dev in various areas, where we describe systems that build on top of each other and work well together:

LAMP Stack (Linux, Apache, Mysql, Php)

Why not have a Data science stack of tools that work well together?

Instead pointing to only one tool lets point to whole families of similar tools.

6

Page 7: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining

The Data Science Stack

7

Where does the data come from?

Data Sources

How can we analyse it?

Analysis

Are there solutions that can do all in one?

Business Intelligence

How can we clean and transform it?

Data Processing

How can we efficiently store/retrieve/search it?

Database

How can we visualise it?

Visualisation

Page 8: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining

But wait what about the Gartner reports?

- Very high level

- Only big players

- Very few open source solutions

- No small tools

- Have to sell your soul to get into these magic

quadrants

8

Page 9: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining

2017 Version

9

Page 10: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining

The 2017 PDF Poster

- 250 Tools in one poster- Provide orientation like a map- Discover your white spots on the map- Over 30’000 visitors- Over 4’300 downloads worldwide- Over 300 mail signups to be notified for

Version 2

Quite a success but it was out of date the day we created it!

10

Page 11: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining

Insights

11

Page 12: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining

Insights Data Sources

- Scrapers (7): Lots of tools and variety, very open source friendly (PhantomJS+Capybara)

- Website Analytics (37): There are surprisingly a lot more tools out there than Google analytics. (Google Analytics)

- Tag Management (6): A lot of competition has emerged since google tag manager (Google Tag Manager)

- Heatmaps (5): Controversial but insightful (Hotjar)

- Mobile Analytics (18): A lot of specialized tools (Google Analytics)

- Social Media (12) : Due to exclusive contracts and harmonization/acquisition there are only a few big cross-platform data providers out there (Brandwatch)

- IoT (8): Marginal role for us now as a data source right now (Ubidot)

12

Page 13: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining

Insights Data Processing

- ETL (10): Tools for very big scale or Datalakes (TalenD)

- Data Cleaning (3): User friendly tools exist that target not only the data

scientist (Trifacta)

- Alerting & Logging (7): Excellent open source production ready solutions

change the way logs are consumed these days (Graylog)

- Message Queues (20): PubSub (Kafka), Real Time processing on the fly is

the new paradigm (Flink), Apache Foundation very active here

13

Page 14: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining

Insights Databases

- Databases (43): There is much more than MYSQL vs NoSQL. Graph

databases (Neo4J), time series databases (TimeScaleDB), Key-Value (Redis),

Column-Oriented (Vertica, VoltDB, Exasol)

- Search (20): A lot of good alternatives to Solr exist nowadays (Elastic) and

SaaS is very popular (Algolia)

- Hadoop Ecosystem (13): The whole Zoo of Tools is maturely integrated yet

remains complex (Spark)

14

Page 15: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining

Insights Analysis

- Deep Learning (21): Huge momentum lots of different frameworks and applications

are popping up (Tensorflow/Keras)

- Statistical software packages (11): The old monoliths are slowly being surpassed by

open source solutions (R, Rapidminer, Orange)

- General ML libraries (24): A myriad of choices for every programming language yet

python remains subjectively the most active one (scikit-learn)

- Computer Vision (9): All big 5 offer Saas solutions, but open source is strong (openCV)

- NLP/Speech recognition (23): Same here (Wit.ai)

- Assistants/Chatbots (15): A lot of promising solutions and frameworks quickly

emerged (Chatfuel)15

Page 16: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining

Insights Visualization

- General Visualisation (32): Huge number of tools,, stable candidates for

python (seaborn), R (shiny)

- JS visualisation (28): JS libs are popping up every week (D3) :)

- Dashboards (17): Line between BI and dashboards is blurring, not too

many open source solutions available (Plotly)

16

Page 17: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining

Business Intelligence

- Business Intelligence (46): I thought I knew a couple of alternatives, but

the options are vast and highly competitive. Most solutions are commercial

but good open source solutions are available (Kibana, Tableau). Ask

Gartner :)

- BI on Hadoop(5): Hard to see where the solutions begin and the

architecture ends (Datameer)

- Data Science Platforms (23): The new BI. Combination between the

freedom of Ipython notebooks and solid infrastructure (Datarobot).

Automated ML.

17

Page 18: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining

2018 Version

18

Page 19: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining

From PDF to Website

19

http://datasciencestack.liip.ch

Page 20: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining

Features I

- You can add tools too!- Search

20

Page 21: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining

Features II

- Internal Liip technology db

integration (Zebra)

- Quarterly Mailing List (keep

busy deciders up to date)

- JSON Export

21

Page 22: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining

Insights 2018

22

Page 23: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining

Outlook

23

Page 24: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining

Whats next?

Assessment of Tools

- Adopt: We feel strongly that the industry should

be adopting these items. We use them when

appropriate on our projects.

- Trial: Worth pursuing. It is important to

understand how to build up this capability.

Enterprises should try this technology on a

project that can handle the risk.

- Asses: Worth exploring with the goal of

understanding how it will affect your enterprise.

- Hold: Proceed with caution.

24

Page 25: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining

Solid rucksack

Data Sources: Google Analytics

Processing: Trifacta

Analysis: Scikit-Learn

Visualization: Highcharts(JS), Shiny(R), Seaborn(python)

Business Intelligence: KNIME

25

Page 26: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining

Trendy rucksack

Sources: Chartbeat or Snowplow

Processing: Fluentd

Analysis: Keras

Visualization: Plotly

Business Intelligence: Data Robot or Dataiku

26

Page 27: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining

27

152 employees5 locations1 vision

Tuesday 10:21

St. GallenZürich

Bern

Fribourg

Lausanne

Page 28: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining

Data Services @ Liip

28

Virtual Assistants

• Chatbots and Assistants

Data Solutions

• Recommender Systems

• Computer Vision

• Speech Recognition

• Integrated ML Models

• Whole Web-apps / apps

Data Science / Consulting

• From Data to Insights

• Data Analysis

• Network Analysis (SNA)

• Social Graph

• Time Series

• Machine Learning

Data Visualization

• Data Visualization

• Geo Visualization

• Data-Modeling

• Real Time

DashboardingBig Data

• Storage (Hadoop)

• And Analysis (Spark)

• Data Streams (Kafka)

Open Data

• Infrastructure (CKAN)

• Linked Data

Data-Driven User Experiences

• Data Interfaces

• Conversational Design

Mobile AI

• CoreML

Page 29: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining

Thank you!Excited to hear your questions.

Dr. Thomas Ebermann

Data Scientist

[email protected]

Page 30: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining

30

Scrapers (7) Website Analytics(37) Social Media (12)

Tag Management (6) Mobile Analytics (18) Heatmaps (5) IoT (8)

Insights Data Sources