data scientist - meetupfiles.meetup.com/11097942/so you want to be a data...some data is big – but...
TRANSCRIPT
![Page 1: DATA SCIENTIST - Meetupfiles.meetup.com/11097942/So You Want to be a Data...Some Data is Big – But Not Very Often 1,220 Respondents 72 countries Rexer Analytics Respondents reported](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea3f8f9e1b3837cb400074f/html5/thumbnails/1.jpg)
Presented by:
DATA
Big Data & Predictive Analytics
SO YOU WANT TO
BE A
DATA SCIENTIST
© Data-Magnum 2016
![Page 2: DATA SCIENTIST - Meetupfiles.meetup.com/11097942/So You Want to be a Data...Some Data is Big – But Not Very Often 1,220 Respondents 72 countries Rexer Analytics Respondents reported](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea3f8f9e1b3837cb400074f/html5/thumbnails/2.jpg)
Four Perspectives
Data Tools
Data Science Skills
Business / Employer
© Data-Magnum 2016
![Page 3: DATA SCIENTIST - Meetupfiles.meetup.com/11097942/So You Want to be a Data...Some Data is Big – But Not Very Often 1,220 Respondents 72 countries Rexer Analytics Respondents reported](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea3f8f9e1b3837cb400074f/html5/thumbnails/3.jpg)
© Data-Magnum 2016
Why Start with Data?
![Page 4: DATA SCIENTIST - Meetupfiles.meetup.com/11097942/So You Want to be a Data...Some Data is Big – But Not Very Often 1,220 Respondents 72 countries Rexer Analytics Respondents reported](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea3f8f9e1b3837cb400074f/html5/thumbnails/4.jpg)
Why Start with Data?
80 % CRISP-DM
© Data-Magnum 2016
![Page 5: DATA SCIENTIST - Meetupfiles.meetup.com/11097942/So You Want to be a Data...Some Data is Big – But Not Very Often 1,220 Respondents 72 countries Rexer Analytics Respondents reported](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea3f8f9e1b3837cb400074f/html5/thumbnails/5.jpg)
2002
2004
2006
2008
2009
Google releases research papers
10/03 and 12/04 read by Cutting
and others First Hadoop Developers Conference
Multiple startups spinoff to commercialize incl Hortonworks, Cloudera, MapR
All the Hoopla over Hadoop
A Little History Google develops proprietary search indexing tool based on Big Table and MapReduce
Doug Cutting working on open source version of the same “Nutch”
Cutting at Yahoo. Renamed Hadoop. First prototype launched 2006.
Yahoo is first commercial implementation 2008
Facebook, Twitter, eBay adopt.
Hadoop becomes open source at
Apache Institute
© Data-Magnum 2016
![Page 6: DATA SCIENTIST - Meetupfiles.meetup.com/11097942/So You Want to be a Data...Some Data is Big – But Not Very Often 1,220 Respondents 72 countries Rexer Analytics Respondents reported](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea3f8f9e1b3837cb400074f/html5/thumbnails/6.jpg)
Some Data is Big – But Not Very Often
1,220 Respondents 72 countries
Rexer Analytics
Respondents reported that their ‘typical’ data set size was:
90% typically < 1 to 100 Million records 60% typically < 100,000 to 1 Million records
© Data-Magnum 2016
![Page 7: DATA SCIENTIST - Meetupfiles.meetup.com/11097942/So You Want to be a Data...Some Data is Big – But Not Very Often 1,220 Respondents 72 countries Rexer Analytics Respondents reported](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea3f8f9e1b3837cb400074f/html5/thumbnails/7.jpg)
Predictive Modeling
Insights are: Specific Directional
Structured RDBMS
Semi-Structured Key Value, Document,
Column, Graph
Unstructured Key Value
How NoSQL Changed Data Science
© Data-Magnum 2016
![Page 8: DATA SCIENTIST - Meetupfiles.meetup.com/11097942/So You Want to be a Data...Some Data is Big – But Not Very Often 1,220 Respondents 72 countries Rexer Analytics Respondents reported](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea3f8f9e1b3837cb400074f/html5/thumbnails/8.jpg)
Predictive Modeling
Insights are: Specific Directional
Structured RDBMS
Semi-Structured Key Value, Document,
Column, Graph
Unstructured Key Value
Data Lakes
How NoSQL Changed Data Science
© Data-Magnum 2016
![Page 9: DATA SCIENTIST - Meetupfiles.meetup.com/11097942/So You Want to be a Data...Some Data is Big – But Not Very Often 1,220 Respondents 72 countries Rexer Analytics Respondents reported](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea3f8f9e1b3837cb400074f/html5/thumbnails/9.jpg)
Predictive Modeling
Insights are: Specific Directional
Structured RDBMS
Semi-Structured Key Value, Document,
Column, Graph
Unstructured Key Value
Recommenders Data Lakes
How NoSQL Changed Data Science
© Data-Magnum 2016
![Page 10: DATA SCIENTIST - Meetupfiles.meetup.com/11097942/So You Want to be a Data...Some Data is Big – But Not Very Often 1,220 Respondents 72 countries Rexer Analytics Respondents reported](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea3f8f9e1b3837cb400074f/html5/thumbnails/10.jpg)
Predictive Modeling
Insights are: Specific Directional
Structured RDBMS
Semi-Structured Key Value, Document,
Column, Graph
Unstructured Key Value
Natural Language Processing
Recommenders Data Lakes
How NoSQL Changed Data Science
© Data-Magnum 2016
![Page 11: DATA SCIENTIST - Meetupfiles.meetup.com/11097942/So You Want to be a Data...Some Data is Big – But Not Very Often 1,220 Respondents 72 countries Rexer Analytics Respondents reported](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea3f8f9e1b3837cb400074f/html5/thumbnails/11.jpg)
Predictive Modeling
Insights are: Specific Directional
Structured RDBMS
Semi-Structured Key Value, Document,
Column, Graph
Unstructured Key Value
Natural Language Processing
Recommenders IOT Data Lakes
How NoSQL Changed Data Science
© Data-Magnum 2016
![Page 12: DATA SCIENTIST - Meetupfiles.meetup.com/11097942/So You Want to be a Data...Some Data is Big – But Not Very Often 1,220 Respondents 72 countries Rexer Analytics Respondents reported](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea3f8f9e1b3837cb400074f/html5/thumbnails/12.jpg)
Predictive Modeling
Insights are: Specific Directional
Structured RDBMS
Semi-Structured Key Value, Document,
Column, Graph
Unstructured Key Value
Natural Language Processing
Recommenders IOT
Deep Learning
Data Lakes
Reinforcement Learning
How NoSQL Changed Data Science
© Data-Magnum 2016
![Page 13: DATA SCIENTIST - Meetupfiles.meetup.com/11097942/So You Want to be a Data...Some Data is Big – But Not Very Often 1,220 Respondents 72 countries Rexer Analytics Respondents reported](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea3f8f9e1b3837cb400074f/html5/thumbnails/13.jpg)
The Tools Perspective
© Data-Magnum 2016
![Page 14: DATA SCIENTIST - Meetupfiles.meetup.com/11097942/So You Want to be a Data...Some Data is Big – But Not Very Often 1,220 Respondents 72 countries Rexer Analytics Respondents reported](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea3f8f9e1b3837cb400074f/html5/thumbnails/14.jpg)
All Those Algorithms Answer Only 5
Questions
1. Is this A or B?
2. Is this weird?
3. How much – or – How many?
4. How is this organized?
5. What should I do next?
© Data-Magnum 2016
![Page 15: DATA SCIENTIST - Meetupfiles.meetup.com/11097942/So You Want to be a Data...Some Data is Big – But Not Very Often 1,220 Respondents 72 countries Rexer Analytics Respondents reported](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea3f8f9e1b3837cb400074f/html5/thumbnails/15.jpg)
Three Types of Machine Learning
• Have Data • Data Has Labels • Learn by Example
• No Data • Learn by Trial
and Error
• Have Data • No Labels • Learn by Example • See If There’s a
Pattern in There
© Data-Magnum 2016
![Page 16: DATA SCIENTIST - Meetupfiles.meetup.com/11097942/So You Want to be a Data...Some Data is Big – But Not Very Often 1,220 Respondents 72 countries Rexer Analytics Respondents reported](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea3f8f9e1b3837cb400074f/html5/thumbnails/16.jpg)
Three Types of Machine Learning
Decision trees / Random Forest Naïve Bayes classification Least squares regression Logistic regression Support vector machines Ensemble methods – Bagging, Boosting, Super Learners Neural Networks Linear Genetic Programs
Q-Learning PyBrain Mostly Custom Agents
Clustering Centroid-based algorithms Connectivity-based algorithms Density-based algorithms Probabilistic Dimensionality Reduction Neural networks / Deep Learning Principal Component Analysis Singular Value Decomposition Independent Component Analysis
© Data-Magnum 2016
![Page 17: DATA SCIENTIST - Meetupfiles.meetup.com/11097942/So You Want to be a Data...Some Data is Big – But Not Very Often 1,220 Respondents 72 countries Rexer Analytics Respondents reported](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea3f8f9e1b3837cb400074f/html5/thumbnails/17.jpg)
2015 Algorithm Usage
1,220 Respondents 72 countries
Rexer Analytics
![Page 18: DATA SCIENTIST - Meetupfiles.meetup.com/11097942/So You Want to be a Data...Some Data is Big – But Not Very Often 1,220 Respondents 72 countries Rexer Analytics Respondents reported](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea3f8f9e1b3837cb400074f/html5/thumbnails/18.jpg)
R versus Python versus SAS
Which do you prefer to use? Most DS use multiple languages but everyone has a favorite.
Burtch Works www.burtchworks.com/2015/05/21/2015-sas-vs-r-survey-results
© Data-Magnum 2016
![Page 19: DATA SCIENTIST - Meetupfiles.meetup.com/11097942/So You Want to be a Data...Some Data is Big – But Not Very Often 1,220 Respondents 72 countries Rexer Analytics Respondents reported](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea3f8f9e1b3837cb400074f/html5/thumbnails/19.jpg)
R versus Python versus SAS
Burtch Works www.burtchworks.com/2015/05/21/2015-sas-vs-r-survey-results
© Data-Magnum 2016
![Page 20: DATA SCIENTIST - Meetupfiles.meetup.com/11097942/So You Want to be a Data...Some Data is Big – But Not Very Often 1,220 Respondents 72 countries Rexer Analytics Respondents reported](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea3f8f9e1b3837cb400074f/html5/thumbnails/20.jpg)
The Data Scientist’s Perspective
Data Wrangler
Model Jockey
Data Scientist
© Data-Magnum 2016
![Page 21: DATA SCIENTIST - Meetupfiles.meetup.com/11097942/So You Want to be a Data...Some Data is Big – But Not Very Often 1,220 Respondents 72 countries Rexer Analytics Respondents reported](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea3f8f9e1b3837cb400074f/html5/thumbnails/21.jpg)
What We Do
© Data-Magnum 2016
250 respondents internationally “Analyzing the Analyzers” by Harris, Murphy, and Vaisman, 2013. http://www.oreilly.com/data/free/analyzing-the-analyzers.csp.
![Page 22: DATA SCIENTIST - Meetupfiles.meetup.com/11097942/So You Want to be a Data...Some Data is Big – But Not Very Often 1,220 Respondents 72 countries Rexer Analytics Respondents reported](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea3f8f9e1b3837cb400074f/html5/thumbnails/22.jpg)
Types of Data Scientists – Self Described
“Analyzing the Analyzers” by Harris, Murphy, and Vaisman, 2013. http://www.oreilly.com/data/free/analyzing-the-analyzers.csp. © Data-Magnum 2016
Leader Business- Person Entrepreneur
Jack of all trades Artist Hacker
Developer Engineer
Researcher Scientist Statistician
![Page 23: DATA SCIENTIST - Meetupfiles.meetup.com/11097942/So You Want to be a Data...Some Data is Big – But Not Very Often 1,220 Respondents 72 countries Rexer Analytics Respondents reported](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea3f8f9e1b3837cb400074f/html5/thumbnails/23.jpg)
What You Need to Know
• Foundational Statistical Theory
– Probability, statistical analysis, sampling theory, hypothesis testing, statistical distributions, correlation, standard deviation, basic regression
• Foundational Programming Skills
– R, SAS, Python, SQL
• Machine Learning
– Supervised and Unsupervised (leave Reinforcement Learning for later)
• Big Data Toolbox
– Hadoop, Spark, how to operationalize predictive models to create business value
Amy Gershkoff, Chief Data Officer, Zynga © Data-Magnum 2016
![Page 24: DATA SCIENTIST - Meetupfiles.meetup.com/11097942/So You Want to be a Data...Some Data is Big – But Not Very Often 1,220 Respondents 72 countries Rexer Analytics Respondents reported](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea3f8f9e1b3837cb400074f/html5/thumbnails/24.jpg)
The Business or Employer’s Perspective
© Data-Magnum 2016
![Page 25: DATA SCIENTIST - Meetupfiles.meetup.com/11097942/So You Want to be a Data...Some Data is Big – But Not Very Often 1,220 Respondents 72 countries Rexer Analytics Respondents reported](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea3f8f9e1b3837cb400074f/html5/thumbnails/25.jpg)
Two Markets
The Big Web Developers Market
© Data-Magnum 2016
![Page 26: DATA SCIENTIST - Meetupfiles.meetup.com/11097942/So You Want to be a Data...Some Data is Big – But Not Very Often 1,220 Respondents 72 countries Rexer Analytics Respondents reported](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea3f8f9e1b3837cb400074f/html5/thumbnails/26.jpg)
Two Markets
The Core Data Science Market Banking Insurance Mortgage Lending Brokerage Telecomm
Healthcare e-commerce B&M Retail Utilities Manufacturing
Transportation Education Government Services
© Data-Magnum 2016
![Page 27: DATA SCIENTIST - Meetupfiles.meetup.com/11097942/So You Want to be a Data...Some Data is Big – But Not Very Often 1,220 Respondents 72 countries Rexer Analytics Respondents reported](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea3f8f9e1b3837cb400074f/html5/thumbnails/27.jpg)
Salary Increases as Experience &
Responsibility Increase
Median Base $112,000
![Page 28: DATA SCIENTIST - Meetupfiles.meetup.com/11097942/So You Want to be a Data...Some Data is Big – But Not Very Often 1,220 Respondents 72 countries Rexer Analytics Respondents reported](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea3f8f9e1b3837cb400074f/html5/thumbnails/28.jpg)
The Opportunity – Good News / Bad News
2nd Best Work/Life Balance and Plenty of Openings Going Unfilled
Market Penetration – 12% in 2012 (Gartner) – Guestimating Maybe 20% to 25% Today.
Citizen Data Scientists and Fully Automated DS
© Data-Magnum 2016
![Page 29: DATA SCIENTIST - Meetupfiles.meetup.com/11097942/So You Want to be a Data...Some Data is Big – But Not Very Often 1,220 Respondents 72 countries Rexer Analytics Respondents reported](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea3f8f9e1b3837cb400074f/html5/thumbnails/29.jpg)
Summing It Up
• Should you specialize?
• Build 3 competencies (Your Focus) – Industry
– Business Process (e.g. customer acquisition, fraud detection)
– Tool Sets (languages, analytic platforms, data platforms)
• Have a life. Join a team. Decide where you want
to live.
© Data-Magnum 2016
![Page 30: DATA SCIENTIST - Meetupfiles.meetup.com/11097942/So You Want to be a Data...Some Data is Big – But Not Very Often 1,220 Respondents 72 countries Rexer Analytics Respondents reported](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea3f8f9e1b3837cb400074f/html5/thumbnails/30.jpg)
Some additional references
How to Become a Data Scientist http://www.datasciencecentral.com/profiles/blogs/how-to-become-a-data-scientist
So You Want to be a Data Scientist http://www.datasciencecentral.com/profiles/blogs/so-you-want-to-be-a-data-scientist
The New Rules for Becoming a Data Scientist http://www.datasciencecentral.com/profiles/blogs/the-new-rules-for-becoming-a-data-scientist
Become a member (for free) of DataScienceCentral.com Use the search feature and search for ‘how to become a data scientist” http://www.datasciencecentral.com/page/search
Join some Meet Ups – Westlake Village Data Science Meet Up 2nd Tuesday of each month at 5:30
Practice on some Kaggle competitions https://www.kaggle.com/
© Data-Magnum 2016
Other Blogs by Bill Vorhies http://www.datasciencecentral.com/profiles/blog/list?user=0h5qapp2gbuf8
![Page 31: DATA SCIENTIST - Meetupfiles.meetup.com/11097942/So You Want to be a Data...Some Data is Big – But Not Very Often 1,220 Respondents 72 countries Rexer Analytics Respondents reported](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea3f8f9e1b3837cb400074f/html5/thumbnails/31.jpg)
Contact Information
Bill Vorhies
President & Chief Data Scientist
Data-Magnum
www.Data-Magnum.com
818.257.2035
“I shall find a way or make one.” Admiral Robert Peary
© Data-Magnum 2016