clare corthell: learning data science online

33
THE OPEN SOURCE DATA SCIENCE MASTERS (THE DIY DATA SCIENTIST) Clare Corthell Data Scientist at Mattermark @clarecorthell www.datasciencemasters.org

Upload: sfdatascience

Post on 08-Jul-2015

619 views

Category:

Data & Analytics


1 download

DESCRIPTION

Clare Corthell, Data Scientist and Designer at Mattermark, and author of the Open Source Data Science Masters, shares her experience teaching herself data science with online resources. http://datasciencemasters.org/

TRANSCRIPT

Page 1: Clare Corthell: Learning Data Science Online

THE OPEN SOURCE DATA SCIENCE MASTERS(THE DIY DATA SCIENTIST)

Clare CorthellData Scientist at Mattermark

@clarecorthellwww.datasciencemasters.org

Page 2: Clare Corthell: Learning Data Science Online

Deal Intelligence Platforminterface to live data about private companies

Page 3: Clare Corthell: Learning Data Science Online

TODAY

• What a Data Scientist does

• Paths to becoming a Data Scientist

• Where to start

• Navigating a path

• Why you should run toward hard things

Page 4: Clare Corthell: Learning Data Science Online

WHAT DOES A DATA SCIENTIST DO?

Data Scientists turn data into knowledgeby answering the right questions

Which is also predicated on asking the right questions

Page 5: Clare Corthell: Learning Data Science Online

HOW DO I BECOME A DATA SCIENTIST?the answer you don’t want…

There’s no paved road, no one way

Page 6: Clare Corthell: Learning Data Science Online

PATHS

1. Get a Classic Masters from an accredited University <Warning> I have yet to see one that’s better than the OSDSM

2. Attend a Bootcamp or Academy• Zipfian Academy (SF)• Insight Data Science Fellows (Palo Alto, NYC)• Data Science Retreat (Berlin)

3. Self-Taught• The Open Source Data Science Masters

Page 7: Clare Corthell: Learning Data Science Online

THEORY & APPLICATIONor, why universities haven’t figured this out yet

Universities don’t focus on “Data Science” because it’s tightly bound to application.

Universities develop theory. Businesses develop applications.

The two exist symbiotically - they do need each other.

The goals are simply very different.

Page 8: Clare Corthell: Learning Data Science Online

• Math• Computing

• Algorithms• Distributed Computing• Databases• Data Mining• Machine Learning• Graph Theory• Natural Language Processing

• Analysis• Visualization

• Python (language & libraries)

The Open Source Data Science

Mastersbit.ly/dsmasters

The internet helps me curate -

hence Open Source

Page 9: Clare Corthell: Learning Data Science Online

(that’s alot)

Page 10: Clare Corthell: Learning Data Science Online

CLARE’S PATHPreviously Product Designer, front end dev

Transcript bit.ly/corthelldata

6 months of study

Data Scientist & Machine Learning Developer at Mattermark

My team builds domain-specific systems for classification, recommendation, prediction,crawling, fact extraction, and more

languagesPython

SQL

machine learningScikit Learn

data manipulationPandas Numpy

matplotlib NLTK

designhtml/css/js

Page 11: Clare Corthell: Learning Data Science Online

1. Get a goal2. Get a plan3. Get mentorship4. Get a project

Page 12: Clare Corthell: Learning Data Science Online

1. Get a goal

What kind of “Data Scientist” do you want to be?

Explore the different roles

Pick something that sparks your interest

Find out what those people do on a daily basis

Page 13: Clare Corthell: Learning Data Science Online

Rachel Schutt, Doing Data Science

Page 14: Clare Corthell: Learning Data Science Online

Analyzing the Analyzers, O’Reilly

Page 15: Clare Corthell: Learning Data Science Online

2. Get a plan

Figure out what skills you need to be minimally effective

Design a Curriculum (fork the OSDSM!)

Plan a schedule of study

Page 16: Clare Corthell: Learning Data Science Online

Dave HoltzAirbnb

Page 17: Clare Corthell: Learning Data Science Online

3. Get mentorship

Talk to people on twitter

Ask to buy them coffee (with a specific need or question in hand)

Get informational interviews(a lost art; they can turn into real interviews, but are low-pressure)

Page 18: Clare Corthell: Learning Data Science Online

4. Get a question

Project Use real-world data to answer a question Who do iguana owners connect to on twitter?

Work on a real business problem Help a non-profit* with data they don’t understand

What channels of marketing are working for us?

*Orgs that coordinate working with NGOs: Bayes Impact, DataKind

(make it a small question - don’t set yourself up for failure)

Page 19: Clare Corthell: Learning Data Science Online

Let’s talk about where this perfect plan gets really incredibly difficult

(Let’s start with a tautology)

Page 20: Clare Corthell: Learning Data Science Online

HARD THINGS ARE HARD

Hard things are hard because there are no easy answers or recipes.

They are hard because your emotions are at odds with your logic.

They are hard because you don’t know the answer and you cannot

ask for help without showing weakness.

Ben HorowitzThe Hard Thing about Hard Things

Page 21: Clare Corthell: Learning Data Science Online

When something scares you run like hell right into it.

The hardest things are things people avoid the most.That’s your marginal advantage.

Maybe that’s why there aren’t enough Data Scientists.

You will figure it out. It’s about ego management and problem solving.

Page 22: Clare Corthell: Learning Data Science Online

RUN TOWARD HARD THINGSChoosing what you want to do

and what to work on

Not knowing everything

Being overwhelmed

Time Management

Math

Coding

Page 23: Clare Corthell: Learning Data Science Online

Not knowing everything Being overwhelmed

There are a million things you could learn and work on. That’s overwhelming. But you can’t afford to get overwhelmed.

You won’t know everything. It’s impractical and impossible to know everything.

Learn to say “I don’t know.”

FYI Programmers don’t read books. They reference them as needed.

Page 24: Clare Corthell: Learning Data Science Online

Time Management

How do I do all of this in a reasonable amount of time?- You don’t.- Be rigorous.

Ask yourself: Will this directly help me achieve my goal?

Refine your goals, focus your work.

Don’t switch tasks. Focus on one thing at a time.

Page 25: Clare Corthell: Learning Data Science Online

Why is time management so hard?

We’re used to other people telling us what to do;

TeachersManagersParents

Page 26: Clare Corthell: Learning Data Science Online

CODING IS HARD.

Page 27: Clare Corthell: Learning Data Science Online

a hint for those new to programming

google

stackoverflow + problem

Page 28: Clare Corthell: Learning Data Science Online

why code?

Page 29: Clare Corthell: Learning Data Science Online

HUMANS SHOULD BE HUMANSAND

COMPUTERS SHOULD BE COMPUTERS.You must code.

Because automation.And no, there is no shortcut.

Page 30: Clare Corthell: Learning Data Science Online

YOUR ADVANTAGE

Self-study in Data Science is hard.

But what you spend in energy and commitment to self-teaching is returned to you in:

• Choice of professional focus • Respect from potential employers for managing yourself. You

want to work with people who will respect and recognize that.• Skills that are tough to get from a university or employer• A path with no gatekeepers - no one will stop you.

Page 31: Clare Corthell: Learning Data Science Online

Take the first step.

Page 32: Clare Corthell: Learning Data Science Online

1. Learn to code in Python.2. Take Intro to Data Science (UW)3. Go get a coffee4. Ask one question

Page 33: Clare Corthell: Learning Data Science Online

i ♥ questionsdatasciencemasters.org

[email protected]

@clarecorthell