perspectives on the growth of data science
TRANSCRIPT
Three Perspectives on the Growth of Data Science
Todd [email protected]
Thinking a bit lately about how the data science community
has grown in the last four years…
`
As a Hiring Manager
As a Community Organizer
As a Teacher
As a Hiring Manager…
Started Trulia’s team four years ago
Colleagues asked: What’s a data scientist? And why would we need a team of them?
Trulia
Trulia
Trulia
Trulia
Trulia
Trulia
Earthquake Threat
Trulia
Crime Threat
Trulia
Married People
Team Goals
1. Transform data into new content 2. Improve content relevance3. Improve monetization
We received around 500 applicants for each data scientist opening • Many computer scientists light on stats and machine learning • Many physicists, mathematicians, biologists, and social scientists
light on coding skills
Applicants
Who would you hire?
Hiring: Skills
Machine LearningHacking Text Mining Image Mining Network Science Data Visualization
Desired Team Skills
‘Data Scientist’ Skills on LinkedIn
Hiring: Academic Communities
AAAI
Hiring: Why so many applicants?
• Offered interesting data and data science problems• Good brand and successful company • Presence in the data science community • Solid team
Team: How We Work
understand problem
process data experiment productionize integrate
In line with an applied research lab0-6 months
Team: How We Work
Everyone on the team should be able to code.
• Same individuals creating and engineering a predictive model is proven strategy
• Ability to code minimally means familiarity with computational complexity, data structures, and programming language concepts
• We want to use statistical-oriented languages such as R and Stata, not be limited to them
Example Project: Deep Learning for Image Recognition
https://wiki.qut.edu.au/display/cyphy/Student+project+topics+proposed+by+Frederic+Maire
Example Project: Real Estate Knowledge Base
We compared the language used on Trulia to the language of Wikipedia, to build a model of the language of real estate.
vs.
Example Project: Real Estate Knowledge Base
In San Francisco, the language model shows agents most commonly describe homes in terms of proximity to a Whole Foods.
Example Project: Real Estate Knowledge Base
Example Project: Location Graph
67 NeighborhoodsIn San Francisco
Example Project: Location Graph
WealthyLow Crime
Clean
Example Project: Location Graph
Near WorkRelatively Nice
Example Project: Location Graph
EdgyYoungCheap
Example Project: Location Graph
FogCheapest
Example Project: Location Graph
Most universallyappealing
Example Project: Voices Relevance
Two Topics in Trulia Voices (100+ topics total)
Example Project: Recommendations
Example Project: Search Ranking
Wanted to rank by likelihood to click on and then send a lead to a property.
Constant• Only one offer ever rejected, and only one person ever left team (until me)• Remained as a centralized team situated in engineering and serving entire org• Hired only computer scientists turned data scientists
Change• At some point, it became: “Of course we have a data science team!”• Engineers throughout the company started dabbling in machine learning
Over four years…
As a Teacher…
Student Interests
Interest in studying data science, and machine learning in particular, is high…
Over four years…
Many new options for learning data science
Data Science Education: Local Organizations
Weeks 1-2 Years
Institute for Computational & Mathematical Engineering
Data Science Education: Others
Hiring Manager Perspective
• Biased towards full-time graduate programs • Cynical about bootcamps • Skeptical of Coursera as evidence of skills
What would be your perspective?
Who would you hire at a tech company—a candidate with a 2 year MS from the Carnegie Mellon Machine Learning Department, or a 12-week data science fellow who was a biology PhD working as a lab assistant before that?
Teacher and Mentor Perspective
Now I appreciate the options for different types of students and what the diverse backgrounds bring to the table.
Berkeley MS Data Science
Masters in Data Science
Taught Online
Applied Machine Learning
Teaching in an Online Classroom
Biggest surprise: Not much different online(Other than a student occasionally attending in bed)
Classroom vs Self-Guided
No matter how old, students are students• Wait until last minute to do assignments, attend office hours• Pay closer attention to materials tied to grades
Constant• Benefits to external guidance and evaluation
Change• Many more resources for learning data science• Many transitional programs (PhD Biologist -> Data Scientist)• Many more and diverse people involved in data science
Over four years…
As a Community Organizer…
The Meetups
SF Data Mining • 8100+ members • 73 events • Held two job fairs
Bay Area Data Visualization • 5600+ members • 48 events
Who shows up?
Wide assortment of attendees• Newbies• Senior scientists• Recruiters• Founders• Designers• Journalists• …
Who shows up?
Constant• Basic formats remain: speakers, workshops, hackathons• Little collaboration across meetups
Change• Many data science groups: SF Data Mining, SF Data Science, Women in Data
Science SF, Analytics Club SF, SF Big Analytics, Kaggle SF, SF Bay Area Machine Learning, USF Series in Analytics, Data Science for Sustainability SF, …
• Embraced by tech companies and educational groups: Started out buying beer and begging for venues. Now almost every tech company with space hosts meetups.
• Interest even from White House
Over four years…
White House
An Evolved Perspective
Old: Skill Based Definition on Data Science
http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
Old: The Myth of Membership in the Data Science Community
Many conversations about who belongs and how to ‘get in’…
New: The Resonance of ‘Data Science’
The fact is that ‘data science’—the name and the idea—have brought together interesting folks with diverse skills but similar fascinations. And that’s beautiful.
A student asked in the last class…
Is the demand for data scientists just hype?
I don’t believe so. But it’s not that there are empty factories waiting to be filled up with data scientists. Almost any business can benefit from data science, but many don’t know it yet. The jobs need to be made, and made by the diverse community excited about data science.