pydata: past, present future (pydata sv 2014 keynote)

Post on 11-Aug-2014

203 Views

Category:

Data & Analytics

9 Downloads

Preview:

Click to see full reader

DESCRIPTION

From the closing keynoteLook back at the last two years of PyData, discussion about Python's role in the growing and changing data analytics landscape, and encouragement of ways to grow the community

TRANSCRIPT

PyData: Past, Present, Future

Peter Wang @pwang

!

Continuum Analytics !

PyData SV 2014

How did we get here?

“Python Data Workshop” March 3, 2012, Google HQ

“Guido, please help us convince core dev to

work with us to solve the packaging problem!”

“Guido, please help us convince core dev to

work with us to solve the packaging problem!”

“Meh. Feel free to solve it

yourselves.”

“Guido, please help us convince core dev to

work with us to solve the packaging problem!”

“Meh. Feel free to solve it

yourselves.”

“What Packaging Problem?”

“What Packaging Problem?”“I just use….”

“What Packaging Problem?”“I just use….”

• pip & virtualenv

“What Packaging Problem?”“I just use….”

• pip & virtualenv• homebrew

“What Packaging Problem?”“I just use….”

• pip & virtualenv• homebrew• rpm

“What Packaging Problem?”“I just use….”

• pip & virtualenv• homebrew• rpm• apt-get

“What Packaging Problem?”“I just use….”

• pip & virtualenv• homebrew• rpm• apt-get• emerge

“What Packaging Problem?”“I just use….”

• pip & virtualenv• homebrew• rpm• apt-get• emerge• tar -zxf

“What Packaging Problem?”“I just use….”

• pip & virtualenv• homebrew• rpm• apt-get• emerge• tar -zxf• double-click MSI

“What Packaging Problem?”“I just use….”

• pip & virtualenv• homebrew• rpm• apt-get• emerge• tar -zxf• double-click MSI• configure ; make ; make install

“What Packaging Problem?”“I just use….”

• pip & virtualenv• homebrew• rpm• apt-get• emerge• tar -zxf• double-click MSI• configure ; make ; make install• export PYTHONPATH=…

“What Packaging Problem?”“I just use….”

• pip & virtualenv• homebrew• rpm• apt-get• emerge• tar -zxf• double-click MSI• configure ; make ; make install• export PYTHONPATH=…

“What Packaging Problem?”“I just use….”

• pip & virtualenv• homebrew• rpm• apt-get• emerge• tar -zxf• double-click MSI• configure ; make ; make install• export PYTHONPATH=…

from python import \! technical_debt

This Packaging Problem

This Packaging Problem

This Packaging Problem

This Packaging Problem

This Packaging Problem

PyData: The First 2 Years• Oct 2012: First PyData Conf, NYC

!

• March 2013: PyData SV (PyCon) • July 2013: PyData Boston (Microsoft) • Oct 2013: PyData NYC (JP Morgan)

!

• Feb 2014: PyData UK (Level39) • May 2014: PyData SV (Facebook) • July 2014: PyData Berlin (EuroPython) • October 2014: NYC (Strata NYC)

!

• October 2014: NYC (YOUR COMPANY HERE)

PyData: The First 10 years

PyData: The First 10 years

• IPython Notebook: 2005-2011 • pandas: 2008-2009 • scikit-learn: 2007 • NumPy: 2006

PyData: The First 15 Years

• IPython Notebook: 2005-2011 • pandas: 2008-2009 • scikit-learn: 2007 • NumPy: 2006 • SciPy: 1999 • IPython: 2001 • matplotlib: 2002

PyData: The First 15 Years

• IPython Notebook: 2005-2011 • pandas: 2008-2009 • scikit-learn: 2007 • NumPy: 2006 • SciPy: 1999 • IPython: 2001 • matplotlib: 2002

http://numfocus.org/johnhunter.html

PyData: The First 20 Years

• Numarray: 2001 • Numeric: 1995

• Matrix Obj: 1994

• IPython Notebook: 2005-2011 • pandas: 2008-2009 • scikit-learn: 2007 • NumPy: 2006 • IPython: 2001 • matplotlib: 2002

Way Way Back

Way Way Back

• python: 1989-1991

Way Way Back

• python: 1989-1991• v1.0: 1994

Way Way Back

• python: 1989-1991• v1.0: 1994• “ABC, SETL…

Way Way Back

• python: 1989-1991• v1.0: 1994• “ABC, SETL… …That would appeal to UNIX/C hackers”

Way Way Back

• python: 1989-1991• v1.0: 1994• “ABC, SETL… …That would appeal to UNIX/C hackers”

$ conda create -n py10 python=1.0

Way Way Back

• python: 1989-1991• v1.0: 1994• “ABC, SETL… …That would appeal to UNIX/C hackers”

http://continuum.io/blog/python-1.0$ conda create -n py10 python=1.0

Way Way Back

It is interactive, structured, high-level, and intended to be used instead of BASIC, Pascal, or AWK. !

It is not meant to be a systems-programming language but is intended for teaching or prototyping.

“In June [1960] we were introduced to this tall college kid that always signed his name with lowercase letters. He was don knuth … don claimed that he could write the [Algol] compiler and a language manual all by himself during his three and a half month summer vacation.”

PyData NYC 2013 Keynote

PyData NYC 2013 Keynote

PyData NYC 2013 Keynote

http://tuulos.github.io/sf-python-meetup-sep-2013/#/

“One of the most exciting features in development is the Numba-based UDF

compiler. Building UDFs for Impala currently requires writing C++ or Java

code and registering them manually with the cluster. Writing C++/Java code is

more difficult, time-consuming, and error-prone for many data analysts.”

http://blog.cloudera.com/blog/2014/04/a-new-python-client-for-impala/

http://grokbase.com/t/python/python-list/01az9hmtf1/python-development-practices

http://grokbase.com/t/python/python-list/01az9hmtf1/python-development-practices

Glue 2.0Python’s legacy as a powerful glue language

• manipulate files • call fast libraries

!

Next-gen Glue: • Link data silos • Link disjoint memory & compute • Unify disparate runtime models • Transcend legacy models of

computers

Hard Problems in Data ScienceLots of data Messy data Noisy data

Hard Problems in Data ScienceLots of data Messy data Noisy data

Lots of computers Lots of tools

Lots of hacking

Hard Problems in Data ScienceLots of data Messy data Noisy data

Lots of computers Lots of tools

Lots of hacking

More questions More data

More people

The Hype & The Opportunity

“Internet Revolution” True Believer, 1996: Businesses that build network capability into their core will outcompete and destroy their competition.

The Hype & The Opportunity

“Internet Revolution” True Believer, 1996: Businesses that build network capability into their core will outcompete and destroy their competition.

“Data Revolution” True Believer, 2014: Businesses that build data comprehension into their core will destroy their competition over the next 5-15 years.

The Hype & The Opportunity

“Internet Revolution” True Believer, 1996: Businesses that build network capability into their core will outcompete and destroy their competition.

“Data Revolution” True Believer, 2014: Businesses that build data comprehension into their core will destroy their competition over the next 5-15 years.

(1993 == 2011?)

Soft Problems in Data Science

Soft Problems in Data Science

Computers

EE

Soft Problems in Data Science

Computers

EE

Applications

CS

Soft Problems in Data Science

Computers

EE

Applications

CS

DATAInsights

Math, Stats

Computers

Applications

Data

Insights

Computers

Applications

Data

Insights

Computers

DATA

Applications

Data Scientist

2013 Data Science Salary Survey!http://www.oreilly.com/data/free/stratasurvey.csp

“Python is the second best language…”

...Because it blurs the lines between “user” and “maker”. !

We stand on the shoulders of Users who became Makers. !

Some people say: “R has a very strong user community.” !

I want people to say that “Python has a strong maker community.”

Standing Tall

Standing Tall

• Science: Standing on the shoulders of giants

Standing Tall

• Science: Standing on the shoulders of giants

• Programming: Standing on each others toes

Standing Tall

• Science: Standing on the shoulders of giants

• Programming: Standing on each others toes

• But in Python, we stand on each others’

shoulders - community that bootstraps itself

“For there is but one veritable problem - the problem of human relations…”

— Antoine de Saint-Exupéry

https://archive.org/details/Scipy2010-PeterWang-PythonEvangelism101

Participate

• Submit issues and pull requests • Represent for the tools you love in social

media conversations • Start PyData meetups • Come to PyData conferences and present • Encourage diversity!!

How did we get here?

• Hard Work • By a community of people • Who cared • About code and people

Where do we go from here?

• More hard work • More community • More caring • More code • More people

Python is not just glue. Python and PyData are communities!

Where do we go from here?

• More hard work • More community • More caring • More code • More people

Python is not just glue. Python and PyData are communities!

top related