pydata texas 2015 keynote

66
Text State of the Py Peter Wang Continuum Analytics @pwang PyData Texas 2015

Upload: peter-wang

Post on 15-Jul-2015

900 views

Category:

Software


0 download

TRANSCRIPT

Text

State of the Py

Peter Wang Continuum Analytics @pwang

PyData Texas 2015

Looking Back

PyData Workshop, March 2012

PyData: The First 3 Years

• Oct 2012: First PyData Conf, NYC • March 2013: Silicon Valley (PyCon) • July 2013: Boston (Microsoft) • Oct 2013: NYC (JP Morgan) • Feb 2014: London (Level39) • May 2014: Silicon Valley (Facebook) • July 2014: Berlin (EuroPython) • October 2014: NYC (Strata NYC) • Feb 2015: San Jose (Strata) • April 2015: Paris • April 2015: Dallas

Coming up!

• May 2015: Berlin • June 2015: London • July 2015: Seattle

Data Science Challenges

• Data volume is growing exponentially within companies. Most don't know how to harvest its value or how to even compute on it.

• Growing mess of tools, databases, and products. New products increase integration headaches, instead of simplifying.

• New hardware & architectures are tempting, but are avoided or sit idle because of software challenges.

• Promises of the "Big Red Solve" button continue to disappoint.(If someone can give you this button, they are your competitor.)

Bush Pilots

Why Python? Why now?

PyData: The First 20 Years

• IPython Notebook: 2005-2011 • pandas: 2008-2009 • scikit-learn: 2007 • NumPy: 2006 • IPython: 2001 • matplotlib: 2002 • Numarray: 2001 • Numeric: 1995 • Matrix Obj: 1994

Broken PCs

Brief History of Computing1946 - ENIAC (electronic, digital, general purpose) 1947 - Keyboard 1948 - Transistor 1954 - FORTRAN 1958 - Semiconductor IC 1958 - SAGE 1962 - Spacewar 1964 - Computer mouse 1969 - ARPAnet

1971 - Intel 4004 1972 - C programming language 1973 - Ethernet, UNIX 1974 - CP/M, which was inspiration for MSDOS 1976, 1977 - Apple I, ][ , TRS-80 1978 - Visicalc

1970s: Dawn of Modern Computing

Long Shadow of the 1970s

1978 - Intel 8088, 8086

1981 - IBM PC, MS-DOS

1983 - C++

1985 - Intel 80386, Windows

1991 - Linux, WWW, Python

1993 - Intel Pentium

1995 - Java, Ruby

Innovation or Churn?

• Cloud

• Mobile Web

• Big Data

• Machine Learning

• JVM language renaissance

• Javascript-the-language

• Javascript-the-compilation-target

• Browser as VM

• Browser as OS

• OS as VM

• VM as Zone (Joyent)

• VM as process (ZeroVM)

• VM as dev sandbox (Docker, ...)

• Datacenter as OS (AWS, Azure, OpenStack, ...)

• Datacenter as runtime (Salt, Ansible, ...)

• Datacenter as calculator (Hadoop, Spark, Disco)

• Database as a service (Dynamo, Firebase, ...)

• Message queues as a service

Blurred Lines

• Compile time, run time, JIT, asm.js

• Imperative code vs. configuration

• App, OS, lightweight virtualization, hardware, virtual hardware

• Dev, dev ops, ops

• Clouds: IaaS, PaaS, SaaS, DBaaS, AaaS...

Blurred Lines

• Compile time, run time, JIT, asm.js

• Imperative code vs. configuration

• App, OS, lightweight virtualization, hardware, virtual hardware

• Dev, dev ops, ops

• Clouds: IaaS, PaaS, SaaS, DBaaS, AaaS...

Microcosm

• The schisms in Python land reflect the evolution of the technology space: Hardware -> Software -> Services

• Docker, pip, and "devops" tooling mostly is to support folks that are not building software, but deploying services.

• Plight of software in recent times is due to changing of underlying bedrock.

• "How we think about concurrency is slave to abstractions from the 1970."

Back to the Future!

"Can any language top a 1950s behemoth?"

http://arstechnica.com/science/2014/05/scientific-computings-future-can-any-coding-language-top-a-1950s-behemoth/

Back to the Future!

"Can any language top a 1950s behemoth?"

http://arstechnica.com/science/2014/05/scientific-computings-future-can-any-coding-language-top-a-1950s-behemoth/

Let's move forward to the 1960s!

PyData NYC 2013

Glue 2.0Python’s legacy as a powerful glue language

• manipulate files • call fast libraries

Next-gen Glue: • Link data silos • Link disjoint memory & compute • Unify disparate runtime models • Transcend legacy models of

computers

Blaze

Tasty Py

"The only problem with Microsoft is that they just have no taste."

Python's Spectrum of UsersAnalyst

• Uses graphical tools • Can call functions,

cut & paste code• Can change some

variables

Gets paid for: Insight

Excel, VB, Tableau,

Analyst / Data Developer

• Builds simple apps & workflows• Used to be "just an analyst" • Likes coding to solve problems• Doesn't want to be a "full-time

programmer"

Gets paid (like a rock star) for: Code that produces insight

SAS, R, Matlab,

Programmer

• Creates frameworks & compilers

• Uses IDEs • Degree in CompSci• Knows multiple

languages

Gets paid for: Code

C, C++, Java, JS,

Python Python Python

Data Literacy

Just as typing and basic computer skills are now a necessity, we

believe data exploration and analysis are going to be a new kind

of literacy that will be required to do great work in any field.

Language is a human instinct and is a natural path to insight. We

see this in our interaction with Python users, whose passion

chiefly stems from this expressiveness and agility.

An analytical language is “thoughtware”, not “software”.

Continuum's Mission

We're on the cusp of a new era of ubiquitous data.

Traditional analytics software ecosystems are being disrupted, and much of it will be commoditized. New business models will emerge in this data-rich environment, requiring different assemblies of software+hardware+people.

In the chaos and churn, people will gravitate to a stable, trusted platform or brand that provides agility and compatibility, without lock-in.

We are building this open foundation. We're starting with Python.

empower scientists to explore their dataanalyze their problems

share their results

Two Sides of Open Source

Two Sides of Open Source

• Geek:

Two Sides of Open Source

• Geek:

• Thinks it's about licenses

Two Sides of Open Source

• Geek:

• Thinks it's about licenses

• Really means the community, ethos, culture

Two Sides of Open Source

• Geek:

• Thinks it's about licenses

• Really means the community, ethos, culture

Two Sides of Open Source

• Geek:

• Thinks it's about licenses

• Really means the community, ethos, culture

• Suit:

Two Sides of Open Source

• Geek:

• Thinks it's about licenses

• Really means the community, ethos, culture

• Suit:

• Thinks it's about cost, value, ROI

Two Sides of Open Source

• Geek:

• Thinks it's about licenses

• Really means the community, ethos, culture

• Suit:

• Thinks it's about cost, value, ROI

• Really should be thinking about innovation

Peter: [email protected]

Twitter: http://twitter.com/pwang @pwang

LinkedIn: http://www.linkedin.com/in/pzwang/

Continuum is Hiring! http://continuum.io/jobs

Continuum is Selling! http://continuum.io

END

PyCon Takeaways

PyCon Takeaways

• As much love & enthusiasm as ever

PyCon Takeaways

• As much love & enthusiasm as ever

• Many subcultures

PyCon Takeaways

• As much love & enthusiasm as ever

• Many subcultures

• Web dev is newest, most visible, most Klout, most RESTful yet most restless about keeping up with Go, Node, etc.

PyCon Takeaways

• As much love & enthusiasm as ever

• Many subcultures

• Web dev is newest, most visible, most Klout, most RESTful yet most restless about keeping up with Go, Node, etc.

• Sysadmin & ops

PyCon Takeaways

• As much love & enthusiasm as ever

• Many subcultures

• Web dev is newest, most visible, most Klout, most RESTful yet most restless about keeping up with Go, Node, etc.

• Sysadmin & ops

• Education

PyCon Takeaways

• As much love & enthusiasm as ever

• Many subcultures

• Web dev is newest, most visible, most Klout, most RESTful yet most restless about keeping up with Go, Node, etc.

• Sysadmin & ops

• Education

• Data & science & analysts

PyCon Takeaways

• As much love & enthusiasm as ever

• Many subcultures

• Web dev is newest, most visible, most Klout, most RESTful yet most restless about keeping up with Go, Node, etc.

• Sysadmin & ops

• Education

• Data & science & analysts

• Maker / Hacker / Raspberry Pi

Common Concerns & Interests

Common Concerns & Interests• Python 3

Common Concerns & Interests• Python 3

• Not compelling enough?

Common Concerns & Interests• Python 3

• Not compelling enough?• Even a Matrix Multiply operator wasn't enough!

Common Concerns & Interests• Python 3

• Not compelling enough?• Even a Matrix Multiply operator wasn't enough!• Unicode is still sometimes broken?

Common Concerns & Interests• Python 3

• Not compelling enough?• Even a Matrix Multiply operator wasn't enough!• Unicode is still sometimes broken?

• Docker

Common Concerns & Interests• Python 3

• Not compelling enough?• Even a Matrix Multiply operator wasn't enough!• Unicode is still sometimes broken?

• Docker• Editing /etc is hard

Common Concerns & Interests• Python 3

• Not compelling enough?• Even a Matrix Multiply operator wasn't enough!• Unicode is still sometimes broken?

• Docker• Editing /etc is hard• No really, editing *someone else's* /etc is hard

Common Concerns & Interests• Python 3

• Not compelling enough?• Even a Matrix Multiply operator wasn't enough!• Unicode is still sometimes broken?

• Docker• Editing /etc is hard• No really, editing *someone else's* /etc is hard

• Keeping processes from stomping all over each other

Common Concerns & Interests• Python 3

• Not compelling enough?• Even a Matrix Multiply operator wasn't enough!• Unicode is still sometimes broken?

• Docker• Editing /etc is hard• No really, editing *someone else's* /etc is hard

• Keeping processes from stomping all over each other• That's what UNIX is for

Common Concerns & Interests• Python 3

• Not compelling enough?• Even a Matrix Multiply operator wasn't enough!• Unicode is still sometimes broken?

• Docker• Editing /etc is hard• No really, editing *someone else's* /etc is hard

• Keeping processes from stomping all over each other• That's what UNIX is for

• Filesystems are b0rken

Common Concerns & Interests• Python 3

• Not compelling enough?• Even a Matrix Multiply operator wasn't enough!• Unicode is still sometimes broken?

• Docker• Editing /etc is hard• No really, editing *someone else's* /etc is hard

• Keeping processes from stomping all over each other• That's what UNIX is for

• Filesystems are b0rken• Since sockets are files, networking is also broken

Common Concerns & Interests• Python 3

• Not compelling enough?• Even a Matrix Multiply operator wasn't enough!• Unicode is still sometimes broken?

• Docker• Editing /etc is hard• No really, editing *someone else's* /etc is hard

• Keeping processes from stomping all over each other• That's what UNIX is for

• Filesystems are b0rken• Since sockets are files, networking is also broken• Since HTTP violates end-to-end principle, life is hell

Big Data: The Fundamental Physics

Moving/copying data (and managing copies) is more expensive than computation.

True for various definitions of “expense”:

• Raw electrical & cooling power• Time• Human factors

Text

http://blog.mccrory.me/2010/12/07/data-gravity-in-the-clouds/