building data apps with python

Building Data Products with PythonDistrict Data Labs

Links to various resources

Introduction to Pythonhttp://bit.ly/1gJ73Tt

Github Repositoryhttp://bit.ly/1eLBzki

About the Instructor

Benjamin Bengfort

Data Science:

● MS Computer Science from North Dakota State● PhD Candidate in CS at the University of Maryland● Data Scientist at Cobrain Company in Bethesda, MD● Board member of Data Community DC● Lecturer at Georgetown University

Python Programmer:

● Python developer for 7 years● Open source contributor● My work on Github: https://github.com/bbengfort

About the Instructor

Benjamin Bengfort

I am available to collaborate and answer questions for all of my students.Twitter: twitter.com/bbengfortLinkedIn: linkedin.com/in/bbengfort Github: github.com/bbengfortEmail: benjamin@bengfort.com

About the Teaching Assistant

Keshav Magge

● MS Computer Science from University of Houston● Lead Data/Software Engineer at Cobrain Company in

Bethesda, MD

Python Programmer:

● Python developer for 7 years● Plone/Zope for 2 years, Django for 5 years● My work on Github: https://github.com/keshavmagge

About the Teaching Assistant

Keshav Magge

Reach out to me to talk about all things python/data or just about lifeTwitter: twitter.com/keshavmaggeLinkedIn: linkedin.com/pub/keshav-magge/12/a2a/324/Github: github.com/keshavmaggeEmail: keshav@keshavmagge.com

Building Data Products

Hilary Mason

A data product is a product that is based on the combination of data and algorithms.”

Mike Loukides

A data application acquires its value from the data itself, and creates more data as a result. It’s not just an application with data; it’s a data product. Data science enables the creation of data products.”

The Data Science Pipeline

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Data Ingestion● There is a world of data out

there- how to get it? Web crawlers, APIs, Sensors? Python and other web scripting languages are custom made for this task.

● The real question is how can we deal with such a giant volume and velocity of data?

● Big Data and Data Science often require ingestion specialists!

● Warehousing the data means storing the data in as raw a form as possible.

● Extract, transform, and load operations move data to operational storage locations.

● Filtering, aggregation, normalization and denormalization all ensure data is in a form it can be computed on.

● Annotated training sets must be created for ML tasks.

Data Wrangling

● Hypothesis driven computation includes design and development of predictive models.

● Many models have to be trained or constrained into a computational form like a Graph database, and this is time consuming.

● Other data products like indices, relations, classifications, and clusters may be computed.

Computation and Analyses

Modeling and Application

This is the part we’re most familiar with. Supervised classification, Unsupervised clustering - Bayes, Logistic Regression,

Decision Trees, and other models.

This is also where the money is.

● Often overlooked, this part is crucial, even if we have data products.

● Humans recognize patterns better than machines. Human feedback is crucial in Active Learning and remodeling (error detection).

● Mashups and collaborations generate more data- and therefore more value!

Reporting and Visualization

Don’t forget feedback!(Active Learning for Data

Products)

What we’re going to build today

SCIENCE BOOKCLUB!!

● A book club that chooses what to read via a recommender system.

● Uses GoodReads data to ingest and return feedback on books.

● Statistical model is a non-negative matrix factorization

● Reporting using Jinja (almost a web app)

Workflow1. Setting up a Python skeleton2. Creating and Running Tests3. Wading in with a configuration4. Ingestion with urllib and requests5. Creating a command line admin with argparse6. Wrangling with BeautifulSoup and SQLAlchemy7. Modeling with numpy8. Reporting with Jinja2

Octavo Architecture (really clear DSP)

requests.py

IngestionModule

Raw Data Storage Computational

Data Storage

WranglingModule

BeautifulSoup

SQLAlchemy

RecommenderModule

ReportingModule

Jinja2Matplotlib

requests.py

Octavo Architecture (really clear DSP)

requests.pyIngestionModule

Raw Data Storage

Computational Data Storage

WranglingModule

BeautifulSoup

SQLAlchemy

RecommenderModule

ReportingModule

Jinja2

Matplotlib

How to tackle this course ...

Lean into it- absorb as much as possible, don’t worry about falling

behind - it will be in your head!

Then afterwards - lets all digest it together (keep in touch)

building data apps with python

data products

active learning

python developer

cobrain company

clear dsp

data product

data

application

Technology

building skills in python

building scalable web apps with python and google cloud...

building an advanced python installation for linux and ·...

devops & apps - building and operating successful mobile...

developing python apps on windows azure

building email apps

building lithium apps

beyond java: enterprise apps, python programming and the...

mongodb + pylons ftw: scalable web apps with python & nosql

apps dev in cross-platform python · develop desktop cross...

building cloudy apps

building python messaging apps with oracle database ·...

air building apps

building cocoa apps

building opensocial apps

building robust apps

docker and python - mjbright.github.io€¦ · what...

building silverlight apps

google apps and python para python brasil [7]

threat modeling python web apps written with flask … ·...