command line data tools
TRANSCRIPT
Why?
• Some times you just want to sling data
• Text is still king; Lowest common denominator
• Machines are pretty honking big now
This Presentation
• List of some good collections of cmd-line tools
• Call out and describe a few in particular
• The PyDataTool of my desire
Sources• From author of “Data Science at the Command
Line”: http://jeroenjanssens.com/2013/09/19/seven-command-line-tools-for-data-science.html (larger list at http://datascienceatthecommandline.com/)
• HN discussion: https://news.ycombinator.com/item?id=6412190
• https://github.com/bitly/data_hacks
Tools• JSON:
• jq: https://stedolan.github.io/jq/
• RecordStream: https://github.com/benbernard/RecordStream
• csvkit: https://csvkit.readthedocs.io/en/1.0.2/
• dt: https://github.com/clarkgrubb/data-tools
• XMLStarlet: http://xmlstar.sourceforge.net/overview.php
Honorable Mentions
• Pythonic awk: https://github.com/alecthomas/pawk
• Google Crush Tools: https://github.com/google/crush-tools
• Xonsh: http://xon.sh/tutorial.html
The PyDataTool of My Desire• Support for csv, json, sql, xls, hdf5; image formats; network
formats (pcap etc.) • Capability of:
• csvkit, jq, dt, “cols” tool • unix tools: sed, sort, shuf, split, tr, tee, uniq, wc, head,
tail, bc • netpbm, imagemagick for images
• Work in streaming mode (netcat, wget, curl) • First-class support for dask, spark • Basic plotting via gnuplot, mpl, bokeh • Built-in SQLite to do in-memory support for queries
Continuum Is Hiring!• Creators of Anaconda, conda, bokeh, blaze, dask,
holoviews, numba, phosphorJS
• Maintainers/contributors to Jupyter, JupyterLab, Spyder, pandas, conda-forge, …
• 150+ ppl, 80 in Austin
• Venture backed
• Enterprise product, OSS community innovation, consulting, training
Continuum Is Hiring• Enterprise Product Team:
• Dev Manager (reports to CTO, runs product engineering)
• QA Lead Engineer - creates test plans, coordinates with product mgmt, dev, and testing team
• Senior Python Developer - enterprise product development; backend, web tech; full stack preferred
• DevOps and Operations - enterprise product, anaconda.org, Anaconda build system
• Email [email protected]