doing data science with clojure
TRANSCRIPT
![Page 2: Doing data science with Clojure](https://reader031.vdocuments.site/reader031/viewer/2022030315/587c23001a28abb5068b69d1/html5/thumbnails/2.jpg)
![Page 3: Doing data science with Clojure](https://reader031.vdocuments.site/reader031/viewer/2022030315/587c23001a28abb5068b69d1/html5/thumbnails/3.jpg)
![Page 4: Doing data science with Clojure](https://reader031.vdocuments.site/reader031/viewer/2022030315/587c23001a28abb5068b69d1/html5/thumbnails/4.jpg)
Design constraints
![Page 5: Doing data science with Clojure](https://reader031.vdocuments.site/reader031/viewer/2022030315/587c23001a28abb5068b69d1/html5/thumbnails/5.jpg)
The analytics chasmIdeal. Almost real-time, can be done during brainstorming without disrupting flow
< 2min < 20min project
squeeze in somewhere in the day
fail
roadmapahoy!
![Page 6: Doing data science with Clojure](https://reader031.vdocuments.site/reader031/viewer/2022030315/587c23001a28abb5068b69d1/html5/thumbnails/6.jpg)
Think in distributions, not numbers
![Page 7: Doing data science with Clojure](https://reader031.vdocuments.site/reader031/viewer/2022030315/587c23001a28abb5068b69d1/html5/thumbnails/7.jpg)
No throwaways
![Page 8: Doing data science with Clojure](https://reader031.vdocuments.site/reader031/viewer/2022030315/587c23001a28abb5068b69d1/html5/thumbnails/8.jpg)
Sharing results
• Have one canonical version that is always current.
• Concentrate discussion in one place and make it searchable and persistent.
• Include methodology (=code).
![Page 9: Doing data science with Clojure](https://reader031.vdocuments.site/reader031/viewer/2022030315/587c23001a28abb5068b69d1/html5/thumbnails/9.jpg)
The environment
![Page 10: Doing data science with Clojure](https://reader031.vdocuments.site/reader031/viewer/2022030315/587c23001a28abb5068b69d1/html5/thumbnails/10.jpg)
REPL vs. notebook
![Page 11: Doing data science with Clojure](https://reader031.vdocuments.site/reader031/viewer/2022030315/587c23001a28abb5068b69d1/html5/thumbnails/11.jpg)
REPL vs. notebook+
![Page 12: Doing data science with Clojure](https://reader031.vdocuments.site/reader031/viewer/2022030315/587c23001a28abb5068b69d1/html5/thumbnails/12.jpg)
![Page 14: Doing data science with Clojure](https://reader031.vdocuments.site/reader031/viewer/2022030315/587c23001a28abb5068b69d1/html5/thumbnails/14.jpg)
#alderaan #sales #growth
![Page 15: Doing data science with Clojure](https://reader031.vdocuments.site/reader031/viewer/2022030315/587c23001a28abb5068b69d1/html5/thumbnails/15.jpg)
Code hidden, but can be expanded
Questions, comments,
& annotations
Shareable
Periodically re-run to keep it fresh
#alderaan #sales #growth
discoverability
![Page 16: Doing data science with Clojure](https://reader031.vdocuments.site/reader031/viewer/2022030315/587c23001a28abb5068b69d1/html5/thumbnails/16.jpg)
Wishlist/TODO• Better editor (shaunlebron.github.io/parinfer/ ?)
• Embedded REPL
• Better exception reporting
• Browsable data structures
(tried and miserably failed: org-babel)
![Page 17: Doing data science with Clojure](https://reader031.vdocuments.site/reader031/viewer/2022030315/587c23001a28abb5068b69d1/html5/thumbnails/17.jpg)
The tools
![Page 18: Doing data science with Clojure](https://reader031.vdocuments.site/reader031/viewer/2022030315/587c23001a28abb5068b69d1/html5/thumbnails/18.jpg)
![Page 19: Doing data science with Clojure](https://reader031.vdocuments.site/reader031/viewer/2022030315/587c23001a28abb5068b69d1/html5/thumbnails/19.jpg)
Data frame
• Data tends to be heterogeneous
• Clojure excels in structure manipulation/encoding
![Page 20: Doing data science with Clojure](https://reader031.vdocuments.site/reader031/viewer/2022030315/587c23001a28abb5068b69d1/html5/thumbnails/20.jpg)
github.com/sbelak/huri• No data structures, just functions over collections
• Composable (even DSLs — no macros!)
• Reasonably fast (transducers <3)
• Do-what-I-mean (auto-sort, liberal with inputs, …)
• Minimal buy-in
• Support reaching into nested structures everywhere
![Page 21: Doing data science with Clojure](https://reader031.vdocuments.site/reader031/viewer/2022030315/587c23001a28abb5068b69d1/html5/thumbnails/21.jpg)
composable data structure based DSLs
->> and partial friendly Support reaching into nested structures everywhere
vanilla vector of maps
interoperability
Provide curried versions where possible
![Page 22: Doing data science with Clojure](https://reader031.vdocuments.site/reader031/viewer/2022030315/587c23001a28abb5068b69d1/html5/thumbnails/22.jpg)
Composability is key to quick iterating
• Provide curried versions where possible
• ->> and partial friendly
• encode computation in structure (comp, some-fn, every-pred, data structure based DSLs, …)
• consistent API
![Page 23: Doing data science with Clojure](https://reader031.vdocuments.site/reader031/viewer/2022030315/587c23001a28abb5068b69d1/html5/thumbnails/23.jpg)
Catching errors early ⇒ more context ⇒ easier debugging ⇒ faster iterating
![Page 24: Doing data science with Clojure](https://reader031.vdocuments.site/reader031/viewer/2022030315/587c23001a28abb5068b69d1/html5/thumbnails/24.jpg)
<3 Bret Victor
![Page 25: Doing data science with Clojure](https://reader031.vdocuments.site/reader031/viewer/2022030315/587c23001a28abb5068b69d1/html5/thumbnails/25.jpg)
Q: What about machine learning?
A: farm it out to sklearn
![Page 26: Doing data science with Clojure](https://reader031.vdocuments.site/reader031/viewer/2022030315/587c23001a28abb5068b69d1/html5/thumbnails/26.jpg)
huri.plot
• DSL on top of ggplot2 (via gg4clj)
• Targets Gorilla REPL
• Follows the rest of Huri’s design philosophy
• bar chart, scatter plot, line chart, box & violin plot, heatmap, histogram
![Page 27: Doing data science with Clojure](https://reader031.vdocuments.site/reader031/viewer/2022030315/587c23001a28abb5068b69d1/html5/thumbnails/27.jpg)
![Page 28: Doing data science with Clojure](https://reader031.vdocuments.site/reader031/viewer/2022030315/587c23001a28abb5068b69d1/html5/thumbnails/28.jpg)
Wishlist/TODO• (even) better structure manipulation (via Spectre?)
• Interactive plots
• More transducer-compatible (online) math functions
• Optimizing ->> (rewrite code on the fly to do more with transducer composition)
![Page 29: Doing data science with Clojure](https://reader031.vdocuments.site/reader031/viewer/2022030315/587c23001a28abb5068b69d1/html5/thumbnails/29.jpg)
Projects worth keeping an eye on
github.com/thi-ng/geom
github.com/yieldbot/vizard
zeppelin-project.org
github.com/aphyr/tesser
github.com/nathanmarz/specter