![Page 1: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/1.jpg)
Using Split-Apply-Combine for Data Analysis in Clojure
Bay Area Clojure GroupJune 6, 2013
Tom Faulhaber
twitter: @tomfaulhabergithub: tomfaulhaber
Saturday, June 8, 13
![Page 2: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/2.jpg)
Saturday, June 8, 13
![Page 3: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/3.jpg)
Saturday, June 8, 13
![Page 4: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/4.jpg)
Saturday, June 8, 13
![Page 5: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/5.jpg)
Saturday, June 8, 13
![Page 6: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/6.jpg)
Saturday, June 8, 13
![Page 7: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/7.jpg)
Data Structures for Data Analysis
Saturday, June 8, 13
![Page 8: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/8.jpg)
The Vector
[265.0 259.98 266.89 262.22 ...]
Saturday, June 8, 13
![Page 9: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/9.jpg)
The Vector
(mean[265.0 259.98 266.89 262.22 ...]) ➜ 263.697
Saturday, June 8, 13
![Page 10: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/10.jpg)
The Vector
(apply min[265.0 259.98 266.89 262.22 ...]) ➜ 257.21
Saturday, June 8, 13
![Page 11: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/11.jpg)
The Vector
(apply max[265.0 259.98 266.89 262.22 ...]) ➜ 269.75
Saturday, June 8, 13
![Page 12: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/12.jpg)
The Vector
(sd[265.0 259.98 266.89 262.22 ...]) ➜ 3.815
Saturday, June 8, 13
![Page 13: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/13.jpg)
The Vector
(quantile[265.0 259.98 266.89 262.22 ...]) ➜ [257.21 260.105 264.27 266.175 269.75]
Saturday, June 8, 13
![Page 14: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/14.jpg)
The Vector
[265.0 259.98 266.89 262.22 ...]
Saturday, June 8, 13
![Page 15: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/15.jpg)
The Vector
(histogram[265.0 259.98 266.89 262.22 ...]) ➜
Saturday, June 8, 13
![Page 16: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/16.jpg)
The Vector
[265.0 259.98 266.89 262.22 ...]
Saturday, June 8, 13
![Page 17: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/17.jpg)
The Vector
(line-chart[265.0 259.98 266.89 262.22 ...]) ➜
Saturday, June 8, 13
![Page 18: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/18.jpg)
The Matrix
Saturday, June 8, 13
![Page 19: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/19.jpg)
The Matrix
1 Dimension
0
1
2
3
4
5
6
0 1 2 3 4
Saturday, June 8, 13
![Page 20: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/20.jpg)
The Matrix
1 Dimension
0
1
2
3
4
5
6
0 1 2 3 40 1 2 3 4 5 6 7
0
1
2
3
4
5
6
2 Dimensions
Saturday, June 8, 13
![Page 21: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/21.jpg)
The Matrix
1 Dimension
0
1
2
3
4
5
6
0 1 2 3 40 1 2 3 4 5 6 7
0
1
2
3
4
5
6
2 Dimensions
0 1 2 3 4 5 6 7
0
1
2
3
4
5
6
01
23
3 Dimensions
Saturday, June 8, 13
![Page 22: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/22.jpg)
Key-Value Pairs
{"IBM" [205.18 203.79 202.79 201.02 ...], "MSFT" [27.93 27.44 27.5 27.34 ...], "AMZN" [265.0 259.98 266.89 262.22 ...]}
Using Key-Value pairs can organize multiple data units (such as trials, customers, etc.) or collect parameter data
Saturday, June 8, 13
![Page 23: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/23.jpg)
The Dataset
2013-02-05
2013-02-01
Date
2013-02-04
2013-02-04 2013-02-04
2013-02-01
2013-02-01
261.46 266.89 268.03AMZN 4012900262.00 266.89MSFT 27.44 27.87
50540000 27.03 28.02 27.42
203.57205.02 201.99IBM 204.19 3188800 203.79AMZN 259.98 264.68 259.98 3723600 259.07262.78
27.93 27.51MSFT 28.05
55565900 27.5527.67204.65 203.37IBM 203.84 3370700 205.35 205.18
265.00268.93 6115000AMZN 268.93 262.80 265.00Adj CloseVolumeCloseLowHighOpenSymbol
...
Saturday, June 8, 13
![Page 24: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/24.jpg)
The Dataset
2013-02-05
2013-02-01
Date
2013-02-04
2013-02-04 2013-02-04
2013-02-01
2013-02-01
261.46 266.89 268.03AMZN 4012900262.00 266.89MSFT 27.44 27.87
50540000 27.03 28.02 27.42
203.57205.02 201.99IBM 204.19 3188800 203.79AMZN 259.98 264.68 259.98 3723600 259.07262.78
27.93 27.51MSFT 28.05
55565900 27.5527.67204.65 203.37IBM 203.84 3370700 205.35 205.18
265.00268.93 6115000AMZN 268.93 262.80 265.00Adj CloseVolumeCloseLowHighOpenSymbol
...
Items in column have same type
Saturday, June 8, 13
![Page 25: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/25.jpg)
The Dataset
2013-02-05
2013-02-01
Date
2013-02-04
2013-02-04 2013-02-04
2013-02-01
2013-02-01
261.46 266.89 268.03AMZN 4012900262.00 266.89MSFT 27.44 27.87
50540000 27.03 28.02 27.42
203.57205.02 201.99IBM 204.19 3188800 203.79AMZN 259.98 264.68 259.98 3723600 259.07262.78
27.93 27.51MSFT 28.05
55565900 27.5527.67204.65 203.37IBM 203.84 3370700 205.35 205.18
265.00268.93 6115000AMZN 268.93 262.80 265.00Adj CloseVolumeCloseLowHighOpenSymbol
...Across a row, there may be different types
Saturday, June 8, 13
![Page 26: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/26.jpg)
The Dataset
2013-02-05
2013-02-01
Date
2013-02-04
2013-02-04 2013-02-04
2013-02-01
2013-02-01
261.46 266.89 268.03AMZN 4012900262.00 266.89MSFT 27.44 27.87
50540000 27.03 28.02 27.42
203.57205.02 201.99IBM 204.19 3188800 203.79AMZN 259.98 264.68 259.98 3723600 259.07262.78
27.93 27.51MSFT 28.05
55565900 27.5527.67204.65 203.37IBM 203.84 3370700 205.35 205.18
265.00268.93 6115000AMZN 268.93 262.80 265.00Adj CloseVolumeCloseLowHighOpenSymbol
...
Saturday, June 8, 13
![Page 27: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/27.jpg)
The Dataset
2013-02-05
2013-02-01
Date
2013-02-04
2013-02-04 2013-02-04
2013-02-01
2013-02-01
261.46 266.89 268.03AMZN 4012900262.00 266.89MSFT 27.44 27.87
50540000 27.03 28.02 27.42
203.57205.02 201.99IBM 204.19 3188800 203.79AMZN 259.98 264.68 259.98 3723600 259.07262.78
27.93 27.51MSFT 28.05
55565900 27.5527.67204.65 203.37IBM 203.84 3370700 205.35 205.18
265.00268.93 6115000AMZN 268.93 262.80 265.00Adj CloseVolumeCloseLowHighOpenSymbol
...Identifiers
Saturday, June 8, 13
![Page 28: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/28.jpg)
The Dataset
2013-02-05
2013-02-01
Date
2013-02-04
2013-02-04 2013-02-04
2013-02-01
2013-02-01
261.46 266.89 268.03AMZN 4012900262.00 266.89MSFT 27.44 27.87
50540000 27.03 28.02 27.42
203.57205.02 201.99IBM 204.19 3188800 203.79AMZN 259.98 264.68 259.98 3723600 259.07262.78
27.93 27.51MSFT 28.05
55565900 27.5527.67204.65 203.37IBM 203.84 3370700 205.35 205.18
265.00268.93 6115000AMZN 268.93 262.80 265.00Adj CloseVolumeCloseLowHighOpenSymbol
...Identifiers Measurements
Saturday, June 8, 13
![Page 29: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/29.jpg)
Split-Apply-Combine
Saturday, June 8, 13
![Page 30: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/30.jpg)
Split-Apply-Combine
Pattern described by Hadley Wickham and implemented in the plyr library for R.
Home page: http://plyr.had.co.nz
JSS Journal of Statistical Software
April 2011, Volume 40, Issue 1. http://www.jstatsoft.org/
The Split-Apply-Combine Strategy for DataAnalysis
Hadley WickhamRice University
Abstract
Many data analysis problems involve the application of a split-apply-combine strategy,where you break up a big problem into manageable pieces, operate on each piece inde-pendently and then put all the pieces back together. This insight gives rise to a new R
package that allows you to smoothly apply this strategy, without having to worry aboutthe type of structure in which your data is stored.
The paper includes two case studies showing how these insights make it easier to workwith batting records for veteran baseball players and a large 3d array of spatio-temporalozone measurements.
Keywords: R, apply, split, data analysis.
1. Introduction
What do we do when we analyze data? What are common actions and what are commonmistakes? Given the importance of this activity in statistics, there is remarkably little researchon how data analysis happens. This paper attempts to remedy a very small part of that lack bydescribing one common data analysis pattern: Split-apply-combine. You see the split-apply-combine strategy whenever you break up a big problem into manageable pieces, operate oneach piece independently and then put all the pieces back together. This crops up in all stagesof an analysis:
During data preparation, when performing group-wise ranking, standardization, or nor-malization, or in general when creating new variables that are most easily calculated ona per-group basis.
When creating summaries for display or analysis, for example, when calculating marginalmeans, or conditioning a table of counts by dividing out group sums.
Saturday, June 8, 13
![Page 31: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/31.jpg)
Split
Apply
Combine
Saturday, June 8, 13
![Page 32: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/32.jpg)
Split
Apply
Combine
the object based on dimension(s) or identifiers (yielding segments of the same type)
Saturday, June 8, 13
![Page 33: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/33.jpg)
Split
Apply
Combine
the object based on dimension(s) or identifiers (yielding segments of the same type)
a function to each segment producing a new segment of the target type. The function can aggregate or transform the segment.
Saturday, June 8, 13
![Page 34: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/34.jpg)
Split
Apply
Combine
the object based on dimension(s) or identifiers (yielding segments of the same type)
a function to each segment producing a new segment of the target type. The function can aggregate or transform the segment.
the results into an output type (possibly of higher dimension)
Saturday, June 8, 13
![Page 35: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/35.jpg)
Variations based on interface
Output
InputArray Data.Frame List Discarded
Array
Data.Frame
List
aaply adply alply a_ply
daply ddply dlply d_ply
laply ldply llply l_ply
From: Wickham, The Split-Apply-Combine Strategy for Data Analysis
Saturday, June 8, 13
![Page 36: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/36.jpg)
Splitting Matrices - 2D
0 1 2 3 4 5 6 7
0
1
2
3
4
5
6
Split each element to a scalar
Saturday, June 8, 13
![Page 37: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/37.jpg)
Splitting Matrices - 2D
Split each column to a vector
0 1 2 3 4 5 6 7
0
1
2
3
4
5
6
Saturday, June 8, 13
![Page 38: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/38.jpg)
Splitting Matrices - 2D
Split each row to a vector
0 1 2 3 4 5 6 7
0
1
2
3
4
5
6
Saturday, June 8, 13
![Page 39: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/39.jpg)
Splitting Matrices - 2D
0 1 2 3 4 5 6 7
0
1
2
3
4
5
6
Split each element to a scalar
Split each column to a vector
0 1 2 3 4 5 6 7
0
1
2
3
4
5
6
Split each row to a vector0 1 2 3 4 5 6 7
0
1
2
3
4
5
6
Saturday, June 8, 13
![Page 40: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/40.jpg)
Splitting Matrices - 3D
0 1 2 3 4 5 6 7
1
2
3
4
5
6
123
0
0
Split each element to a scalar
Saturday, June 8, 13
![Page 41: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/41.jpg)
Splitting Matrices - 3D
1 2 3 4 5 6 7
0
12
3
0
0
1
2
3
4
5
6
Split each row x=c1, y=c2 to a vector
Saturday, June 8, 13
![Page 42: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/42.jpg)
Splitting Matrices - 3D
Split each row x=c1, z=c2 to a vector
0 1 2 3 4 5 6 7
1
2
3
4
5
6
0
01
23
Saturday, June 8, 13
![Page 43: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/43.jpg)
Splitting Matrices - 3D
0 1 2 3 4 5 6 7
1
2
3
4
5
6
123
0
0
Split each row y=c1, z=c2 to a vector
Saturday, June 8, 13
![Page 44: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/44.jpg)
Splitting Matrices - 3D
0 1 2 3 4 5 6 7
0
1
2
3
4
5
6
01
23
Split each slice x=c to a 2D matrix
Saturday, June 8, 13
![Page 45: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/45.jpg)
Splitting Matrices - 3D
Split each slice y=c to a 2D matrix
0 1 2 3 4 5 6 7
0
1
2
3
4
5
6
01
23
Saturday, June 8, 13
![Page 46: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/46.jpg)
Splitting Matrices - 3D
0 1 2 3 4 5 6 7
0
1
2
3
4
5
6
0
12
3
Split each slice z=c to a 2D matrix
Saturday, June 8, 13
![Page 47: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/47.jpg)
Splitting Matrices - 3D
0 1 2 3 4 5 6 7
1
2
3
4
5
6
12
3
0
0
Split each element to a scalar
1 2 3 4 5 6 7
0
12
3
0
0
1
2
3
4
5
6
Split each row x=c1, y=c2 to a vector
Split each row x=c1, z=c2 to a vector0 1 2 3 4 5 6 7
1
2
3
4
5
6
0
0123
0 1 2 3 4 5 6 7
1
2
3
4
5
6
123
0
0
Split each row y=c1, z=c2 to a vector
0 1 2 3 4 5 6 7
0
1
2
3
4
5
6
0123
Split each slice x=c to a 2D matrix
Split each slice y=c to a 2D matrix0 1 2 3 4 5 6 7
0
1
2
3
4
5
6
0123
0 1 2 3 4 5 6 7
0
1
2
3
4
5
6
0
123
Split each slice z=c to a 2D matrixSaturday, June 8, 13
![Page 48: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/48.jpg)
Splitting a Dataset
Split by Symbol
2013-02-05
2013-02-01 Date
2013-02-04
2013-02-04 2013-02-04
2013-02-01
2013-02-01
261.46 266.89 268.03AMZN 4012900262.00 266.89MSFT 27.44 27.87
50540000 27.03 28.02 27.42
203.57205.02 201.99IBM 204.19 3188800 203.79AMZN 259.98 264.68 259.98 3723600 259.07262.78
27.93 27.51MSFT 28.05
55565900 27.5527.67204.65 203.37IBM 203.84 3370700 205.35 205.18
265.00268.93 6115000AMZN 268.93 262.80 265.00Adj CloseVolumeCloseLowHighOpenSymbol
...
2013-02-05
Date 2013-02-04 2013-02-01
261.46 266.89 268.03AMZN 4012900262.00 266.89AMZN 259.98 264.68 259.98 3723600 259.07262.78
265.00268.93 6115000AMZN 268.93 262.80 265.00Adj CloseVolumeCloseLowHighOpenSymbol
2013-02-01 Date
2013-02-04 203.57205.02 201.99IBM 204.19 3188800 203.79204.65 203.37IBM 203.84 3370700 205.35 205.18
Adj CloseVolumeCloseLowHighOpenSymbol
Date 2013-02-042013-02-01
MSFT 27.44 27.87
50540000 27.03 28.02 27.42 27.93 27.51MSFT 28.05
55565900 27.5527.67
Adj CloseVolumeCloseLowHighOpenSymbol
Saturday, June 8, 13
![Page 49: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/49.jpg)
Splitting a Dataset
Split by Date
... 2013-02-05
2013-02-01 Date
2013-02-04
2013-02-04 2013-02-04
2013-02-01
2013-02-01
261.46 266.89 268.03AMZN 4012900262.00 266.89MSFT 27.44 27.87
50540000 27.03 28.02 27.42
203.57205.02 201.99IBM 204.19 3188800 203.79AMZN 259.98 264.68 259.98 3723600 259.07262.78
27.93 27.51MSFT 28.05
55565900 27.5527.67204.65 203.37IBM 203.84 3370700 205.35 205.18
265.00268.93 6115000AMZN 268.93 262.80 265.00Adj CloseVolumeCloseLowHighOpenSymbol
2013-02-01 Date
2013-02-01
2013-02-01 27.93 27.51MSFT 28.05
55565900 27.5527.67204.65 203.37IBM 203.84 3370700 205.35 205.18
265.00268.93 6115000AMZN 268.93 262.80 265.00Adj CloseVolumeCloseLowHighOpenSymbol
Date
2013-02-04
2013-02-04 2013-02-04
MSFT 27.44 27.87
50540000 27.03 28.02 27.42 203.57205.02 201.99IBM 204.19 3188800 203.79
AMZN 259.98 264.68 259.98 3723600 259.07262.78Adj CloseVolumeCloseLowHighOpenSymbol
2013-02-05 Date
261.46 266.89 268.03AMZN 4012900262.00 266.89Adj CloseVolumeCloseLowHighOpenSymbol
Saturday, June 8, 13
![Page 50: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/50.jpg)
Splitting a Dataset
Split by Date
... 2013-02-05
2013-02-01 Date
2013-02-04
2013-02-04 2013-02-04
2013-02-01
2013-02-01
261.46 266.89 268.03AMZN 4012900262.00 266.89MSFT 27.44 27.87
50540000 27.03 28.02 27.42
203.57205.02 201.99IBM 204.19 3188800 203.79AMZN 259.98 264.68 259.98 3723600 259.07262.78
27.93 27.51MSFT 28.05
55565900 27.5527.67204.65 203.37IBM 203.84 3370700 205.35 205.18
265.00268.93 6115000AMZN 268.93 262.80 265.00Adj CloseVolumeCloseLowHighOpenSymbol
2013-02-01 Date
2013-02-01
2013-02-01 27.93 27.51MSFT 28.05
55565900 27.5527.67204.65 203.37IBM 203.84 3370700 205.35 205.18
265.00268.93 6115000AMZN 268.93 262.80 265.00Adj CloseVolumeCloseLowHighOpenSymbol
Date
2013-02-04
2013-02-04 2013-02-04
MSFT 27.44 27.87
50540000 27.03 28.02 27.42 203.57205.02 201.99IBM 204.19 3188800 203.79
AMZN 259.98 264.68 259.98 3723600 259.07262.78Adj CloseVolumeCloseLowHighOpenSymbol
2013-02-05 Date
261.46 266.89 268.03AMZN 4012900262.00 266.89Adj CloseVolumeCloseLowHighOpenSymbol
We’ll see more advanced splitting in the case study
Saturday, June 8, 13
![Page 51: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/51.jpg)
Apply
0
01
23
Saturday, June 8, 13
![Page 52: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/52.jpg)
Apply
(func )
0
01
23
Saturday, June 8, 13
![Page 53: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/53.jpg)
Apply
(func )
0
01
23
➜ result
Saturday, June 8, 13
![Page 54: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/54.jpg)
Apply
(func )
result must be appropriate for output type
0
01
23
➜ result
Saturday, June 8, 13
![Page 55: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/55.jpg)
Combine
Assemble apply results into output
5
4
3
2
1
0
0123
5
4
3
2
1
0
0123
Saturday, June 8, 13
![Page 56: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/56.jpg)
Implementing ddply in Clojure
Saturday, June 8, 13
![Page 57: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/57.jpg)
Implementing ddply(ns split-apply-combine.ply "Implementation of the split-apply-combine functions, similar to R's plyr library." (:use [incanter.core :only [$data col-names conj-rows dataset]]) (:require [split-apply-combine.core :as sac]))
(defn fast-conj-rows "A simple version of conj-rows that runs much faster" [& datasets] (when (seq datasets) (dataset (col-names (first datasets)) (mapcat :rows datasets))))
(defn expr-to-fn [expr] (let [row-param (gensym "row-") kw-map (sac/build-keyword-map expr)] `(fn [~row-param] (let [~@(apply concat (for [[kw sym] kw-map] [sym `(get ~row-param ~kw ~kw)]))] ~(sac/convert-keywords expr kw-map)))))
(defn exprs-to-fns [group-by] (if (coll? group-by) (vec (for [item group-by] (if (and (coll? item) (coll? (second item)) (not (#{'fn 'fn*} (first (second item))))) [(first item) (expr-to-fn (second item))] item))) group-by))
(defn split-ds "Perform a split operation on data, which must be a dataset, using the group-by-fns to choose bins. group-by-fns can either be a single function or a collection of functions. In the latter case, the results will be combined to create a key for the bin. Returns a map of the group-by-fns results to datasets including all the rows that had the given result.
Note that keyword column names are the most common functions to use for the group-by." [group-by-fns data] (let [cols (col-names data) group-by-fn (if (= 1 (count group-by-fns)) (first group-by-fns) (apply juxt group-by-fns))] (loop [cur (:rows data) row-groups {}] (if (empty? cur) (for [[group rows] row-groups] [group (dataset cols rows)]) (recur (next cur) (let [row (first cur) k (group-by-fn row) a (row-groups k)] (assoc row-groups k (if a (conj a row) [row]))))))))
(defn apply-ds "Apply fun to each group in grouped-data returning a sequence of pairs of the original group-keys and the result of applying the function the dataset. See split-ds for information on the grouped-data data structure." [fun grouped-data] (for [[group split-data] grouped-data] [group (fun split-data)]))
(defn combine-ds "Combine the datasets in grouped-data into a single dataset including the columns specified in the group-by argument as having the values found in the keys in the grouped data.
If there are columns that are in both the key and the dataset, the values in the key have precedence." [group-by grouped-data] (let [group-by (if (coll? group-by) group-by [group-by]) group-by-filter (complement (set group-by))] (apply fast-conj-rows (for [[group data] grouped-data] (let [grouped-cols (zipmap group-by group) union-cols (concat group-by (filter group-by-filter (col-names data)))] (dataset union-cols (map #(merge % grouped-cols) (:rows data))))))))
(defn ddply* "Split-apply-combine from datasets to datasets.
Splits data into a the group of datasets as specified by the group-by argument, applies fun to each of the resulting datasets and combines the result of that back into a single dataset.
The group-by argument can be a keyword or collection of keywords which specify the columns to group by. It can also include pairs [keyword keyfn] where the function keyfun is applied to each row to generate the key for that row. When the groups are combined, keyword is used as the column name for the resulting column. The two types of group-by specifications can be mixed.
The result of the apply function can contain the same columns names as the original dataset or different ones. It can contain the same number of rows as the original, a different number, or a single row.
If data is not specified, it defaults to the currently bound value of $data.
Examples:
(ddply* :Symbol (transform :Change = (diff0 :Close)) stock-data)
(ddply* [[:Month #((juxt year month) (:timestamp %)]] (colwise :Volume sum) stock-data)"
([group-by fun] (ddply* group-by fun $data)) ([group-by fun data] (let [group-by (if (coll? group-by) group-by [group-by]) group-by (for [item group-by] (if (coll? item) item [item item]))] (->> data (split-ds (map second group-by)) (apply-ds fun) (combine-ds (map first group-by))))))
(defmacro ddply "Split-apply-combine from datasets to datasets. This macro is a wrapper on ddply* which provides translation of simple column-referencing expressions in the group-by argument.
Splits data into a the group of datasets as specified by the group-by argument, applies fun to each of the resulting datasets and combines the result of that back into a single dataset.
The group-by argument can be a keyword or collection of keywords which specify the columns to group by. It can also include pairs [keyword key-expr] where the exression key-expr is tranformed to a function and in expr are expanded to accessors on rows. The resulting function is applied to each row to generate the key for that row. When the groups are combined, keyword is used as the column name for the resulting column. The two types of group-by specifications can be mixed.
The result of the apply function can contain the same columns names as the original dataset or different ones. It can contain the same number of rows as the original, a different number, or a single row.
If data is not specified, it defaults to the currently bound value of $data.
Examples:
(ddply :Symbol (transform :Change = (diff0 :Close)) stock-data)
(ddply [[:Month ((juxt year month) :timestamp]]] (colwise :Volume sum) stock-data)" ([group-by fun] `(ddply* ~(exprs-to-fns group-by) ~fun $data)) ([group-by fun data] `(ddply* ~(exprs-to-fns group-by) ~fun ~data)))
(defn d_ply* "Split-apply-combine from datasets to nothing. This version ignores the output of fun and is used for fun's side effects.
Splits data into a the group of datasets as specified by the group-by argument, applies fun to each of the resulting datasets and then drops the result.
The group-by argument can be a keyword or collection of keywords which specify the columns to group by. It can also include pairs [keyword keyfn] where the function keyfun is applied to each row to generate the key for that row. When the groups are combined, keyword is used as the column name for the resulting column. The two types of group-by specifications can be mixed.
The result of the apply function can contain the same columns names as the original dataset or different ones. It can contain the same number of rows as the original, a different number, or a single row.
If data is not specified, it defaults to the currently bound value of $data.
Example:
(d_ply* :Symbol #(view (bar-chart :Date :Volume :data %)) stock-data)" ([group-by fun] (ddply* group-by fun $data)) ([group-by fun data] (let [group-by (if (coll? group-by) group-by [group-by]) group-by (for [item group-by] (if (coll? item) item [item item]))] (dorun (->> data (split-ds (map second group-by)) (apply-ds fun))))))
(defmacro d_ply "Split-apply-combine from datasets to nothing. This version ignores the output of fun and is used for fun's side effects. This macro is a wrapper on d_ply* which provides translation of simple column-referencing expressions in the group-by argument.
Splits data into a the group of datasets as specified by the group-by argument, applies fun to each of the resulting datasets and then drops the result.
The group-by argument can be a keyword or collection of keywords which specify the columns to group by. It can also include pairs [keyword keyfn] where the function keyfun is applied to each row to generate the key for that row. When the groups are combined, keyword is used as the column name for the resulting column. The two types of group-by specifications can be mixed.
The result of the apply function can contain the same columns names as the original dataset or different ones. It can contain the same number of rows as the original, a different number, or a single row.
If data is not specified, it defaults to the currently bound value of $data.
Example:
(d_ply :Symbol #(view (bar-chart :Date :Volume :data %)) stock-data)" ([group-by fun] `(d_ply* ~(exprs-to-fns group-by) ~fun $data)) ([group-by fun data] `(d_ply* ~(exprs-to-fns group-by) ~fun ~data)))
Saturday, June 8, 13
![Page 58: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/58.jpg)
Implementing ddply - Split(ns split-apply-combine.ply "Implementation of the split-apply-combine functions, similar to R's plyr library." (:use [incanter.core :only [$data col-names conj-rows dataset]]) (:require [split-apply-combine.core :as sac]))
(defn fast-conj-rows "A simple version of conj-rows that runs much faster" [& datasets] (when (seq datasets) (dataset (col-names (first datasets)) (mapcat :rows datasets))))
(defn expr-to-fn [expr] (let [row-param (gensym "row-") kw-map (sac/build-keyword-map expr)] `(fn [~row-param] (let [~@(apply concat (for [[kw sym] kw-map] [sym `(get ~row-param ~kw ~kw)]))] ~(sac/convert-keywords expr kw-map)))))
(defn exprs-to-fns [group-by] (if (coll? group-by) (vec (for [item group-by] (if (and (coll? item) (coll? (second item)) (not (#{'fn 'fn*} (first (second item))))) [(first item) (expr-to-fn (second item))] item))) group-by))
(defn split-ds "Perform a split operation on data, which must be a dataset, using the group-by-fns to choose bins. group-by-fns can either be a single function or a collection of functions. In the latter case, the results will be combined to create a key for the bin. Returns a map of the group-by-fns results to datasets including all the rows that had the given result.
Note that keyword column names are the most common functions to use for the group-by." [group-by-fns data] (let [cols (col-names data) group-by-fn (if (= 1 (count group-by-fns)) (first group-by-fns) (apply juxt group-by-fns))] (loop [cur (:rows data) row-groups {}] (if (empty? cur) (for [[group rows] row-groups] [group (dataset cols rows)]) (recur (next cur) (let [row (first cur) k (group-by-fn row) a (row-groups k)] (assoc row-groups k (if a (conj a row) [row]))))))))
(defn apply-ds "Apply fun to each group in grouped-data returning a sequence of pairs of the original group-keys and the result of applying the function the dataset. See split-ds for information on the grouped-data data structure." [fun grouped-data] (for [[group split-data] grouped-data] [group (fun split-data)]))
(defn combine-ds "Combine the datasets in grouped-data into a single dataset including the columns specified in the group-by argument as having the values found in the keys in the grouped data.
If there are columns that are in both the key and the dataset, the values in the key have precedence." [group-by grouped-data] (let [group-by (if (coll? group-by) group-by [group-by]) group-by-filter (complement (set group-by))] (apply fast-conj-rows (for [[group data] grouped-data] (let [grouped-cols (zipmap group-by group) union-cols (concat group-by (filter group-by-filter (col-names data)))] (dataset union-cols (map #(merge % grouped-cols) (:rows data))))))))
(defn ddply* "Split-apply-combine from datasets to datasets.
Splits data into a the group of datasets as specified by the group-by argument, applies fun to each of the resulting datasets and combines the result of that back into a single dataset.
The group-by argument can be a keyword or collection of keywords which specify the columns to group by. It can also include pairs [keyword keyfn] where the function keyfun is applied to each row to generate the key for that row. When the groups are combined, keyword is used as the column name for the resulting column. The two types of group-by specifications can be mixed.
The result of the apply function can contain the same columns names as the original dataset or different ones. It can contain the same number of rows as the original, a different number, or a single row.
If data is not specified, it defaults to the currently bound value of $data.
Examples:
(ddply* :Symbol (transform :Change = (diff0 :Close)) stock-data)
(ddply* [[:Month #((juxt year month) (:timestamp %)]] (colwise :Volume sum) stock-data)"
([group-by fun] (ddply* group-by fun $data)) ([group-by fun data] (let [group-by (if (coll? group-by) group-by [group-by]) group-by (for [item group-by] (if (coll? item) item [item item]))] (->> data (split-ds (map second group-by)) (apply-ds fun) (combine-ds (map first group-by))))))
(defmacro ddply "Split-apply-combine from datasets to datasets. This macro is a wrapper on ddply* which provides translation of simple column-referencing expressions in the group-by argument.
Splits data into a the group of datasets as specified by the group-by argument, applies fun to each of the resulting datasets and combines the result of that back into a single dataset.
The group-by argument can be a keyword or collection of keywords which specify the columns to group by. It can also include pairs [keyword key-expr] where the exression key-expr is tranformed to a function and in expr are expanded to accessors on rows. The resulting function is applied to each row to generate the key for that row. When the groups are combined, keyword is used as the column name for the resulting column. The two types of group-by specifications can be mixed.
The result of the apply function can contain the same columns names as the original dataset or different ones. It can contain the same number of rows as the original, a different number, or a single row.
If data is not specified, it defaults to the currently bound value of $data.
Examples:
(ddply :Symbol (transform :Change = (diff0 :Close)) stock-data)
(ddply [[:Month ((juxt year month) :timestamp]]] (colwise :Volume sum) stock-data)" ([group-by fun] `(ddply* ~(exprs-to-fns group-by) ~fun $data)) ([group-by fun data] `(ddply* ~(exprs-to-fns group-by) ~fun ~data)))
(defn d_ply* "Split-apply-combine from datasets to nothing. This version ignores the output of fun and is used for fun's side effects.
Splits data into a the group of datasets as specified by the group-by argument, applies fun to each of the resulting datasets and then drops the result.
The group-by argument can be a keyword or collection of keywords which specify the columns to group by. It can also include pairs [keyword keyfn] where the function keyfun is applied to each row to generate the key for that row. When the groups are combined, keyword is used as the column name for the resulting column. The two types of group-by specifications can be mixed.
The result of the apply function can contain the same columns names as the original dataset or different ones. It can contain the same number of rows as the original, a different number, or a single row.
If data is not specified, it defaults to the currently bound value of $data.
Example:
(d_ply* :Symbol #(view (bar-chart :Date :Volume :data %)) stock-data)" ([group-by fun] (ddply* group-by fun $data)) ([group-by fun data] (let [group-by (if (coll? group-by) group-by [group-by]) group-by (for [item group-by] (if (coll? item) item [item item]))] (dorun (->> data (split-ds (map second group-by)) (apply-ds fun))))))
(defmacro d_ply "Split-apply-combine from datasets to nothing. This version ignores the output of fun and is used for fun's side effects. This macro is a wrapper on d_ply* which provides translation of simple column-referencing expressions in the group-by argument.
Splits data into a the group of datasets as specified by the group-by argument, applies fun to each of the resulting datasets and then drops the result.
The group-by argument can be a keyword or collection of keywords which specify the columns to group by. It can also include pairs [keyword keyfn] where the function keyfun is applied to each row to generate the key for that row. When the groups are combined, keyword is used as the column name for the resulting column. The two types of group-by specifications can be mixed.
The result of the apply function can contain the same columns names as the original dataset or different ones. It can contain the same number of rows as the original, a different number, or a single row.
If data is not specified, it defaults to the currently bound value of $data.
Example:
(d_ply :Symbol #(view (bar-chart :Date :Volume :data %)) stock-data)" ([group-by fun] `(d_ply* ~(exprs-to-fns group-by) ~fun $data)) ([group-by fun data] `(d_ply* ~(exprs-to-fns group-by) ~fun ~data)))
(defn split-ds [group-by-fns data] (let [cols (col-names data) group-by-fn (if (= 1 (count group-by-fns)) (first group-by-fns) (apply juxt group-by-fns))] (loop [cur (:rows data) row-groups {}] (if (empty? cur) (for [[group rows] row-groups] [group (dataset cols rows)]) (recur (next cur) (let [row (first cur) k (group-by-fn row) a (row-groups k)] (assoc row-groups k (if a (conj a row) [row]))))))))
Saturday, June 8, 13
![Page 59: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/59.jpg)
Implementing ddply - Apply(ns split-apply-combine.ply "Implementation of the split-apply-combine functions, similar to R's plyr library." (:use [incanter.core :only [$data col-names conj-rows dataset]]) (:require [split-apply-combine.core :as sac]))
(defn fast-conj-rows "A simple version of conj-rows that runs much faster" [& datasets] (when (seq datasets) (dataset (col-names (first datasets)) (mapcat :rows datasets))))
(defn expr-to-fn [expr] (let [row-param (gensym "row-") kw-map (sac/build-keyword-map expr)] `(fn [~row-param] (let [~@(apply concat (for [[kw sym] kw-map] [sym `(get ~row-param ~kw ~kw)]))] ~(sac/convert-keywords expr kw-map)))))
(defn exprs-to-fns [group-by] (if (coll? group-by) (vec (for [item group-by] (if (and (coll? item) (coll? (second item)) (not (#{'fn 'fn*} (first (second item))))) [(first item) (expr-to-fn (second item))] item))) group-by))
(defn split-ds "Perform a split operation on data, which must be a dataset, using the group-by-fns to choose bins. group-by-fns can either be a single function or a collection of functions. In the latter case, the results will be combined to create a key for the bin. Returns a map of the group-by-fns results to datasets including all the rows that had the given result.
Note that keyword column names are the most common functions to use for the group-by." [group-by-fns data] (let [cols (col-names data) group-by-fn (if (= 1 (count group-by-fns)) (first group-by-fns) (apply juxt group-by-fns))] (loop [cur (:rows data) row-groups {}] (if (empty? cur) (for [[group rows] row-groups] [group (dataset cols rows)]) (recur (next cur) (let [row (first cur) k (group-by-fn row) a (row-groups k)] (assoc row-groups k (if a (conj a row) [row]))))))))
(defn apply-ds "Apply fun to each group in grouped-data returning a sequence of pairs of the original group-keys and the result of applying the function the dataset. See split-ds for information on the grouped-data data structure." [fun grouped-data] (for [[group split-data] grouped-data] [group (fun split-data)]))
(defn combine-ds "Combine the datasets in grouped-data into a single dataset including the columns specified in the group-by argument as having the values found in the keys in the grouped data.
If there are columns that are in both the key and the dataset, the values in the key have precedence." [group-by grouped-data] (let [group-by (if (coll? group-by) group-by [group-by]) group-by-filter (complement (set group-by))] (apply fast-conj-rows (for [[group data] grouped-data] (let [grouped-cols (zipmap group-by group) union-cols (concat group-by (filter group-by-filter (col-names data)))] (dataset union-cols (map #(merge % grouped-cols) (:rows data))))))))
(defn ddply* "Split-apply-combine from datasets to datasets.
Splits data into a the group of datasets as specified by the group-by argument, applies fun to each of the resulting datasets and combines the result of that back into a single dataset.
The group-by argument can be a keyword or collection of keywords which specify the columns to group by. It can also include pairs [keyword keyfn] where the function keyfun is applied to each row to generate the key for that row. When the groups are combined, keyword is used as the column name for the resulting column. The two types of group-by specifications can be mixed.
The result of the apply function can contain the same columns names as the original dataset or different ones. It can contain the same number of rows as the original, a different number, or a single row.
If data is not specified, it defaults to the currently bound value of $data.
Examples:
(ddply* :Symbol (transform :Change = (diff0 :Close)) stock-data)
(ddply* [[:Month #((juxt year month) (:timestamp %)]] (colwise :Volume sum) stock-data)"
([group-by fun] (ddply* group-by fun $data)) ([group-by fun data] (let [group-by (if (coll? group-by) group-by [group-by]) group-by (for [item group-by] (if (coll? item) item [item item]))] (->> data (split-ds (map second group-by)) (apply-ds fun) (combine-ds (map first group-by))))))
(defmacro ddply "Split-apply-combine from datasets to datasets. This macro is a wrapper on ddply* which provides translation of simple column-referencing expressions in the group-by argument.
Splits data into a the group of datasets as specified by the group-by argument, applies fun to each of the resulting datasets and combines the result of that back into a single dataset.
The group-by argument can be a keyword or collection of keywords which specify the columns to group by. It can also include pairs [keyword key-expr] where the exression key-expr is tranformed to a function and in expr are expanded to accessors on rows. The resulting function is applied to each row to generate the key for that row. When the groups are combined, keyword is used as the column name for the resulting column. The two types of group-by specifications can be mixed.
The result of the apply function can contain the same columns names as the original dataset or different ones. It can contain the same number of rows as the original, a different number, or a single row.
If data is not specified, it defaults to the currently bound value of $data.
Examples:
(ddply :Symbol (transform :Change = (diff0 :Close)) stock-data)
(ddply [[:Month ((juxt year month) :timestamp]]] (colwise :Volume sum) stock-data)" ([group-by fun] `(ddply* ~(exprs-to-fns group-by) ~fun $data)) ([group-by fun data] `(ddply* ~(exprs-to-fns group-by) ~fun ~data)))
(defn d_ply* "Split-apply-combine from datasets to nothing. This version ignores the output of fun and is used for fun's side effects.
Splits data into a the group of datasets as specified by the group-by argument, applies fun to each of the resulting datasets and then drops the result.
The group-by argument can be a keyword or collection of keywords which specify the columns to group by. It can also include pairs [keyword keyfn] where the function keyfun is applied to each row to generate the key for that row. When the groups are combined, keyword is used as the column name for the resulting column. The two types of group-by specifications can be mixed.
The result of the apply function can contain the same columns names as the original dataset or different ones. It can contain the same number of rows as the original, a different number, or a single row.
If data is not specified, it defaults to the currently bound value of $data.
Example:
(d_ply* :Symbol #(view (bar-chart :Date :Volume :data %)) stock-data)" ([group-by fun] (ddply* group-by fun $data)) ([group-by fun data] (let [group-by (if (coll? group-by) group-by [group-by]) group-by (for [item group-by] (if (coll? item) item [item item]))] (dorun (->> data (split-ds (map second group-by)) (apply-ds fun))))))
(defmacro d_ply "Split-apply-combine from datasets to nothing. This version ignores the output of fun and is used for fun's side effects. This macro is a wrapper on d_ply* which provides translation of simple column-referencing expressions in the group-by argument.
Splits data into a the group of datasets as specified by the group-by argument, applies fun to each of the resulting datasets and then drops the result.
The group-by argument can be a keyword or collection of keywords which specify the columns to group by. It can also include pairs [keyword keyfn] where the function keyfun is applied to each row to generate the key for that row. When the groups are combined, keyword is used as the column name for the resulting column. The two types of group-by specifications can be mixed.
The result of the apply function can contain the same columns names as the original dataset or different ones. It can contain the same number of rows as the original, a different number, or a single row.
If data is not specified, it defaults to the currently bound value of $data.
Example:
(d_ply :Symbol #(view (bar-chart :Date :Volume :data %)) stock-data)" ([group-by fun] `(d_ply* ~(exprs-to-fns group-by) ~fun $data)) ([group-by fun data] `(d_ply* ~(exprs-to-fns group-by) ~fun ~data)))
(defn apply-ds [fun grouped-data] (for [[group split-data] grouped-data] [group (fun split-data)]))
Saturday, June 8, 13
![Page 60: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/60.jpg)
Implementing ddply - Combine(ns split-apply-combine.ply "Implementation of the split-apply-combine functions, similar to R's plyr library." (:use [incanter.core :only [$data col-names conj-rows dataset]]) (:require [split-apply-combine.core :as sac]))
(defn fast-conj-rows "A simple version of conj-rows that runs much faster" [& datasets] (when (seq datasets) (dataset (col-names (first datasets)) (mapcat :rows datasets))))
(defn expr-to-fn [expr] (let [row-param (gensym "row-") kw-map (sac/build-keyword-map expr)] `(fn [~row-param] (let [~@(apply concat (for [[kw sym] kw-map] [sym `(get ~row-param ~kw ~kw)]))] ~(sac/convert-keywords expr kw-map)))))
(defn exprs-to-fns [group-by] (if (coll? group-by) (vec (for [item group-by] (if (and (coll? item) (coll? (second item)) (not (#{'fn 'fn*} (first (second item))))) [(first item) (expr-to-fn (second item))] item))) group-by))
(defn split-ds "Perform a split operation on data, which must be a dataset, using the group-by-fns to choose bins. group-by-fns can either be a single function or a collection of functions. In the latter case, the results will be combined to create a key for the bin. Returns a map of the group-by-fns results to datasets including all the rows that had the given result.
Note that keyword column names are the most common functions to use for the group-by." [group-by-fns data] (let [cols (col-names data) group-by-fn (if (= 1 (count group-by-fns)) (first group-by-fns) (apply juxt group-by-fns))] (loop [cur (:rows data) row-groups {}] (if (empty? cur) (for [[group rows] row-groups] [group (dataset cols rows)]) (recur (next cur) (let [row (first cur) k (group-by-fn row) a (row-groups k)] (assoc row-groups k (if a (conj a row) [row]))))))))
(defn apply-ds "Apply fun to each group in grouped-data returning a sequence of pairs of the original group-keys and the result of applying the function the dataset. See split-ds for information on the grouped-data data structure." [fun grouped-data] (for [[group split-data] grouped-data] [group (fun split-data)]))
(defn combine-ds "Combine the datasets in grouped-data into a single dataset including the columns specified in the group-by argument as having the values found in the keys in the grouped data.
If there are columns that are in both the key and the dataset, the values in the key have precedence." [group-by grouped-data] (let [group-by (if (coll? group-by) group-by [group-by]) group-by-filter (complement (set group-by))] (apply fast-conj-rows (for [[group data] grouped-data] (let [grouped-cols (zipmap group-by group) union-cols (concat group-by (filter group-by-filter (col-names data)))] (dataset union-cols (map #(merge % grouped-cols) (:rows data))))))))
(defn ddply* "Split-apply-combine from datasets to datasets.
Splits data into a the group of datasets as specified by the group-by argument, applies fun to each of the resulting datasets and combines the result of that back into a single dataset.
The group-by argument can be a keyword or collection of keywords which specify the columns to group by. It can also include pairs [keyword keyfn] where the function keyfun is applied to each row to generate the key for that row. When the groups are combined, keyword is used as the column name for the resulting column. The two types of group-by specifications can be mixed.
The result of the apply function can contain the same columns names as the original dataset or different ones. It can contain the same number of rows as the original, a different number, or a single row.
If data is not specified, it defaults to the currently bound value of $data.
Examples:
(ddply* :Symbol (transform :Change = (diff0 :Close)) stock-data)
(ddply* [[:Month #((juxt year month) (:timestamp %)]] (colwise :Volume sum) stock-data)"
([group-by fun] (ddply* group-by fun $data)) ([group-by fun data] (let [group-by (if (coll? group-by) group-by [group-by]) group-by (for [item group-by] (if (coll? item) item [item item]))] (->> data (split-ds (map second group-by)) (apply-ds fun) (combine-ds (map first group-by))))))
(defmacro ddply "Split-apply-combine from datasets to datasets. This macro is a wrapper on ddply* which provides translation of simple column-referencing expressions in the group-by argument.
Splits data into a the group of datasets as specified by the group-by argument, applies fun to each of the resulting datasets and combines the result of that back into a single dataset.
The group-by argument can be a keyword or collection of keywords which specify the columns to group by. It can also include pairs [keyword key-expr] where the exression key-expr is tranformed to a function and in expr are expanded to accessors on rows. The resulting function is applied to each row to generate the key for that row. When the groups are combined, keyword is used as the column name for the resulting column. The two types of group-by specifications can be mixed.
The result of the apply function can contain the same columns names as the original dataset or different ones. It can contain the same number of rows as the original, a different number, or a single row.
If data is not specified, it defaults to the currently bound value of $data.
Examples:
(ddply :Symbol (transform :Change = (diff0 :Close)) stock-data)
(ddply [[:Month ((juxt year month) :timestamp]]] (colwise :Volume sum) stock-data)" ([group-by fun] `(ddply* ~(exprs-to-fns group-by) ~fun $data)) ([group-by fun data] `(ddply* ~(exprs-to-fns group-by) ~fun ~data)))
(defn d_ply* "Split-apply-combine from datasets to nothing. This version ignores the output of fun and is used for fun's side effects.
Splits data into a the group of datasets as specified by the group-by argument, applies fun to each of the resulting datasets and then drops the result.
The group-by argument can be a keyword or collection of keywords which specify the columns to group by. It can also include pairs [keyword keyfn] where the function keyfun is applied to each row to generate the key for that row. When the groups are combined, keyword is used as the column name for the resulting column. The two types of group-by specifications can be mixed.
The result of the apply function can contain the same columns names as the original dataset or different ones. It can contain the same number of rows as the original, a different number, or a single row.
If data is not specified, it defaults to the currently bound value of $data.
Example:
(d_ply* :Symbol #(view (bar-chart :Date :Volume :data %)) stock-data)" ([group-by fun] (ddply* group-by fun $data)) ([group-by fun data] (let [group-by (if (coll? group-by) group-by [group-by]) group-by (for [item group-by] (if (coll? item) item [item item]))] (dorun (->> data (split-ds (map second group-by)) (apply-ds fun))))))
(defmacro d_ply "Split-apply-combine from datasets to nothing. This version ignores the output of fun and is used for fun's side effects. This macro is a wrapper on d_ply* which provides translation of simple column-referencing expressions in the group-by argument.
Splits data into a the group of datasets as specified by the group-by argument, applies fun to each of the resulting datasets and then drops the result.
The group-by argument can be a keyword or collection of keywords which specify the columns to group by. It can also include pairs [keyword keyfn] where the function keyfun is applied to each row to generate the key for that row. When the groups are combined, keyword is used as the column name for the resulting column. The two types of group-by specifications can be mixed.
The result of the apply function can contain the same columns names as the original dataset or different ones. It can contain the same number of rows as the original, a different number, or a single row.
If data is not specified, it defaults to the currently bound value of $data.
Example:
(d_ply :Symbol #(view (bar-chart :Date :Volume :data %)) stock-data)" ([group-by fun] `(d_ply* ~(exprs-to-fns group-by) ~fun $data)) ([group-by fun data] `(d_ply* ~(exprs-to-fns group-by) ~fun ~data)))
(defn combine-ds [group-by grouped-data] (let [group-by (if (coll? group-by) group-by [group-by]) group-by-filter (complement (set group-by))] (apply fast-conj-rows (for [[group data] grouped-data] (let [grouped-cols (zipmap group-by group) union-cols (concat group-by (filter group-by-filter (col-names data)))] (dataset union-cols (map #(merge % grouped-cols) (:rows data))))))))
Saturday, June 8, 13
![Page 61: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/61.jpg)
Implementing ddply - Putting it all together(ns split-apply-combine.ply "Implementation of the split-apply-combine functions, similar to R's plyr library." (:use [incanter.core :only [$data col-names conj-rows dataset]]) (:require [split-apply-combine.core :as sac]))
(defn fast-conj-rows "A simple version of conj-rows that runs much faster" [& datasets] (when (seq datasets) (dataset (col-names (first datasets)) (mapcat :rows datasets))))
(defn expr-to-fn [expr] (let [row-param (gensym "row-") kw-map (sac/build-keyword-map expr)] `(fn [~row-param] (let [~@(apply concat (for [[kw sym] kw-map] [sym `(get ~row-param ~kw ~kw)]))] ~(sac/convert-keywords expr kw-map)))))
(defn exprs-to-fns [group-by] (if (coll? group-by) (vec (for [item group-by] (if (and (coll? item) (coll? (second item)) (not (#{'fn 'fn*} (first (second item))))) [(first item) (expr-to-fn (second item))] item))) group-by))
(defn split-ds "Perform a split operation on data, which must be a dataset, using the group-by-fns to choose bins. group-by-fns can either be a single function or a collection of functions. In the latter case, the results will be combined to create a key for the bin. Returns a map of the group-by-fns results to datasets including all the rows that had the given result.
Note that keyword column names are the most common functions to use for the group-by." [group-by-fns data] (let [cols (col-names data) group-by-fn (if (= 1 (count group-by-fns)) (first group-by-fns) (apply juxt group-by-fns))] (loop [cur (:rows data) row-groups {}] (if (empty? cur) (for [[group rows] row-groups] [group (dataset cols rows)]) (recur (next cur) (let [row (first cur) k (group-by-fn row) a (row-groups k)] (assoc row-groups k (if a (conj a row) [row]))))))))
(defn apply-ds "Apply fun to each group in grouped-data returning a sequence of pairs of the original group-keys and the result of applying the function the dataset. See split-ds for information on the grouped-data data structure." [fun grouped-data] (for [[group split-data] grouped-data] [group (fun split-data)]))
(defn combine-ds "Combine the datasets in grouped-data into a single dataset including the columns specified in the group-by argument as having the values found in the keys in the grouped data.
If there are columns that are in both the key and the dataset, the values in the key have precedence." [group-by grouped-data] (let [group-by (if (coll? group-by) group-by [group-by]) group-by-filter (complement (set group-by))] (apply fast-conj-rows (for [[group data] grouped-data] (let [grouped-cols (zipmap group-by group) union-cols (concat group-by (filter group-by-filter (col-names data)))] (dataset union-cols (map #(merge % grouped-cols) (:rows data))))))))
(defn ddply* "Split-apply-combine from datasets to datasets.
Splits data into a the group of datasets as specified by the group-by argument, applies fun to each of the resulting datasets and combines the result of that back into a single dataset.
The group-by argument can be a keyword or collection of keywords which specify the columns to group by. It can also include pairs [keyword keyfn] where the function keyfun is applied to each row to generate the key for that row. When the groups are combined, keyword is used as the column name for the resulting column. The two types of group-by specifications can be mixed.
The result of the apply function can contain the same columns names as the original dataset or different ones. It can contain the same number of rows as the original, a different number, or a single row.
If data is not specified, it defaults to the currently bound value of $data.
Examples:
(ddply* :Symbol (transform :Change = (diff0 :Close)) stock-data)
(ddply* [[:Month #((juxt year month) (:timestamp %)]] (colwise :Volume sum) stock-data)"
([group-by fun] (ddply* group-by fun $data)) ([group-by fun data] (let [group-by (if (coll? group-by) group-by [group-by]) group-by (for [item group-by] (if (coll? item) item [item item]))] (->> data (split-ds (map second group-by)) (apply-ds fun) (combine-ds (map first group-by))))))
(defmacro ddply "Split-apply-combine from datasets to datasets. This macro is a wrapper on ddply* which provides translation of simple column-referencing expressions in the group-by argument.
Splits data into a the group of datasets as specified by the group-by argument, applies fun to each of the resulting datasets and combines the result of that back into a single dataset.
The group-by argument can be a keyword or collection of keywords which specify the columns to group by. It can also include pairs [keyword key-expr] where the exression key-expr is tranformed to a function and in expr are expanded to accessors on rows. The resulting function is applied to each row to generate the key for that row. When the groups are combined, keyword is used as the column name for the resulting column. The two types of group-by specifications can be mixed.
The result of the apply function can contain the same columns names as the original dataset or different ones. It can contain the same number of rows as the original, a different number, or a single row.
If data is not specified, it defaults to the currently bound value of $data.
Examples:
(ddply :Symbol (transform :Change = (diff0 :Close)) stock-data)
(ddply [[:Month ((juxt year month) :timestamp]]] (colwise :Volume sum) stock-data)" ([group-by fun] `(ddply* ~(exprs-to-fns group-by) ~fun $data)) ([group-by fun data] `(ddply* ~(exprs-to-fns group-by) ~fun ~data)))
(defn d_ply* "Split-apply-combine from datasets to nothing. This version ignores the output of fun and is used for fun's side effects.
Splits data into a the group of datasets as specified by the group-by argument, applies fun to each of the resulting datasets and then drops the result.
The group-by argument can be a keyword or collection of keywords which specify the columns to group by. It can also include pairs [keyword keyfn] where the function keyfun is applied to each row to generate the key for that row. When the groups are combined, keyword is used as the column name for the resulting column. The two types of group-by specifications can be mixed.
The result of the apply function can contain the same columns names as the original dataset or different ones. It can contain the same number of rows as the original, a different number, or a single row.
If data is not specified, it defaults to the currently bound value of $data.
Example:
(d_ply* :Symbol #(view (bar-chart :Date :Volume :data %)) stock-data)" ([group-by fun] (ddply* group-by fun $data)) ([group-by fun data] (let [group-by (if (coll? group-by) group-by [group-by]) group-by (for [item group-by] (if (coll? item) item [item item]))] (dorun (->> data (split-ds (map second group-by)) (apply-ds fun))))))
(defmacro d_ply "Split-apply-combine from datasets to nothing. This version ignores the output of fun and is used for fun's side effects. This macro is a wrapper on d_ply* which provides translation of simple column-referencing expressions in the group-by argument.
Splits data into a the group of datasets as specified by the group-by argument, applies fun to each of the resulting datasets and then drops the result.
The group-by argument can be a keyword or collection of keywords which specify the columns to group by. It can also include pairs [keyword keyfn] where the function keyfun is applied to each row to generate the key for that row. When the groups are combined, keyword is used as the column name for the resulting column. The two types of group-by specifications can be mixed.
The result of the apply function can contain the same columns names as the original dataset or different ones. It can contain the same number of rows as the original, a different number, or a single row.
If data is not specified, it defaults to the currently bound value of $data.
Example:
(d_ply :Symbol #(view (bar-chart :Date :Volume :data %)) stock-data)" ([group-by fun] `(d_ply* ~(exprs-to-fns group-by) ~fun $data)) ([group-by fun data] `(d_ply* ~(exprs-to-fns group-by) ~fun ~data)))
(defn ddply* ([group-by fun] (ddply* group-by fun $data)) ([group-by fun data] (let [group-by (if (coll? group-by) group-by [group-by]) group-by (for [item group-by] (if (coll? item) item [item item]))] (->> data (split-ds (map second group-by)) (apply-ds fun) (combine-ds (map first group-by))))))
(defmacro ddply ([group-by fun] `(ddply* ~(exprs-to-fns group-by) ~fun $data)) ([group-by fun data] `(ddply* ~(exprs-to-fns group-by) ~fun ~data)))
Saturday, June 8, 13
![Page 62: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/62.jpg)
Support functions - colwise
(ddply :Symbol (colwise :num stats/mean) tech-stocks)
Saturday, June 8, 13
![Page 63: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/63.jpg)
Support functions - transform
(ddply :Symbol (transform :Change = (diff0 :Close) :Date =* (time-format/parse (time-format/formatters :year-month-day) :Date)) tech-stocks)
Saturday, June 8, 13
![Page 64: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/64.jpg)
A Case Study
Saturday, June 8, 13
![Page 65: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/65.jpg)
A Case Study
“SpaceCurve delivers instantaneous intelligence for location-based services, commodities, defense, emergency services and other markets. The company is developing Big Data solutions that continuously store and immediately analyze massive amounts of multidimensional data.”
Performance analysis of large-scale geospatial-temporal ingest and query on the SpaceCurve multidimensional DB
Saturday, June 8, 13
![Page 66: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/66.jpg)
Our Sample Problem
cpu23cpu11
cpu22cpu10
cpu09 cpu21
cpu20cpu08
cpu15
cpu16
cpu17
cpu14
cpu12
cpu19
cpu13
cpu18
cpu07
cpu06
cpu05
cpu04
cpu03
cpu02
cpu01
cpu00
cpu23cpu11
cpu22cpu10
cpu09 cpu21
cpu20cpu08
cpu15
cpu16
cpu17
cpu14
cpu12
cpu19
cpu13
cpu18
cpu07
cpu06
cpu05
cpu04
cpu03
cpu02
cpu01
cpu00
cpu23cpu11
cpu22cpu10
cpu09 cpu21
cpu20cpu08
cpu15
cpu16
cpu17
cpu14
cpu12
cpu19
cpu13
cpu18
cpu07
cpu06
cpu05
cpu04
cpu03
cpu02
cpu01
cpu00
cpu23cpu11
cpu22cpu10
cpu09 cpu21
cpu20cpu08
cpu15
cpu16
cpu17
cpu14
cpu12
cpu19
cpu13
cpu18
cpu07
cpu06
cpu05
cpu04
cpu03
cpu02
cpu01
cpu00
cpu23cpu11
cpu22cpu10
cpu09 cpu21
cpu20cpu08
cpu15
cpu16
cpu17
cpu14
cpu12
cpu19
cpu13
cpu18
cpu07
cpu06
cpu05
cpu04
cpu03
cpu02
cpu01
cpu00
cpu23cpu11
cpu22cpu10
cpu09 cpu21
cpu20cpu08
cpu15
cpu16
cpu17
cpu14
cpu12
cpu19
cpu13
cpu18
cpu07
cpu06
cpu05
cpu04
cpu03
cpu02
cpu01
cpu00
10GB/s/channelswitch
External Clients
10.0.1.101 10.0.1.102 10.0.1.107 10.0.1.109 10.0.1.111 10.0.1.112 ‣CPU load data‣6 systems‣24 cores/each‣6 data points‣1 sample/second‣~38 minutes run time
Total of ~2 million data points
Small subset of the overall SpaceCurve analysis
Saturday, June 8, 13
![Page 67: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/67.jpg)
Time to see it work...
Saturday, June 8, 13
![Page 68: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/68.jpg)
Where to?
Saturday, June 8, 13
![Page 69: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/69.jpg)
Where to?
Saturday, June 8, 13
![Page 70: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/70.jpg)
Where to?
• A full library implementation of Split-Apply-Combine and helpers
Saturday, June 8, 13
![Page 71: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/71.jpg)
Where to?
• A full library implementation of Split-Apply-Combine and helpers
• Add to Incanter?
Saturday, June 8, 13
![Page 72: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/72.jpg)
Where to?
• A full library implementation of Split-Apply-Combine and helpers
• Add to Incanter?
• Performance optimizations (mutable intermediate results, column-oriented datasets)
Saturday, June 8, 13
![Page 73: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/73.jpg)
Where to?
• A full library implementation of Split-Apply-Combine and helpers
• Add to Incanter?
• Performance optimizations (mutable intermediate results, column-oriented datasets)
• Implementation based on reducers and parallelism
Saturday, June 8, 13
![Page 74: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/74.jpg)
Where to?
• A full library implementation of Split-Apply-Combine and helpers
• Add to Incanter?
• Performance optimizations (mutable intermediate results, column-oriented datasets)
• Implementation based on reducers and parallelism
• Explore the continuum from data exploration tools (R, Incanter) to large-scale data analysis (Hadoop, Cascalog, SpaceCurve, etc.)
Saturday, June 8, 13
![Page 75: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/75.jpg)
Discussion
Saturday, June 8, 13
![Page 76: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/76.jpg)
References
• Source for this presentation: https://www.github.com/tomfaulhaber/split-apply-combine
• The R Project: http://www.r-project.org• The plyr home page: http://plyr.had.co.nz• Hadley Wickham, The Split-Apply-Combine Strategy for Data Analysis,
Journal of Statistical Software, April 2011, Volume 40, Issue 1• Incanter project: http://incanter.org• Eric Rochester, The Clojure Data Analysis Cookbook, Packt Publishing, 2013• Bruce Durling, Quick and Dirty Data Science with Incanter, talk from
EuroClojure 2012, http://confreaks.com/videos/2071-euroclojure2012-quick-and-dirty-data-science-with-incanter
• Spacecurve: http://www.spacecurve.comTom Faulhabertwitter: @tomfaulhabergithub: tomfaulhaber
Saturday, June 8, 13
![Page 77: Implementing the Split-Apply-Combine model in Clojure and Incanter](https://reader034.vdocuments.site/reader034/viewer/2022051610/5495c015ac7959342e8b4f46/html5/thumbnails/77.jpg)
Photo Credits
• Florida Home - anoldent on flickr (http://www.flickr.com/photos/anoldent/2405722434/)
• Midland Coal Mine - jasonwoodhead23 on flickr (http://www.flickr.com/photos/woodhead/8522679843/)
• Paradise - Antti Simonen on flickr (http://www.flickr.com/photos/anttisimonen/6041095682/)
• Traders on the Exchange - thetaxhaven on flickr (http://www.flickr.com/photos/83532250@N06/7651028854)
• Louvre - dynamosquito on flickr (http://www.flickr.com/photos/25182210@N07/2802458437/)
• Construction - Aapo Haapanen on flickr (http://www.flickr.com/photos/decade_null/214247988/)
• Server farm - from the Spacecurve website (http://www.spacecurve.com)
• Sailboat race - Ryk Van Toronto on flickr (http://www.flickr.com/photos/sydandsaskia/394507351)
• Arguing Philosophers - David Schroeter on flickr (http://www.flickr.com/photos/53477785@N00/92134612/)
Saturday, June 8, 13