df1 - r - natekin - improving daily analysis with data.table

34
Alex Natekin

Upload: moscowdatafest

Post on 16-Jan-2017

404 views

Category:

Science


3 download

TRANSCRIPT

Page 1: DF1 - R - Natekin - Improving Daily Analysis with data.table

Improving daily analysis with data.table

a [brief] tutorial

Alex NatekinDeloitte Analytics Institute

Page 2: DF1 - R - Natekin - Improving Daily Analysis with data.table

2

Been there, done that

[email protected]/natekinlinkedin.com/in/natekinfacebook.com/alex.natekin

Page 3: DF1 - R - Natekin - Improving Daily Analysis with data.table

3

Data.table

Page 4: DF1 - R - Natekin - Improving Daily Analysis with data.table

4

Legend says

And many others…“the R god of number crunching”

Page 5: DF1 - R - Natekin - Improving Daily Analysis with data.table

5

Legend says (2)

… to read the manual

With great poweR comes great Responsibility

of fasteR & richeR data crunching …

Page 6: DF1 - R - Natekin - Improving Daily Analysis with data.table

6

Choose your side

dplyr sqldfdata.table

“Hadleyverse” Way of the warrior…

…each one is way different from data.frame

Page 7: DF1 - R - Natekin - Improving Daily Analysis with data.table

7

Choose your side… wisely

from recent Matt Dowle’s meetup presentations

Page 8: DF1 - R - Natekin - Improving Daily Analysis with data.table

8

from recent Matt Dowle’s meetup presentations

…just search for “data.table benchmarks”

Choose your side… wisely (2)

Page 9: DF1 - R - Natekin - Improving Daily Analysis with data.table

9

data.table applicability

SolutionData

extraction & checks

Data processing

Feature engineering Models Stories

…trying to find your place under the sun

Page 10: DF1 - R - Natekin - Improving Daily Analysis with data.table

10

data.table applicability

SolutionData

extraction & checks

Data processing

Feature engineering Models Stories

Naïve functionality

Most awesome functionality

Is closest to production code

(if applicable to R)

Page 11: DF1 - R - Natekin - Improving Daily Analysis with data.table

11

Core functionality

1. Data reading & memory management

2. Data access & ordering3. Grouping & aggregation

…feature engineering

More efficient:

Page 12: DF1 - R - Natekin - Improving Daily Analysis with data.table

12

Core functionality (2)

1. Data reading & memory management

2. Data access & ordering3. Grouping & aggregation

…feature engineering

More efficient:

…as data.frame extension (~100% compatible)

1. Reduce machine time

2. Reduce human programming time

Page 13: DF1 - R - Natekin - Improving Daily Analysis with data.table

13

Core principle

DT[i, j, by]1. Take DT2. Subset rows by i3. Calculate j4. …grouped by by

Page 14: DF1 - R - Natekin - Improving Daily Analysis with data.table

14

Core principle (2)

from data.table tutorial

Page 15: DF1 - R - Natekin - Improving Daily Analysis with data.table

15

Example: churn

Sorry

Laptop died last evening,

no interactive tutorial

Screenshots from remaining

files

Page 16: DF1 - R - Natekin - Improving Daily Analysis with data.table

16

Example

Page 17: DF1 - R - Natekin - Improving Daily Analysis with data.table

17

Example

Page 18: DF1 - R - Natekin - Improving Daily Analysis with data.table

18

Example

Page 19: DF1 - R - Natekin - Improving Daily Analysis with data.table

19

Example (manual injection)

setkey(DT, colA, colB)

Yet another recent Matt Dowle’s meetup presentations

Page 20: DF1 - R - Natekin - Improving Daily Analysis with data.table

20

Example

Page 21: DF1 - R - Natekin - Improving Daily Analysis with data.table

21

Example

Page 22: DF1 - R - Natekin - Improving Daily Analysis with data.table

22

Example

Page 23: DF1 - R - Natekin - Improving Daily Analysis with data.table

23

Example

Page 24: DF1 - R - Natekin - Improving Daily Analysis with data.table

24

Example

Page 25: DF1 - R - Natekin - Improving Daily Analysis with data.table

25

Example

Page 26: DF1 - R - Natekin - Improving Daily Analysis with data.table

26

Example: churn

Page 27: DF1 - R - Natekin - Improving Daily Analysis with data.table

27

Example: churn

Page 28: DF1 - R - Natekin - Improving Daily Analysis with data.table

28

Example: churn

Page 29: DF1 - R - Natekin - Improving Daily Analysis with data.table

29

Example: churn

Page 30: DF1 - R - Natekin - Improving Daily Analysis with data.table

30

Example: churn

Page 31: DF1 - R - Natekin - Improving Daily Analysis with data.table

31

Functionality: more

1. Fread

2. Column updates

3. Set functions (set, setnames, …)

4. Special symbols (.SD, .I, …)

5. Joins

… next time

Page 32: DF1 - R - Natekin - Improving Daily Analysis with data.table

32

More: resources

Page 33: DF1 - R - Natekin - Improving Daily Analysis with data.table

33

SummaRy

1. data.table is helpful & awesome

2. go forth and use it

3. RTFM

Page 34: DF1 - R - Natekin - Improving Daily Analysis with data.table

Thanks!

Alex [email protected]+7 915 070 45 74