pipelines for data analysis in rfiles.meetup.com/1406240/pipelines.pdf · consistent way create new...
TRANSCRIPT
![Page 1: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/1.jpg)
Hadley Wickham @hadleywickham Chief Scientist, RStudio
Pipelines for data analysis in R
October 2015
![Page 2: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/2.jpg)
Data analysis is the process by which data becomes
understanding, knowledge and insight
Data analysis is the process by which data becomes
understanding, knowledge and insight
![Page 3: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/3.jpg)
Data analysis is the process by which data becomes
understanding, knowledge and insight
Data analysis is the process by which data becomes
understanding, knowledge and insight
![Page 4: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/4.jpg)
Transform
Visualise
Model
Tidy
Import
Surprises, but doesn't scale
Scales, but doesn't (fundamentally) surprise
Create new variables & new summariesConsistent way of storing data
![Page 5: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/5.jpg)
Transform
Visualise
Model
tidyr dplyrTidy
Importreadr readxl haven DBI httr
broom
ggplot2 ggvis
![Page 6: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/6.jpg)
Pipelines
![Page 7: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/7.jpg)
Think it Do itDescribe it
Cognitive
Computational
(precisely)
![Page 8: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/8.jpg)
Cognition time ≫ Computation time
http://ww
w.flickr.com/photos/m
utsmuts/4695658106
![Page 9: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/9.jpg)
%>%Inspirations: unix, F#, haskell, clojure, method chaining
magrittr::
![Page 10: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/10.jpg)
foo_foo <- little_bunny()
bop_on( scoop_up( hop_through(foo_foo, forest), field_mouse ), head )
# vs foo_foo %>% hop_through(forest) %>% scoop_up(field_mouse) %>% bop_on(head)
![Page 11: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/11.jpg)
x %>% f(y) # f(x, y)
x %>% f(z, .) # f(z, x)
x %>% f(y) %>% g(z) # g(f(x, y), z)
# Turns function composition (hard to read) # into sequence (easy to read)
![Page 12: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/12.jpg)
# Any function can use it. Only needs a simple # property: the type of the first argument # needs to be the same as the type of the result.
# tidyr: pipelines for messy -> tidy data # dplyr: pipelines for data manipulation # ggvis: pipelines for visualisations # rvest: pipelines for html # purrr: pipelines for lists # xml2: pipelines for xml # stringr: pipelines for strings
![Page 13: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/13.jpg)
Tidy
![Page 14: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/14.jpg)
Transform
Visualise
Model
tidyr dplyrTidy
Importreadr readxl haven DBI httr
ggplot2 ggvis
broom
![Page 15: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/15.jpg)
Storage Meaning
Table / File Data set
Rows Observations
Columns Variables
Tidy data = data that makes data analysis easy
![Page 16: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/16.jpg)
Source: local data frame [5,769 x 22]
iso2 year m04 m514 m014 m1524 m2534 m3544 m4554 m5564 m65 mu f04 f514 (chr) (int) (int) (int) (int) (int) (int) (int) (int) (int) (int) (int) (int) (int) 1 AD 1989 NA NA NA NA NA NA NA NA NA NA NA NA 2 AD 1990 NA NA NA NA NA NA NA NA NA NA NA NA 3 AD 1991 NA NA NA NA NA NA NA NA NA NA NA NA 4 AD 1992 NA NA NA NA NA NA NA NA NA NA NA NA 5 AD 1993 NA NA NA NA NA NA NA NA NA NA NA NA 6 AD 1994 NA NA NA NA NA NA NA NA NA NA NA NA 7 AD 1996 NA NA 0 0 0 4 1 0 0 NA NA NA 8 AD 1997 NA NA 0 0 1 2 2 1 6 NA NA NA 9 AD 1998 NA NA 0 0 0 1 0 0 0 NA NA NA 10 AD 1999 NA NA 0 0 0 1 1 0 0 NA NA NA 11 AD 2000 NA NA 0 0 1 0 0 0 0 NA NA NA 12 AD 2001 NA NA 0 NA NA 2 1 NA NA NA NA NA 13 AD 2002 NA NA 0 0 0 1 0 0 0 NA NA NA 14 AD 2003 NA NA 0 0 0 1 2 0 0 NA NA NA 15 AD 2004 NA NA 0 0 0 1 1 0 0 NA NA NA 16 AD 2005 0 0 0 0 1 1 0 0 0 0 0 0 .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... Variables not shown: f014 (int), f1524 (int), f2534 (int), f3544 (int), f4554 (int), f5564 (int), f65 (int), fu (int)
What are the variables in this dataset? (Hint: f = female, u = unknown, 1524 = 15-24)
![Page 17: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/17.jpg)
# To convert this messy data into tidy data # we need two verbs. First we need to gather # together all the columns that aren't variables
tb2 <- tb %>% gather(demo, n, -iso2, -year, na.rm = TRUE) tb2
![Page 18: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/18.jpg)
# Then separate the demographic variable into # sex and age tb3 <- tb2 %>% separate(demo, c("sex", "age"), 1) tb3
# Many tidyr verbs come in pairs: # spread vs. gather # extract/separate vs. unite # nest vs. unnest
![Page 19: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/19.jpg)
Google for “tidyr” &
“tidy data”
![Page 20: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/20.jpg)
Transform
![Page 21: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/21.jpg)
Transform
Visualise
Model
tidyr dplyrTidy
ggplot2 ggvis
broom
Importreadr readxl haven DBI httr
![Page 22: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/22.jpg)
Think it Do itDescribe it
Cognitive
Computational
(precisely)
![Page 23: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/23.jpg)
One table verbs
• select: subset variables by name• filter: subset observations by value• mutate: add new variables• summarise: reduce to a single obs• arrange: re-order the observations
+ group by
![Page 24: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/24.jpg)
Demo
![Page 25: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/25.jpg)
right_join()
full_join()
inner_join()
left_join()
Mutating
semi_join()
anti_join()
Filtering Set
intersect()
setdiff()
union()
![Page 26: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/26.jpg)
dplyr sources• Local data frame (C++)• Local data table• Local data cube (experimental)• RDMS: Postgres, MySQL, SQLite,
Oracle, MS SQL, JDBC, Impala• MonetDB, BigQuery
![Page 27: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/27.jpg)
Google for “dplyr”
![Page 28: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/28.jpg)
Visualise
![Page 29: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/29.jpg)
Transform
Visualise
Model
tidyr dplyrTidy
ggplot2 ggvis
broom
Importreadr readxl haven DBI httr
![Page 30: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/30.jpg)
What is ggvis?• A grammar of graphics
(like ggplot2)
• Reactive (interactive & dynamic) (like shiny)
• A pipeline (a la dplyr)
•Of the web (drawn with vega)
![Page 31: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/31.jpg)
Demo4-ggvis.R 4-ggvis.Rmd
![Page 32: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/32.jpg)
Google for “ggvis”
![Page 33: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/33.jpg)
Modelwith broom, by David Robinson
![Page 34: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/34.jpg)
Transform
Visualise
Model
tidyr dplyrTidy
ggplot2 ggvis
broom
Importreadr readxl haven DBI httr
![Page 35: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/35.jpg)
2.5
5.0
7.5
1990 1995 2000 2005 2010 2015date
log(sales)
46 TX cities, ~25 years of data
What makes it hard to see the long term
trend?
![Page 36: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/36.jpg)
# Models are useful as tool for removing # known patterns
tx <- tx %>% group_by(city) %>% mutate( resid = lm( log(sales) ~ factor(month), na.action = na.exclude ) %>% resid() )
![Page 37: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/37.jpg)
−2
−1
0
1
1990 1995 2000 2005 2010 2015date
resid
![Page 38: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/38.jpg)
# Models are also useful in their own right
models <- tx %>% group_by(city) %>% do(mod = lm( log(sales) ~ factor(month), data = ., na.action = na.exclude) )
![Page 39: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/39.jpg)
Model summaries
• Model level: one row per model• Coefficient level: one row per
coefficient (per model)• Observation level: one row per
observation (per model)
![Page 40: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/40.jpg)
Demo5-broom.R
![Page 41: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/41.jpg)
Google for “broom r”
![Page 42: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/42.jpg)
Big data and R
![Page 43: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/43.jpg)
Big Can’t fit in memory on one computer: >5 TB
Medium Fits in memory on a server: 10 GB-5 TB
Small Fits in memory on a laptop: <10 GB
R is great at this!
![Page 44: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/44.jpg)
R• R provides an excellent environment for
rapid interactive exploration of small data.
• There is no technical reason why it can’t also work well with medium size data. (But the work mostly hasn’t been done)
• What about big data?
![Page 45: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/45.jpg)
1. Can be reduced to a small data problem with subsetting/sampling/summarising (90%)
2. Can be reduced to a very large number of small data problems (9%)
3. Is irreducibly big (1%)
![Page 46: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/46.jpg)
The right small data
• Rapid iteration essential• dplyr supports this activity by avoiding
cognitive costs of switching between languages.
![Page 47: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/47.jpg)
Lots of small problems
• Embarrassingly parallel (e.g. Hadoop)• R wrappers like foreach, rhipe, rhadoop• Challenging is matching architecture of
computing to data storage
![Page 48: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/48.jpg)
Irreducibly big
• Computation must be performed by specialised system.
• Typically C/C++, Fortran, Scala.• R needs to be able to talk to those
systems.
![Page 49: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/49.jpg)
Future work
![Page 50: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/50.jpg)
End gameProvide a fluent interface where you spent your mental energy on the specific data problem, not general data analysis process.The best tools become invisible with time!Still a lot of work to do, especially on the connection between modelling and visualisation.
![Page 51: Pipelines for data analysis in Rfiles.meetup.com/1406240/pipelines.pdf · Consistent way Create new variables & new summaries of storing data. Transform Visualise Model tidyr dplyr](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eaa9a60e92e42cf0c3301/html5/thumbnails/51.jpg)
Transform
Visualise
Model
tidyr dplyrTidy
Importreadr readxl haven DBI httr
broom
ggplot2 ggvis