data visualization with r (ii) dr. jieh-shan george yeh [email protected]
TRANSCRIPT
![Page 2: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.site/reader030/viewer/2022032522/56649d6e5503460f94a4ee27/html5/thumbnails/2.jpg)
2
Outlines
• Data Visualization with R• Visualizing Different Type of Data– Univariate– Univariate Categorical– Bivariate Categorical– Bivariate Continuous vs Categorical– Bivariate Continuous vs Continuous– Bivariate: Continuous vs Time
![Page 3: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.site/reader030/viewer/2022032522/56649d6e5503460f94a4ee27/html5/thumbnails/3.jpg)
3
Data Visualization with R
• Both anecdotally, and per Google Trends, R is the language and tool most closely associated with creating data visualizations. – http://www.google.com/trends/explore?hl=en-US#q=
R%20language,%20Data%20Visualization,%20D3.js,%20Processing.js&cmpt=q
![Page 4: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.site/reader030/viewer/2022032522/56649d6e5503460f94a4ee27/html5/thumbnails/4.jpg)
4
Google Trend on R & Data Visualization
![Page 5: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.site/reader030/viewer/2022032522/56649d6e5503460f94a4ee27/html5/thumbnails/5.jpg)
5
Google Trend on R & Data Visualization
![Page 6: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.site/reader030/viewer/2022032522/56649d6e5503460f94a4ee27/html5/thumbnails/6.jpg)
6
GRAPH FOR DATA MINING
![Page 7: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.site/reader030/viewer/2022032522/56649d6e5503460f94a4ee27/html5/thumbnails/7.jpg)
7
Hierarchical Clustering
• hc<-hclust(dist(mtcars))• plot(hc)• rect.hclust(hc, k=4)
![Page 8: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.site/reader030/viewer/2022032522/56649d6e5503460f94a4ee27/html5/thumbnails/8.jpg)
8
Decision Tree
require(rpart)require(rpart.plot)rp1<-rpart(factor(cyl)~mpg, data=mtcars)prp(rp1)
![Page 9: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.site/reader030/viewer/2022032522/56649d6e5503460f94a4ee27/html5/thumbnails/9.jpg)
9
OTHERS
![Page 10: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.site/reader030/viewer/2022032522/56649d6e5503460f94a4ee27/html5/thumbnails/10.jpg)
10
Financial TimeseriesQuantitative Financial Modeling Framework
• require(quantmod)• getSymbols("YHOO",src="google") # from google
finance• getSymbols("YHOO", from="2014-01-01")• chartSeries(YHOO)
![Page 11: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.site/reader030/viewer/2022032522/56649d6e5503460f94a4ee27/html5/thumbnails/11.jpg)
11
• barChart(YHOO)• candleChart(YHOO,multi.col=TRUE,theme="white") • chartSeries(to.weekly(YHOO),up.col='white',dn.col='
blue')
![Page 12: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.site/reader030/viewer/2022032522/56649d6e5503460f94a4ee27/html5/thumbnails/12.jpg)
12
GGPLOT2
![Page 13: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.site/reader030/viewer/2022032522/56649d6e5503460f94a4ee27/html5/thumbnails/13.jpg)
13
ggplot2
• The ggplot2 package, created by Hadley Wickham, offers a powerful graphics language for creating elegant and complex plots.
• Originally based on Leland Wilkinson's The Grammar of Graphics, ggplot2 allows you to create graphs that represent both univariate and multivariate numerical and categorical data in a straightforward manner.
• Grouping can be represented by color, symbol, size, and transparency. The creation of trellis plots (i.e., conditioning) is relatively simple.
• qplot() (for quick plot) hides much of this complexity when creating standard graphs.
![Page 14: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.site/reader030/viewer/2022032522/56649d6e5503460f94a4ee27/html5/thumbnails/14.jpg)
14
qplot()• The qplot() function can be used to create the most common graph
types. While it does not expose ggplot's full power, it can create a very wide range of useful plots. The format is:
qplot(x, y, data=, color=, shape=, size=, alpha=, geom=, method=, formula=, facets=, xlim=, ylim= xlab=, ylab=, main=, sub=)
Notes:• At present, ggplot2 cannot be used to create 3D graphs or mosaic
plots.• Use I(value) to indicate a specific value. For example size=z makes the
size of the plotted points or lines proportional to the values of a variable z. In contrast, size=I(3) sets each point or line to three times the default size.
![Page 15: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.site/reader030/viewer/2022032522/56649d6e5503460f94a4ee27/html5/thumbnails/15.jpg)
15
Customizing ggplot2 Graphs
• Unlike base R graphs, the ggplot2 graphs are not effected by many of the options set in the par( ) function.
• They can be modified using the theme() function, and by adding graphic parameters within the qplot() function.
• For greater control, use ggplot() and other functions provided by the package.
• ggplot2 functions can be chained with "+" signs to generate the final plot.
![Page 16: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.site/reader030/viewer/2022032522/56649d6e5503460f94a4ee27/html5/thumbnails/16.jpg)
16
![Page 17: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.site/reader030/viewer/2022032522/56649d6e5503460f94a4ee27/html5/thumbnails/17.jpg)
17
Example
# ggplot2 exampleslibrary(ggplot2)
# create factors with value labels mtcars$gear <- factor(mtcars$gear,levels=c(3,4,5), labels=c("3gears","4gears","5gears")) mtcars$am <- factor(mtcars$am,levels=c(0,1), labels=c("Automatic","Manual")) mtcars$cyl <- factor(mtcars$cyl,levels=c(4,6,8), labels=c("4cyl","6cyl","8cyl"))
![Page 18: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.site/reader030/viewer/2022032522/56649d6e5503460f94a4ee27/html5/thumbnails/18.jpg)
18
# Kernel density plots for mpg# grouped by number of gears (indicated by color)qplot(mpg, data=mtcars, geom="density", fill=gear, alpha=I(.5), main="Distribution of Gas Milage", xlab="Miles Per Gallon", ylab="Density")
![Page 19: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.site/reader030/viewer/2022032522/56649d6e5503460f94a4ee27/html5/thumbnails/19.jpg)
19
# Scatterplot of mpg vs. hp for each combination of gears and cylinders# in each facet, transmission type is represented by shape and colorqplot(hp, mpg, data=mtcars, shape=am, color=am, facets=gear~cyl, size=I(3), xlab="Horsepower", ylab="Miles per Gallon")
![Page 20: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.site/reader030/viewer/2022032522/56649d6e5503460f94a4ee27/html5/thumbnails/20.jpg)
20
# Separate regressions of mpg on weight for each number of cylindersqplot(wt, mpg, data=mtcars, geom=c("point", "smooth"), method="lm", formula=y~x, color=cyl, xlab="Weight", ylab="Miles per Gallon“, main="Regression of MPG on Weight",
)
![Page 21: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.site/reader030/viewer/2022032522/56649d6e5503460f94a4ee27/html5/thumbnails/21.jpg)
21
# Boxplots of mpg by number of gears # observations (points) are overlayed and jitteredqplot(gear, mpg, data=mtcars, geom=c("boxplot", "jitter"), fill=gear, main="Mileage by Gear Number", xlab="", ylab="Miles per Gallon")
![Page 22: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.site/reader030/viewer/2022032522/56649d6e5503460f94a4ee27/html5/thumbnails/22.jpg)
22
• To learn more, see the ggplot reference site– http://docs.ggplot2.org/current/index.html