session 5 - bootstrappers.umassmed.educd74-lps sod2-lps cd74-r848 sod2-r848 cd74-ifnb sod2-ifnb 0...
TRANSCRIPT
Session 5Nick Hathaway; [email protected]
Contents
Adding Text To Plots 1
Line graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Bar graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Dividing Data into Quantiles 17
Part 1 Excerices 21
RMarkdown 21
R Code Chunks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Adding Text To Plots
Line graph
Reading and processing data
library(tidyverse)#ts_longFormat = read_tsv("time.series.data.txt") %>%
rename(gene = X1) %>%gather(Condition, expression, 2:ncol(.) ) %>%separate(Condition, c("exposure", "time") ) %>%mutate(time = as.numeric(gsub("h", "", time) ) )
# also you can also the %in% operator that R offersts_longFormat_SOD2_CD74 = ts_longFormat %>%
filter(gene %in% c("SOD2", "CD74") )
# create a grouping variable to make plotting easierts_longFormat_SOD2_CD74 = ts_longFormat_SOD2_CD74 %>%
mutate(grouping = paste0(gene, "-", exposure))
Here is a plot from the last Session
# using group = grouping to separate out the different genes and the exposure but still color by exposuregeneLinetypes =c("dotted", "solid")names(geneLinetypes) = c("CD74", "SOD2")# make the points larger, the value given to size is a relative numberggplot(ts_longFormat_SOD2_CD74, aes(x = time, y = expression, color = exposure, group = grouping)) +
geom_point(aes(shape = gene), size = 3) +
1
geom_line(aes(linetype = gene)) +scale_color_brewer(palette = "Dark2") +scale_shape_manual(values = c(1, 3)) +scale_linetype_manual(values = geneLinetypes)
0
5000
10000
15000
0 5 10 15 20 25
time
expr
essi
on
gene
CD74
SOD2
exposure
Ctrl
Ifnb
Lps
R848
Now imagine we want to add text to each line so we can have a label for what each line represents. Thiscould be accomplished in several ways, one way is to first create a data frame with data points for each themax time point for each grouping variable.
ts_longFormat_SOD2_CD74_summary = ts_longFormat_SOD2_CD74 %>%filter("Ctrl" != exposure) %>%group_by(grouping) %>%mutate(maxTime = max(time)) %>%filter(time == maxTime)
Once we have this data frame, we can use it to add a label at the further time point, which will be at the endof each line. This can be done by utilizing the fact that when adding geom_[LAYER] layers we can assign anew data frame for the layer to base its layout off of by doing data=.
ggplot(ts_longFormat_SOD2_CD74, aes(x = time, y = expression, color = exposure, group = grouping)) +geom_point(aes(shape = gene), size = 3) +geom_line(aes(linetype = gene)) +scale_color_brewer(palette = "Dark2") +scale_shape_manual(values = c(1, 3)) +scale_linetype_manual(values = geneLinetypes) +geom_text(aes(label = grouping), data = ts_longFormat_SOD2_CD74_summary)
2
CD74−Lps
SOD2−Lps
CD74−R848
SOD2−R848
CD74−Ifnb
SOD2−Ifnb0
5000
10000
15000
0 5 10 15 20 25
time
expr
essi
on
gene
CD74
SOD2
exposure
a
a
a
a
Ctrl
Ifnb
Lps
R848
Here we used the geom_text layer which adds text, it needs a x and y (which was set in the top ggplot aes)and a label variable for what it’s going to add as text to the plot. Notice how the text is centered on thelast point, but we can’t see it very well, so to change the text alignment we use hjust=, 0 = start at the x,ycoordinates, 0.5 (default) = center on the x,y coordinates, and 1 = end at the x,y coordinates.
ggplot(ts_longFormat_SOD2_CD74, aes(x = time, y = expression, color = exposure, group = grouping)) +geom_point(aes(shape = gene), size = 3) +geom_line(aes(linetype = gene)) +scale_color_brewer(palette = "Dark2") +scale_shape_manual(values = c(1, 3)) +scale_linetype_manual(values = geneLinetypes) +geom_text(aes(label = grouping), hjust = 0, data = ts_longFormat_SOD2_CD74_summary)
3
CD74−Lps
SOD2−Lps
CD74−R848
SOD2−R848
CD74−Ifnb
SOD2−Ifnb0
5000
10000
15000
0 5 10 15 20 25
time
expr
essi
on
gene
CD74
SOD2
exposure
a
a
a
a
Ctrl
Ifnb
Lps
R848
So now the text starts at the point but it’s still over the point so lets nudge it a little bit to the right by usenudge_x= to nudge it over 1 x-axis unit
ggplot(ts_longFormat_SOD2_CD74, aes(x = time, y = expression, color = exposure, group = grouping)) +geom_point(aes(shape = gene), size = 3) +geom_line(aes(linetype = gene)) +scale_color_brewer(palette = "Dark2") +scale_shape_manual(values = c(1, 3)) +scale_linetype_manual(values = geneLinetypes) +geom_text(aes(label = grouping), nudge_x = 1, hjust = 0, data = ts_longFormat_SOD2_CD74_summary)
4
CD74−Lps
SOD2−Lps
CD74−R848
SOD2−R848
CD74−Ifnb
SOD2−Ifnb0
5000
10000
15000
0 5 10 15 20 25
time
expr
essi
on
gene
CD74
SOD2
exposure
a
a
a
a
Ctrl
Ifnb
Lps
R848
Though unfortunately ggplot doesn’t take into account the text when determining limits so we have to changethem to be able to see the text
ggplot(ts_longFormat_SOD2_CD74, aes(x = time, y = expression, color = exposure, group = grouping)) +geom_point(aes(shape = gene), size = 3) +geom_line(aes(linetype = gene)) +scale_color_brewer(palette = "Dark2") +scale_shape_manual(values = c(1, 3)) +scale_linetype_manual(values = geneLinetypes) +geom_text(aes(label = grouping), nudge_x = 1, hjust = 0, data = ts_longFormat_SOD2_CD74_summary) +xlim(0, 30)
5
CD74−Lps
SOD2−Lps
CD74−R848
SOD2−R848
CD74−Ifnb
SOD2−Ifnb0
5000
10000
15000
0 10 20 30
time
expr
essi
on
gene
CD74
SOD2
exposure
a
a
a
a
Ctrl
Ifnb
Lps
R848
Bar graph
Now let’s try adding text to a barplot, lets define the time points as factors so we don’t have to have so muchspace between each bar
ts_longFormat_SOD2_CD74 = read_tsv("time.series.data.txt") %>%rename(gene = X1) %>%gather(Condition, expression, 2:ncol(.) ) %>%separate(Condition, c("exposure", "time") ) %>%filter(gene %in% c("SOD2", "CD74") )%>%mutate(time = factor(time, levels = c("0h", "1h", "2h", "4h", "6h", "12h", "24h")))
ggplot(ts_longFormat_SOD2_CD74, aes(x = time, y = expression, fill = gene)) +geom_bar(stat = "identity", position = "dodge") +scale_fill_brewer(palette = "Dark2")
6
0
5000
10000
15000
0h 1h 2h 4h 6h 12h 24h
time
expr
essi
on gene
CD74
SOD2
ggplot(ts_longFormat_SOD2_CD74, aes(x = time, y = expression, fill = gene)) +geom_bar(stat = "identity", position = "dodge") +scale_fill_brewer(palette = "Dark2") +geom_text(aes(label = expression))
7
12037.00875
209.537083333333
16970.932
1310.058
16475.174
3927.324
17277.43875
8543.60125
18339.2325
11942.29
14720.914
10835.0349882.88
7789.53333333333
12580.2066666667
527.106666666667
13359.5766666667
1430.8
12102.8766666667
6844.12
12448.3233333333
9642.0310213.05
13774.21
5812.25333333333
13947.8266666667
11408.615
210.055
9773.32
303.53
8625.455
487.47
9327.76
607.655
10434.865
680.81
10828.38
596.4050
5000
10000
15000
0h 1h 2h 4h 6h 12h 24h
time
expr
essi
on gene
CD74
SOD2
These numbers are quite large so let’s change it so they only show 3 significant figures using signif()function
ggplot(ts_longFormat_SOD2_CD74, aes(x = time, y = expression, fill = gene)) +geom_bar(stat = "identity", position = "dodge") +scale_fill_brewer(palette = "Dark2") +geom_text(aes(label = signif(expression, 3)))
8
12000
210
17000
1310
16500
3930
17300
8540
18300
11900
14700
108009880
7790
12600
527
13400
1430
12100
6840
12400
964010200
13800
5810
13900
11400
210
9770
304
8630
487
9330
608
10400
681
10800
5960
5000
10000
15000
0h 1h 2h 4h 6h 12h 24h
time
expr
essi
on gene
CD74
SOD2
Also look how the numbers are over the place, what’s happening? Well, we still haven’t taken into accountthe different exposures. We could handle this in a couple of ways but let’s take advantage of the face_wrapfunction in ggplot. By using the ~ symbol we tell face_wrap what columns to use to create separate panels.
ggplot(ts_longFormat_SOD2_CD74, aes(x = time, y = expression, fill = gene)) +geom_bar(stat = "identity", position = "dodge") +scale_fill_brewer(palette = "Dark2") +geom_text(aes(label = signif(expression, 3))) +facet_wrap(~exposure)
9
12000
210
17000
1310
16500
3930
17300
8540
18300
11900
14700
1080098807790
11400
210
9770
304
8630
487
9330
608
10400
681
10800
596
12600
527
13400
1430
12100
6840
12400
964010200
13800
5810
13900
Lps R848
Ctrl Ifnb
0h 1h 2h 4h 6h 12h 24h 0h 1h 2h 4h 6h 12h 24h
0
5000
10000
15000
0
5000
10000
15000
time
expr
essi
on gene
CD74
SOD2
Notice how the limits for each axis is the same across all panels, we can change this by setting the scales=to free (different limits for each panel), free_x(different for just x), or free_y (different for just y)
ggplot(ts_longFormat_SOD2_CD74, aes(x = time, y = expression, fill = gene)) +geom_bar(stat = "identity", position = "dodge") +scale_fill_brewer(palette = "Dark2") +geom_text(aes(label = signif(expression, 3)), position = "dodge") +facet_wrap(~exposure, scales = "free_x")
10
210
12000
1310
17000
3930
16500
8540
17300
11900
18300
10800
14700
77909880
210
11400
304
9770
487
8630
608
9330
681
10400
596
10800
527
12600
1430
13400
6840
121009640
1240013800
10200
13900
5810
Lps R848
Ctrl Ifnb
1h 2h 4h 6h 12h 24h 1h 2h 4h 6h 12h 24h
0h 1h 2h 4h 6h 12h 24h
0
5000
10000
15000
0
5000
10000
15000
time
expr
essi
on gene
CD74
SOD2
Because the barplot is dodged, we have to doge the geom_text as well, each that is done with the posi-tion_dodge function rather than just "dodge".
ggplot(ts_longFormat_SOD2_CD74, aes(x = time, y = expression, fill = gene, group = gene)) +geom_bar(stat = "identity", position = "dodge") +scale_fill_brewer(palette = "Dark2") +geom_text(aes(label = signif(expression, 3)), position = position_dodge(width = 0.9) ) +facet_wrap(~exposure, scales = "free_x")
11
210
12000
1310
17000
3930
16500
8540
17300
11900
18300
10800
14700
77909880
210
11400
304
9770
487
8630
608
9330
681
10400
596
10800
527
12600
1430
13400
6840
121009640
1240013800
10200
13900
5810
Lps R848
Ctrl Ifnb
1h 2h 4h 6h 12h 24h 1h 2h 4h 6h 12h 24h
0h 1h 2h 4h 6h 12h 24h
0
5000
10000
15000
0
5000
10000
15000
time
expr
essi
on gene
CD74
SOD2
Also let’s raise the labels a bit above the bars by adding 1000 to the y
ggplot(ts_longFormat_SOD2_CD74, aes(x = time, y = expression, fill = gene)) +geom_bar(stat = "identity", position = "dodge") +scale_fill_brewer(palette = "Dark2") +geom_text(aes(y = expression + 1000, label = signif(expression, 3)), position = position_dodge(width = 0.9) ) +facet_wrap(~exposure, scales = "free_x")
12
210
12000
1310
17000
3930
16500
8540
17300
11900
18300
10800
14700
77909880
210
11400
304
9770
487
8630
608
9330
681
10400
596
10800
527
12600
1430
13400
6840
121009640
1240013800
10200
13900
5810
Lps R848
Ctrl Ifnb
1h 2h 4h 6h 12h 24h 1h 2h 4h 6h 12h 24h
0h 1h 2h 4h 6h 12h 24h
0
5000
10000
15000
20000
0
5000
10000
15000
20000
time
expr
essi
on gene
CD74
SOD2
Also let’s angle the text by setting angle = 45 to put the text at a slant to fit a bit better
ggplot(ts_longFormat_SOD2_CD74, aes(x = time, y = expression, fill = gene)) +geom_bar(stat = "identity", position = "dodge") +scale_fill_brewer(palette = "Dark2") +geom_text(aes(y = expression + 1000, label = signif(expression, 3)), angle = 45, position = position_dodge(width = 0.9) ) +facet_wrap(~exposure, scales = "free_x")
13
210
1200
0
1310
1700
0
3930
1650
0
8540
1730
0
1190
0
1830
0
1080
01470
0
779098
80
210
1140
0
304
9770
487
8630
608
9330
681
1040
0
596
1080
0
527
1260
0
1430
1340
0
6840
1210
0
964012
400
1380
0
1020
0 1390
0
5810
Lps R848
Ctrl Ifnb
1h 2h 4h 6h 12h 24h 1h 2h 4h 6h 12h 24h
0h 1h 2h 4h 6h 12h 24h
0
5000
10000
15000
20000
0
5000
10000
15000
20000
time
expr
essi
on gene
CD74
SOD2
We can also give a bit more room with putting the label on the bottom.
ggplot(ts_longFormat_SOD2_CD74, aes(x = time, y = expression, fill = gene)) +geom_bar(stat = "identity", position = "dodge") +scale_fill_brewer(palette = "Dark2") +geom_text(aes(y = expression + 1000, label = signif(expression, 3)), angle = 45, position = position_dodge(width = 0.9) ) +facet_wrap(~exposure, scales = "free_x") +theme(legend.position = "bottom")
14
210
1200
0
1310
1700
0
3930
1650
0
8540
1730
0
1190
018
300
1080
01470
0
779098
80
210
1140
0
304
9770
487
8630
608
9330
681
1040
0
596
1080
0
527
1260
0
1430
1340
0
6840
1210
0
964012
400
1380
0
1020
0 1390
0
5810
Lps R848
Ctrl Ifnb
1h 2h 4h 6h 12h 24h 1h 2h 4h 6h 12h 24h
0h 1h 2h 4h 6h 12h 24h0
5000
10000
15000
20000
0
5000
10000
15000
20000
time
expr
essi
on
gene CD74 SOD2
Another Example
Here is an example of adding text to a bar graph to indicate how big each group is when plotting relativeproportions
maln_protein_to_matrix_mat_pca_dat_samplesMeta = readr::read_tsv("maln_protein_to_matrix_mat_pca_dat_samplesMeta.tab.txt")maln_protein_to_matrix_mat_pca_dat_samplesMeta
# A tibble: 2,398 x 6sample country region hdbcluster reads collection_year<chr> <chr> <chr> <int> <chr> <int>
1 Ghana.~ Ghana West A~ 6 Ghan~ 20132 Ghana.~ Ghana West A~ 5 Ghan~ 20133 Guinea~ Guinea West A~ 5 Guin~ 20114 Malawi~ Malawi East A~ 5 Mala~ 20115 DRC.08~ DRC Centra~ 5 DRC.~ 20136 DRC.08~ DRC Centra~ 5 DRC.~ 20137 Ghana.~ Ghana West A~ 5 Ghan~ 20138 Gambia~ Gambia West A~ 5 Gamb~ 20089 Gambia~ Gambia West A~ 5 Gamb~ 2008
10 Gambia~ Mali West A~ 5 Mali~ 2010# ... with 2,388 more rows
maln_protein_to_matrix_mat_pca_dat_samplesMeta_sum = maln_protein_to_matrix_mat_pca_dat_samplesMeta %>%group_by(hdbcluster, collection_year) %>%
15
summarise(n = n()) %>%group_by(hdbcluster) %>%mutate(clusterTotal = sum(n)) %>%mutate(clusterFrac = n/clusterTotal) %>%group_by(collection_year) %>%mutate(yearTotal = sum(n)) %>%mutate(yearFrac = n/yearTotal)
maln_protein_to_matrix_mat_pca_dat_samplesMeta_sum_filt = maln_protein_to_matrix_mat_pca_dat_samplesMeta_sum %>%group_by() %>%filter("NA" != collection_year) %>%mutate(collection_year = as.integer(collection_year)) %>%filter(yearTotal >10)
maln_protein_to_matrix_mat_pca_dat_samplesMeta_sum_filt_yearTotals = maln_protein_to_matrix_mat_pca_dat_samplesMeta_sum_filt %>%filter() %>%select(collection_year, yearTotal) %>%unique()
maln_protein_to_matrix_mat_pca_dat_samplesMeta_sum_filt_yearTotals
# A tibble: 9 x 2collection_year yearTotal
<int> <int>1 2007 622 2008 1043 2009 1434 2010 2875 2011 9306 2012 4997 2013 2878 2002 159 2014 23
clusterColors = c("black", "#005AC8", "#AA0A3C", "#0AB45A", "#8214A0", "#FA7850", "#006E82", "#FA78FA", "black", "#005AC8", "#AA0A3C", "#0AB45A", "#8214A0", "#FA7850", "#005AC8", "#AA0A3C", "#0AB45A", "#8214A0", "#FA7850","#14D2DC")names(clusterColors) = c("0", "1", "2", "3", "4", "5", "6", "7", "Lab", "central_africa", "e_africa", "se_asia", "w_africa", "south_america","Central Africa", "East Africa", "South East Asia", "West Africa", "South America", "India")
ggplot(maln_protein_to_matrix_mat_pca_dat_samplesMeta_sum_filt %>%filter(), aes(x = collection_year, y = yearFrac, fill = as.factor(hdbcluster) ) ) +
geom_bar(stat = "identity", color = "black") +scale_fill_manual("Cluster",values = clusterColors) +scale_x_continuous(breaks = seq(min(maln_protein_to_matrix_mat_pca_dat_samplesMeta_sum_filt$collection_year), max(maln_protein_to_matrix_mat_pca_dat_samplesMeta_sum_filt$collection_year))) +theme_bw() + ggtitle("") +theme(axis.text.x = element_text(family = "Helvetica",face="plain", colour="#000000", angle = 90, hjust = 1),
axis.title = element_text(family = "Helvetica", face="bold", colour="#000000"),plot.title = element_text(family = "Helvetica", face="bold", colour="#000000", hjust = 0.5),panel.border = element_blank(),panel.grid.major.x = element_blank(),axis.ticks.x = element_blank()) +
geom_text(data = maln_protein_to_matrix_mat_pca_dat_samplesMeta_sum_filt_yearTotals,aes(x = collection_year, y = 1.07, label = paste0("n=", yearTotal), angle = 45) ,inherit.aes = F) +
labs(x = "Collection Year", y = "Relative Proportions")
16
n=62
n=10
4
n=14
3
n=28
7
n=93
0
n=49
9
n=28
7n=
15n=
23
0.0
0.3
0.6
0.9
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
Collection Year
Rel
ativ
e P
ropo
rtio
ns
Cluster
1
2
3
4
5
6
Dividing Data into Quantiles
Some times it might be helpful to split data into different bins by creating evenly sized quantiles. This can bedone by using the ntile() function. Here we are creating quantiles based off of the sd of the Lps exposure,to bin genes by how variable their expression is during the Lps time points.
ts_longFormat_lps_sum = ts_longFormat %>%filter(exposure == "Lps") %>%group_by(gene) %>%summarise(lps_sd = sd(expression))
ts_longFormat_lps_sum
# A tibble: 25,807 x 2gene lps_sd<chr> <dbl>
1 A1BG 0.4882 A1BG-AS1 0.4613 A1CF 0.03534 A2M 173.5 A2M-AS1 0.5526 A2ML1 0.01357 A2MP1 0.03938 A3GALT2 0.02619 A4GALT 3.06
10 A4GNT 0.0519# ... with 25,797 more rows
17
ntiles = 2000ts_longFormat_lps_sum = ts_longFormat_lps_sum %>%
mutate(lps_sd_quantile = ntile(lps_sd, ntiles))
We can add this quantile information back to the original data set by using the left_join() function, whichtakes a data frame and takes another data frame with which it shares columns, by matching information inthe shared columns left_join adds what ever columns the first data frame doesn’t have and populates thesecolumns by matching up the data in the shared columns.
ts_longFormat = ts_longFormat %>%left_join(ts_longFormat_lps_sum)
ts_longFormat
# A tibble: 490,333 x 6gene exposure time expression lps_sd lps_sd_quantile<chr> <chr> <dbl> <dbl> <dbl> <int>
1 A1BG Ctrl 0. 5.41 4.88e-1 10502 A1BG-~ Ctrl 0. 1.72 4.61e-1 10403 A1CF Ctrl 0. 0.0504 3.53e-2 6274 A2M Ctrl 0. 708. 1.73e+2 19775 A2M-A~ Ctrl 0. 1.38 5.52e-1 10706 A2ML1 Ctrl 0. 0.0275 1.35e-2 4837 A2MP1 Ctrl 0. 0.0329 3.93e-2 6428 A3GAL~ Ctrl 0. 0.0229 2.61e-2 5789 A4GALT Ctrl 0. 3.85 3.06e+0 1386
10 A4GNT Ctrl 0. 0.120 5.19e-2 694# ... with 490,323 more rows
Let’s take the 2000th quantile
top_ts_longFormat = ts_longFormat %>%filter(lps_sd_quantile == 2000)
bottom_ts_longFormat = ts_longFormat %>%filter(lps_sd_quantile == 1)
And plot all the genes by using facet_wrap to seperate out the genes and allow their y axis to be differentbetween panels.
ggplot(top_ts_longFormat, aes(x = time, y = expression, color = exposure)) +geom_point() +geom_line() +scale_color_brewer(palette = "Dark2") +facet_wrap(~gene, scales = "free_y")
18
IL1RN LIPA SOD2 TGM2
CCL4 CD74 FTH1 FTL
B2M CCL18 CCL22 CCL3
0 5 10152025 0 5 10152025 0 5 10152025 0 5 10152025
0
2500
5000
7500
10000
20000
40000
60000
80000
2500
5000
7500
10000
0
5000
10000
15000
10000
20000
30000
40000
50000
0
5000
10000
5000
10000
15000
20000
10000
15000
2500
5000
7500
10000
5000
10000
15000
20000
0
2000
4000
6000
8000
2500
5000
7500
10000
12500
time
expr
essi
on
exposure
Ctrl
Ifnb
Lps
R848
The facet_wrap function also allows you set how many columns to have by using the ncol= arguments.
ggplot(top_ts_longFormat, aes(x = time, y = expression, color = exposure)) +geom_point() +geom_line() +scale_color_brewer(palette = "Dark2") +facet_wrap(~gene, scales = "free_y", ncol = 3)
19
LIPA SOD2 TGM2
FTH1 FTL IL1RN
CCL3 CCL4 CD74
B2M CCL18 CCL22
0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25
0
5000
10000
15000
10000
15000
250050007500
1000012500
2500
5000
7500
10000
5000
10000
15000
20000
02000400060008000
20000
40000
60000
80000
0
5000
10000
5000
10000
15000
20000
0250050007500
10000
1000020000300004000050000
250050007500
10000
time
expr
essi
on
exposure
Ctrl
Ifnb
Lps
R848
facet_grid is another faceting function that sets out things in a grid pattern which is better for showingrelationships, the face_wrap just create a panel for each level and puts these panels in the order that thelevels go but facet_grid will layout the panels in a grid.
ggplot(top_ts_longFormat, aes(x = gene, y = expression, fill = gene)) +geom_bar(stat = "identity", color = "black") +scale_fill_brewer(palette = "Paired") +facet_grid(time~exposure) +theme(axis.text.x = element_text(angle = -45, hjust = 0))
20
Ctrl Ifnb Lps R848
01
24
612
24
B2MCCL18
CCL22
CCL3CCL4
CD74FTH1
FTLIL1RN
LIPASOD2
TGM2
B2MCCL18
CCL22
CCL3CCL4
CD74FTH1
FTLIL1RN
LIPASOD2
TGM2
B2MCCL18
CCL22
CCL3CCL4
CD74FTH1
FTLIL1RN
LIPASOD2
TGM2
B2MCCL18
CCL22
CCL3CCL4
CD74FTH1
FTLIL1RN
LIPASOD2
TGM2
020000400006000080000
020000400006000080000
020000400006000080000
020000400006000080000
020000400006000080000
020000400006000080000
020000400006000080000
gene
expr
essi
on
gene
B2M
CCL18
CCL22
CCL3
CCL4
CD74
FTH1
FTL
IL1RN
LIPA
SOD2
TGM2
Also the library cowplot is a great library for setting up completely different plots in custum sized panels likein a figure.
Part 1 Excerices
Using the Temperature data frame from last sessionsAverage Temperatures USA
1. Create a bar plot of temperatures for 1995 for Boston and put the temperatures on top of the bars,x-axis = month, y-axis = temperature
2. Create a line graph for all years in Boston and put the name of the year next to the line after December, x-axis = month, y-axis = temperature
3. Create a bar plot for all years in Boston but facet the plot so each year has its own panel , x-axis =month, y-axis = temperature
4. Create quantiles with 100 bins for mean temperatures over all years for each Station, and take the100th bin and create a bar graph, with x-axis Station_Name, y-axis temperate, and using face_gridplot month by year
RMarkdown
Markdown is the term for a way of writing plain text files with certain syntax that when given to a programwill render the contents into a rich document, like an HTML document. Many different flavors of Markdown
21
exist but most follow similar rules. RMarkdown is a flavor of markdown that allows for inserting R Codeinto the document that will then run and the output of the code will be captured and placed into the finaldocument. This is a great way to create an information document for your R code, creating R examples, andbecause the final output is an HTML document they can include interactive graphs and tables that R helpsto create. In fact all Session pages so far have been created by using RMarkdown, for example here is thedocument that created this page itself Session 6.
There are many features offered by RMarkdown, here are a few cheatsheets that RStudio offers that help and aregreat references guides, https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdfand https://www.rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf.
Below is an example of how text looks like rendering and how syntax controls the output
Figure 1:
Within RStudio you can create a new RMarkdown by click the + symbol in the top left corner. And younormally just pick HTML for output. When you do this, RStudio will ask to install the libraries needed tocreate RMarkdowns.
Below is the default RMarkdown document created when creating a new Document
R Code Chunks
Below is an example of a R code chunk
Important nodes of about r chunks
22
Figure 2:
23
Figure 3:
24
Figure 4:
Figure 5:
25
• The whole document is ran in a brand new R Session and therefore libraries need to be loaded at thebeginning of the document
• Each r code is ran in the same R Session, meaning all the R code is ran as if it you took all the R codeand pasted into one R script and ran it
• When naming chunks, the name must always be unique (the name above for this chunk is pressure andcannot be used again)
• Options given to the chunk are separated by commas• The working directory of the R code executed is the directory where the RMarkdown document is
located• The resulting output document is in the same directory as the RMarkdown document.
Some important and commonly used options to
• echo - This will control if the R code itself is shown in the output document (by default it is)
• eval - This will control if the R code is executed, if this is set to FALSE the code will be shown butnot executed (this might be good for when trying to show R examples but don’t want the code to execute)
• fig.width - This will affect the width of the captured output of the code chunk, important for plots
• fig.height - This will affect the height of the captured output of the code chunk, important for plots
And there are many more options, see the reference/cheatsheets for examples.
Once you want to create the output document, you hit the knit button.
Part 2. Exercises
1. Create a directory and put the temperature dataset in it and create a new RMarkdown and save it inthe same directory.
2. Take the code from Part 1 and put it in the RMarkdown to create a HTMl page of the plots youcreated, add a Header for each plot (by using the # symbol).
3. By looking at the cheatsheets, try to figure out how to add a table of contents to document.
26