stat. 651 ggplot2cox.csueastbay.edu/~esuess/classes/statistics_651/...laos kyrgyzstan latvia lebanon...

32
Stat. 651 ggplot2 Prof. Eric A. Suess ggplot2 examples library(tidyverse) library(mdsr) CIACounties Make the base plot g and then add different layers on to it. head(CIACountries) ## country pop area oil_prod gdp educ roadways net_users ## 1 Afghanistan 32564342 652230 0 1900 NA 0.06462444 >5% ## 2 Albania 3029278 28748 20510 11900 3.3 0.62613051 >35% ## 3 Algeria 39542166 2381741 1420000 14500 4.3 0.04771929 >15% ## 4 American Samoa 54343 199 0 13000 NA 1.21105528 <NA> ## 5 Andorra 85580 468 NA 37200 NA 0.68376068 >60% ## 6 Angola 19625353 1246700 1742000 7300 3.5 0.04125211 >15% # base plot g g <- CIACountries %>% ggplot(aes(y= gdp, x= educ)) g + geom_point() ## Warning: Removed 64 rows containing missing values (geom_point). 1

Upload: others

Post on 18-Apr-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Stat. 651 ggplot2cox.csueastbay.edu/~esuess/classes/Statistics_651/...Laos Kyrgyzstan Latvia Lebanon Liberia Lesotho Liechtenstein Lithuania MacauLuxembourg Madagascar Malawi Malaysia

Stat. 651 ggplot2Prof. Eric A. Suess

ggplot2 exampleslibrary(tidyverse)library(mdsr)

CIACounties

Make the base plot g and then add different layers on to it.head(CIACountries)

## country pop area oil_prod gdp educ roadways net_users## 1 Afghanistan 32564342 652230 0 1900 NA 0.06462444 >5%## 2 Albania 3029278 28748 20510 11900 3.3 0.62613051 >35%## 3 Algeria 39542166 2381741 1420000 14500 4.3 0.04771929 >15%## 4 American Samoa 54343 199 0 13000 NA 1.21105528 <NA>## 5 Andorra 85580 468 NA 37200 NA 0.68376068 >60%## 6 Angola 19625353 1246700 1742000 7300 3.5 0.04125211 >15%# base plot g

g <- CIACountries %>% ggplot(aes(y = gdp, x = educ))

g + geom_point()

## Warning: Removed 64 rows containing missing values (geom_point).

1

Page 2: Stat. 651 ggplot2cox.csueastbay.edu/~esuess/classes/Statistics_651/...Laos Kyrgyzstan Latvia Lebanon Liberia Lesotho Liechtenstein Lithuania MacauLuxembourg Madagascar Malawi Malaysia

0e+00

5e+04

1e+05

0 5 10educ

gdp

g + geom_point(size = 3)

## Warning: Removed 64 rows containing missing values (geom_point).

2

Page 3: Stat. 651 ggplot2cox.csueastbay.edu/~esuess/classes/Statistics_651/...Laos Kyrgyzstan Latvia Lebanon Liberia Lesotho Liechtenstein Lithuania MacauLuxembourg Madagascar Malawi Malaysia

0e+00

5e+04

1e+05

0 5 10educ

gdp

g + geom_point(aes(color = net_users), size = 3)

## Warning: Removed 64 rows containing missing values (geom_point).

3

Page 4: Stat. 651 ggplot2cox.csueastbay.edu/~esuess/classes/Statistics_651/...Laos Kyrgyzstan Latvia Lebanon Liberia Lesotho Liechtenstein Lithuania MacauLuxembourg Madagascar Malawi Malaysia

0e+00

5e+04

1e+05

0 5 10educ

gdp

net_users

>0%

>5%

>15%

>35%

>60%

NA

# no geom_point used for the next picture

g + geom_text( aes(label = country, color = net_users), size = 3 )

## Warning: Removed 64 rows containing missing values (geom_text).

4

Page 5: Stat. 651 ggplot2cox.csueastbay.edu/~esuess/classes/Statistics_651/...Laos Kyrgyzstan Latvia Lebanon Liberia Lesotho Liechtenstein Lithuania MacauLuxembourg Madagascar Malawi Malaysia

AlbaniaAlgeria

AngolaAnguilla

Antigua and Barbuda Argentina

Armenia

Aruba

Australia

Austria

Azerbaijan

Bahrain

Bangladesh

BarbadosBelarus

Belgium

BelizeBenin

Bermuda

Bhutan Bolivia

BotswanaBrazil

British Virgin Islands

Brunei

Bulgaria

Burkina FasoBurma

BurundiCabo Verde

CambodiaCameroon

Canada

Central African RepublicChad

Chile

Colombia

ComorosCongo, Democratic Republic of theCongo, Republic of the

Cook IslandsCosta Rica

Cote d'Ivoire

Croatia

Cuba

CyprusCzechia

Denmark

Djibouti

DominicaDominican Republic

EcuadorEgyptEl Salvador

Equatorial Guinea

Eritrea

Estonia

Ethiopia

Fiji

FinlandFrance

Gambia, The

Georgia

Germany

Ghana

Greece

GrenadaGuatemala

GuineaGuyana

Hong Kong

Hungary

Iceland

IndiaIndonesia

Iran

Ireland

IsraelItaly

Jamaica

Japan

Kazakhstan

Kenya Kiribati

Korea, South

Kuwait

KyrgyzstanLaos

LatviaLebanon

LesothoLiberia

Liechtenstein

Lithuania

LuxembourgMacau

Madagascar Malawi

Malaysia

Maldives

Mali

Malta

Marshall IslandsMauritania

Mauritius Mexico

Moldova

Monaco

MongoliaMorocco

Mozambique

Namibia

Nepal

Netherlands

New Zealand

NicaraguaNiger

Norway

Oman

Pakistan

Palau

Panama

ParaguayPeru

Philippines

PolandPortugal

Qatar

RomaniaRussia

Rwanda

Saint Kitts and Nevis

Saint LuciaSaint Vincent and the GrenadinesSamoa Sao Tome and Principe

Saudi Arabia

Senegal

Serbia

Seychelles

Sierra Leone

Singapore

Slovakia Slovenia

Solomon Islands

South Africa

Spain

Sri Lanka Swaziland

Sweden

Switzerland

SyriaTajikistan Tanzania

Thailand

Timor−LesteTogo

Tonga

Trinidad and Tobago

Tunisia

Turkey

UgandaUkraine

United Kingdom

United States

Uruguay

Vanuatu

Venezuela

VietnamYemenZambiaZimbabwe0e+00

5e+04

1e+05

0 5 10educ

gdp

net_users

a

a

a

a

a

a

>0%

>5%

>15%

>35%

>60%

NA

g + geom_point( aes(color = net_users, size = roadways) )

## Warning: Removed 66 rows containing missing values (geom_point).

5

Page 6: Stat. 651 ggplot2cox.csueastbay.edu/~esuess/classes/Statistics_651/...Laos Kyrgyzstan Latvia Lebanon Liberia Lesotho Liechtenstein Lithuania MacauLuxembourg Madagascar Malawi Malaysia

0e+00

5e+04

1e+05

0 5 10educ

gdp

net_users

>0%

>5%

>15%

>35%

>60%

NA

roadways

10

20

30

Cahnge the scalesg + geom_point(aes(color = net_users, size = roadways)) +

coord_trans( y = "log10")

## Warning: Removed 66 rows containing missing values (geom_point).

6

Page 7: Stat. 651 ggplot2cox.csueastbay.edu/~esuess/classes/Statistics_651/...Laos Kyrgyzstan Latvia Lebanon Liberia Lesotho Liechtenstein Lithuania MacauLuxembourg Madagascar Malawi Malaysia

25000

50000

75000100000125000

2.5 5.0 7.5 10.0 12.5educ

gdp

net_users

>0%

>5%

>15%

>35%

>60%

NA

roadways

10

20

30

g + geom_point(aes(color = net_users, size = roadways)) +scale_y_continuous(name = "Gross Domestic Product", trans = "log10")

## Warning: Removed 66 rows containing missing values (geom_point).

7

Page 8: Stat. 651 ggplot2cox.csueastbay.edu/~esuess/classes/Statistics_651/...Laos Kyrgyzstan Latvia Lebanon Liberia Lesotho Liechtenstein Lithuania MacauLuxembourg Madagascar Malawi Malaysia

1e+03

1e+04

1e+05

0 5 10educ

Gro

ss D

omes

tic P

rodu

ct

net_users

>0%

>5%

>15%

>35%

>60%

NA

roadways

10

20

30

Faceting

g + geom_point(alpha = 0.9, aes(size = roadways)) +coord_trans(y = "log10") +facet_wrap( ~ net_users, nrow = 1) +theme(legend.position = "top")

## Warning: Removed 66 rows containing missing values (geom_point).

8

Page 9: Stat. 651 ggplot2cox.csueastbay.edu/~esuess/classes/Statistics_651/...Laos Kyrgyzstan Latvia Lebanon Liberia Lesotho Liechtenstein Lithuania MacauLuxembourg Madagascar Malawi Malaysia

>0% >5% >15% >35% >60% NA

2.55.07.510.012.5 2.55.07.510.012.5 2.55.07.510.012.5 2.55.07.510.012.5 2.55.07.510.012.5 2.55.07.510.012.5

25000

50000

75000100000125000

educ

gdp

roadways 10 20 30

g + geom_point(alpha = 0.9, aes(size = roadways)) +coord_trans(y = "log10") +scale_y_continuous(name = "Gross Domestic Product", trans = "log10") +facet_wrap( ~ net_users, nrow = 1) +theme(legend.position = "top")

## Warning: Removed 66 rows containing missing values (geom_point).

9

Page 10: Stat. 651 ggplot2cox.csueastbay.edu/~esuess/classes/Statistics_651/...Laos Kyrgyzstan Latvia Lebanon Liberia Lesotho Liechtenstein Lithuania MacauLuxembourg Madagascar Malawi Malaysia

>0% >5% >15% >35% >60% NA

2.55.07.510.012.5 2.55.07.510.012.5 2.55.07.510.012.5 2.55.07.510.012.5 2.55.07.510.012.5 2.55.07.510.012.5

1e+03

1e+04

1e+05

educ

Gro

ss D

omes

tic P

rodu

ctroadways 10 20 30

Export the data and try in Tableau

getwd()

## [1] "/home/esuess/classes/2019-2020/01 - Fall 2019/Stat651/Presentations/02_ggplot2"write_csv(CIACountries, "CIACountries.csv")

MedicareCharges

Check out the MEPS website for more real data.head(MedicareCharges)

## Warning: Detecting old grouped_df format, replacing `vars` attribute by## `groups`

## # A tibble: 6 x 4## # Groups: drg [1]## drg stateProvider num_charges mean_charge## <chr> <fct> <int> <dbl>## 1 039 AK 1 34805.## 2 039 AL 23 32044.## 3 039 AR 16 27463.## 4 039 AZ 24 33443.

10

Page 11: Stat. 651 ggplot2cox.csueastbay.edu/~esuess/classes/Statistics_651/...Laos Kyrgyzstan Latvia Lebanon Liberia Lesotho Liechtenstein Lithuania MacauLuxembourg Madagascar Malawi Malaysia

## 5 039 CA 67 56095.## 6 039 CO 10 35252.?MedicareCharges

NJCharges <- MedicareCharges %>% filter(stateProvider == "NJ")NJCharges

## # A tibble: 100 x 4## # Groups: drg [100]## drg stateProvider num_charges mean_charge## <chr> <fct> <int> <dbl>## 1 039 NJ 31 35104.## 2 057 NJ 55 45692.## 3 064 NJ 55 87042.## 4 065 NJ 59 59576.## 5 066 NJ 56 45819.## 6 069 NJ 61 41917.## 7 074 NJ 41 42993.## 8 101 NJ 58 42314.## 9 149 NJ 50 34916.## 10 176 NJ 36 58941.## # ... with 90 more rowsp <- NJCharges %>% ggplot(aes(y = mean_charge, x = reorder(drg, mean_charge))) +

geom_bar(fill = "grey", stat = "identity")p

0

50000

100000

150000

200000

250000

536303310313305203390684149379039948563293301641918897195392491192282312603690069101812074552202638057066394315419699300309389191439885194292378683482640391689287176281065872372190254308178602811473189418193698481291244917470682377249251064314177280871247238286243253208330480469460252246329853207870reorder(drg, mean_charge)

mea

n_ch

arge

11

Page 12: Stat. 651 ggplot2cox.csueastbay.edu/~esuess/classes/Statistics_651/...Laos Kyrgyzstan Latvia Lebanon Liberia Lesotho Liechtenstein Lithuania MacauLuxembourg Madagascar Malawi Malaysia

p <- p + ylab("Statewide Average Charges ($)") +xlab("Medical Procedure (DRG)")

p

0

50000

100000

150000

200000

250000

536303310313305203390684149379039948563293301641918897195392491192282312603690069101812074552202638057066394315419699300309389191439885194292378683482640391689287176281065872372190254308178602811473189418193698481291244917470682377249251064314177280871247238286243253208330480469460252246329853207870Medical Procedure (DRG)

Sta

tew

ide

Ave

rage

Cha

rges

($)

p <- p + theme(axis.text.x = element_text(angle = 90, hjust = 1))p

12

Page 13: Stat. 651 ggplot2cox.csueastbay.edu/~esuess/classes/Statistics_651/...Laos Kyrgyzstan Latvia Lebanon Liberia Lesotho Liechtenstein Lithuania MacauLuxembourg Madagascar Malawi Malaysia

0

50000

100000

150000

200000

250000

536

303

310

313

305

203

390

684

149

379

039

948

563

293

301

641

918

897

195

392

491

192

282

312

603

690

069

101

812

074

552

202

638

057

066

394

315

419

699

300

309

389

191

439

885

194

292

378

683

482

640

391

689

287

176

281

065

872

372

190

254

308

178

602

811

473

189

418

193

698

481

291

244

917

470

682

377

249

251

064

314

177

280

871

247

238

286

243

253

208

330

480

469

460

252

246

329

853

207

870

Medical Procedure (DRG)

Sta

tew

ide

Ave

rage

Cha

rges

($)

Now add the overall data to the plot to compare with NJ.p <- p + geom_point(data = MedicareCharges, size = 1, alpha = 0.3)p

13

Page 14: Stat. 651 ggplot2cox.csueastbay.edu/~esuess/classes/Statistics_651/...Laos Kyrgyzstan Latvia Lebanon Liberia Lesotho Liechtenstein Lithuania MacauLuxembourg Madagascar Malawi Malaysia

0e+00

1e+05

2e+05

536

303

310

313

305

203

390

684

149

379

039

948

563

293

301

641

918

897

195

392

491

192

282

312

603

690

069

101

812

074

552

202

638

057

066

394

315

419

699

300

309

389

191

439

885

194

292

378

683

482

640

391

689

287

176

281

065

872

372

190

254

308

178

602

811

473

189

418

193

698

481

291

244

917

470

682

377

249

251

064

314

177

280

871

247

238

286

243

253

208

330

480

469

460

252

246

329

853

207

870

Medical Procedure (DRG)

Sta

tew

ide

Ave

rage

Cha

rges

($)

SAT

Here is the link to the College Board SAT website.g <- SAT_2010 %>% ggplot(aes(x = math))

g + geom_histogram()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

14

Page 15: Stat. 651 ggplot2cox.csueastbay.edu/~esuess/classes/Statistics_651/...Laos Kyrgyzstan Latvia Lebanon Liberia Lesotho Liechtenstein Lithuania MacauLuxembourg Madagascar Malawi Malaysia

0

1

2

3

4

5

500 550 600math

coun

t

g + geom_histogram(binwidth = 10)

15

Page 16: Stat. 651 ggplot2cox.csueastbay.edu/~esuess/classes/Statistics_651/...Laos Kyrgyzstan Latvia Lebanon Liberia Lesotho Liechtenstein Lithuania MacauLuxembourg Madagascar Malawi Malaysia

0

2

4

6

500 550 600math

coun

t

g + geom_density()

16

Page 17: Stat. 651 ggplot2cox.csueastbay.edu/~esuess/classes/Statistics_651/...Laos Kyrgyzstan Latvia Lebanon Liberia Lesotho Liechtenstein Lithuania MacauLuxembourg Madagascar Malawi Malaysia

0.0000

0.0025

0.0050

0.0075

0.0100

500 550 600math

dens

ity

ggplot( data = head(SAT_2010, 10), aes( y = math, x = reorder(state, math) ) ) +geom_bar(stat = "identity")

17

Page 18: Stat. 651 ggplot2cox.csueastbay.edu/~esuess/classes/Statistics_651/...Laos Kyrgyzstan Latvia Lebanon Liberia Lesotho Liechtenstein Lithuania MacauLuxembourg Madagascar Malawi Malaysia

0

200

400

600

Georgia Delaware Florida Connecticut Alaska California Arizona Alabama Arkansas Coloradoreorder(state, math)

mat

h

Scatterplot with tend linesg <- SAT_2010 %>% ggplot(aes(x = expenditure, y = math)) +

geom_point()g

18

Page 19: Stat. 651 ggplot2cox.csueastbay.edu/~esuess/classes/Statistics_651/...Laos Kyrgyzstan Latvia Lebanon Liberia Lesotho Liechtenstein Lithuania MacauLuxembourg Madagascar Malawi Malaysia

500

550

600

10 15 20expenditure

mat

h

g <- g + geom_smooth(method="lm", se = 0) +xlab("Average expenditure per student ($100)") +ylab("Average score on math SAT")

g

19

Page 20: Stat. 651 ggplot2cox.csueastbay.edu/~esuess/classes/Statistics_651/...Laos Kyrgyzstan Latvia Lebanon Liberia Lesotho Liechtenstein Lithuania MacauLuxembourg Madagascar Malawi Malaysia

500

550

600

10 15 20Average expenditure per student ($100)

Ave

rage

sco

re o

n m

ath

SAT

Add the trend line within groups representing rate of taking the test.SAT_2010 <- SAT_2010 %>%

mutate(SAT_rate = cut(sat_pct, breaks = c(0, 30, 60, 100), labels = c("low", "medium", "high") ))

g <- g %+% SAT_2010g

20

Page 21: Stat. 651 ggplot2cox.csueastbay.edu/~esuess/classes/Statistics_651/...Laos Kyrgyzstan Latvia Lebanon Liberia Lesotho Liechtenstein Lithuania MacauLuxembourg Madagascar Malawi Malaysia

500

550

600

10 15 20Average expenditure per student ($100)

Ave

rage

sco

re o

n m

ath

SAT

g + aes(color = SAT_rate)

21

Page 22: Stat. 651 ggplot2cox.csueastbay.edu/~esuess/classes/Statistics_651/...Laos Kyrgyzstan Latvia Lebanon Liberia Lesotho Liechtenstein Lithuania MacauLuxembourg Madagascar Malawi Malaysia

500

550

600

10 15 20Average expenditure per student ($100)

Ave

rage

sco

re o

n m

ath

SAT

SAT_rate

low

medium

high

g +facet_wrap( ~ SAT_rate)

22

Page 23: Stat. 651 ggplot2cox.csueastbay.edu/~esuess/classes/Statistics_651/...Laos Kyrgyzstan Latvia Lebanon Liberia Lesotho Liechtenstein Lithuania MacauLuxembourg Madagascar Malawi Malaysia

low medium high

10 15 20 10 15 20 10 15 20

500

550

600

Average expenditure per student ($100)

Ave

rage

sco

re o

n m

ath

SAT

HELPPrct

Here is the link to the NSDUH website.HELPrct %>% ggplot(aes(x = homeless)) +

geom_bar(aes(fill = substance), position = "fill") +coord_flip()

23

Page 24: Stat. 651 ggplot2cox.csueastbay.edu/~esuess/classes/Statistics_651/...Laos Kyrgyzstan Latvia Lebanon Liberia Lesotho Liechtenstein Lithuania MacauLuxembourg Madagascar Malawi Malaysia

homeless

housed

0.00 0.25 0.50 0.75 1.00count

hom

eles

s

substance

alcohol

cocaine

heroin

NHANES

Here is the link to the NHANES website.library(NHANES)

head(NHANES)

## # A tibble: 6 x 76## ID SurveyYr Gender Age AgeDecade AgeMonths Race1 Race3 Education## <int> <fct> <fct> <int> <fct> <int> <fct> <fct> <fct>## 1 51624 2009_10 male 34 " 30-39" 409 White <NA> High Sch~## 2 51624 2009_10 male 34 " 30-39" 409 White <NA> High Sch~## 3 51624 2009_10 male 34 " 30-39" 409 White <NA> High Sch~## 4 51625 2009_10 male 4 " 0-9" 49 Other <NA> <NA>## 5 51630 2009_10 female 49 " 40-49" 596 White <NA> Some Col~## 6 51638 2009_10 male 9 " 0-9" 115 White <NA> <NA>## # ... with 67 more variables: MaritalStatus <fct>, HHIncome <fct>,## # HHIncomeMid <int>, Poverty <dbl>, HomeRooms <int>, HomeOwn <fct>,## # Work <fct>, Weight <dbl>, Length <dbl>, HeadCirc <dbl>, Height <dbl>,## # BMI <dbl>, BMICatUnder20yrs <fct>, BMI_WHO <fct>, Pulse <int>,## # BPSysAve <int>, BPDiaAve <int>, BPSys1 <int>, BPDia1 <int>,## # BPSys2 <int>, BPDia2 <int>, BPSys3 <int>, BPDia3 <int>,## # Testosterone <dbl>, DirectChol <dbl>, TotChol <dbl>, UrineVol1 <int>,## # UrineFlow1 <dbl>, UrineVol2 <int>, UrineFlow2 <dbl>, Diabetes <fct>,## # DiabetesAge <int>, HealthGen <fct>, DaysPhysHlthBad <int>,

24

Page 25: Stat. 651 ggplot2cox.csueastbay.edu/~esuess/classes/Statistics_651/...Laos Kyrgyzstan Latvia Lebanon Liberia Lesotho Liechtenstein Lithuania MacauLuxembourg Madagascar Malawi Malaysia

## # DaysMentHlthBad <int>, LittleInterest <fct>, Depressed <fct>,## # nPregnancies <int>, nBabies <int>, Age1stBaby <int>,## # SleepHrsNight <int>, SleepTrouble <fct>, PhysActive <fct>,## # PhysActiveDays <int>, TVHrsDay <fct>, CompHrsDay <fct>,## # TVHrsDayChild <int>, CompHrsDayChild <int>, Alcohol12PlusYr <fct>,## # AlcoholDay <int>, AlcoholYear <int>, SmokeNow <fct>, Smoke100 <fct>,## # Smoke100n <fct>, SmokeAge <int>, Marijuana <fct>, AgeFirstMarij <int>,## # RegularMarij <fct>, AgeRegMarij <int>, HardDrugs <fct>, SexEver <fct>,## # SexAge <int>, SexNumPartnLife <int>, SexNumPartYear <int>,## # SameSex <fct>, SexOrientation <fct>, PregnantNow <fct>

Take a sample first and then make the plot.sample_n(NHANES, size = 1000) %>% ggplot(aes(x = Age, y = Height, color = Gender)) +

geom_point() +geom_smooth() +xlab("Age (years)") +ylab("Height (cm)")

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

## Warning: Removed 42 rows containing non-finite values (stat_smooth).

## Warning: Removed 42 rows containing missing values (geom_point).

100

125

150

175

200

0 20 40 60 80Age (years)

Hei

ght (

cm)

Gender

female

male

Here is an alternative plot using all the data. This is hexbin plot.NHANES %>% ggplot(aes(x = Age, y = Height, color = Gender)) +

geom_hex() +geom_smooth() +

25

Page 26: Stat. 651 ggplot2cox.csueastbay.edu/~esuess/classes/Statistics_651/...Laos Kyrgyzstan Latvia Lebanon Liberia Lesotho Liechtenstein Lithuania MacauLuxembourg Madagascar Malawi Malaysia

xlab("Age (years)") +ylab("Height (cm)")

## Warning: Removed 353 rows containing non-finite values (stat_binhex).

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

## Warning: Removed 353 rows containing non-finite values (stat_smooth).

100

125

150

175

200

0 20 40 60 80Age (years)

Hei

ght (

cm)

Gender

female

male

10

20

30

40

50

count

library(mosaic)

head(NHANES)

## # A tibble: 6 x 76## ID SurveyYr Gender Age AgeDecade AgeMonths Race1 Race3 Education## <int> <fct> <fct> <int> <fct> <int> <fct> <fct> <fct>## 1 51624 2009_10 male 34 " 30-39" 409 White <NA> High Sch~## 2 51624 2009_10 male 34 " 30-39" 409 White <NA> High Sch~## 3 51624 2009_10 male 34 " 30-39" 409 White <NA> High Sch~## 4 51625 2009_10 male 4 " 0-9" 49 Other <NA> <NA>## 5 51630 2009_10 female 49 " 40-49" 596 White <NA> Some Col~## 6 51638 2009_10 male 9 " 0-9" 115 White <NA> <NA>## # ... with 67 more variables: MaritalStatus <fct>, HHIncome <fct>,## # HHIncomeMid <int>, Poverty <dbl>, HomeRooms <int>, HomeOwn <fct>,## # Work <fct>, Weight <dbl>, Length <dbl>, HeadCirc <dbl>, Height <dbl>,## # BMI <dbl>, BMICatUnder20yrs <fct>, BMI_WHO <fct>, Pulse <int>,## # BPSysAve <int>, BPDiaAve <int>, BPSys1 <int>, BPDia1 <int>,## # BPSys2 <int>, BPDia2 <int>, BPSys3 <int>, BPDia3 <int>,

26

Page 27: Stat. 651 ggplot2cox.csueastbay.edu/~esuess/classes/Statistics_651/...Laos Kyrgyzstan Latvia Lebanon Liberia Lesotho Liechtenstein Lithuania MacauLuxembourg Madagascar Malawi Malaysia

## # Testosterone <dbl>, DirectChol <dbl>, TotChol <dbl>, UrineVol1 <int>,## # UrineFlow1 <dbl>, UrineVol2 <int>, UrineFlow2 <dbl>, Diabetes <fct>,## # DiabetesAge <int>, HealthGen <fct>, DaysPhysHlthBad <int>,## # DaysMentHlthBad <int>, LittleInterest <fct>, Depressed <fct>,## # nPregnancies <int>, nBabies <int>, Age1stBaby <int>,## # SleepHrsNight <int>, SleepTrouble <fct>, PhysActive <fct>,## # PhysActiveDays <int>, TVHrsDay <fct>, CompHrsDay <fct>,## # TVHrsDayChild <int>, CompHrsDayChild <int>, Alcohol12PlusYr <fct>,## # AlcoholDay <int>, AlcoholYear <int>, SmokeNow <fct>, Smoke100 <fct>,## # Smoke100n <fct>, SmokeAge <int>, Marijuana <fct>, AgeFirstMarij <int>,## # RegularMarij <fct>, AgeRegMarij <int>, HardDrugs <fct>, SexEver <fct>,## # SexAge <int>, SexNumPartnLife <int>, SexNumPartYear <int>,## # SameSex <fct>, SexOrientation <fct>, PregnantNow <fct>NHANES2 <- NHANES %>% select(AgeDecade, BMI_WHO)head(NHANES2)

## # A tibble: 6 x 2## AgeDecade BMI_WHO## <fct> <fct>## 1 " 30-39" 30.0_plus## 2 " 30-39" 30.0_plus## 3 " 30-39" 30.0_plus## 4 " 0-9" 12.0_18.5## 5 " 40-49" 30.0_plus## 6 " 0-9" 12.0_18.5NHANES2_table <- table(NHANES2)NHANES2_table

## BMI_WHO## AgeDecade 12.0_18.5 18.5_to_24.9 25.0_to_29.9 30.0_plus## 0-9 873 193 28 7## 10-19 280 664 244 172## 20-29 49 526 349 418## 30-39 10 394 433 495## 40-49 26 371 475 506## 50-59 15 314 487 477## 60-69 8 199 321 373## 70+ 6 142 207 218mosaicplot(NHANES2_table, color = TRUE)

27

Page 28: Stat. 651 ggplot2cox.csueastbay.edu/~esuess/classes/Statistics_651/...Laos Kyrgyzstan Latvia Lebanon Liberia Lesotho Liechtenstein Lithuania MacauLuxembourg Madagascar Malawi Malaysia

NHANES2_table

AgeDecade

BM

I_W

HO

0−9 10−19 20−29 30−39 40−49 50−59 60−69 70+12

.0_1

8.5

18.5

_to_

24.9

25.0

_to_

29.9

30.0

_plu

s

Weather

library(macleish)

## Loading required package: etlhead(whately_2015)

## # A tibble: 6 x 8## when temperature wind_speed wind_dir rel_humidity pressure## <dttm> <dbl> <dbl> <dbl> <dbl> <int>## 1 2015-01-01 00:00:00 -9.32 1.40 225. 54.6 985## 2 2015-01-01 00:10:00 -9.46 1.51 248. 55.4 985## 3 2015-01-01 00:20:00 -9.44 1.62 258. 56.2 985## 4 2015-01-01 00:30:00 -9.3 1.14 244. 56.4 985## 5 2015-01-01 00:40:00 -9.32 1.22 238. 56.9 984## 6 2015-01-01 00:50:00 -9.34 1.09 242. 57.2 984## # ... with 2 more variables: solar_radiation <dbl>, rainfall <int>whately_2015 %>% ggplot(aes(x = when, y=temperature)) +

geom_line(color = "darkgrey") +geom_smooth() +xlab(NULL) +ylab("Tempurature (degrees Fahrenheit)")

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

28

Page 29: Stat. 651 ggplot2cox.csueastbay.edu/~esuess/classes/Statistics_651/...Laos Kyrgyzstan Latvia Lebanon Liberia Lesotho Liechtenstein Lithuania MacauLuxembourg Madagascar Malawi Malaysia

−20

0

20

Jan 2015 Apr 2015 Jul 2015 Oct 2015 Jan 2016

Tem

pura

ture

(de

gree

s Fa

hren

heit)

Here is the link to the choroplethr website.library(choroplethr)

## Loading required package: acs

## Loading required package: XML

#### Attaching package: 'acs'

## The following object is masked from 'package:dplyr':#### combine

## The following object is masked from 'package:base':#### applylibrary(choroplethrMaps)library(rUnemploymentData)

animated_state_unemployment_choropleth()

## Warning: Column `region` joining character vector and factor, coercing into## character vector

## Warning: Column `region` joining character vector and factor, coercing into## character vector

29

Page 30: Stat. 651 ggplot2cox.csueastbay.edu/~esuess/classes/Statistics_651/...Laos Kyrgyzstan Latvia Lebanon Liberia Lesotho Liechtenstein Lithuania MacauLuxembourg Madagascar Malawi Malaysia

## Warning: Column `region` joining character vector and factor, coercing into## character vector

## Warning: Column `region` joining character vector and factor, coercing into## character vector

## Warning: Column `region` joining character vector and factor, coercing into## character vector

## Warning: Column `region` joining character vector and factor, coercing into## character vector

## Warning: Column `region` joining character vector and factor, coercing into## character vector

## Warning: Column `region` joining character vector and factor, coercing into## character vector

## Warning: Column `region` joining character vector and factor, coercing into## character vector

## Warning: Column `region` joining character vector and factor, coercing into## character vector

## Warning: Column `region` joining character vector and factor, coercing into## character vector

## Warning: Column `region` joining character vector and factor, coercing into## character vector

## Warning: Column `region` joining character vector and factor, coercing into## character vector

## Warning: Column `region` joining character vector and factor, coercing into## character vector

## [1] "All files will be written to the current working directory: /home/esuess/classes/2019-2020/01 - Fall 2019/Stat651/Presentations/02_ggplot2 . To change this use setwd()"## [1] "Now writing individual choropleth files there as 'choropleth_1.png', 'choropleth_2.png', etc."

## Saving 6.5 x 4.5 in image

## Saving 6.5 x 4.5 in image## Saving 6.5 x 4.5 in image## Saving 6.5 x 4.5 in image## Saving 6.5 x 4.5 in image## Saving 6.5 x 4.5 in image## Saving 6.5 x 4.5 in image## Saving 6.5 x 4.5 in image## Saving 6.5 x 4.5 in image## Saving 6.5 x 4.5 in image## Saving 6.5 x 4.5 in image## Saving 6.5 x 4.5 in image## Saving 6.5 x 4.5 in image## Saving 6.5 x 4.5 in image

## [1] "Now writing code to animate all images in 'animated_choropleth.html'. Please open that file with a browser."

30

Page 31: Stat. 651 ggplot2cox.csueastbay.edu/~esuess/classes/Statistics_651/...Laos Kyrgyzstan Latvia Lebanon Liberia Lesotho Liechtenstein Lithuania MacauLuxembourg Madagascar Malawi Malaysia

# animated_county_unemployment_choropleth()

Networks

Check out ggnet2.

Example 4.library(GGally)

#### Attaching package: 'GGally'

## The following object is masked from 'package:dplyr':#### nasalibrary(network)

## network: Classes for Relational Data## Version 1.15 created on 2019-04-01.## copyright (c) 2005, Carter T. Butts, University of California-Irvine## Mark S. Handcock, University of California -- Los Angeles## David R. Hunter, Penn State University## Martina Morris, University of Washington## Skye Bender-deMoll, University of Washington## For citation information, type citation("network").## Type help("network-package") to get started.library(sna)

## Loading required package: statnet.common

#### Attaching package: 'statnet.common'

## The following object is masked from 'package:base':#### order

## sna: Tools for Social Network Analysis## Version 2.4 created on 2016-07-23.## copyright (c) 2005, Carter T. Butts, University of California-Irvine## For citation information, type citation("sna").## Type help(package="sna") to get started.library(ggplot2)

# root URLr = "https://raw.githubusercontent.com/briatte/ggnet/master/"

# read nodesv = read.csv(paste0(r, "inst/extdata/nodes.tsv"), sep = "\t")names(v)

## [1] "Sexe" "Prénom"## [3] "Nom" "Groupe"

31

Page 32: Stat. 651 ggplot2cox.csueastbay.edu/~esuess/classes/Statistics_651/...Laos Kyrgyzstan Latvia Lebanon Liberia Lesotho Liechtenstein Lithuania MacauLuxembourg Madagascar Malawi Malaysia

## [5] "Département.d.élection" "Num.circonscription"## [7] "Commission.permanente" "Twitter"# read edgese = read.csv(paste0(r, "inst/extdata/network.tsv"), sep = "\t")names(e)

## [1] "Source" "Target"# network objectnet = network(e, directed = TRUE)

# party affiliationx = data.frame(Twitter = network.vertex.names(net))x = merge(x, v, by = "Twitter", sort = FALSE)$Groupenet %v% "party" = as.character(x)

# color palettey = RColorBrewer::brewer.pal(9, "Set1")[ c(3, 1, 9, 6, 8, 5, 2) ]names(y) = levels(x)

# network plotggnet2(net, color = "party", palette = y, alpha = 0.75, size = 4, edge.alpha = 0.5)

party

Ecolo

GDR

NI

RRDP

SRC

UDI

UMP

Review Table 3.3 on page 47 for the different kinds of plots that can be made for different kinds of x, yvariables.

Continue with the Extended example: Historical baby names on page 48.

32