feasibility of collection and analysis of microbiome data

24
Feasibility of collection and analysis of microbiome data in a longitudinal randomized trial of community gardening Running head: RCT community gardening and microbiome Mireia Gascon a,b,c , Kylie K. Harrall d , Alyssa W. Beavers e , Deborah H. Glueck f , Maggie A. Stanislawski d,g , Katherine Alaimo h , Angel Villalobos i , James R. Hebert j , Kelsey Dexter k , Kaigang Li l , Jill Litt a,i* Affiliations a ISGlobal, Barcelona, Spain b Universitat Pompeu Fabra (UPF), Barcelona, Spain c CIBER Epidemiología y Salud Pública (CIBERESP), Barcelona, Spain d Lifecourse Epidemiology of Adiposity and Diabetes Center, Colorado School of Public Health, University of Colorado Denver, Aurora, Colorado, United States of America e Department of Food Science and Human Nutrition, Michigan State University, East Lansing, Michigan, United States of America f Department of Pediatrics, University of Colorado School of Medicine, University of Colorado Denver, Anschutz Medical Campus, Aurora, Colorado, United States of America g Department of Epidemiology, University of Colorado School of Public Health, University of Colorado Denver, Anschutz Medical Campus, Aurora, Colorado, United States of America h Department of Food Science and Human Nutrition, Michigan State University, Michigan, United States of America

Upload: others

Post on 23-Jun-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Feasibility of collection and analysis of microbiome data

Feasibility of collection and analysis of microbiome data in a longitudinal

randomized trial of community gardening

Running head: RCT community gardening and microbiome

Mireia Gascona,b,c, Kylie K. Harralld, Alyssa W. Beaverse, Deborah H. Glueckf, Maggie

A. Stanislawskid,g, Katherine Alaimoh, Angel Villalobosi, James R. Hebertj, Kelsey

Dexterk, Kaigang Lil, Jill Litta,i*

Affiliations

aISGlobal, Barcelona, Spain

bUniversitat Pompeu Fabra (UPF), Barcelona, Spain

cCIBER Epidemiología y Salud Pública (CIBERESP), Barcelona, Spain

dLifecourse Epidemiology of Adiposity and Diabetes Center, Colorado School of Public

Health, University of Colorado Denver, Aurora, Colorado, United States of America

eDepartment of Food Science and Human Nutrition, Michigan State University, East

Lansing, Michigan, United States of America

fDepartment of Pediatrics, University of Colorado School of Medicine, University of

Colorado Denver, Anschutz Medical Campus, Aurora, Colorado, United States of

America

gDepartment of Epidemiology, University of Colorado School of Public Health,

University of Colorado Denver, Anschutz Medical Campus, Aurora, Colorado, United

States of America

hDepartment of Food Science and Human Nutrition, Michigan State University,

Michigan, United States of America

Page 2: Feasibility of collection and analysis of microbiome data

iEnvironmental Studies, University of Colorado Boulder, Boulder, Colorado, United

States of America

jDepartment of Epidemiology and Biostatistics and Cancer Prevention and Control

Program, Arnold School of Public Health, University of South Carolina, Columbia,

United States of America

kDepartment of Endocrinology, University of Colorado School of Public Health,

University of Colorado Denver, Anschutz Medical Campus, Aurora, Colorado, United

States of America

lDepartment of Health & Exercise Science, Colorado State University, Colorado, United

States of America

Correspondence to:

*Jill Litt, [email protected]

4001 Discovery Drive, Boulder, Colorado 80303

303-735-4519

Page 3: Feasibility of collection and analysis of microbiome data

Author contributions

M. Gascon contributions: formal analysis, writing, editing; KK. Harrall contributions:

data curation, methodology, formal analysis, writing, editing; AW. Beavers contributions:

data curation, methodology, formal analysis, writing, editing; DH. Glueck contributions:

conceptualization, methodology, formal analysis, writing, editing. MA. Stanislawski

contributions: formal analysis, writing, editing; K. Alaimo contributions:

conceptualization, funding acquisition, methodology, validation, writing, editing; A.

Villalobos contributions: methodology, validation, writing, editing; JR. Hebert

contributions: methodology, validation, writing, editing; K. Dexter contributions:

methodology, validation, writing, editing; K. Li contributions: methodology, validation,

writing, editing; J. Litt contributions: conceptualization, funding acquisition,

investigation, methodology, supervision, writing and editing.

Acknowledgements

We would like to thank Robert Knight and his lab, particularly Daniel McDonald, for the

analysis of the microbiome, and Allyson Masunaga Goto, Jessica Metcalf and Lara

Fahnestock for their advice and contribution in the project.

Funding

This study was funded by the University of Colorado Boulder Population Center (CUPC,

Litt, PI), through the National Institute of Child Health & Human Development of the

National Institutes of Health under Award Number P2CHD066613-06 and the Center for

Microbiome Innovation at the University of California San Diego. We also received

supplemental funding through the Clinical & Translational Research Center (CTRC) to

cover all laboratory costs (Litt, PI). Mireia Gascon received a fellowship from the Societat

Econòmica Barcelonesa d’Amics del País (SEBAP) in 2018, Barcelona (Catalonia), for

Page 4: Feasibility of collection and analysis of microbiome data

her research stay at the University of Colorado to conduct the statistical analysis for this

work. DHG was supported, in part, by R01GM121-81 and R25 GM111901.

Ethical conduct of research

The DGEM study had the ethical approval of the University of Colorado Boulder

Institutional Review Board Office (Protocol #: 16-0644).

Data sharing statement

The authors certify that this manuscript reports original clinical trial data. Individual, de-

identified participant data that underlie the results reported in this article (text, tables,

figures, and appendices) are available from the corresponding author (Jill Litt:

[email protected]) following publication, including the clinical study report and study

protocol.

Word count: 5817

Figure number: 5

Table number: 2

Page 5: Feasibility of collection and analysis of microbiome data

1

Appendices

The Denver Garden Environment and Microbiome (DGEM) feasibility study: a roadmap for the

analysis of microbiome results for a randomized controlled trial of community gardening.

Mireia Gascon, Kylie K. Harrall, Alyssa W. Beavers, Deborah H. Glueck, Maggie A. Stanislawski,

Katherine Alaimo, Angel Villalobos, James R. Hebert, Kelsey Dexter, Kaigang Li, Jill Litt

Appendix A. Add taxonomic information to the deblur reference-hit sequences.

Qiime2, Assign Taxonomy

Note: Qiime2 commands are case sensitive

Convert a BIOM file to a text file. biom convert –i /media/sf_data/rarefied_denovo_FeatureTable.BIOM

-o /media/sf_data/rarefied_denovo_FeatureTable.txt

--to-tsv

Imports reference-hit sequences, downloaded from Qiita as an FA file, into Qiime2. Import FeatureData[Sequence] from fa file

Qiime tools import –input-path /media/sf_data/reference-hit.seqs.fa

--ouput-path /media/sf_data/reference-hit.seqs.qza --type FeatureData[Sequence]

Assigns taxonomy to the reference-hit sequences.

Download the greengenes classifier at the link below: https://chmi-sops.github.io/mydoc_qiime2.html

Visualization files are viewable at https://view.qiime2.org.

qiime feature-classifier classify-sklearn \

--i-classifier /media/sf_data/gg-13-8-99-515-806-nb-classifier.qza --i-reads /media/sf_data/reference-hit.seqs.qza

--o-classification /media/sf_data/reference-hit.seqs.taxonomy.qza

--p-reads-per-batch 10000 (This allowed me to run on laptop) qiime metadata tabulate \

--m-input-file /media/sf_data/reference-hit.seqs.taxonomy.qza \

--o-visualization /media/sf_data/reference-hit.seqs.taxonomy.qzv

Page 6: Feasibility of collection and analysis of microbiome data

2

Appendix B. Phylum Index Figures

SAS, Relative proportion phylum index figures *Import the reference-hit frequency table and reference-hit sequences with taxonomic assignment;

*Frequency table;

PROC IMPORT OUT= WORK.SUMS5

DATAFILE= "C:\Users\harrallk\Dropbox (ColoradoTeam)\Microbio me\Data\ID15926_fro_trimmed_deblur\AddTaxonomy\Deblur_biom_forSAS.xlsx"

DBMS=EXCEL REPLACE;

RANGE="Sheet1$"; GETNAMES=YES;

MIXED=NO;

SCANTEXT=YES; USEDATE=YES;

SCANTIME=YES;

RUN;

*Sequences; *NOTE: taxonomic identifiers include Kingdom, Phylum, Class, Order, Family, Genus, Species in a

single variable name. Prior to SAS import, Phylum was extracted into its own variable.

PROC IMPORT OUT= WORK.SUMS5 DATAFILE= "~\reference-hit.seqs.wTax_Phylum.xlsx"

DBMS=EXCEL REPLACE;

RANGE="Sheet1$"; GETNAMES=YES;

MIXED=NO;

SCANTEXT=YES;

USEDATE=YES; SCANTIME=YES;

RUN;

*Merge the count data and the sequences with taxonomic information; data MB01.Biomtax; /*Reference-hit sequences and taxonomic identification*/

set Biomtax;

run;

data MB01.Biomfile; /*Reference-hit frequency table*/ set Biomfile;

run;

*Align variable names between the sequence and frequency datasets. data a;

set MB01.Biomtax;

keep Otu_ID Phylum; Otu_ID = Feature_ID;

run;

proc sort data = a;

by OTU_ID; run;

proc sort data = Biomfile;

by OTU_ID; run;

data deblur;

merge a BiomFile; by OTU_ID;

Page 7: Feasibility of collection and analysis of microbiome data

3

run; data MB01.deblur;

set deblur;

run;

*How many phylums does this dataset contain?; proc freq data = Mb01.deblur;

tables phylum;

run; /* Remove OTU_ID */

data deblur;

set Mb01.deblur; drop OTU_ID;

run;

proc sort data = deblur;

by Phylum; run;

ods output summary = Sums;

proc means data = deblur sum; by Phylum;

run;

/* Remove the variable and labels that proc means added in as variable names */ data sums2;

set sums;

drop VName:;

run; data sums3;

set sums2;

drop Label:; run;

/* I have a missing phylum name - label as other */

data sums4;

set sums3; if Phylum = " " then Phylum = " Other";

if Phylum = " [Thermi]" then Phylum = " Thermi";

run; /* import the transposed file into SAS */

PROC IMPORT OUT= WORK.SUMS5

DATAFILE= "C:\Users\harrallk\Dropbox (ColoradoTeam)\Microbio me\Data\ID15926_fro_trimmed_deblur\AddTaxonomy\sums4.xlsx"

DBMS=EXCEL REPLACE;

RANGE="Sheet1$";

GETNAMES=YES; MIXED=NO;

SCANTEXT=YES;

USEDATE=YES; SCANTIME=YES;

RUN;

/* Merge in the metadata info, like group and sample locations */ proc sort data = sums5;

by SampleID;

run;

Page 8: Feasibility of collection and analysis of microbiome data

4

proc sort data = Mb01.MbAnalysis; by SampleID;

run;

data Mb01.DeblurTax;

merge sums5 Mb01.Mbanalysis; by SampleID;

run;

* Create graphics, we are interested in the following phylums: Firmicutes, Bacteroidetes, Proteobacteria, Verrucomicrobia, Actinobacteria, Tenericutes,

Cyanobacteria, Fusobacteria, Spirochaetes;

* Create a dataset with relative abundance; data RPh;

set Mb01.DeblurTax;

TotCount = (Other +

Acidobacteria + Actinobacteria +

Aquificae +

Armatimonadetes + BHI80_139 +

BRC1 +

Bacteroidetes + Chlamydiae +

Chlorobi +

Chloroflexi +

Crenarchaeota + Cyanobacteria +

Deferribacteres +

Elusimicrobia + Euryarchaeota +

FBP +

Fibrobacteres +

Firmicutes + Fusobacteria +

GN02 +

Gemmatimonadetes + Lentisphaerae +

MVP_21 +

Nitrospirae + OD1 +

OP3 +

OP8 +

OP9 + Planctomycetes +

Proteobacteria +

SR1 + Spirochaetes +

Synergistetes +

TM6 + TM7 +

Tenericutes +

Verrucomicrobia +

Page 9: Feasibility of collection and analysis of microbiome data

5

WPS_2 + WS2 +

WS3 +

Thermi);

RA_Actinobacteria = (Actinobacteria/TotCount); RA_Bacteroidetes = (Bacteroidetes/TotCount);

RA_Cyanobacteria = (Cyanobacteria/TotCount);

RA_Firmicutes = (Firmicutes/TotCount); RA_Fusobacteria = (Fusobacteria/TotCount);

RA_Proteobacteria = (Proteobacteria/TotCount);

/*RA_Spirochaetes = (Spirochaetes/TotCount);*/ /*RA_Tenericutes = (Tenericutes/TotCount);*/

RA_Verrucomicrobia = (Verrucomicrobia/TotCount);

RA_Other = ((Other +

Acidobacteria + Aquificae +

Armatimonadetes +

BHI80_139 + BRC1 +

Chlamydiae +

Chlorobi + Chloroflexi +

Crenarchaeota +

Deferribacteres +

Elusimicrobia + Euryarchaeota +

FBP +

Fibrobacteres + GN02 +

Gemmatimonadetes +

Lentisphaerae +

MVP_21 + Nitrospirae +

OD1 +

OP3 + OP8 +

OP9 +

Planctomycetes + SR1 +

Spirochaetes +

Synergistetes +

TM6 + TM7 +

Tenericutes +

WPS_2 + WS2 +

WS3 +

Thermi)/TotCount); run;

*Stack the data;

data RPH2;

Page 10: Feasibility of collection and analysis of microbiome data

6

keep PID

sample_type

sample_type_num

Timepoint Group

RA_Actinobacteria

RA_Bacteroidetes RA_Cyanobacteria

RA_Firmicutes

RA_Fusobacteria RA_Proteobacteria

RA_Verrucomicrobia

RA_Other;

set RPH; run;

proc sort data = RPH2;

by PID Timepoint; run;

* Verrucomicrobia;

data a; keep

PID

sample_type

sample_type_num Timepoint

Group

Phylum RA_Phylum;

set RPH2;

Phylum = "Verrucomicrobia";

RA_Phylum = RA_Verrucomicrobia; run;

* Bacteroidetes;

data b; keep

PID

sample_type sample_type_num

Timepoint

Group

Phylum RA_Phylum;

set RPH2;

Phylum = "Bacteroidetes"; RA_Phylum = RA_Bacteroidetes;

run;

data bi; merge a b;

by Phylum PID Timepoint;

run;

Page 11: Feasibility of collection and analysis of microbiome data

7

* Cyanobacteria; data c;

keep

PID

sample_type sample_type_num

Timepoint

Group Phylum

RA_Phylum;

set RPH2; Phylum = "Cyanobacteria";

RA_Phylum = RA_Cyanobacteria;

run;

data ci; merge bi c;

by Phylum PID Timepoint;

run; * Firmicutes;

data d;

keep PID

sample_type

sample_type_num

Timepoint Group

Phylum

RA_Phylum; set RPH2;

Phylum = "Firmicutes";

RA_Phylum = RA_Firmicutes;

run; data di;

merge ci d;

by Phylum PID Timepoint; run;

* Fusobacteria;

data e; keep

PID

sample_type

sample_type_num Timepoint

Group

Phylum RA_Phylum;

set RPH2;

Phylum = "Fusobacteria"; RA_Phylum = RA_Fusobacteria;

run;

data ei;

Page 12: Feasibility of collection and analysis of microbiome data

8

merge di e; by Phylum PID Timepoint;

run;

* Proteobacteria;

data f; keep

PID

sample_type sample_type_num

Timepoint

Group Phylum

RA_Phylum;

set RPH2;

Phylum = "Proteobacteria"; RA_Phylum = RA_Proteobacteria;

run;

data fi; merge ei f;

by Phylum PID Timepoint;

run; * Actinobacteria;

data i;

keep

PID sample_type

sample_type_num

Timepoint Group

Phylum

RA_Phylum;

set RPH2; Phylum = "Actinobacteria";

RA_Phylum = RA_Actinobacteria;

run; data ii;

merge fi i;

by Phylum PID Timepoint; run;

* Other;

data j;

keep PID

sample_type

sample_type_num Timepoint

Group

Phylum RA_Phylum;

set RPH2;

Phylum = "Other";

Page 13: Feasibility of collection and analysis of microbiome data

9

RA_Phylum = RA_Other; run;

data ji;

merge ii j;

by Phylum PID Timepoint; run;

data Rph3;

set ji; run;

*Isolate data by gardening groups;

data Rph3_GroupZero; set ji;

where Group = 0;

run;

data Rph3_GroupOne; set ji;

where Group = 1;

run; /*********************************************

Create a stacked bar chart

**********************************************/ *Summed over all 6;

data Rph5;

set Rph3;

RA_Phylum_6 = RA_Phylum/6; if Phylum = "Firmicutes" then PhylumCat = 9;

else if Phylum = "Bacteroidetes" then PhylumCat = 8;

else if Phylum = "Proteobacteria" then PhylumCat = 7; else if Phylum = "Verrucomicrobia" then PhylumCat = 6;

else if Phylum = "Actinobacteria" then PhylumCat = 5;

else if Phylum = "Cyanobacteria" then PhylumCat = 4;

else if Phylum = "Fusobacteria" then PhylumCat = 3; else if Phylum = "Other" then PhylumCat = 2;

run;

proc format; value Phylum 2 = "Other"

3 = "Fusobacteria"

4 = "Cyanobacteria" 5 = "Actinobacteria"

6 = "Verrucomicrobia"

7 = "Proteobacteria"

8 = "Bacteroidetes" 9 = "Firmicutes";

run;

*Graphic for gardeners; *Subset of gardeners;

data Rph5_GroupOne;

set Rph5; where Group = 1;

run;

ODS Graphics / Height = 6in Width = 4in;

Page 14: Feasibility of collection and analysis of microbiome data

10

proc sgpanel data = Rph5_GroupOne; title "The Relative Proportion of Phylums Represented in Gardener's Microbiome Samples";

title2 "Summed Over 6 Timepoints";

panelby Sample_type / columns = 1 rows = 4 novarname;

vbar PID / response = RA_Phylum_6 group = PhylumCat

groupdisplay = stack

barwidth = 1; rowaxis label = "Relative Proportion of each Phylum";

colaxis label = "Participant";

keylegend / title = "Phylum"; format PhylumCat Phylum.;

run;

*Graphic for non-gardeners;

*Subset of non-gardeners; data Rph5_GroupZero;

set Rph5;

where Group = 0; run;

proc sgpanel data = Rph5_GroupZero;

title "The Relative Proportion of Phylums Represented in Non-Gardener's Microbiome Samples"; title2 "Summed Over 6 Timepoints";

panelby Sample_type / columns = 1 rows = 4 novarname;

vbar PID / response = RA_Phylum_6

group = PhylumCat groupdisplay = stack

barwidth = 1;

rowaxis label = "Relative Proportion of each Phylum"; colaxis label = "Participant";

keylegend / title = "Phylum";

format PhylumCat Phylum.;

run; /***************************************

Graphic that sums over participants and not time

****************************************/ proc format;

value Phylum 2 = "Other"

3 = "Fusobacteria" 4 = "Cyanobacteria"

5 = "Actinobacteria"

6 = "Verrucomicrobia"

7 = "Proteobacteria" 8 = "Bacteroidetes"

9 = "Firmicutes";

run; ODS PDF DPI = 1200

file = "C:\Users\harrallk\Dropbox

(ColoradoTeam)\Microbiome\Text\Deblur_Figures_SampleByTime_01.pdf" startpage = no

;

ODS Graphics / Height = 6in Width = 4in;

Page 15: Feasibility of collection and analysis of microbiome data

11

*Graphic for gardeners; *Subset gardeners;

data Rph6_GroupOne;

set Rph3;

where Group = 1; RA_Phylum_Gp = RA_Phylum/5;

if Phylum = "Firmicutes" then PhylumCat = 9;

else if Phylum = "Bacteroidetes" then PhylumCat = 8; else if Phylum = "Proteobacteria" then PhylumCat = 7;

else if Phylum = "Verrucomicrobia" then PhylumCat = 6;

else if Phylum = "Actinobacteria" then PhylumCat = 5; else if Phylum = "Cyanobacteria" then PhylumCat = 4;

else if Phylum = "Fusobacteria" then PhylumCat = 3;

else if Phylum = "Other" then PhylumCat = 2;

run; proc sgpanel data = Rph6_GroupOne;

title "The Relative Proportion of Phylums Represented in Gardener's Microbiome Samples";

title2 "Over Time"; panelby Sample_type / columns = 1 rows = 4 novarname;

vbar Timepoint / response = RA_Phylum_Gp

group = PhylumCat groupdisplay = stack

barwidth = 1

transparency = 0.5;

rowaxis label = "Relative Proportion of each Phylum"; colaxis label = "Time";

styleattrs datacolors = (Cyan Magenta Yellow Black Orange Blue Gray Green)

datacontrastcolors = (Black); keylegend / title = "Phylum";

format PhylumCat Phylum.;

run;

*Graphic for non-gardeners; *Subset non-gardeners;

data Rph6_GroupZero;

set Rph3; where Group = 0;

RA_Phylum_Gp = RA_Phylum/6;

if Phylum = "Firmicutes" then PhylumCat = 9; else if Phylum = "Bacteroidetes" then PhylumCat = 8;

else if Phylum = "Proteobacteria" then PhylumCat = 7;

else if Phylum = "Verrucomicrobia" then PhylumCat = 6;

else if Phylum = "Actinobacteria" then PhylumCat = 5; else if Phylum = "Cyanobacteria" then PhylumCat = 4;

else if Phylum = "Fusobacteria" then PhylumCat = 3;

else if Phylum = "Other" then PhylumCat = 2; run;

proc sgpanel data = Rph6_GroupZero;

title "The Relative Proportion of Phylums Represented in Non-Gardener's Microbiome Samples"; title2 "Over Time";

panelby Sample_type / columns = 1 rows = 4 novarname;

vbar Timepoint / response = RA_Phylum_Gp

Page 16: Feasibility of collection and analysis of microbiome data

12

group = PhylumCat groupdisplay = stack

barwidth = 1

transparency = 0.5;

rowaxis label = "Relative Proportion of each Phylum"; colaxis label = "Time";

styleattrs datacolors = (Cyan Magenta Yellow Black Orange Blue Gray Green)

datacontrastcolors = (Black); keylegend / title = "Phylum";

format PhylumCat Phylum.;

run; ods pdf close;

/***************************************

Graphic that shows all participant data over time

****************************************/ *Create a PID by time variable to indicate groups on x-axis;

data Rph4;

set Rph3; PIDTime = PID+(Timepoint*0.1) ;

if Phylum = "Firmicutes" then PhylumCat = 9;

else if Phylum = "Bacteroidetes" then PhylumCat = 8; else if Phylum = "Proteobacteria" then PhylumCat = 7;

else if Phylum = "Verrucomicrobia" then PhylumCat = 6;

else if Phylum = "Actinobacteria" then PhylumCat = 5;

else if Phylum = "Cyanobacteria" then PhylumCat = 4; else if Phylum = "Fusobacteria" then PhylumCat = 3;

else if Phylum = "Other" then PhylumCat = 2;

run; ODS PDF DPI = 1200

file = "C:\Users\harrallk\Dropbox

(ColoradoTeam)\Microbiome\Text\Deblur_Figures_SampleByPIDByTime_01.pdf"

startpage = no ;

ODS Graphics / Height = 6in Width = 5in;

*graphic for gardeners; *subset gardeners;

data Rph4_one;

set Rph4; where Group = 1;

run;

proc sgpanel data = Rph4_One;

title "Relative Proportion of Phylumns Present in the Microbiome of Gardeners"; panelby Sample_type / columns = 1 rows = 4 novarname;

vbar PIDTime / response = RA_Phylum

group = PhylumCat groupdisplay = stack

barwidth = 1

transparency = 0.5 ;

refline 5.1 7.1 12.1 13.1 / axis = x discreteoffset = -0.5 lineattrs = (color = Black thickness = 5);

rowaxis label = "Relative Proportion of each Phylum";

Page 17: Feasibility of collection and analysis of microbiome data

13

colaxis label = "Participant.Timepoint"; styleattrs datacolors = (Cyan Magenta Yellow Black Orange Blue Gray Green)

datacontrastcolors = (Black);

keylegend / title = "Phylum";

format PhylumCat Phylum.; run;

*graphic for non-gardeners;

*subset non-gardeners; data Rph4_zero;

set Rph4;

where Group = 0; run;

proc sgpanel data = Rph4_Zero;

title "Relative Proportion of Phylumns Present in the Microbiome of Non-Gardeners";

panelby Sample_type / columns = 1 rows = 4 novarname; vbar PIDTime / response = RA_Phylum

group = PhylumCat

groupdisplay = stack barwidth = 1

transparency = 0.5

; refline 3.1 6.1 8.1 9.1 14.1 / axis = x discreteoffset = -0.5 lineattrs = (color = Black thickness = 4);

rowaxis label = "Relative Proportion of each Phylum";

colaxis label = "Participant.Timepoint";

styleattrs datacolors = (Cyan Magenta Yellow Black Orange Blue Gray Green) datacontrastcolors = (Black);

keylegend / title = "Phylum";

format PhylumCat Phylum.; run;

ods pdf close;

Page 18: Feasibility of collection and analysis of microbiome data

14

Appendix C. Test for differences in the abundance of sOTUs between gardeners and non-

gardeners

R, ANCOM

## ANCOM code downloaded from https://sites.google.com/site/siddharthamandal1985/research

#Run downloaded code for ANCOM, titled ANCOM_updated_code.R library(exactRankTests)

library(nlme)

library(ggplot2) # Separate, parallel models, run for each sample type. We present code for the forehead below.

# Restrict samples to forehead

Var_data_forehead <-

var_data[which(var_data$sample_type_num>0.5&var_data$sample_type_num<1.5),]

# forhead analysis accounting for within subject variability longitudinal_comparison_foreheadAdjRand=ANCOM.main(OTUdat=otu_data,

Vardat=Var_data_forehead,

adjusted=T, repeated=T,

main.var="Group",

adj.formula="sexF+age_1",

repeat.var="NULL", longitudinal=FALSE,

random.formula="~1|PID",

multcorr=2, sig=0.05,

prev.cut=0.90)

longitudinal_comparison_foreheadAdjRand$W.taxa

Page 19: Feasibility of collection and analysis of microbiome data

15

Appendix D – Alpha Diversity. Test for differences in Shannon diversity or Faith’s PD

between gardeners and non-gardeners.

SAS, Linear Mixed Model

Models for Shannon diversity and Faith’s phylogenetic diversity are parallel.

The data were subset by sample location.

proc mixed data = mouth;

title "Mouth Shannon Diversity, Full Model";

class PID;

model Shannon_Deblur = group*timepoint*age_1*BMI_1

group*timepoint*age_1 group*timepoint*BMI_1

group*age_1*BMI_1

timepoint*age_1*BMI_1

group*timepoint

group*age_1

group*BMI_1 timepoint*age_1

timepoint*BMI_1

age_1*BMI_1

group

timepoint

sexF age_1

BMI_1

/ residual outp = mouthresout ddfm = kr; repeated PID;

run;

proc univariate data = mouthresout;

var studentResid;

title "Full Mouth Model";

histogram studentResid / normal;

run;

Page 20: Feasibility of collection and analysis of microbiome data

16

Appendix E – Beta Diversity. Visualize variability within and between participant samples.

SAS, Variability within and between participants

/********************************************************************* Convert distance matrix into a list of sample pairs and correlations

**********************************************************************/

proc iml; use UniFrac;

read all var "SampleID" into ColNames; /* get names of variables */

read all var (ColNames) into mCorr; /* matrix of correlations */

close UniFrac; numCols = ncol(mCorr); /* number of variables */

numPairs = numCols*(numCols-1) / 2;

length = 2*nleng(ColNames) + 5; /* max length of new ID variable */ Sample1 = j(NumPairs, 1, BlankStr(length));

i = 1;

do row= 2 to numCols; /* construct the pairwise names */

do col = 1 to row-1; Sample1[i] = ColNames[col];

i = i + 1;

end; end;

Sample2 = j(NumPairs, 1, BlankStr(length));

i = 1; do row= 2 to numCols; /* construct the pairwise names */

do col = 1 to row-1;

Sample2[i] = ColNames[row];

i = i + 1; end;

end;

lowerIdx = loc(row(mCorr) > col(mCorr)); /* indices of lower-triangular elements */ Corr = mCorr[ lowerIdx ];

create CorrPairs var {"Sample1" "Sample2" "Corr"};

append; close;

QUIT;

/****************************************************************************

Merge in metadata.

This step introduces sample identification for the members of each correlation pair. This requires two

stages of merging: the first merge identifies the first column of pair members and the second merge identifies the second column of pair members.

****************************************************************************/

/**************************** Round 1 of merging

******************************/

proc sort data = CorrPairs; by Sample1;

run;

*Add "SampleID" to correlation dataset. This is the sample identification variable was named in the

metadata;

Page 21: Feasibility of collection and analysis of microbiome data

17

data CorrPairs2; set CorrPairs;

SampleID = Sample1;

SampleID2 = Sample2;

run; *For SAS variable naming conventions, an x was adding to the start of each numeric sample id.;

proc sort data = Mb01.MetaSampleX;

by SampleID; run;

data DistMeta;

merge CorrPairs2 Mb01.MetaSampleX; by SampleID;

run;

/***************************************************************************

Round 2 of merging Reduce Mb01.MetaSampleX so it only includes sample, time point, and group.

****************************************************************************/

data MetaReduce; keep SampleID2 timepointS2 groupS2 PIDS2 sample_type_numS2;

set Mb01.MetaSampleX;

SampleID2 = SampleID; timepointS2 = timepoint;

groupS2 = group;

PIDS2 = PID;

sample_type_numS2 = sample_type_num; run;

proc sort data = DistMeta;

by SampleID2; run;

proc sort data = MetaReduce;

by SampleID2;

run; data DistMeta2;

merge DistMeta MetaReduce;

by SampleID2; run;

*Remove any missing values of UniFrac;

data DistMeta3; set DistMeta2;

where corr NE .;

run;

/**************************************************************************** For each correlation pair, define the following

1. Members of the pair came from the same participant

2. Members of the pair came from the same sample type 3. Members of pair are from sample intervention group

4. Members of pair are from same time point

5. Within variability - members of pair are from same participant and same sample type 6. Between variability - members of pair are from different participants but same sample type and time

point

7. Members of pair are from sample intervention group

Page 22: Feasibility of collection and analysis of microbiome data

18

****************************************************************************/ data interactions;

set DistMeta3;

if PID = PIDS2 then PIDMatch = 1;

else PIDMatch = 0; if sample_type_num = sample_type_numS2 then SampleMatch = 1;

else SampleMatch = 0;

Match = (PIDMatch*SampleMatch); run;

*Code this so the between variation only considers differences between participants from the same day;

data graphicDTD; set interactions;

if timepoint = timepointS2 then TimeMatch = 1;

else TimeMatch = 0;

if PIDMatch = 1 and SampleMatch = 1 then Within = 1; else Within = 0;

if PIDMatch = 0 and SampleMatch = 1 and TimeMatch = 1 then Between = 1;

else Between = 0; if group = groups2 then GroupMatch = 1;

else GroupMatch = 0;

run; proc sort data = graphicDTD;

by Within;

run;

/*************************************************************************** UniFrac graphics showing within and between variability over sample type

****************************************************************************/

proc format; value var 0 = "Variation between participants"

1 = "Variation within participants";

value varDTD 0 = "Variation between participants (day-to-day)"

1 = "Variation within participants"; run;

ods pdf file = "~\UniFrac_unweightedPlots.pdf"; proc sgplot data = graphicDTD;

title "Variability Within and Between Participants";

vbar sample_type / response = corr group = Within groupdisplay = cluster stat = Mean limitstat = stderr limitattrs = (color = black);

format Within varDTD.;

yaxis label = "Unweighted UniFrac Distance";

xaxis label = "Sampling Location"; styleattrs datacolors = (Black White) datacontrastcolors = (Black Black);

keylegend / title = " ";

run; ods pdf close;

Page 23: Feasibility of collection and analysis of microbiome data

19

Appendix F – Beta Diversity. Test for differences in weighted UniFrac between gardeners

and non-gardeners

R, Nested Permutation ANOVAs

# Import datasets

library(readr) UniFrac_unweighted_denovo_181203 <- read_delim("~/UniFrac_unweighted_denovo.txt",

+ "\t", escape_double = FALSE, trim_ws = TRUE)

# Import metadata so we can compare UniFrac by group, time, sex, age, and BMI. library(haven)

mbanalysis <- read_sas("~/mbanalysis.sas7bdat", NULL)

#### Note, make sure that column and row names match after R imports this matrix. R likes to add an x

when a variable name starts with a number. I fixed this problem by adding x's to all column and row names before importing into R.

# Format data for Permuation ANOVAs.

# Make the first column into row names UniFrac <- as.data.frame(UniFrac_unweighted_denovo_181203) # Shorten object name

a <- UniFrac[,1]

rownames(UniFrac) <- a UniFrac <- UniFrac[,-1]

UniFrac[1:5, 1:5]

#Remove missing samples from the metadata.

meta <- mbanalysis[complete.cases(mbanalysis[ , "Shannon_CR_OTC"]),] meta2 <- meta[,c("SampleID","Timepoint", "Group", "sample_type")]

# Distance matrix samples and metadata samples must be in the same order.

sampleOrder <- as.numeric(row.names(UniFrac)) sampleOrder2 <- as.data.frame(sampleOrder)

colnames(sampleOrder2) <- "SampleID"

orderMeta <- merge(sampleOrder2,meta2,by.x="SampleID", sort = FALSE) GroupTime <- orderMeta[,c("Group", "Timepoint")]

GroupTime$Timepoint <- as.factor(GroupTime$Timepoint)

GroupTime$Group <- as.factor(GroupTime$Group)

# Import metadata so we can compare UniFrac by group, time, sex, age, and BMI # sample ID must be listed in the first column of the data frame

pax <- as.data.frame(rownames(UniFrac))

colnames(pax) <- "SampleID" Pax2 <- cbind(pax, UniFrac)

###################################################################

# The following code is parallel for all sample type. Thus, we only report forehead.

##################################################################### ##Restrict meta samples to forehead and order same as unifrac

foreheadMeta <- meta2[meta2$sample_type == "Forehead", ]

foreheadIDs <- foreheadMeta$SampleID ForeheadUniFrac <- Pax2[rownames(Pax2) %in% foreheadIDs,colnames(Pax2) %in% foreheadIDs]

#Group by PID

ForeheadGroupPID <- meta[meta$SampleID %in% foreheadIDs,c("SampleID", "Group", "PID")] ##Order

ForeheadOrder <- as.numeric(row.names(ForeheadUniFrac))

ForeheadOrder2 <- as.data.frame(ForeheadOrder)

colnames(ForeheadOrder2) <- "SampleID"

Page 24: Feasibility of collection and analysis of microbiome data

20

#Nested Group by PID ForeheadGroupPID2 <- merge(ForeheadOrder2,ForeheadGroupPID,by.x="SampleID", sort = FALSE)

###Remove sampleID from the metadata, and define covariates as factors

#Group by PID

ForeheadGroupPID3 <- ForeheadGroupPID2[,2:3] ForeheadGroupPID3$Group <- as.factor(ForeheadGroupPID3$Group)

ForeheadGroupPID3$PID <- as.factor(ForeheadGroupPID3$PID)

# Forehead Models nested.npmanova(ForeheadUniFrac~Group+PID, data = ForeheadGroupPID3, permutations=999,

warnings=FALSE)