The NIDDK Repositories Adding value to shared Session 5... · The NIDDK Repositories – Adding value…

Download The NIDDK Repositories Adding value to shared Session 5... · The NIDDK Repositories – Adding value…

Post on 26-Aug-2018




0 download

Embed Size (px)


<ul><li><p>The NIDDK Repositories Adding value to shared resources </p><p>Rebekah Rasooly NIDDK, NIH </p><p>May, 2013 </p></li><li><p>Central repository components </p><p>Contract funded since 2003. </p><p>Biosample repository (Fisher): archival storage of biological specimens Database repository (RTI): maintain archival datasets, respond to queries about data and stored samples Genetics repository (Rutgers Univ.): create immortalized cell lines, DNA extraction </p></li><li><p> Samples and data stored from &gt;50 major multi-site clinical studies in diabetes, digestive, kidney, liver, and urologic diseases </p><p> Each study collects according to its own protocols </p><p> 43 datasets available for sharing </p><p> 23 GWAS datasets available for sharing through dbGAP </p><p> DNA and/or biosamples available for sharing from 28 studies </p><p>The NIDDK Central Repositories holdings: </p><p>Biosample Repository Genetics Repository Affiliated Repositories</p><p>Total samples 7,384,858 113,057 560,191</p></li><li><p>Types of studies Diabetes and Obesity Studies DCCT/EDIC (The Type 1 Diabetes Control and Complications Trial/Epidemiology of Diabetes Interventions and Complications) DPP (Diabetes Prevention Program) DPPOS (The Diabetes Prevention Program Outcome Study) LookAHEAD (Action for Health in Diabetes) HEALTHY (Middle-School Based Primary Prevention Trial of Type 2 Diabetes) TrialNet - TN01 (NATURAL HISTORY STUDY OF THE DEVELOPMENT OF TYPE 1 DIABETES) TEDDY (The Environmental Determinants of Diabetes in the Young) </p><p> Kidney Studies AASK Trial (The African American Study of Kidney Disease and Hypertension Study) CRIC (Chronic Renal Insufficiency Cohort Study) MDRD (The Modification of Diet in Renal Disease) </p><p> Liver Disease Studies A2ALL (The Adult-to-Adult Living Donor Liver Transplantation Cohort Study) HALT-C (The Hepatitis C Antiviral Long-term Treatment against Cirrhosis) VIRAHEP-C (The Study of Viral Resistance to Antiviral Therapy of Chronic Hepatitis C) </p><p> Urology Studies MTOPS (The Medical Therapy of Prostatic Symptoms) SISTEr (The Stress Incontinence Surgical Treatment Efficacy Trial) </p><p></p></li><li><p> NIDDK has custodianship of all samples and data transferred to the Repositories and no IP protections are attached </p><p> The Steering Committee of each study or study group has control of the samples and data during a proprietary period (2 years after the end of the study or study increment) </p><p>Expensive resources have to be useful: the Repositories sharing policies </p></li><li><p> Remove all identifiers, except some elements of dates (Limited data set) </p><p> Collect all forms, MOPs, key papers and analytic datasets from those papers </p><p> Reconcile sample list with phenotypic data </p><p> Carry out Dataset Integrity Check (DSIC) process </p><p>Curating studies </p></li><li><p>Curating studies collect all forms, etc. </p><p> </p></li><li><p>Curating studies perform Dataset Integrity Check </p><p> verify that published results from the study can be reproduced using the archived datasets </p><p> perform a small number of analyses to duplicate published results intent is to provide confidence that the dataset distributed by the </p><p>NIDDK repository is a true copy of the study data does not attempt to resolve minor or inconsequential discrepancies </p><p>with published results </p><p> </p></li><li><p> using the CaBIG Common Biorepository Model (CBM) of 30 </p><p>variables an additional 140 variables that are domain- or study-specific </p><p>Curation for common variables </p><p>Index Variable_Name Description CBM (curated for </p><p>all studies)</p><p>Diabetes </p><p>Domain</p><p>Kidney </p><p>Domain</p><p>Liver </p><p>Domain</p><p>Study_Specific* Other_Common**</p><p>1 Ethnicity Ethnicity x x x x</p><p>2 Gender Gender x x x x</p><p>3 Race Race x x x x</p><p>4 ace_arb Use of antihypertensives (ACE inhibitors, ARBs) x x x</p><p>5 acr Albumin to creatinine ratio x</p><p>6 add_dx prior addisons disease x</p><p>7 aer Albumin Excretion Rate x</p><p>8 age Age x x x x</p><p>9 age_transplant Age at transplant x x</p><p>10 ageatonset Age at IDDM onset x</p><p>11 agegroup Age group x</p><p>12 aki Acute kidney injury (aka ARF) x x</p><p>13 alcohol Frequent alcohol use x x</p><p>14 assign Treatment group x</p><p>15 beckqaire Severe anxiety(BECK) x</p><p>'What's in the NIDDK CDR?'--public query tools for the NIDDK central data repository. Pan H, Ardini MA, Bakalov V, et al., 2013, Database (Oxford). </p></li><li><p>Studies are searchable </p><p> </p></li><li><p> No analytic dataset impossible to recreate results in papers </p><p> No data dictionary for variables used in analysis impossible to recreate results in paper </p><p> Errors in calculation </p><p> Poor or incomplete linkage of sample lists to phenotypic data </p><p> Sample labeling issues, including: </p><p> Labels applied incorrectly cannot be read by barcode scanner </p><p> Duplicate ids </p><p> Empty or nearly empty vials </p><p> Incorrectly preserved samples </p><p>Curation issues </p></li><li><p>Using the Repository requests for data and samples </p><p>Requests for Repository materials</p><p>year</p><p>requests for </p><p>biosamples</p><p>requests for </p><p>genetic samples</p><p>total number of unique </p><p>samples data requests</p><p>2004 0 7 1936 0</p><p>2005 15 7 3658 4</p><p>2006 47 12 9391 5</p><p>2007 49 6 6979 15</p><p>2008 50 24 29271 16</p><p>2009 64 45 48561 33</p><p>2010 98 34 64195 29</p><p>2011 149 14 44110 58</p><p>2012 109 16 73113 94</p><p>2013 55 6 10638 14</p></li><li><p>Using the data and samples </p><p>91 publications by researchers who gained access to data and samples through the NIDDK Repository, including: </p><p> Papers based on the GWAS data sets in dbGAP </p><p> A paper that re-examined the data from a study of dialysis intensity (HEMO) and suggested a re-interpretation the major study conclusion (Argyropoulos, C et al., 2009, J. Am. Soc. Nephrol., 20, 2034-2043). </p><p> A paper that re-analyzed the IBD Genetics GWAS data to identify additional loci (Elding, H et al., 2011, Am J Hum Genet. 2011 Dec 9;89(6):798-805) </p><p> Publications on novel analytic methods or markers in NIDDK Repository-supplied samples </p></li><li><p>dbGAP the NIDDK Repository: two different curatorial approaches </p><p>NIDDK Repository dbGAP </p><p>Manual Curation Automated curation </p><p>Elements of dates accepted No elements of dates accepted </p><p>DSIC No DSIC </p><p>Linkage to samples No linkage to samples </p><p>Expensive - ~$1M/year for the </p><p>Data Repository </p><p>Minimal costs the NLM is </p><p>bearing the costs of acquiring </p><p>studies </p><p>Low volume High volume </p></li><li><p>dbgap NIDDK Repository </p><p>Year approved # </p><p>downloaded pct </p><p>downloaded approved pct </p><p>downloaded </p><p>2010 54 28 52% 29 100% </p><p>2011 91 52 57% 58 100% </p><p>2012 137 83 61% 94 100% </p><p>2013 53 17 32% 14 100% </p><p>dbGAP the NIDDK Repository: two different curatorial approaches </p></li><li><p> Curation is expensive More familiarity = more sophisticated use of data If investigators are not obliged to share their data, </p><p>they can get by with poor documentation and processing/storage errors </p><p>Lessons learned and cautionary notes </p></li><li><p> Project Officers Beena Akolkar Paul Eggers Bob Karp Contracting Specialist Rich Bailey Repository Specialists Sharon Kay Mobley Kris Moen </p><p>NIDDK Repository Staff </p><p> RTI Data Repository Phil Cooley, PI Helen Pan, Sylvia Tan ThermoFisher, Biosample Repository Heather Higgins, PI Rutgers Univ., Genetics Repository Jay Tischfield, PI </p><p> </p></li></ul>


View more >