the niddk repositories adding value to shared session 5... · the niddk repositories – adding...

Download The NIDDK Repositories Adding value to shared Session 5... · The NIDDK Repositories – Adding value

Post on 26-Aug-2018




0 download

Embed Size (px)


  • The NIDDK Repositories Adding value to shared resources

    Rebekah Rasooly NIDDK, NIH

    May, 2013

  • Central repository components

    Contract funded since 2003.

    Biosample repository (Fisher): archival storage of biological specimens Database repository (RTI): maintain archival datasets, respond to queries about data and stored samples Genetics repository (Rutgers Univ.): create immortalized cell lines, DNA extraction

  • Samples and data stored from >50 major multi-site clinical studies in diabetes, digestive, kidney, liver, and urologic diseases

    Each study collects according to its own protocols

    43 datasets available for sharing

    23 GWAS datasets available for sharing through dbGAP

    DNA and/or biosamples available for sharing from 28 studies

    The NIDDK Central Repositories holdings:

    Biosample Repository Genetics Repository Affiliated Repositories

    Total samples 7,384,858 113,057 560,191

  • Types of studies Diabetes and Obesity Studies DCCT/EDIC (The Type 1 Diabetes Control and Complications Trial/Epidemiology of Diabetes Interventions and Complications) DPP (Diabetes Prevention Program) DPPOS (The Diabetes Prevention Program Outcome Study) LookAHEAD (Action for Health in Diabetes) HEALTHY (Middle-School Based Primary Prevention Trial of Type 2 Diabetes) TrialNet - TN01 (NATURAL HISTORY STUDY OF THE DEVELOPMENT OF TYPE 1 DIABETES) TEDDY (The Environmental Determinants of Diabetes in the Young)

    Kidney Studies AASK Trial (The African American Study of Kidney Disease and Hypertension Study) CRIC (Chronic Renal Insufficiency Cohort Study) MDRD (The Modification of Diet in Renal Disease)

    Liver Disease Studies A2ALL (The Adult-to-Adult Living Donor Liver Transplantation Cohort Study) HALT-C (The Hepatitis C Antiviral Long-term Treatment against Cirrhosis) VIRAHEP-C (The Study of Viral Resistance to Antiviral Therapy of Chronic Hepatitis C)

    Urology Studies MTOPS (The Medical Therapy of Prostatic Symptoms) SISTEr (The Stress Incontinence Surgical Treatment Efficacy Trial)

  • NIDDK has custodianship of all samples and data transferred to the Repositories and no IP protections are attached

    The Steering Committee of each study or study group has control of the samples and data during a proprietary period (2 years after the end of the study or study increment)

    Expensive resources have to be useful: the Repositories sharing policies

  • Remove all identifiers, except some elements of dates (Limited data set)

    Collect all forms, MOPs, key papers and analytic datasets from those papers

    Reconcile sample list with phenotypic data

    Carry out Dataset Integrity Check (DSIC) process

    Curating studies

  • Curating studies collect all forms, etc.

  • Curating studies perform Dataset Integrity Check

    verify that published results from the study can be reproduced using the archived datasets

    perform a small number of analyses to duplicate published results intent is to provide confidence that the dataset distributed by the

    NIDDK repository is a true copy of the study data does not attempt to resolve minor or inconsequential discrepancies

    with published results

  • using the CaBIG Common Biorepository Model (CBM) of 30

    variables an additional 140 variables that are domain- or study-specific

    Curation for common variables

    Index Variable_Name Description CBM (curated for

    all studies)







    Study_Specific* Other_Common**

    1 Ethnicity Ethnicity x x x x

    2 Gender Gender x x x x

    3 Race Race x x x x

    4 ace_arb Use of antihypertensives (ACE inhibitors, ARBs) x x x

    5 acr Albumin to creatinine ratio x

    6 add_dx prior addisons disease x

    7 aer Albumin Excretion Rate x

    8 age Age x x x x

    9 age_transplant Age at transplant x x

    10 ageatonset Age at IDDM onset x

    11 agegroup Age group x

    12 aki Acute kidney injury (aka ARF) x x

    13 alcohol Frequent alcohol use x x

    14 assign Treatment group x

    15 beckqaire Severe anxiety(BECK) x

    'What's in the NIDDK CDR?'--public query tools for the NIDDK central data repository. Pan H, Ardini MA, Bakalov V, et al., 2013, Database (Oxford).

  • Studies are searchable

  • No analytic dataset impossible to recreate results in papers

    No data dictionary for variables used in analysis impossible to recreate results in paper

    Errors in calculation

    Poor or incomplete linkage of sample lists to phenotypic data

    Sample labeling issues, including:

    Labels applied incorrectly cannot be read by barcode scanner

    Duplicate ids

    Empty or nearly empty vials

    Incorrectly preserved samples

    Curation issues

  • Using the Repository requests for data and samples

    Requests for Repository materials


    requests for


    requests for

    genetic samples

    total number of unique

    samples data requests

    2004 0 7 1936 0

    2005 15 7 3658 4

    2006 47 12 9391 5

    2007 49 6 6979 15

    2008 50 24 29271 16

    2009 64 45 48561 33

    2010 98 34 64195 29

    2011 149 14 44110 58

    2012 109 16 73113 94

    2013 55 6 10638 14

  • Using the data and samples

    91 publications by researchers who gained access to data and samples through the NIDDK Repository, including:

    Papers based on the GWAS data sets in dbGAP

    A paper that re-examined the data from a study of dialysis intensity (HEMO) and suggested a re-interpretation the major study conclusion (Argyropoulos, C et al., 2009, J. Am. Soc. Nephrol., 20, 2034-2043).

    A paper that re-analyzed the IBD Genetics GWAS data to identify additional loci (Elding, H et al., 2011, Am J Hum Genet. 201