biomedical data everywhere: recent developments in data management and policy at nih
DESCRIPTION
BioMedical Data Everywhere: Recent Developments in Data Management and Policy at NIH. Jerry Sheehan Assistant Director for Policy Development National Library of Medicine - National Institutes of Health [email protected] CASC Fall Meeting September 8, 2011, Arlington, VA. - PowerPoint PPT PresentationTRANSCRIPT
BioMedical Data Everywhere:Recent Developments in Data Management and Policy at NIH
Jerry SheehanAssistant Director for Policy DevelopmentNational Library of Medicine - National Institutes of [email protected]
CASC Fall MeetingSeptember 8, 2011, Arlington, VA
National Library of Medicine: More than a Library • World’s largest medical library
– >12 million physical artifacts (books, journals, technical reports, photographs)
– >22,000 print and electronic serial subscriptions– Historical collection of rare and old medical works
• Intramural research laboratories– Lister Hill Nat’l Center for Biomedical Comms.– National Center for Biotechnology Information
• Extramural research and training– ~ 100 research projects per year, $36M– 18 funded research training sites, 250 trainees
• Health data standards and vocabularies• Information resources and services
– Publications and metadata– Genomic, chemical, clinical trial data– Environmental health and toxicology data– Disaster information services & systems– Medical images, analytical tools
2
www.nlm.nih.gov
NLM Information Resources• Publications
– Citations/metadata (PubMed)– Full-text articles (PubMed
Central)• Data
– Genomic (GenBank, dbGaP, GEO, GeneTest)
– Clinical trials (ClinicalTrials.gov)– Drug (RxNorm, Daily Med, Pillbox)– Chemical (PubChem)– Environmental & toxicology
• Images– Visible Human– Spine x-rays, cervical images– Historical photos
• Synthesized information– Evidence summaries– Guidelines– Consumer health information
(MedlinePlus)• Vocabulary resources
– Unified Medical Language System– Standard clinical terms (SNOMED)– Health data interchange – Biomedical terms
• Software & Tools– APIs– Natural language processing– Image analysis– Mobile apps
3
4
http://www.pubmed.gov
QUALITY
Growth in Medline, the fully indexed subset of PubMed which accounts for approximately 90% of all PubMed citations. Original graph: http://www.nlm.nih.gov/bsd/stats/cit_added.html
PubMed/Medline: Journal CitationsCONTENT• 21+ million citations
and abstracts– 700,000 added per year – 50%+ link to full text
• 5500+ journals– 120-130 added per year
USAGE (2010)• 120+ million visitors• 2 million searches per
day• 2.4 billion page views• Google, Bing, others • Content used by
outside developers• Mobile version
5
+ 2.2 million full-text articles,26 thousand more added per month
Typical weekday usage:•420,000 different users •740,000 articles retrieved Annually•~ 99% of articles downloaded at least once•28% downloaded more than 100 times
PubMed Central: Full-Text Articleswww.pubmedcentral.gov
6
ClincalTrials.gov http://clinicaltrials.gov/
Studies Registered at ClinicalTrials.gov since May 1, 2005Registry and Results Database•Federally and privately supported trials •Conducted in the United States and 170+ countries•Mandatory submission for some trials
Current content •100,000+ registered trials•330 new registrations/week•3,000+ results (summary) of approved productso Outcome measureso Statistical analyseso Adverse events
Usage (2010)•28,000 visitors per day
7
08-SEP-2011 CASC Fall Meeting 8
9
Repository for NIH-funded GWA studiesAs of Aug 2011: •161 studies•2045 data sets•2727 documents•5890 Analyses•128190 Variables
10
As of August, 2011: •85 million deposited substance records
o Representing more than 30 million chemically unique compounds•500 thousand bioassay records
o Representing more than 130 million experimental bioactivity results
• Database of biological activities of small molecules
• Repository for data from NIH Molecular Libraries program
08-SEP-2011 CASC Fall Meeting 11
ToxMap: Environmental Health Maps
12
Almost 900 In English & Spanish
~ 40,000 links
Almost 900 In English & Spanish
~ 40,000 links
~1,000 drugs100 supplements~1,000 drugs100 supplements
> 170 tutorials> 75 anatomy videos> 125 surgery videos
> 170 tutorials> 75 anatomy videos> 125 surgery videos
Since 2006English & bilingual issues
Since 2006English & bilingual issues
>40 languages>250 topics>3,300 links
>40 languages>250 topics>3,300 links
Over 100 directories of doctors, hospitals, clinics & libraries
Over 100 directories of doctors, hospitals, clinics & libraries
~ 3,500 articles> 2,000 images~ 3,500 articles> 2,000 images
15-20 stories added daily15-20 stories added daily
>1,200 links to ClinicalTrials.gov>1,200 links to ClinicalTrials.gov
13
MEDLINEPLUS CONNECTLinks from diagnosis, drug, and laboratory information in EHR/PHR to relevant material in MedlinePlus,
MEDLINEPLUS MOBILEStreamlines content specifically tailored for users particular type of cell phone or tablet.
179K
306K
MEDLINEPLUS USAGE150 million visitors in 2010420,000 visitors per day.
MedlinePlus: Trusted Health Informationwww.medlineplus.gov
906K
2.3M
25.8M
436K
208K
128K
109K
507K
1.4M 296K
6.1M
1.5M
120K
174K
403K
656K
623K
1.5M
462K
1.6M
3.5M
1.2M
343K
765K
322K
1.8M
1M 2.4M
3.2M
5.4M
298K ME 270K NH 240K VT 2.2M MA 307K RI 834K CT 4.1M NJ 117K DE 1.7M MD 210K
10M 651K
1.9M
711K 1.3M
725K 3.1M
4.2M
Map of 100+ Million visits in the United States in 2010
14
08-SEP-2011 15
Genetic test means an analysis of human DNA, RNA, chromosomes, proteins, or metabolites, if the analysis detects genotypes, mutations, or chromosomal changes. Genetic test does not include an analysis of proteins or metabolites that is directly related to a manifested disease, disorder, or pathological condition.
08-SEP-2011 16CASC Fall Meeting
NLM is Not Alone:Growing interest in data at NIH
“[High throughput technologies] provide us with the opportunity to ask questions that have the word ‘ALL’ in them. What are ALL the transcripts in a cell? What are ALL the protein interactions? . . Those kinds of questions are now approachable, especially if we do the right job of making really powerful databases publicly accessible to all those who need them and empower investigators in small labs as well as big labs to plunge into that kind of mindset.”- Francis S. Collins, MD, PhD [Director, NIH]
17
08-SEP-2011 18
http://report.nih.gov/UploadDocs/Biomed_Info_Resources_FY08_09.pdf
http://report.nih.gov/biennialreport/
08-SEP-2011 19
http://report.nih.gov/UploadDocs/Biomed_Info_Resources_FY08_09.pdf
Select NIH Data Initiatives • NDAR – National Database for Autism Research (NIMH)
– Repository for NIH-funded autism studies and centers of excellence– Genomic, phenotypic, imaging data and associated information
• ADNI – Alzheimer’s Disease Neuroimaging Initiative (NIA)– Multisite study, public-private partership, validated biomarkers– Centralized FMRI and PET data, linked clinical database
• NIDDK Data Repository– Archival datasets from NIDDK-funded studies (diabetes, digestive, kidney)– 29 datasets to-date; more than 100 access requests in 2009-10
• BTRIS – Biomedical Translational Research Information System (CC)– Repository for data from NIH intramural clinical studies– Allow aggregation and analysis across multiple Institute studies
20
Data Sharing Policies
21
NIH Public Access Policy (journal articles)
NIH Data Sharing Policy (data sharing plan)
NIH GWAS Policy
dbGaP
Clinical Trials Info
Clinical Trials.gov
IC or domain-specific policies
• Autism Research – National Database for Autism Research
• NIAAA Genetics of Alzheimer’s
• Alzheimer’s Disease Neuroimaging Initiative (LONI
Repository) • Others. . .
NIH Sequence
Data Sharing Policy
GenBankGEO
Recent Guidance for NIH Data Sharing Plans
22
http://grants.nih.gov/grants/sharing_key_elements_data_sharing_plan.pdf
NLM 175th Anniversary
08-SEP-2011 23