Download - A Tale of Two Data Catalogs
![Page 1: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/1.jpg)
1
DATA CATALOGS
Table of Contents
I. NIH Data Discovery Index
• Methodology• Findings• Questions raised
II. Institutional Data Interviews
• Methodology• Findings
III. Outcomes
• Benefits to the library
By: Charles DickensKEVIN READ
![Page 2: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/2.jpg)
2
It was the best of times…
![Page 3: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/3.jpg)
3
NIH Big Data to Knowledge (BD2K)Facilitating Broad Use of Biomedical Big Data
![Page 4: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/4.jpg)
4
NIH Data Discovery Index
Datasets areCITABLE
Datasets areDISCOVERABLE
Datasets areLINKED TO
THE LITERATURE
Datasets arePART OF THE
RESEARCH ECOSYSTEM
![Page 5: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/5.jpg)
NIH Data Sharing Repositories
http://www.nlm.nih.gov/NIHbmic/nih_data_sharing_repositories.html
![Page 6: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/6.jpg)
Searching for NIH-funded unidentified datasets in PubMed and PMC
6
![Page 7: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/7.jpg)
113,089
75,441
Remaining articles with unidentified datasets
NIH-funded articles for 2011:
88,592 78,901
Non-PMC Articles
Non-research Articles
Molecular Sequence Data MH
71,913 SI Field
71,680 69,857XML
7
PMC Acknowledgements
![Page 8: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/8.jpg)
SI Field
Clinical-Trials.gov
PDB GEO GenBank PubChem RefSeq ISRCTN OMIM0
200
400
600
800
1000
1200
1400
1600
Excluded Articles
8
![Page 9: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/9.jpg)
9
PMC Acknowledgements
PDB
Clinica
lTrials.
gov
GenBan
kGEO
IRD
MGIDIP
Flybase
dbGaPSRA
Worm
BaseM
PD
NURSARGD
ICPSR
VectorB
ase0
100
200
300
400
500
600
700
800
Excluded keywords
![Page 10: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/10.jpg)
10
XML Keyword
GenBan
kPDB
GEOdbSNP
Clinica
lTrials.
govRGD
Flybase SRA DIPdbGaP
Worm
Base MGI
BioGRID
VectorB
ase
Multiple
Keywords
0
100
200
300
400
500
600
Excluded keywords
FlyBase:GeneNetwork:Mouse Genome Informatics:Neuroscience Information
Framework:Rat Genome Database:WormBase:Zebrafish Model
Organism Database
GenBank:PDB
![Page 11: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/11.jpg)
NIH-sponsored data repositories now added to PubMed and PMC search indexes
11
![Page 12: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/12.jpg)
383
What category of dataset was used for the research described in the article?
Were live human or animal subjects used in the collection
of the data?
What were the subject(s) of study (from which or whom the data was collected)?
If new dataset(s) were created, what type(s) of data were
collected?
What existing dataset(s) were used? If any?
How many datasets are there in each article?
12
![Page 13: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/13.jpg)
13
Measuring blood pressure in mice
Measuring left hemisphere of brain for growth factor
Staining and imaging
Analysis of images using software
![Page 14: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/14.jpg)
Results
14
![Page 15: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/15.jpg)
Average number of datasets per article:
2.92
15
![Page 16: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/16.jpg)
% of datasets that use live subjects
54%
Human
51%Animal
49%
16
![Page 17: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/17.jpg)
% of new data
87%
17
% of data created using pre-existing datasets
13%
![Page 18: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/18.jpg)
18
It was the worst of times…
![Page 19: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/19.jpg)
Data Types
19
Image Genetic or Genomic
Chemical
Biochemical
Electrical (Elecrophysiological)
Optical – non-image
Behavioral
Computational Simulation or model
Magnetic Resonance – non-image
Structural
Physiological
Questionnaire/Survey
Clinical Measures
Geospatial
INSUFFICIENT
![Page 20: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/20.jpg)
Inter-rater Reliability:
20Total # of datasets (High) Total # of datasets (Low)
0
100
200
300
400
500
600
700
800
Total number of datasets found per 25 ar-ticles
43%
![Page 21: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/21.jpg)
How do we define a data set?
21
Dataset
![Page 22: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/22.jpg)
How do we define a data set?
22
Datasets
![Page 23: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/23.jpg)
How do we define a data set?
23
Datasets
![Page 24: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/24.jpg)
Where in the collection/processing pipeline
should data be described?
24
![Page 25: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/25.jpg)
Book of the Second
Understanding institutional data challenges via interviews
![Page 26: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/26.jpg)
26
Institutional Data Catalog
• Organize and describe institutional research data
• Promote collaboration within the institution
• Promote a culture of sharing and transparency
![Page 27: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/27.jpg)
27
Methodology
• Literature review• ID researchers/PIs using
active grant system• Analyzed datasets in
researcher papers before interviews– Used NIH Data Discovery
Index method
![Page 28: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/28.jpg)
Understand your researchersBASIC SCIENCE RESEARCHERS CLINICAL RESEARCHERS
![Page 29: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/29.jpg)
Data Interviews
Postdocs or student leaves with data
Lack of standards/procedures
Size of data
Messiness/Disconnect between datasets
Too challenging
0 1 2 3 4 5 6 7
Challenges Organizing Data – Basic Science Researchers
![Page 30: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/30.jpg)
Data Interviews
Storage expense
Changes in software
Lack of IT resources
Lack of preservation procedures (readme, plans, postdoc etc.)
Data in multiple storage locations
Storage space
0 1 2 3 4 5 6
Challenges Preserving Data – Basic Science Researchers
![Page 31: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/31.jpg)
Data Interviews
Data quality
Messiness/Disconnect between datasets
Poor data output formats
Can't search data
Data loss
Team miscommunication on who's using data
0 1 2 3 4 5 6
Challenges Organizing Data – Clinical Researchers
![Page 32: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/32.jpg)
Data Interviews
Collaboration only
unknown parties
data repository
general public
primary results only
Do not share
0 1 2 3 4 5 6 7 8 9
Basic ScienceClinical
Experience with Data Sharing
![Page 33: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/33.jpg)
33
Only the best of times…How the library benefitted from this exercise
![Page 34: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/34.jpg)
34
Identified group to pilot institutional data catalog – Population Health
![Page 35: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/35.jpg)
35
Acquired new opportunities for teaching data management
![Page 36: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/36.jpg)
36
Developing a lab tool for basic scientists to manage metadata
![Page 37: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/37.jpg)
37
Developed a better understanding of researcher needs and challenges
![Page 38: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/38.jpg)
38
AcknowledgementsBD2K Project• Lou Knecht, Jim Mork, Kathel Dunn, Betsy Humphreys, Jerry
Sheehan, Mike Huerta, Dr. Donald LindbergAnnotators• Preeti Kochar, Helen Ochej, Susan Schmidt, Melissa Yorks, Shari
Mohary, Olga Printseva, Janice Ward, Oleg Rodionov, Sally Davidson, Jennie Larkin, Peter Lyster, Matt McAuliffe, Greg Farber, Betsy Humphreys, Jerry Sheehan, Mike Huerta, Lou Knecht, Suzy Roy, Swapna Abhyankar, Olivier Bodenreider, Karen Gutzman, Dina Demner Fusman, Laritza Rodriguez, Sonya Shooshan, Samantha Tate, Matthew Simpson, Tracy Edinger, Olubumi Akiwumi, Mary Ann Hantakas, Corinn Sinnott
![Page 39: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/39.jpg)
39
References1. Adamick J, Canavan M, McGinty S, Reznik-Zellen R, Schmidt M, Stevens R. Building as We Climb: The Data Working Group at the University of Massachusetts Amherst [Internet]. Univ. Massachusetts New Engl. Area Libr. e-Science Symp. 2011. Available from: http://escholarship.umassmed.edu/escience_symposium/2011/posters/3 2. Bardyn TP, Resnick T, Camina SK. Translational Researchers’ Perceptions of Data Management Practices and Data Curation Needs: Findings from a Focus Group in an Academic Health Sciences Library. J. Web Librariansh. [Internet]. 2012 Oct [cited 2013 Jan 30];6(4):274–87. Available from: http://www.tandfonline.com/doi/abs/10.1080/19322909.2012.730375 3. Carlson J, Fosmire M, Miller CC, Nelson MS. Determining Data Information Literacy Needs: A Study of Students and Research Faculty. portal Libr. Acad. 2011;11(2):629 – 657. 4. Delserone LM. At the watershed: Preparing for research data management and stewardship at the University of Minnesota Libraries. Libr. Trends [Internet]. Urbana-Champaign, Illinois: John Hopkins University Press and the Graduate School of Library and Information Science.; 2008 [cited 2013 Jan 11]. p. 202–10. Available from: https://www.ideals.illinois.edu/handle/2142/10670 5. Harrison A, Searle S. Not drowning , ingesting : dealing with the research data deluge at an institutional level. VALA2010 Proc. [Internet]. 2010. Available from: http://www.vala.org.au/vala2010/papers2010/VALA2010_43_Harrison_Final.pdf 6. Hruby GW, McKiernan J, Bakken S, Weng C. A centralized research data repository enhances retrospective outcomes research capacity: a case report. J. Am. Med. Inform. Assoc. [Internet]. 2013 Jan 15 [cited 2013 Apr 11];1–5. Available from: http://www.ncbi.nlm.nih.gov/pubmed/23322812 7. Johnson LM, Butler JT, Johnston LR. Developing E-Science and Research Services and Support at the University of Minnesota Health Sciences Libraries. J. Libr. Adm. [Internet]. Routledge; 2012 Nov [cited 2013 Jan 11];52(8):754–69. Available from: http://dx.doi.org/10.1080/01930826.2012.751291 8. Jones S, Ross S, Ruusalepp R. Data Audit Framework Methodology [Internet]. Glasgow; 2009 p. 1–70. Available from: http://www.data-audit.eu/DAF_Methodology.pdf 9. Lage K, Losoff B, Maness J. Receptivity to Library Involvement in Scientific Data Curation: A Case Study at the University of Colorado Boulder. portal Libr. Acad. [Internet]. 2011 [cited 2012 Nov 21];11(4):915–37. Available from: http://muse.jhu.edu/journals/portal_libraries_and_the_academy/v011/11.4.lage.html 10. Newton MP, Miller CC, Bracke MS. Librarian Roles in Institutional Repository Data Set Collecting: Outcomes of a Research Library Task Force. Collect. Manag. 2011;36(1):53–67. 11. Peters C, Dryden AR. Assessing the Academic Library’s Role in Campus-Wide Research Data Management: A First Step at the University of Houston. Sci. Technol. Libr. [Internet]. Routledge; 2011 Sep [cited 2013 Jan 11];30(4):387–403. Available from: http://dx.doi.org/10.1080/0194262X.2011.626340 12. Piwowar H a. Who shares? Who doesn’t? Factors associated with openly archiving raw research data. PLoS One [Internet]. 2011 Jan [cited 2013 Mar 10];6(7):e18657. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3135593&tool=pmcentrez&rendertype=abstract 13. Raboin R, Reznik-Zellen RC, Salo D. Forging New Service Paths: Institutional Approaches to Providing Research Data Management Services. J. eScience Librariansh. [Internet]. 2012;1(3). Available from: http://escholarship.umassmed.edu/jeslib/vol1/iss3/2/ 14. Reznik-Zellen R, Adamick J, McGinty S. Tiers of Research Data Support Services. J. eScience Librariansh. [Internet]. 2012 [cited 2012 Nov 10];1(1):27–35. Available from: http://escholarship.umassmed.edu/jeslib/vol1/iss1/5/ 15. Scaramozzino JM, Ramirez ML, McGaughey KJ. A Study of Faculty Data Curation Behaviors and Attitudes at a Teaching-Centered University. Coll. Res. Libr. [Internet]. Association of College & Research Libraries; 2012 Jul 1 [cited 2013 Jan 11];73(4):349–65. Available from: http://crl.acrl.org/content/73/4/349.abstract 16. Soehner C, Steeves C, Ward J. E-Science and Data Support Services. 2010 [cited 2013 Jan 11];(August). Available from: http://www.arl.org/storage/documents/publications/escience-report-2010.pdf 17. Trinidad SB, Fullerton SM, Bares JM, Jarvik GP, Larson EB, Burke W. Genomic research and wide data sharing: views of prospective participants. Genet. Med. 2010 Aug;12(8):486–95. 18. Walters TO. Data curation program development in U.S. universities: The Georgia Institute of Technology example. Int. J. Digit. Curation [Internet]. 2009;4(3):83–92. Available from: http://www.ijdc.net/index.php/ijdc/article/viewFile/136/153 19. Westra B. Data Services for the Sciences: A Needs Assessment. Ariadne [Internet]. 2010;(64). Available from: http://www.ariadne.ac.uk/issue64/westra 20. Williams SC. Using a Bibliographic Study to Identify Faculty Candidates for Data Services. Sci. Technol. Libr. [Internet]. Routledge; 2013 May 9 [cited 2013 May 14];1–8. Available from: http://dx.doi.org/10.1080/0194262X.2013.774622 21. Xia J, Liu Y. Usage Patterns of Open Genomic Data. Coll. Res. Libr. [Internet]. Association of College & Research Libraries; 2013 Mar 1 [cited 2013 Mar 7];74(2):195–207. Available from: http://crl.acrl.org/content/74/2/195.abstract
![Page 40: A Tale of Two Data Catalogs](https://reader033.vdocuments.site/reader033/viewer/2022061105/53edd5bd8d7f7289708b5ec6/html5/thumbnails/40.jpg)
40
ImagesPonderings for All Things Blog. 2010. Available from: http://ponderingsofallthings.blogspot.com/2010/05/tale-of-two-cities-charles-dickens.html Reading Charles Dickens Blog. Manette in Bastille. 2012. Available from: http://readingcharlesdickens.com/wp-content/uploads/2012/07/Manette-in-Bastille-253x300.jpg Grandma’s Graphics. Old Scrooge say busy in his counting-house. 2000. Available from: http://www.grandmasgraphics.com/graphics/childrens/childrens379_2000.jpg Sungardas Blog. Apple to Orange. 2010. Available from: http://blog.sungardas.com/wp-content/uploads/Apple-to-Orange.jpg Patel R. Questions?. Flickr. 2007. Available from: https://www.flickr.com/photos/23679420@N00/545653437 / Biomedical Engineering Laboratory.Wikimedia. 2012. Available from: http://upload.wikimedia.org/wikipedia/commons/a/a3/Biomedical_Engineering_Laboratory.jpg