andrew treloar - the life‐sciences as a pathfinder in data‐intensive research practice abstract:...
DESCRIPTION
The advent of the Internet is bringing about fundamental changes in the ways that research is performed and communicated. These have been particularly driven by the growing importance of data, as well as the tools available to work with this data. This presentation will examine this shift, drawing on examples from the life‐sciences, and try to make some predictions about the next five years. First presented at the 2014 Winter School in Mathematical and Computational Biology http://bioinformatics.org.au/ws14/program/TRANSCRIPT
![Page 1: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/1.jpg)
The life-‐sciences as a pathfinder in data-‐intensive research prac3ce Dr Andrew Treloar, Director of Technology
July 10, 2014 CC-‐BY-‐SA, @atreloar 1
![Page 2: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/2.jpg)
Structure presenta3on § Research Lifecycles § Func3ons of Scholarly Communica3on § Pointers to the future § Characterising the future § Pathfinder problems § Conclusions
July 10, 2014 CC-‐BY-‐SA, @atreloar 2
![Page 3: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/3.jpg)
So many lifecycles…
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 3
![Page 4: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/4.jpg)
Minimal Research Lifecycle
Think
Do Share
July 10, 2014 CC-‐BY-‐SA, @atreloar 4
![Page 5: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/5.jpg)
Sharing: Scholarly Communication System and its Functions § Registration § Certification § Awareness § Archiving
(Rosendaal and Geurts, 1997)
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 5
![Page 6: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/6.jpg)
System of Journals § Registration
§ submission of manuscript § Certification
§ peer-review (pre-publication) § commentary (post-publication)
§ Awareness § discovery services
§ Archiving § libraries (print) § publishers (electronic) § special purpose organisations (e.g. Portico)
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 6
![Page 7: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/7.jpg)
Pointers to the future
“the future is already here – it’s just not very evenly distributed”
William Gibson, NPR interview
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 7
![Page 8: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/8.jpg)
Registration: BioRxiv
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 8
![Page 9: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/9.jpg)
Registration: Github
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 9
![Page 10: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/10.jpg)
Registration: WikiPathways
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 10
![Page 11: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/11.jpg)
Registration: NeuroLex
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 11
![Page 12: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/12.jpg)
Registration: Nanopublications
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 12
![Page 13: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/13.jpg)
Registra3on: some observa3ons § Decoupling registra3on from cer3fica3on § Timestamping, versioning § Registra3on of various types of objects § Machines as creators and contributors
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 13
![Page 14: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/14.jpg)
Certification: PubMed Commons
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 14
![Page 15: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/15.jpg)
Certification: PubPeer
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 15
![Page 16: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/16.jpg)
Cer3fica3on: Publons
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 16
![Page 17: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/17.jpg)
Cer3fica3on: some observa3ons § Peer-‐review decoupled from publica3on process § Cer3fica3on of various types of objects § Machines valida3ng form § Social endorsement
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 17
![Page 18: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/18.jpg)
Awareness: myExperiment
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 18
![Page 19: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/19.jpg)
Awareness: eLabNotebook RSS
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 19
![Page 20: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/20.jpg)
Awareness: Twitter
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 20
![Page 21: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/21.jpg)
Awareness: some observations § Awareness for various types of objects § Real 3me awareness § Awareness support targeted at machines § Awareness through social media
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 21
![Page 22: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/22.jpg)
Archiving: PDB
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 22
![Page 23: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/23.jpg)
Archiving: GenBank
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 23
![Page 24: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/24.jpg)
Characterising the future
Fixed Varying
Discrete Continuous
Hidden VisibleResearch Process
Nature of object
Process of making public
Speed of communicationDelayed Instant
Atomic CompoundAtomicity of object
Communicated objectPublication
+data proxies
Publication + linked data + linked models
Formal InformalNature of processJuly 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 24
![Page 25: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/25.jpg)
Fundamental changes § The research process (objects, social
dimension) is becoming more exposed § Articles, books are no longer the only
relevant objects for research communication
§ Objects are no longer static § Machines are joining humans as
(co-)creators and consumers of research objects
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 25
![Page 26: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/26.jpg)
Pathfinder problems § Integrity of the scholarly record § The three obsolescences
§ hardware § file format § soWware
July 10, 2014 CC-‐BY-‐SA, @atreloar 26
![Page 27: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/27.jpg)
System of Journals: Archiving
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 27
![Page 28: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/28.jpg)
Web of Objects: Archiving?
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 28
![Page 29: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/29.jpg)
Not just citation relationships
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 29
![Page 30: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/30.jpg)
The problem of obsolescence § Lifescience research environment can be viewed as undergoing a process of accelerated evolu3on
§ Other disciplines will hit these problems in 3me
July 10, 2014 CC-‐BY-‐SA, @atreloar 30
![Page 31: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/31.jpg)
Cambrian explosion
July 10, 2014 31
![Page 32: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/32.jpg)
Hardware obsolescence: Roche 454
July 10, 2014 CC-‐BY-‐SA, @atreloar 32
![Page 33: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/33.jpg)
SoWware obsolescence: too much choice, not enough support
July 10, 2014 CC-‐BY-‐SA, @atreloar 33
![Page 34: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/34.jpg)
Abandonware § “Last summer, a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna with an intriguing project. The ANREP program, which annotates structural mo3fs in gene or protein sequences, was out of date having been wriben more than a decade ago. Although s3ll used by molecular biologists, its slow compu3ng ability meant a straighcorward mul3ple search could take all night on a desktop PC. The Udine biologist wanted Vitacolonna, a postdoctoral fellow in computa3onal biology, to write a program that could do the job more quickly.” § Sam Jaffe, Scien3sts Abandon their SoWware, The Scien)st, Feb 16, 2004
July 10, 2014 CC-‐BY-‐SA, @atreloar 34
![Page 35: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/35.jpg)
File format obsolescence: Illumina § Probability of error in basecalling encoded using ascii code to reduce file size
§ Meaning of the ascii code changed along the life cycle and for data generated at different 3me points the quality might be encoded differently
§ “If you get an error like "Invalid quality score value", your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores. You'll need to add the op3on "-‐Q33" to your FASTX Toolkit arguments”. Obviously…
July 10, 2014 CC-‐BY-‐SA, @atreloar 35
![Page 36: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/36.jpg)
Evereb Rogers, Diffusion of Innova)on, 1962
July 10, 2014 CC-‐BY-‐SA, @atreloar 36
![Page 37: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/37.jpg)
Conclusions § Need to move to a smaller number of standard file formats
§ Need to move to a more sustainable model of soWware development and maintenance
§ Need to encourage placorm manufacturers to innovate around the hardware, not the soWware
§ NOTE: other disciplines are looking to lifesciences to work out how to solve some of these problems
July 10, 2014 CC-‐BY-‐SA, @atreloar 37
![Page 38: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/38.jpg)
On best prac3ces in the development of bioinforma3cs soWware, Front. Genet., 02 Jul 14
§ Source code available to reviewers § SoWware indexed, citable, available § Source code documented § Source code managed § Test libraries, sample data and dataset repositories available
July 10, 2014 CC-‐BY-‐SA, @atreloar 38
![Page 39: Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The](https://reader031.vdocuments.site/reader031/viewer/2022020105/554e75e2b4c905f66a8b4dbb/html5/thumbnails/39.jpg)
Ques3ons? § [email protected]
§ @atreloar § hbps://www.slideshare.net/atreloar/the-‐lifesciences-‐as-‐a-‐pathfinder-‐in-‐dataintensive-‐research-‐prac3ce
July 10, 2014 CC-‐BY-‐SA, @atreloar 39