big data supporting drug discovery - cautionary tales from the world of chemistry for translational...
TRANSCRIPT
![Page 1: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/1.jpg)
Big Data Supporting Drug Discovery
Cautionary Tales from the World of Chemistry for Translational Informatics
Valery Tkachenko
RSC-CSIR/OSDD meeting
Pune, India
February 3rd 2014
![Page 2: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/2.jpg)
Big DataChemical SpaceDrug Discovery pipelineMachine learningTraining setsRSC/ChemSpider platformsRSC/ArchiveResearch data managementData quality, crowdsourcing and AltMetricsBuilding Global Chemistry Network
![Page 3: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/3.jpg)
![Page 5: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/5.jpg)
Big DataChemical SpaceDrug Discovery pipelineMachine learningTraining setsRSC/ChemSpider platformsRSC/ArchiveResearch data managementData quality, crowdsourcing and AltMetricsBuilding Global Chemistry Network
![Page 6: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/6.jpg)
Chemical space - 1060
![Page 7: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/7.jpg)
Navigation in chemical space
![Page 8: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/8.jpg)
Navigation in chemical space
![Page 9: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/9.jpg)
Big DataChemical SpaceDrug Discovery pipelineMachine learningTraining setsRSC/ChemSpider platformsRSC/ArchiveResearch data managementData quality, crowdsourcing and AltMetricsBuilding Global Chemistry Network
![Page 10: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/10.jpg)
Structure-based Drug Design
![Page 11: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/11.jpg)
Structure-based Drug Design
![Page 12: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/12.jpg)
Ligand-based Drug Design
![Page 13: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/13.jpg)
Ligand-based Drug Design
![Page 14: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/14.jpg)
Big DataChemical SpaceDrug Discovery pipelineMachine learningTraining setsRSC/ChemSpider platformsRSC/ArchiveResearch data managementData quality, crowdsourcing and AltMetricsBuilding Global Chemistry Network
![Page 15: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/15.jpg)
Machine learning
![Page 16: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/16.jpg)
Applied machine learning
![Page 17: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/17.jpg)
Big DataChemical SpaceDrug Discovery pipelineMachine learningTraining setsRSC/ChemSpider platformsRSC/ArchiveResearch data managementData quality, crowdsourcing and AltMetricsBuilding Global Chemistry Network
![Page 18: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/18.jpg)
• ~30 million chemicals and growing
• Data sourced from >500 different sources
• Crowdsourced curation and annotation
• Ongoing deposition of data from our journals and our collaborators
• A structure centric hub for web-searching
![Page 19: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/19.jpg)
ChemSpider
![Page 20: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/20.jpg)
ChemSpider
![Page 21: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/21.jpg)
Properties - experimental
![Page 22: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/22.jpg)
Properties - ACDLabs
![Page 23: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/23.jpg)
Properties – EPI Suite
![Page 24: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/24.jpg)
Properties - ChemAxon
![Page 25: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/25.jpg)
Literature references
![Page 26: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/26.jpg)
Patents references
![Page 27: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/27.jpg)
Books
![Page 28: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/28.jpg)
Classification
![Page 29: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/29.jpg)
Chemical vendors and datasources
![Page 30: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/30.jpg)
Multimedia
![Page 31: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/31.jpg)
Big DataChemical SpaceDrug Discovery pipelineMachine learningTraining setsRSC/ChemSpider platformsRSC/ArchiveResearch data managementData quality, crowdsourcing and AltMetricsBuilding Global Chemistry Network
![Page 32: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/32.jpg)
ChemSpider Reactions
![Page 33: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/33.jpg)
ChemSpider Reactions
![Page 34: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/34.jpg)
ChemSpider Reactions
![Page 35: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/35.jpg)
ChemSpider Reactions
![Page 36: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/36.jpg)
ChemSpider Spectra
![Page 37: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/37.jpg)
ChemSpider Spectra
![Page 38: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/38.jpg)
ChemSpider Databases
ChemSpider Compounds
ChemSpider Reactions
ChemSpider Spectra
ChemSpider Crystals
ChemSpider Materials
ChemSpider Assays
ChemSpider Algorithms
![Page 39: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/39.jpg)
Research data inflow
Deposition Gateway
Staging databases
Compounds
Reactions
Spectra
Materials
Articles / CSSP
Compounds Module
Spectra Module
Reactions Module
Materials Module
TextminingModule
!͙Module
Web UI for unified depositions
DropBox, Google Drive, SkyDrive, etc
LabTrove and other templated data
Documents
API, FTP, etc
Raw data Validated dataStaging
databases
All databases are sliced by data sources/data
collections and have simple
security model where each data
slice/source is private, public or
embargoed
![Page 40: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/40.jpg)
Research data outflow
Compounds Reactions Spectra Materials Documents
CompoundsAPI
ReactionsAPI
SpectraAPI
MaterialsAPI
DocumentsAPI
CompoundsWidgets
ReactionsWidgets
SpectraWidgets
MaterialsWidgets
DocumentsWidgets
Data tier
Data access tier
User interface
components tier
Analytical Laboratory application
User interface tier
(examples) Electronic Laboratory Notebook
Paid 3rd party integrations (various platforms – SharePoint, Google, etc)
Chemical Inventory application
![Page 41: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/41.jpg)
Big DataChemical SpaceDrug Discovery pipelineMachine learningTraining setsRSC/ChemSpider platformsRSC/ArchiveResearch data managementData quality, crowdsourcing and AltMetricsBuilding Global Chemistry Network
![Page 42: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/42.jpg)
RSC Archive – since 1841
![Page 43: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/43.jpg)
DERA - Digitally Enabling RSC Archive
![Page 44: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/44.jpg)
Semantic mark-up of articles
![Page 45: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/45.jpg)
It is so difficult to navigate…
What’s the structure?What’s the structure?
Are they in our file?
Are they in our file?
What’s similar?What’s
similar?
What’s the target?
What’s the target?Pharmacology
data?Pharmacology
data?
Known Pathways?
Known Pathways?
Working On Now?
Working On Now?Connections
to disease?Connections to disease?
Expressed in right cell type?Expressed in
right cell type?
Competitors?Competitors?
IP?IP?
![Page 46: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/46.jpg)
Data quality issue and CVSP
– Robochemistry
– Proliferation of errors in public and private databases
– Automated quality control system
![Page 47: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/47.jpg)
DrugBank dataset (6516 records)
J. Brechner, IUPACGraphical Representation of stereochem. configurationsSection: ST-1.1.10
DB06287
![Page 48: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/48.jpg)
Big DataChemical SpaceDrug Discovery pipelineMachine learningTraining setsRSC/ChemSpider platformsRSC/ArchiveResearch data managementData quality, crowdsourcing and AltMetricsBuilding Global Chemistry Network
![Page 49: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/49.jpg)
Research data management
University 1
Data Hub
Workstations
University 2
Data Hub
Workstations
Company 3
Data Hub
Workstations
Data Repositoryindexed storage
Data Repository provideddata storage
Chemically intelligent services
Indexes
Data
External clients Publishers
Scientists Funding bodies
![Page 50: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/50.jpg)
Big DataChemical SpaceDrug Discovery pipelineMachine learningTraining setsRSC/ChemSpider platformsRSC/ArchiveResearch data managementData quality, crowdsourcing and AltMetricsBuilding Global Chemistry Network
![Page 51: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/51.jpg)
Crowdsourcing
![Page 52: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/52.jpg)
AltMetrics
![Page 53: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/53.jpg)
RSC/Rewards and Recognition
Congratulations! Your 1st CSSP article has been published. Philosopher Lao Tzu said “A journey of a thousand miles begins with a single step”. In the same way we hope that this will be the first of many submissions that you make to CSSP.
The First Step badge is awarded when a user submits (& has published) their 1st CSSP article.
![Page 54: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/54.jpg)
Big DataChemical SpaceDrug Discovery pipelineMachine learningTraining setsRSC/ChemSpider platformsRSC/ArchiveResearch data managementVisualization and navigationBuilding Global Chemistry Network
![Page 55: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/55.jpg)
Visualization
![Page 56: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/56.jpg)
Visualization and navigation
![Page 58: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/58.jpg)
Big DataChemical SpaceDrug Discovery pipelineMachine learningTraining setsRSC/ChemSpider platformsRSC/ArchiveResearch data managementData quality, crowdsourcing and AltMetricsBuilding Global Chemistry Network
![Page 59: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/59.jpg)
We are a part of a larger world
![Page 60: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/60.jpg)
ChemSpider APIs
![Page 61: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/61.jpg)
National Chemistry Database
![Page 62: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/62.jpg)
http://www.openphacts.org
Open PHACTS is an Innovative Medicines Initiative (IMI) project, aiming to reduce the barriers to
drug discovery in industry, academia and for small
businesses.
Semantic web is one of the corner stones
![Page 63: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/63.jpg)
![Page 64: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics](https://reader038.vdocuments.site/reader038/viewer/2022110118/554e7d92b4c9054a698b529a/html5/thumbnails/64.jpg)
OSDD