bradley open notebook science georgia tech oa week
DESCRIPTION
Jean-Claude Bradley presents on Open Notebook Science: Transparency in Research on October 23, 2012 at Georgia Tech for Open Access Week. Topics include solubility, melting points, a recrystallization app, the Chemical Information Retrieval class at Drexel University and the Open Chemical Property Matrix (OCPM). YouTube recording here: http://youtu.be/XpRyfdNuMrQTRANSCRIPT
Open Notebook Science: Transparency
in Research
Jean-Claude Bradley
October 23, 2012
Georgia Tech Library
Associate Professor of ChemistryDrexel University
Open Access Week
Openness in Chemistry
WHY?
Dibenzalacetone derivatives docking against tubulin (paclitaxel site)
(Andrew Lang)
“Simple” aldol condensation synthesis
Top Hit(no reports of synthesis)
In top ten(a few reports of synthesis)(Andrew Lang)
What is the current standard for “sufficient information” in
communicating organic chemistry?
By definition, all peer-reviewed published documentation has been approved as sufficient by authors, editors and reviewers.
Searching for aldol condensations of acetone in the Reaction Attempts
database
(Andrew Lang)
Information from the literature on the target synthesis
Information from the literature on the target synthesis
Information from the literature on the target synthesis
A successful synthesis by avoiding water, dramatically increasing NaOH and long reaction
time
An example of a failed experiment in an Open Notebook with useful information
A failed experiment reveals the importance of aldehyde solubility
Motivation: Faster Science, Better Science
An example of a successful experiment in an Open Notebook
Never having to leave the Google Spreadsheet dashboard for access to key info
(Andrew Lang and Rich Apodaca)
A click away from an interactive NMR display (using JCAMP-DX format and ChemDoodle)
(Andrew Lang)
Contributing to Science while Teaching it:
Chemical Information Retrieval Class
The Chemical Information Validation Sheet
567 curated and referenced measurements from Fall 2010 Chemical Information Retrieval course
Discovering outliers for melting points (stdev/average)
Investigating the m.p. inconsistencies of EGCG
Investigating the m.p. inconsistencies of cyclohexanone
Most popular data sources
Alfa Aesar donates melting points to the public
Open Melting Point Explorer
(Andrew Lang)
OutliersMDPI
datasetEPI (donated all data to public
also)
Outliers for ethanol: Alfa Aesar and Oxford MSDS
Inconsistencies and SMILES problems within MDPI dataset
MDPI Dataset labeled with High Trust Level
Open Melting Point DatasetsCurrently 20,000 compounds with Open MPs
American Petroleum Institute 5 CPHYSPROP -30 CPHYSPROP 125 Cpeer reviewed journal (2008) 97.5 Cgovernment database -30 Cgovernment database 4.58 C
What is the melting point of 4-benzyltoluene?
The quest to resolve the melting point of 4-benzyltoluene: liquid at room temp
and can be frozen <-30C
Open Lab Notebook page measuring the melting point of 4-benzyltoluene
Ruling out all melting points above -15C?
Oops – 4-benzyltoluene freezes after 16 days at -15C!
Measuring the melting point by slowly heating from -15 C gives 5 C
There are NO FACTS, only measurements embedded
within assumptions
Open Notebook Science maintains the integrity of data
provenance by making assumptions explicit
Open Random Forest modeling of Open Melting Point data using CDK descriptors
(Andrew Lang)
R2 = 0.78, TPSA and nHdon most important
Melting point prediction service
Web services for summary data
(Andrew Lang)
Using a Google Spreadsheet as a “dashboard interface” for reaction planning and analysis
Calling Google App Scripts
Calling Google App Scripts
(Andrew Lang and Rich Apodaca)
Google Apps Scripts for conveniently exploring melting
point data
Straight chain carboxylic acids from 1 to 10 carbons
Straight chain alcohols from 1 to 10 carbons
Comparison of model with triple validated measurements
Cyclic primary amines from 3 to 6 carbons (cyclobutylamine flagged for validation – only single source available)
Open Melting Points in Supplementary Data Pages of Wikipedia (Martin Walker)
Google Apps Scripts web services
Chemistry Google App Scripts description sheet
(Andrew Lang and Rich Apodaca)
Integration of Multiple Web Services to Recommend Solvents
for Reactions
(Andrew Lang)
The Recrystallization App
(Andrew Lang)
The importance of recrystallization
• Generally preferred if there is a known solvent that gives a good yield
• Scales much more easily and cheaply than chromatography
• However, for new compounds much trial and error may be needed
How does it work?
1. Look up the solvent boiling point
2. Look up the room temperature solubility or predict it via Abraham descriptors predicted from a model using the CDK
3. Look up the solute melting point or predict it via a model using the CDK
4. Use the melting point and the solubility at room temperature to predict the solubility at boiling
5. Calculate the predicted recrystallization yield
The Recrystallization App produces and uses Open Data:•Open Solubility Collection and Models•Open Melting Point Collection and Models•Modeling depends mainly on CDK (Open Source Software with Open Descriptors)•Open Notebook Science
What are good solvents to recrystallize benzoic acid?
(Andrew Lang)
Click on the solvent to see temp curve
(Andrew Lang)
Deliver melting point data via App
(Andrew Lang)
Chemical Information Retrieval 2012 property assignment
Melting Point Outlier List
Melting Point Outlier example
Solubility Outlier List
Solubility of benzoic acid in 1-octanol discrepancies
Using ChemSpider to ensure all stereocenters are defined before
searching for properties
Using the InChIKey to find single isomers
Chemical Information Validation Sheet 2012
Each entry validated with an image
Avoiding redundant property data points with a single click within the
validation sheet
Open Chemical Property Matrix (OCPM)
logP
Abraham descriptors
Melting point
Aqueous solubility
Octanol solubility
Vapor pressure
Flash point
Boiling point
Open Chemical Property Matrix (OCPM)
OCPM relationships
OCPM melting point sheet
Conclusions
More openness in chemistry can make science more efficient
Provide interfaces that make sense to the end users: Open Data, Open Models and Open Source Software to modelersApps (smartphones, Google App Scripts, etc.) for chemists at the bench
Acknowledgements
Andrew Lang (code, modeling)Bill Acree (modeling, solubility data contribution)Antony Williams (ChemSpider services, mp data curation)Matthew McBride and Rida Atif (recrystallization and synthesis)Kayla Gogarty (OCPM)