bradley open notebook science georgia tech oa week

71
Open Notebook Science: Transparency in Research Jean-Claude Bradley October 23, 2012 Georgia Tech Library Associate Professor of Chemistry Drexel University Open Access Week

Upload: jean-claude-bradley

Post on 22-Dec-2014

669 views

Category:

Education


0 download

DESCRIPTION

Jean-Claude Bradley presents on Open Notebook Science: Transparency in Research on October 23, 2012 at Georgia Tech for Open Access Week. Topics include solubility, melting points, a recrystallization app, the Chemical Information Retrieval class at Drexel University and the Open Chemical Property Matrix (OCPM). YouTube recording here: http://youtu.be/XpRyfdNuMrQ

TRANSCRIPT

Page 1: Bradley Open Notebook Science Georgia Tech OA week

Open Notebook Science: Transparency

in Research

Jean-Claude Bradley

October 23, 2012

Georgia Tech Library

Associate Professor of ChemistryDrexel University

Open Access Week

Page 2: Bradley Open Notebook Science Georgia Tech OA week

Openness in Chemistry

WHY?

Page 3: Bradley Open Notebook Science Georgia Tech OA week

Dibenzalacetone derivatives docking against tubulin (paclitaxel site)

(Andrew Lang)

Page 4: Bradley Open Notebook Science Georgia Tech OA week

“Simple” aldol condensation synthesis

Top Hit(no reports of synthesis)

In top ten(a few reports of synthesis)(Andrew Lang)

Page 5: Bradley Open Notebook Science Georgia Tech OA week

What is the current standard for “sufficient information” in

communicating organic chemistry?

By definition, all peer-reviewed published documentation has been approved as sufficient by authors, editors and reviewers.

Page 6: Bradley Open Notebook Science Georgia Tech OA week

Searching for aldol condensations of acetone in the Reaction Attempts

database

(Andrew Lang)

Page 7: Bradley Open Notebook Science Georgia Tech OA week

Information from the literature on the target synthesis

Page 8: Bradley Open Notebook Science Georgia Tech OA week

Information from the literature on the target synthesis

Page 9: Bradley Open Notebook Science Georgia Tech OA week

Information from the literature on the target synthesis

Page 10: Bradley Open Notebook Science Georgia Tech OA week

A successful synthesis by avoiding water, dramatically increasing NaOH and long reaction

time

Page 11: Bradley Open Notebook Science Georgia Tech OA week

An example of a failed experiment in an Open Notebook with useful information

Page 12: Bradley Open Notebook Science Georgia Tech OA week

A failed experiment reveals the importance of aldehyde solubility

Page 13: Bradley Open Notebook Science Georgia Tech OA week

Motivation: Faster Science, Better Science

Page 14: Bradley Open Notebook Science Georgia Tech OA week

An example of a successful experiment in an Open Notebook

Page 15: Bradley Open Notebook Science Georgia Tech OA week

Never having to leave the Google Spreadsheet dashboard for access to key info

(Andrew Lang and Rich Apodaca)

Page 16: Bradley Open Notebook Science Georgia Tech OA week

A click away from an interactive NMR display (using JCAMP-DX format and ChemDoodle)

(Andrew Lang)

Page 17: Bradley Open Notebook Science Georgia Tech OA week

Contributing to Science while Teaching it:

Chemical Information Retrieval Class

Page 18: Bradley Open Notebook Science Georgia Tech OA week

The Chemical Information Validation Sheet

567 curated and referenced measurements from Fall 2010 Chemical Information Retrieval course

Page 19: Bradley Open Notebook Science Georgia Tech OA week

Discovering outliers for melting points (stdev/average)

Page 20: Bradley Open Notebook Science Georgia Tech OA week

Investigating the m.p. inconsistencies of EGCG

Page 21: Bradley Open Notebook Science Georgia Tech OA week

Investigating the m.p. inconsistencies of cyclohexanone

Page 22: Bradley Open Notebook Science Georgia Tech OA week

Most popular data sources

Page 23: Bradley Open Notebook Science Georgia Tech OA week

Alfa Aesar donates melting points to the public

Page 24: Bradley Open Notebook Science Georgia Tech OA week

Open Melting Point Explorer

(Andrew Lang)

Page 25: Bradley Open Notebook Science Georgia Tech OA week

OutliersMDPI

datasetEPI (donated all data to public

also)

Page 26: Bradley Open Notebook Science Georgia Tech OA week

Outliers for ethanol: Alfa Aesar and Oxford MSDS

Page 27: Bradley Open Notebook Science Georgia Tech OA week

Inconsistencies and SMILES problems within MDPI dataset

Page 28: Bradley Open Notebook Science Georgia Tech OA week

MDPI Dataset labeled with High Trust Level

Page 29: Bradley Open Notebook Science Georgia Tech OA week

Open Melting Point DatasetsCurrently 20,000 compounds with Open MPs

Page 30: Bradley Open Notebook Science Georgia Tech OA week

American Petroleum Institute 5 CPHYSPROP -30 CPHYSPROP 125 Cpeer reviewed journal (2008) 97.5 Cgovernment database -30 Cgovernment database 4.58 C

What is the melting point of 4-benzyltoluene?

Page 31: Bradley Open Notebook Science Georgia Tech OA week

The quest to resolve the melting point of 4-benzyltoluene: liquid at room temp

and can be frozen <-30C

Page 32: Bradley Open Notebook Science Georgia Tech OA week

Open Lab Notebook page measuring the melting point of 4-benzyltoluene

Page 33: Bradley Open Notebook Science Georgia Tech OA week

Ruling out all melting points above -15C?

Page 34: Bradley Open Notebook Science Georgia Tech OA week

Oops – 4-benzyltoluene freezes after 16 days at -15C!

Page 35: Bradley Open Notebook Science Georgia Tech OA week

Measuring the melting point by slowly heating from -15 C gives 5 C

Page 36: Bradley Open Notebook Science Georgia Tech OA week

There are NO FACTS, only measurements embedded

within assumptions

Open Notebook Science maintains the integrity of data

provenance by making assumptions explicit

Page 37: Bradley Open Notebook Science Georgia Tech OA week

Open Random Forest modeling of Open Melting Point data using CDK descriptors

(Andrew Lang)

R2 = 0.78, TPSA and nHdon most important

Page 38: Bradley Open Notebook Science Georgia Tech OA week

Melting point prediction service

Page 39: Bradley Open Notebook Science Georgia Tech OA week

Web services for summary data

(Andrew Lang)

Page 40: Bradley Open Notebook Science Georgia Tech OA week

Using a Google Spreadsheet as a “dashboard interface” for reaction planning and analysis

Page 41: Bradley Open Notebook Science Georgia Tech OA week

Calling Google App Scripts

Page 42: Bradley Open Notebook Science Georgia Tech OA week

Calling Google App Scripts

(Andrew Lang and Rich Apodaca)

Page 43: Bradley Open Notebook Science Georgia Tech OA week

Google Apps Scripts for conveniently exploring melting

point data

Page 44: Bradley Open Notebook Science Georgia Tech OA week

Straight chain carboxylic acids from 1 to 10 carbons

Straight chain alcohols from 1 to 10 carbons

Comparison of model with triple validated measurements

Page 45: Bradley Open Notebook Science Georgia Tech OA week

Cyclic primary amines from 3 to 6 carbons (cyclobutylamine flagged for validation – only single source available)

Page 46: Bradley Open Notebook Science Georgia Tech OA week

Open Melting Points in Supplementary Data Pages of Wikipedia (Martin Walker)

Page 47: Bradley Open Notebook Science Georgia Tech OA week

Google Apps Scripts web services

Page 48: Bradley Open Notebook Science Georgia Tech OA week

Chemistry Google App Scripts description sheet

(Andrew Lang and Rich Apodaca)

Page 49: Bradley Open Notebook Science Georgia Tech OA week

Integration of Multiple Web Services to Recommend Solvents

for Reactions

(Andrew Lang)

Page 50: Bradley Open Notebook Science Georgia Tech OA week

The Recrystallization App

(Andrew Lang)

Page 51: Bradley Open Notebook Science Georgia Tech OA week

The importance of recrystallization

• Generally preferred if there is a known solvent that gives a good yield

• Scales much more easily and cheaply than chromatography

• However, for new compounds much trial and error may be needed

Page 52: Bradley Open Notebook Science Georgia Tech OA week

How does it work?

1. Look up the solvent boiling point

2. Look up the room temperature solubility or predict it via Abraham descriptors predicted from a model using the CDK

3. Look up the solute melting point or predict it via a model using the CDK

4. Use the melting point and the solubility at room temperature to predict the solubility at boiling

5. Calculate the predicted recrystallization yield

Page 53: Bradley Open Notebook Science Georgia Tech OA week

The Recrystallization App produces and uses Open Data:•Open Solubility Collection and Models•Open Melting Point Collection and Models•Modeling depends mainly on CDK (Open Source Software with Open Descriptors)•Open Notebook Science

Page 54: Bradley Open Notebook Science Georgia Tech OA week

What are good solvents to recrystallize benzoic acid?

(Andrew Lang)

Page 55: Bradley Open Notebook Science Georgia Tech OA week

Click on the solvent to see temp curve

(Andrew Lang)

Page 56: Bradley Open Notebook Science Georgia Tech OA week

Deliver melting point data via App

(Andrew Lang)

Page 57: Bradley Open Notebook Science Georgia Tech OA week

Chemical Information Retrieval 2012 property assignment

Page 58: Bradley Open Notebook Science Georgia Tech OA week

Melting Point Outlier List

Page 59: Bradley Open Notebook Science Georgia Tech OA week

Melting Point Outlier example

Page 60: Bradley Open Notebook Science Georgia Tech OA week

Solubility Outlier List

Page 61: Bradley Open Notebook Science Georgia Tech OA week

Solubility of benzoic acid in 1-octanol discrepancies

Page 62: Bradley Open Notebook Science Georgia Tech OA week

Using ChemSpider to ensure all stereocenters are defined before

searching for properties

Page 63: Bradley Open Notebook Science Georgia Tech OA week

Using the InChIKey to find single isomers

Page 64: Bradley Open Notebook Science Georgia Tech OA week

Chemical Information Validation Sheet 2012

Page 65: Bradley Open Notebook Science Georgia Tech OA week

Each entry validated with an image

Page 66: Bradley Open Notebook Science Georgia Tech OA week

Avoiding redundant property data points with a single click within the

validation sheet

Page 67: Bradley Open Notebook Science Georgia Tech OA week

Open Chemical Property Matrix (OCPM)

logP

Abraham descriptors

Melting point

Aqueous solubility

Octanol solubility

Vapor pressure

Flash point

Boiling point

Page 68: Bradley Open Notebook Science Georgia Tech OA week

Open Chemical Property Matrix (OCPM)

Page 69: Bradley Open Notebook Science Georgia Tech OA week

OCPM relationships

Page 70: Bradley Open Notebook Science Georgia Tech OA week

OCPM melting point sheet

Page 71: Bradley Open Notebook Science Georgia Tech OA week

Conclusions

More openness in chemistry can make science more efficient

Provide interfaces that make sense to the end users: Open Data, Open Models and Open Source Software to modelersApps (smartphones, Google App Scripts, etc.) for chemists at the bench

Acknowledgements

Andrew Lang (code, modeling)Bill Acree (modeling, solubility data contribution)Antony Williams (ChemSpider services, mp data curation)Matthew McBride and Rida Atif (recrystallization and synthesis)Kayla Gogarty (OCPM)