bradley sla talk on open melting point collections
DESCRIPTION
Jean-Claude Bradley presented at a panel on New Forms of Scholarly Communication in Science at the Special Libraries Association meeting on June 15, 2011. The talk covered the role of trust in science, with a focus on the validation of melting point data. Where the literature was unable to reconcile measurements, Open Notebook Science was used to clarify. The collection of an Open Dataset of melting point measurements for 20,000 compounds was described as well as ongoing curation efforts and corresponding web services. (collaborators Andrew Lang and Antony Williams)TRANSCRIPT
New Forms of Scholarly Communication in Science
The Role of Trust
June 15, 2011
Special Libraries Association
Jean-Claude Bradley
Department of ChemistryDrexel University
Unknown Perils of the Past
Before online databases (early 90s) searching for properties like melting
points using ONE “trusted source” was practical
• CRC Handbook•Merck Index• Chemical Vendor Catalogs (e.g. Sigma-Aldrich)• Peer-Reviewed Journals
Known Perils of the Present
Today, many librarians discourage the use of new online sources (like Wikipedia) for the
searching of chemical data and recommend using only “trusted sources”
The problem is that the “trusted source” model is - and always was – fundamentally
flawed.
Ironically most of Wikipedia’s chemical information is problematic BECAUSE it is based
on “trusted sources”!
Promises for the Future
Using technology, we can begin to replace the “trusted source”
model with one based on transparency and provenance
The current state of transparency in scientific communication
Case study of melting point data
The Chemical Information Validation Sheet
567 curated and referenced measurements from Fall 2010 Chemical Information Retrieval course
Discovering outliers for melting points (stdev/average)
Investigating the m.p. inconsistencies of EGCG
Investigating the m.p. inconsistencies of cyclohexanone
Most popular data sources
Alfa Aesar donates melting points to the public
Open Melting Point Explorer
(Andrew Lang)
OutliersMDPI
datasetEPI (donated all data to public
also)
Outliers for ethanol: Alfa Aesar and Oxford MSDS
Inconsistencies and SMILES problems within MDPI dataset
MDPI Dataset labeled with High Trust Level
Open Melting Point DatasetsCurrently 20,000 compounds with Open MPs
Live curation on a public Google Spreadsheet of compounds with highest mp ranges
(collaboration with Andrew Lang and Antony Williams)
Some melting points can’t be resolved only with literature: 4-benzyltoluene
The quest to resolve the melting point of 4-benzyltoluene: liquid at room temp
and can be frozen <-30C
The quest to resolve the melting point of 4-benzyltoluene: ambiguous results upon heating
but clearly remains a liquid at -15 C for 2 days in freezer
Further investigation into the literature for the melting point of 4-benzyltoluene
Although a general description of method is provided the raw data are
not
Because of broken provenance errors cascade through the literature
Calculations in patent based on incorrect data
Open Random Forest modeling of Open Melting Point data using CDK descriptors
(Andrew Lang)
R2 = 0.78, TPSA and nHdon most important
Melting point prediction service
Melting point predictions and measurements on iPhone/iPad (Andrew Lang and Alex Clark)
Using melting point for temperature dependent solubility prediction
Motivation: Faster Science, Better Science
There are NO FACTS, only measurements embedded
within assumptions
Open Notebook Science maintains the integrity of data
provenance by making assumptions explicit
TRUST
PROOF
Crowdsourcing Solubility Data
Data provenance: From Wikipedia to…
…the lab notebook and raw data
Solubilities collected in a Google Spreadsheet
Web services for summary data
(Andrew Lang)
Web service calls from within a Google Spreadsheet for solubility measurement and
prediction
(Andrew Lang)
Integration of Multiple Web Services to Recommend Solvents for Reactions
(Andrew Lang)
Reaction Attempts Book
Reaction Attempts Book: Reactants listed Alphabetically
ONS Challenge Solubility Book cited for nanotechnology application
All ONS web services
For all Formats of ONS Projects
For all Formats of ONS Projects
Conclusions
• For science to progress quickly there is great benefit in moving away from a “trusted source” model to one based on transparency and data provenance
•Open Notebook Science offers an efficient way to make research transparent and discoverable