bradley sla talk on open melting point collections

47
New Forms of Scholarly Communication in Science The Role of Trust June 15, 2011 Special Libraries Association Jean-Claude Bradley Department of Chemistry Drexel University

Upload: jean-claude-bradley

Post on 11-May-2015

16.457 views

Category:

Education


0 download

DESCRIPTION

Jean-Claude Bradley presented at a panel on New Forms of Scholarly Communication in Science at the Special Libraries Association meeting on June 15, 2011. The talk covered the role of trust in science, with a focus on the validation of melting point data. Where the literature was unable to reconcile measurements, Open Notebook Science was used to clarify. The collection of an Open Dataset of melting point measurements for 20,000 compounds was described as well as ongoing curation efforts and corresponding web services. (collaborators Andrew Lang and Antony Williams)

TRANSCRIPT

Page 1: Bradley SLA Talk on Open Melting Point Collections

New Forms of Scholarly Communication in Science

The Role of Trust

June 15, 2011

Special Libraries Association

Jean-Claude Bradley

Department of ChemistryDrexel University

Page 2: Bradley SLA Talk on Open Melting Point Collections

Unknown Perils of the Past

Before online databases (early 90s) searching for properties like melting

points using ONE “trusted source” was practical

• CRC Handbook•Merck Index• Chemical Vendor Catalogs (e.g. Sigma-Aldrich)• Peer-Reviewed Journals

Page 3: Bradley SLA Talk on Open Melting Point Collections

Known Perils of the Present

Today, many librarians discourage the use of new online sources (like Wikipedia) for the

searching of chemical data and recommend using only “trusted sources”

The problem is that the “trusted source” model is - and always was – fundamentally

flawed.

Ironically most of Wikipedia’s chemical information is problematic BECAUSE it is based

on “trusted sources”!

Page 4: Bradley SLA Talk on Open Melting Point Collections

Promises for the Future

Using technology, we can begin to replace the “trusted source”

model with one based on transparency and provenance

Page 5: Bradley SLA Talk on Open Melting Point Collections

The current state of transparency in scientific communication

Case study of melting point data

Page 6: Bradley SLA Talk on Open Melting Point Collections

The Chemical Information Validation Sheet

567 curated and referenced measurements from Fall 2010 Chemical Information Retrieval course

Page 7: Bradley SLA Talk on Open Melting Point Collections

Discovering outliers for melting points (stdev/average)

Page 8: Bradley SLA Talk on Open Melting Point Collections

Investigating the m.p. inconsistencies of EGCG

Page 9: Bradley SLA Talk on Open Melting Point Collections

Investigating the m.p. inconsistencies of cyclohexanone

Page 10: Bradley SLA Talk on Open Melting Point Collections

Most popular data sources

Page 11: Bradley SLA Talk on Open Melting Point Collections

Alfa Aesar donates melting points to the public

Page 12: Bradley SLA Talk on Open Melting Point Collections

Open Melting Point Explorer

(Andrew Lang)

Page 13: Bradley SLA Talk on Open Melting Point Collections

OutliersMDPI

datasetEPI (donated all data to public

also)

Page 14: Bradley SLA Talk on Open Melting Point Collections

Outliers for ethanol: Alfa Aesar and Oxford MSDS

Page 15: Bradley SLA Talk on Open Melting Point Collections

Inconsistencies and SMILES problems within MDPI dataset

Page 16: Bradley SLA Talk on Open Melting Point Collections

MDPI Dataset labeled with High Trust Level

Page 17: Bradley SLA Talk on Open Melting Point Collections

Open Melting Point DatasetsCurrently 20,000 compounds with Open MPs

Page 18: Bradley SLA Talk on Open Melting Point Collections

Live curation on a public Google Spreadsheet of compounds with highest mp ranges

(collaboration with Andrew Lang and Antony Williams)

Page 19: Bradley SLA Talk on Open Melting Point Collections

Some melting points can’t be resolved only with literature: 4-benzyltoluene

Page 20: Bradley SLA Talk on Open Melting Point Collections

The quest to resolve the melting point of 4-benzyltoluene: liquid at room temp

and can be frozen <-30C

Page 21: Bradley SLA Talk on Open Melting Point Collections

The quest to resolve the melting point of 4-benzyltoluene: ambiguous results upon heating

but clearly remains a liquid at -15 C for 2 days in freezer

Page 22: Bradley SLA Talk on Open Melting Point Collections

Further investigation into the literature for the melting point of 4-benzyltoluene

Although a general description of method is provided the raw data are

not

Page 23: Bradley SLA Talk on Open Melting Point Collections

Because of broken provenance errors cascade through the literature

Calculations in patent based on incorrect data

Page 24: Bradley SLA Talk on Open Melting Point Collections

Open Random Forest modeling of Open Melting Point data using CDK descriptors

(Andrew Lang)

R2 = 0.78, TPSA and nHdon most important

Page 25: Bradley SLA Talk on Open Melting Point Collections

Melting point prediction service

Page 26: Bradley SLA Talk on Open Melting Point Collections

Melting point predictions and measurements on iPhone/iPad (Andrew Lang and Alex Clark)

Page 27: Bradley SLA Talk on Open Melting Point Collections

Using melting point for temperature dependent solubility prediction

Page 28: Bradley SLA Talk on Open Melting Point Collections

Motivation: Faster Science, Better Science

Page 29: Bradley SLA Talk on Open Melting Point Collections

There are NO FACTS, only measurements embedded

within assumptions

Open Notebook Science maintains the integrity of data

provenance by making assumptions explicit

Page 30: Bradley SLA Talk on Open Melting Point Collections

TRUST

PROOF

Page 31: Bradley SLA Talk on Open Melting Point Collections

Crowdsourcing Solubility Data

Page 32: Bradley SLA Talk on Open Melting Point Collections

Data provenance: From Wikipedia to…

Page 33: Bradley SLA Talk on Open Melting Point Collections

…the lab notebook and raw data

Page 34: Bradley SLA Talk on Open Melting Point Collections

Solubilities collected in a Google Spreadsheet

Page 35: Bradley SLA Talk on Open Melting Point Collections

Web services for summary data

(Andrew Lang)

Page 36: Bradley SLA Talk on Open Melting Point Collections

Web service calls from within a Google Spreadsheet for solubility measurement and

prediction

(Andrew Lang)

Page 37: Bradley SLA Talk on Open Melting Point Collections

Integration of Multiple Web Services to Recommend Solvents for Reactions

(Andrew Lang)

Page 38: Bradley SLA Talk on Open Melting Point Collections
Page 39: Bradley SLA Talk on Open Melting Point Collections
Page 40: Bradley SLA Talk on Open Melting Point Collections
Page 41: Bradley SLA Talk on Open Melting Point Collections

Reaction Attempts Book

Page 42: Bradley SLA Talk on Open Melting Point Collections

Reaction Attempts Book: Reactants listed Alphabetically

Page 43: Bradley SLA Talk on Open Melting Point Collections

ONS Challenge Solubility Book cited for nanotechnology application

Page 44: Bradley SLA Talk on Open Melting Point Collections

All ONS web services

Page 45: Bradley SLA Talk on Open Melting Point Collections

For all Formats of ONS Projects

Page 46: Bradley SLA Talk on Open Melting Point Collections

For all Formats of ONS Projects

Page 47: Bradley SLA Talk on Open Melting Point Collections

Conclusions

• For science to progress quickly there is great benefit in moving away from a “trusted source” model to one based on transparency and data provenance

•Open Notebook Science offers an efficient way to make research transparent and discoverable