building a community resource for the life sciences

71
Building A Community Platform to Support Chemistry and the Life Sciences

Upload: antony-williams-chemconnector-orcid-0000-0002-2668-4821

Post on 17-Jun-2015

681 views

Category:

Technology


0 download

DESCRIPTION

This is a presentation given in Track 4, Open Access and Cheminformatics, at the Bio-IT Meeting in Boston on April 21st 2010. It is a general overview of ChemSpider activities to link together the internet for chemists and validate and curate data. We won the Bio-IT Best Practices Community Service Award that evening also.

TRANSCRIPT

Page 1: Building A Community Resource For The Life Sciences

Building A Community Platform to Support Chemistry and the Life Sciences

Page 2: Building A Community Resource For The Life Sciences

Where Would You look? What Do You Trust?

Page 3: Building A Community Resource For The Life Sciences

Chemistry on the Internet TODAY

Chemistry searches are generally limited to text-based searches across the internet

Data are dirty: sorting the wheat from the chaff. Who can you trust?

Too many searches required to resource data

Page 4: Building A Community Resource For The Life Sciences

Chemistry on the Internet TODAY

Chemistry searches are generally limited to text-based searches across the internet

Data are dirty: sorting the wheat from the chaff. Who can you trust?

Too many searches required to resource data

Page 5: Building A Community Resource For The Life Sciences
Page 6: Building A Community Resource For The Life Sciences
Page 7: Building A Community Resource For The Life Sciences

The Final Search Strategy

Page 8: Building A Community Resource For The Life Sciences

All Those Names, One StructureA problem to solve…

Page 9: Building A Community Resource For The Life Sciences

Chemistry on the Internet TODAY

Chemistry searches are generally limited to text-based searches across the internet

Data are dirty: sorting the wheat from the chaff. Who can you trust?

Too many searches required to resource data

Page 10: Building A Community Resource For The Life Sciences

Trustworthy Chemistry? Encyclopedic articles (Wikipedia) Chemical vendor databases Metabolic pathway databases Property databases Patents with chemical structures Drug Discovery data Scientific publications Compound aggregators Blogs/Wikis and Open Notebook Science

Page 11: Building A Community Resource For The Life Sciences

Where Would You look? What Do You Trust?

Page 12: Building A Community Resource For The Life Sciences

Structural Data for LifeSciencesDailyMed

Page 13: Building A Community Resource For The Life Sciences

Lack of Stereochemisty

Page 14: Building A Community Resource For The Life Sciences

Incorrect Structures

Page 15: Building A Community Resource For The Life Sciences

Ugh…

Page 16: Building A Community Resource For The Life Sciences

Drugs are REALLY Messy

Page 17: Building A Community Resource For The Life Sciences

Vancomycin

Who will curate?

How would you clean such a large dataset?

Assertions!!!

Page 18: Building A Community Resource For The Life Sciences

The EXPERTS must get it right?!

Page 19: Building A Community Resource For The Life Sciences

Wikipedia, C&E News, PubChem C&E News (from ACS)

Page 20: Building A Community Resource For The Life Sciences

Chemistry on the Internet TODAY

Chemistry searches are generally limited to text-based searches across the internet

Data are dirty: sorting the wheat from the chaff. Who can you trust?

Too many searches required to resource data

Page 21: Building A Community Resource For The Life Sciences

Just “Public Compound” Databases

PubChem Drugbank ChEBI/ChEMBL KEGG LipidMAPs ChemIDPlus eMolecules ZINC Lots of chemical vendors ChemSpider

Page 22: Building A Community Resource For The Life Sciences

media.obsessable.com

As few interfaces as possible

What do humans want?

Page 23: Building A Community Resource For The Life Sciences

A Pragmatic Vision“Build a Structure Centric Community to

Serve Chemists”

Integrate chemical structure data on the web Create a “structure-based hub” to information and

data Provide access to structure-based “algorithms” Let chemists contribute their own data Allow the community to curate/correct data

Page 24: Building A Community Resource For The Life Sciences

Answer Questions

Questions a chemist might ask… What is the melting point of n-heptanol? What is the chemical structure of Xanax? Chemically, what is phenolphthalein? What are the stereocenters of cholesterol? Where can I find publications about xylene? What are the different trade names for Ketoconazole? What is the NMR spectrum of Aspirin? What are the safety handling issues for Thymol Blue?

Page 25: Building A Community Resource For The Life Sciences

ChemSpider Searches

Page 26: Building A Community Resource For The Life Sciences
Page 27: Building A Community Resource For The Life Sciences

Search “OEA”

Page 28: Building A Community Resource For The Life Sciences

Search OEA

Page 29: Building A Community Resource For The Life Sciences

Link Farm Connections

Page 30: Building A Community Resource For The Life Sciences

Link Farm Connections

Page 31: Building A Community Resource For The Life Sciences

Search OEA

Page 32: Building A Community Resource For The Life Sciences

Search OEA

Page 33: Building A Community Resource For The Life Sciences

Google Books

Page 34: Building A Community Resource For The Life Sciences

Google Scholar

Page 35: Building A Community Resource For The Life Sciences

Linked Patents for OEA

Page 36: Building A Community Resource For The Life Sciences
Page 37: Building A Community Resource For The Life Sciences

Google Patents

Page 38: Building A Community Resource For The Life Sciences

Microsoft Academic Search

Page 39: Building A Community Resource For The Life Sciences

RSC Journals

Page 40: Building A Community Resource For The Life Sciences

RSC Databases

Page 41: Building A Community Resource For The Life Sciences

Statistics for Today

Almost 25 million compounds from >350 data sources

About 7000 unique users per day and up to ½ million transactions per day

A crowdsourced deposition and curation platform

Grows daily – more depositions, more links, more data

Page 42: Building A Community Resource For The Life Sciences

Searching Chemistry on the Internet

How complete a result set will we get if we search for “chemicals” by name?

Is there a better way to link chemistry databases? Linking by “names” is dangerous

Chemists want structure and SUBstructure searching

Page 43: Building A Community Resource For The Life Sciences

The InChI Identifier

Page 44: Building A Community Resource For The Life Sciences

Multiple Layers

Page 45: Building A Community Resource For The Life Sciences

InChIStrings Hash to InChIKeys

Page 46: Building A Community Resource For The Life Sciences

Link the Internet with InChIKeys!

Taken from: Rafael Sidis’ Blog

Page 47: Building A Community Resource For The Life Sciences

Vancomycin – Search the Internet

Page 48: Building A Community Resource For The Life Sciences

Vancomycin

Search Molecular SKELETON

Search Full Molecule

Page 49: Building A Community Resource For The Life Sciences

Full Molecule Search: 4 Hits

Page 50: Building A Community Resource For The Life Sciences

Full Skeleton Search: 104 Hits

Page 51: Building A Community Resource For The Life Sciences
Page 52: Building A Community Resource For The Life Sciences
Page 53: Building A Community Resource For The Life Sciences
Page 54: Building A Community Resource For The Life Sciences

Vancomycin

Page 55: Building A Community Resource For The Life Sciences

Vancomycin on ChemSpider 1 compound – 3 days

Page 56: Building A Community Resource For The Life Sciences

InChIKeys

RCINICONZNJXQF-MZXODVADSA-N

Make the internet searchable by adding InChIKeys

Publishers add InChIKeys to papers now…

Page 57: Building A Community Resource For The Life Sciences

InChIKeys

RCINICONZNJXQF-MZXODVADSA-N

Make the internet searchable by adding InChIKeys

Publishers add InChIKeys to papers now…

is what???

Page 58: Building A Community Resource For The Life Sciences

The InChI “Resolver”

Page 59: Building A Community Resource For The Life Sciences

InChI Resolver to DOIsStructure Search the Web

Page 60: Building A Community Resource For The Life Sciences

Most Chemistry is NOT Published

Only a fraction of chemistry is published

Only a tiny fraction of chemistry is patented

What of the “Lost Chemistry”- never published and cannot be abstracted Reactions performed Structures made and studied Spectra acquired and then disposed of Available chemicals never found

Page 61: Building A Community Resource For The Life Sciences

Crowd-sourcing Curation and Deposition

Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate

Page 62: Building A Community Resource For The Life Sciences

Building a Structure Centric Community for Chemists

Multi-level Curation and Approval

Page 63: Building A Community Resource For The Life Sciences

Semantic Markup: Project Prospect

Page 64: Building A Community Resource For The Life Sciences

Name-Structure Pairs

Page 65: Building A Community Resource For The Life Sciences

Semantic Linking of Structures

What would you want to link off a structure? Chemical suppliers Other publications Analytical Data Related Reactions Wikipedia Patents “Everything”

Page 66: Building A Community Resource For The Life Sciences

Org Prep Daily (Blog)

Page 67: Building A Community Resource For The Life Sciences

ChemSpider SyntheticPages

Page 68: Building A Community Resource For The Life Sciences

Chemistry on the Internet FUTURE The semantic web for chemistry is in place Crowdsourced contributions are commonplace Chemists will search by structure/substructure Chemistry articles indexed and searchable Reduced number of searches to find data Data are integrated – compounds, vendors,

syntheses, data, publications and patents A world of Open Access and Open Data

Page 69: Building A Community Resource For The Life Sciences

ChemSpider Web Services

Page 70: Building A Community Resource For The Life Sciences
Page 71: Building A Community Resource For The Life Sciences

Thank you

[email protected]: ChemSpidermanwww.chemspider.com/blogSLIDES: www.slideshare.net/AntonyWilliams