chemmodlab: a web-based cheminformatics modeling laboratory s. stanley young + eccr and chemspider...

26
ChemModLab: A Web- ChemModLab: A Web- based Cheminformatics based Cheminformatics Modeling Laboratory Modeling Laboratory S. Stanley Young + ECCR S. Stanley Young + ECCR and and ChemSpider Teams ChemSpider Teams

Upload: barry-price

Post on 01-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

ChemModLab: A Web-ChemModLab: A Web-based based

Cheminformatics Cheminformatics Modeling LaboratoryModeling Laboratory

S. Stanley Young + ECCRS. Stanley Young + ECCR and and ChemSpider TeamsChemSpider Teams

S. Stanley Young + ECCR and S. Stanley Young + ECCR and ChemSpider TeamsChemSpider Teams

ChemSpider : A Web-based ChemSpider : A Web-based Chemical Informatics ResourceChemical Informatics Resource

3

What is What is ChemSpider?ChemSpider?

ChemSpider is a molecular structure-ChemSpider is a molecular structure-centric web service for chemists:centric web service for chemists: Chemical structure drawing, manipulation, Chemical structure drawing, manipulation,

visualization, modeling & databasingvisualization, modeling & databasing Web location to deposit, curate and enhance Web location to deposit, curate and enhance

data associated with chemical structuresdata associated with chemical structures Web structure-based access to federated Web structure-based access to federated

chemistry databases representing chemical chemistry databases representing chemical vendors, literature, online data, patents and vendors, literature, online data, patents and other forms of chemistry data other forms of chemistry data

4

How do people generally use How do people generally use ChemSpider?ChemSpider?

Searching for chemical structures, in rank Searching for chemical structures, in rank order, via:order, via: Registry numbers, trade names and synonyms. Registry numbers, trade names and synonyms. Structure identifiers such as SMILES or InChIStructure identifiers such as SMILES or InChI Intrinsic properties: commonly mass-based Intrinsic properties: commonly mass-based

searches executed by mass spectrometristssearches executed by mass spectrometrists By systematic names: IUPAC or CAS Index nameBy systematic names: IUPAC or CAS Index name

Generation of physicochemical propertiesGeneration of physicochemical properties Text-based searching of Open Access Text-based searching of Open Access

articlesarticles

5

ChemSpider Status ChemSpider Status August 2007August 2007

Online database of over Online database of over 16.5 million16.5 million structures structures Systems in place for: Systems in place for:

Single structure and data collection depositionsSingle structure and data collection depositions Association of analytical data with structuresAssociation of analytical data with structures Ability to curate data for each individual recordAbility to curate data for each individual record

Indexing of and Integration to:Indexing of and Integration to: Over 70 individual databasesOver 70 individual databases Patents from the US, European and Asian Patent officesPatents from the US, European and Asian Patent offices

Text-based searching of over Text-based searching of over 50,000 Open Access 50,000 Open Access articlesarticles

Over a thousand unique users access ChemSpider Over a thousand unique users access ChemSpider per dayper day

6

Flexible Boolean SearchingFlexible Boolean Searching

7

Predicted Properties Details Predicted Properties Details “Prozac”“Prozac”

8

Search result: 49 hits in 2.8 Search result: 49 hits in 2.8 secondsseconds

9

Integrated Visualization ToolsIntegrated Visualization Tools

10

External Integrations - External Integrations - WikipediaWikipedia

The links between Wikipedia and ChemSpider are formed automatically

11

What is What is ChemModLab?ChemModLab?

ChemModLab is a Web Service for building ChemModLab is a Web Service for building and evaluating QSAR models.and evaluating QSAR models.

Send your data: assay results and SD file.Send your data: assay results and SD file.

Use any or all of five descriptor types (2D).Use any or all of five descriptor types (2D). (Use your own descriptors)(Use your own descriptors)

Use any or all of 16 statistical modeling Use any or all of 16 statistical modeling methods.methods.

Predict potency of untested compound. Predict potency of untested compound.

12

Virtual Virtual ScreeningScreening

ChemSpiderChemModLab

13

ChemModLab ChemModLab Dialog Dialog (1)(1)

Data Input

14

ChemModLab ChemModLab Dialog Dialog (2)(2)Five 2D Descriptor Sets

15

ChemModLab ChemModLab Dialogue Dialogue (3)(3)

16 Modeling Methods

16

ChemModLab Modeling ChemModLab Modeling MethodsMethods

16 Statistical Modeling Methods•Trees: RandomForest, rpart, tree• Neural networks• k-nearest neighbors• Support vector machines• Partial least squares• Partial least squares with linear discriminant analysis• Least angle regression• Ridge regression• Elastic net• Principal components regression• Family ensemble of k-nearest neighbors, using 70% selection• Family ensemble of tree, using 70% selection• Family ensemble of rpart, using 70% selection• randomForest using 70% selection

17

ECCR@NCSU + ChemSpider ECCR@NCSU + ChemSpider

PlanPlan

User submits data to ChemModLab to get QSAR Model(s).

Model is sent to ChemSpider.

ChemSpider computes a “virtual screen”.

The hit-list is clustered and sent to the user.

18

Accumulation curvesCompare descriptor sets, given a method

19

Accumulation Curves

Compare modeling methods, given a descriptor set

20

Diversity Diversity MapMap

ClusterActive

Compounds

Modeling Methods

21

ContinuContinuousous

ResponsResponsee

22

Continuous Continuous ResponseResponse

23

ContinuContinuousous

ResponsResponsee

24

ModelModelEvaluatiEvaluati

ononTake detailed looks at which

models?

AID348 (NCGC):KNN – PhENet – CAPRF – B#RF – CAPRF – FFTree – CAPTree – PhTree – FFPLS – CAP

25

SummarSummaryy

1.ChemSpider is a web chemical informatics center.

2.ChemModLab is a free, web service for QSAR.

3.Together they support sophisticated virtual screening.

* ChemModLab is supported by the NCI RoadMap project.

26

ECCR@NCSU Group ECCR@NCSU Group ChemSpider GroupChemSpider Group

ChemModLab Team

Jacqueline M. Hughes-OliverAtina D. Brooks Gary W. HowellKirtesh PatilStan YoungQianyi Zhang

ChemSpider Team

Antony Williams (project lead)

A rotating team of advisors and developers including many contributions from the Open Source community

eccr.stat.ncsu.edu www.chemspider.com