indigo knime nodes · i indigo toolkit api: c#, java, python i bingo for postgresql mikhail...

29
Open-source software Indigo KNIME nodes Mikhail Rybalkin KNIME Open Source Days October 6, 2011

Upload: others

Post on 25-Jul-2020

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Indigo KNIME nodes · I Indigo toolkit API: C#, Java, Python I Bingo for PostgreSQL Mikhail Rybalkin Indigo KNIME nodes October 6, 2011. History of Indigo projects I Di erent algorithms

Open-source software

Indigo KNIME nodes

Mikhail Rybalkin

KNIME Open Source DaysOctober 6, 2011

Page 2: Indigo KNIME nodes · I Indigo toolkit API: C#, Java, Python I Bingo for PostgreSQL Mikhail Rybalkin Indigo KNIME nodes October 6, 2011. History of Indigo projects I Di erent algorithms

GGA overview

I Saint-Petersburg & Boston since 1995

I 380 employees:

development department: 150content department: 100QA department: 70

I Projects with ...

I http://ggasoftware.com

Mikhail Rybalkin Indigo KNIME nodes October 6, 2011

Page 3: Indigo KNIME nodes · I Indigo toolkit API: C#, Java, Python I Bingo for PostgreSQL Mikhail Rybalkin Indigo KNIME nodes October 6, 2011. History of Indigo projects I Di erent algorithms

GGA projects

- cheminformatics toolkit- database search cartridge

Mass spectrometry

Molecular dynamics

Sequence processing

High-throughput screening

GPGPU, cluster computing

Image analysys and recognition

IndigoBingo

Open-sourceprojects

Proprietarycontract-basedprojects

Mikhail Rybalkin Indigo KNIME nodes October 6, 2011

Page 4: Indigo KNIME nodes · I Indigo toolkit API: C#, Java, Python I Bingo for PostgreSQL Mikhail Rybalkin Indigo KNIME nodes October 6, 2011. History of Indigo projects I Di erent algorithms

History of Indigo projects

I Different algorithms

I Bingo for Oracle

I Bingo for SQL Server

I Ketcher - molecule sketcher:bingo-demo.ggasoftware.com/

I Indigo toolkit API:C#, Java, Python

I Bingo for PostgreSQL

O

O

C

C CCl

Cl

O

OCl

Cl

ON

N

C

O

NN

C

C

O

NN

C

NO

ON

N

N

ON

N

O

N

O

C

C

O

C

N

C

N O

ON

N

C O

O

O

N

NO

O

N

O

N N

NN

OO

OO

30M

Mikhail Rybalkin Indigo KNIME nodes October 6, 2011

Page 5: Indigo KNIME nodes · I Indigo toolkit API: C#, Java, Python I Bingo for PostgreSQL Mikhail Rybalkin Indigo KNIME nodes October 6, 2011. History of Indigo projects I Di erent algorithms

History of Indigo projects

I Different algorithms

I Bingo for Oracle

I Bingo for SQL Server

I Ketcher - molecule sketcher:bingo-demo.ggasoftware.com/

I Indigo toolkit API:C#, Java, Python

I Bingo for PostgreSQL

O

O

C

C CCl

Cl

O

OCl

Cl

ON

N

C

O

NN

C

C

O

NN

C

NO

ON

N

N

ON

N

O

N

O

C

C

O

C

N

C

N O

ON

N

C O

O

O

N

NO

O

N

O

N N

NN

OO

OO

30M

Mikhail Rybalkin Indigo KNIME nodes October 6, 2011

Page 6: Indigo KNIME nodes · I Indigo toolkit API: C#, Java, Python I Bingo for PostgreSQL Mikhail Rybalkin Indigo KNIME nodes October 6, 2011. History of Indigo projects I Di erent algorithms

History of Indigo projects

I Different algorithms

I Bingo for Oracle

I Bingo for SQL Server

I Ketcher - molecule sketcher:bingo-demo.ggasoftware.com/

I Indigo toolkit API:C#, Java, Python

I Bingo for PostgreSQL

O

O

C

C CCl

Cl

O

OCl

Cl

ON

N

C

O

NN

C

C

O

NN

C

NO

ON

N

N

ON

N

O

N

O

C

C

O

C

N

C

N O

ON

N

C O

O

O

N

NO

O

N

O

N N

NN

OO

OO

30M

Mikhail Rybalkin Indigo KNIME nodes October 6, 2011

Page 7: Indigo KNIME nodes · I Indigo toolkit API: C#, Java, Python I Bingo for PostgreSQL Mikhail Rybalkin Indigo KNIME nodes October 6, 2011. History of Indigo projects I Di erent algorithms

History of Indigo projects

I Different algorithms

I Bingo for Oracle

I Bingo for SQL Server

I Ketcher - molecule sketcher:bingo-demo.ggasoftware.com/

I Indigo toolkit API:C#, Java, Python

I Bingo for PostgreSQL

O

O

C

C CCl

Cl

O

OCl

Cl

ON

N

C

O

NN

C

C

O

NN

C

NO

ON

N

N

ON

N

O

N

O

C

C

O

C

N

C

N O

ON

N

C O

O

O

N

NO

O

N

O

N N

NN

OO

OO

30M

Mikhail Rybalkin Indigo KNIME nodes October 6, 2011

Page 8: Indigo KNIME nodes · I Indigo toolkit API: C#, Java, Python I Bingo for PostgreSQL Mikhail Rybalkin Indigo KNIME nodes October 6, 2011. History of Indigo projects I Di erent algorithms

Why Open Source?

I Bussiness model

I Relationship with the scientific commnity

I Feedback from the commnityI SuggestionsI Testing

I Indigo modules are used in commercial projects

Mikhail Rybalkin Indigo KNIME nodes October 6, 2011

Page 9: Indigo KNIME nodes · I Indigo toolkit API: C#, Java, Python I Bingo for PostgreSQL Mikhail Rybalkin Indigo KNIME nodes October 6, 2011. History of Indigo projects I Di erent algorithms

Why Open Source?

I Bussiness model

I Relationship with the scientific commnity

I Feedback from the commnityI SuggestionsI Testing

I Indigo modules are used in commercial projects

Mikhail Rybalkin Indigo KNIME nodes October 6, 2011

Page 10: Indigo KNIME nodes · I Indigo toolkit API: C#, Java, Python I Bingo for PostgreSQL Mikhail Rybalkin Indigo KNIME nodes October 6, 2011. History of Indigo projects I Di erent algorithms

Why Open Source?

I Bussiness model

I Relationship with the scientific commnity

I Feedback from the commnityI SuggestionsI Testing

I Indigo modules are used in commercial projects

Mikhail Rybalkin Indigo KNIME nodes October 6, 2011

Page 11: Indigo KNIME nodes · I Indigo toolkit API: C#, Java, Python I Bingo for PostgreSQL Mikhail Rybalkin Indigo KNIME nodes October 6, 2011. History of Indigo projects I Di erent algorithms

Introduction

Indigo SDK is an open-source cheminformatics library.

Goals of Indigo

I Easy access to the library of GGA cheminformatics algorithms

I Portability across OS and programming languages

I Extensibility with plugins (incl. third-party)

Mikhail Rybalkin Indigo KNIME nodes October 6, 2011

Page 12: Indigo KNIME nodes · I Indigo toolkit API: C#, Java, Python I Bingo for PostgreSQL Mikhail Rybalkin Indigo KNIME nodes October 6, 2011. History of Indigo projects I Di erent algorithms

Capabilities

I Support of popular data formats:SMILES, SMARTS, Molfile, Rxnfile, SDF, RDF, GZip

I Portability over modern platforms and languages:Linux/Windows/Mac OS X, 32/64 bit, Java/Python/C#

I Outstanding performance:Original algorithms, fast C++ implementation

Mikhail Rybalkin Indigo KNIME nodes October 6, 2011

Page 13: Indigo KNIME nodes · I Indigo toolkit API: C#, Java, Python I Bingo for PostgreSQL Mikhail Rybalkin Indigo KNIME nodes October 6, 2011. History of Indigo projects I Di erent algorithms

Functionality

I Calculation of structure properties:Canonical SMILES, molecular weight, molecular formula

I Rendering of molecules and reactions:SVG, PNG, EMF, PDF, automatic layout, highlighting, ...

I Structure and reaction search:Exact, Substructure, Similarity, SMARTS, fingerprints

I Scaffold detection and R-Group decomposition:MCS of arbitrary amount of input structures

I Reaction atom-to-atom mapping

I Combinatorial chemistry:Stereo transformations, intramolecular and multistep reactions

Mikhail Rybalkin Indigo KNIME nodes October 6, 2011

Page 14: Indigo KNIME nodes · I Indigo toolkit API: C#, Java, Python I Bingo for PostgreSQL Mikhail Rybalkin Indigo KNIME nodes October 6, 2011. History of Indigo projects I Di erent algorithms

Functionality

I Calculation of structure properties:Canonical SMILES, molecular weight, molecular formula

I Rendering of molecules and reactions:SVG, PNG, EMF, PDF, automatic layout, highlighting, ...

I Structure and reaction search:Exact, Substructure, Similarity, SMARTS, fingerprints

I Scaffold detection and R-Group decomposition:MCS of arbitrary amount of input structures

I Reaction atom-to-atom mapping

I Combinatorial chemistry:Stereo transformations, intramolecular and multistep reactions

Mikhail Rybalkin Indigo KNIME nodes October 6, 2011

Page 15: Indigo KNIME nodes · I Indigo toolkit API: C#, Java, Python I Bingo for PostgreSQL Mikhail Rybalkin Indigo KNIME nodes October 6, 2011. History of Indigo projects I Di erent algorithms

Indigo Nodes for KNIME

I First release: May 2011

I Intensive communication with the community

Mikhail Rybalkin Indigo KNIME nodes October 6, 2011

Page 16: Indigo KNIME nodes · I Indigo toolkit API: C#, Java, Python I Bingo for PostgreSQL Mikhail Rybalkin Indigo KNIME nodes October 6, 2011. History of Indigo projects I Di erent algorithms

Substructure search

Mikhail Rybalkin Indigo KNIME nodes October 6, 2011

Page 17: Indigo KNIME nodes · I Indigo toolkit API: C#, Java, Python I Bingo for PostgreSQL Mikhail Rybalkin Indigo KNIME nodes October 6, 2011. History of Indigo projects I Di erent algorithms

Atom-to-atom mapping

1

2 3

4

5 6

O7

8

4 4

1

O7

1

O7

2

3

4

2

3

5

6

1

5

68

23

8

56

8

1

4 2

3

4

OH7

5

6

1

OH78

2

3

5

6

8

Mikhail Rybalkin Indigo KNIME nodes October 6, 2011

Page 18: Indigo KNIME nodes · I Indigo toolkit API: C#, Java, Python I Bingo for PostgreSQL Mikhail Rybalkin Indigo KNIME nodes October 6, 2011. History of Indigo projects I Di erent algorithms

R-Group decomposition

Mikhail Rybalkin Indigo KNIME nodes October 6, 2011

Page 19: Indigo KNIME nodes · I Indigo toolkit API: C#, Java, Python I Bingo for PostgreSQL Mikhail Rybalkin Indigo KNIME nodes October 6, 2011. History of Indigo projects I Di erent algorithms

Combinatorial chemistryI Integrate Legio tool into KNIME

?

Reactionpattern

Monomers Products

Reaction pattern

Monomers

Products

Mikhail Rybalkin Indigo KNIME nodes October 6, 2011

Page 20: Indigo KNIME nodes · I Indigo toolkit API: C#, Java, Python I Bingo for PostgreSQL Mikhail Rybalkin Indigo KNIME nodes October 6, 2011. History of Indigo projects I Di erent algorithms

Technical issues and details

I Cross-platform:

I Linux/Windows/Mac OS X(Solaris is also supported by request)

I 32/64 bit

I Testing:I JenkinsI Python/IronPython/Jython

I C++ core code, plain C API

I Indigo API for Java/C#/PythonSWIG ?

I Hydrogens support.SMARTS: h notation is not supported on purpose.

Mikhail Rybalkin Indigo KNIME nodes October 6, 2011

Page 21: Indigo KNIME nodes · I Indigo toolkit API: C#, Java, Python I Bingo for PostgreSQL Mikhail Rybalkin Indigo KNIME nodes October 6, 2011. History of Indigo projects I Di erent algorithms

Technical issues and details

I Cross-platform:

I Linux/Windows/Mac OS X(Solaris is also supported by request)

I 32/64 bit

I Testing:I JenkinsI Python/IronPython/Jython

I C++ core code, plain C API

I Indigo API for Java/C#/PythonSWIG ?

I Hydrogens support.SMARTS: h notation is not supported on purpose.

Mikhail Rybalkin Indigo KNIME nodes October 6, 2011

Page 22: Indigo KNIME nodes · I Indigo toolkit API: C#, Java, Python I Bingo for PostgreSQL Mikhail Rybalkin Indigo KNIME nodes October 6, 2011. History of Indigo projects I Di erent algorithms

Technical issues and details

I Cross-platform:

I Linux/Windows/Mac OS X(Solaris is also supported by request)

I 32/64 bit

I Testing:I JenkinsI Python/IronPython/Jython

I C++ core code, plain C API

I Indigo API for Java/C#/PythonSWIG ?

I Hydrogens support.SMARTS: h notation is not supported on purpose.

Mikhail Rybalkin Indigo KNIME nodes October 6, 2011

Page 23: Indigo KNIME nodes · I Indigo toolkit API: C#, Java, Python I Bingo for PostgreSQL Mikhail Rybalkin Indigo KNIME nodes October 6, 2011. History of Indigo projects I Di erent algorithms

Design

Core API(C++)

QS_DEF(Molecule, mol); QS_DEF(Array<int>, gross);

MoleculeAutoLoader loader(molfile); loader.loadMolecule(mol);

MoleculeAromatizer::aromatizeBonds(mol); MangoExact::calculateHash(mol, _hash);

MoleculeFingerprintBuilder builder(mol, _context.fp_parameters); builder.process();

_fp.copy(builder.get(), _context.fp_parameters.fingerprintSize()); _fp_sim_bits_count = builder.countBits_Sim(); output.writeBinaryWord((word)_fp_sim_bits_count);

const byte *fp_sim_ptr = builder.getSim(); int fp_sim_size = _context.fp_parameters.fingerprintSizeSim();

// Save gross formula GrossFormula::collect(mol, gross); GrossFormula::toString(gross, _gross_str);

_counted_elems_str.clear();

ArrayOutput ce_output(_counted_elems_str);

for (int i = 0; i < (int)NELEM(counted_elements); i++) ce_output.printf(", %d", gross[counted_elements[i]]);

ce_output.writeByte(0);

// Calculate molecular mass MoleculeMass mass_calulator; mass_calulator.relative_atomic_mass_map = &_context.relative_atomic_mass_map; _molecular_mass = mass_calulator.molecularWeight(mol);

if (outfile_allscafs != 0) { int output = indigoWriteFile(outfile_allscafs); int allscafs = indigoAllScaffolds(scaffold); int item, iter = indigoIterateArray(allscafs);

printf("saving all obtained scaffolds (%d total) to %s\n", indigoArrayCount(allscafs), outfile_allscafs);

while ((item = indigoNext(iter))) { indigoSdfAppend(output, item); indigoFree(item); } indigoFree(iter); indigoFree(output); }

printf("decomposing the structures... "); deco = indigoDecomposeMolecules(scaffold, structures);

if (outfile_hl != 0) { int output = indigoWriteFile(outfile_hl); int item, iter = indigoIterateDecomposedMolecules(deco);

printf("saving the highlighted structures to %s\n", outfile_hl);

while ((item = indigoNext(iter))) { indigoSdfAppend(output, indigoDecomposedMoleculeHighlighted(item)); indigoFree(item); }

indigoFree(iter); indigoFree(output); }

Simplified API(plain C)

zlib

IndigoObject query = indigo.loadSmarts("[NX3;H2,H1;!$(NC=O)]"); for (IndigoObject item : indigo.iterateSDFile("thiazolidines.sdf")) { item.aromatize(); IndigoObject match = indigo.matchSubstructure(query, item);

if (match != null) System.out.println(match.matchHighlight().smiles()); else System.out.println("no match"); }

Java wrapper

C# wrapperIndigoObject rxn = indigo.loadReactionFromFile("test.rxn");

foreach (IndigoObject iter in rxn2.iterateReactants()){ System.Console.WriteLine(iter.smiles()); System.Console.WriteLine(iter.molecularWeight()); System.Console.WriteLine(iter.mostAbundantMass()); System.Console.WriteLine(iter.monoisotopicMass()); System.Console.WriteLine(iter.grossFormula());}

arr = indigo.createArray(); for item in indigo.iterateSDFile("thiazolidines.sdf"): item.aromatize(); item.cisTransClear(); arr.arrayAdd(item);

scaf = indigo.extractCommonScaffold(arr, "exact") deco = indigo.decomposeMolecules(scaf, arr) scaff = deco.decomposedMoleculeScaffold() for item in deco.iterateDecomposedMolecules(): print item.decomposedMoleculeHighlighted().smiles()

Pythonwrapper

RendererC++ plugin

cairo

Javawrapper

C# wrapper

Pythonwrapper

Open for third-party plugins(either C, C++, Java, C#, or Python)

Renderer C API

RenderParams& rp = renderer_self.renderParams

Array<int> mapping; if (obj->getBaseMolecule().isQueryMolecule()) rp.mol.reset(new QueryMolecule()); else rp.mol.reset(new Molecule()); rp.mol->clone(self.getObject(object)-> getBaseMolecule(), &mapping, 0); GraphHighlighting* hl = self.getObject(object)->getMoleculeHighlighting(); if (hl != 0 && hl->numVertices() > 0) { rp.molhl.init(*rp.mol.get()); rp.molhl.copy(*hl, mapping); } rp.rmode = RENDER_MOL; RenderParamInterface::render(rp);

Mikhail Rybalkin Indigo KNIME nodes October 6, 2011

Page 24: Indigo KNIME nodes · I Indigo toolkit API: C#, Java, Python I Bingo for PostgreSQL Mikhail Rybalkin Indigo KNIME nodes October 6, 2011. History of Indigo projects I Di erent algorithms

Indigo API

Indigo i n d i g o = new Indigo ( ) ;

IndigoObject r x n = i n d i g o . c r e a t e R e a c t i o n ( ) ;r x n . addReactant ( mol1 ) ;r x n . addReactant ( mol2 ) ;r x n . addProduct ( i n d i g o . l o a d M o l e c u l e ( ”ClC1CCCCC1” ) ) ;r x n . automap ( ” d i s c a r d ” ) ;

System . out . p r i n t l n ( r x n . s m i l e s ( ) ) ;fo r ( IndigoObject i t em : r x n . i t e r a t e R e a c t a n t s ( ) )

System . out . p r i n t l n ( i tem . m o l f i l e ( ) ) ;

Mikhail Rybalkin Indigo KNIME nodes October 6, 2011

Page 25: Indigo KNIME nodes · I Indigo toolkit API: C#, Java, Python I Bingo for PostgreSQL Mikhail Rybalkin Indigo KNIME nodes October 6, 2011. History of Indigo projects I Di erent algorithms

Plans

1. Add reaction support into KNIME:I automatic atom-to-atom mapping (already available)I substructure and exact seachI combinatorial chemistryI reaction-based transformations

2. Add reaction readers into KNIME core

3. Add other additional nodes (community requests)

4. Multithreading

5. Testing workflows

Mikhail Rybalkin Indigo KNIME nodes October 6, 2011

Page 26: Indigo KNIME nodes · I Indigo toolkit API: C#, Java, Python I Bingo for PostgreSQL Mikhail Rybalkin Indigo KNIME nodes October 6, 2011. History of Indigo projects I Di erent algorithms

Other open-source projects

Ketcher — web-based chemical structure editor

Imago — 2D chemical structure image recognition toolkit

Mikhail Rybalkin Indigo KNIME nodes October 6, 2011

Page 27: Indigo KNIME nodes · I Indigo toolkit API: C#, Java, Python I Bingo for PostgreSQL Mikhail Rybalkin Indigo KNIME nodes October 6, 2011. History of Indigo projects I Di erent algorithms

Acknowledgnents

I Indigo community

I KNIME community

I KNIME Open Source Days organizers

I Dmitry Pavlov

Mikhail Rybalkin Indigo KNIME nodes October 6, 2011

Page 28: Indigo KNIME nodes · I Indigo toolkit API: C#, Java, Python I Bingo for PostgreSQL Mikhail Rybalkin Indigo KNIME nodes October 6, 2011. History of Indigo projects I Di erent algorithms

Community

I Web Site:http://ggasoftware.com/opensource

I Google groups:http://groups.google.com/group/indigo-general

http://groups.google.com/group/indigo-dev

http://groups.google.com/group/indigo-bugs

I Complete source code on GitHub:http://github.com/ggasoftware/indigo

I Chemistry Toolkit Rosetta (code examples):http://ctr.wikia.com

I KNIME Community:http://tech.knime.org/forum/indigo

Mikhail Rybalkin Indigo KNIME nodes October 6, 2011

Page 29: Indigo KNIME nodes · I Indigo toolkit API: C#, Java, Python I Bingo for PostgreSQL Mikhail Rybalkin Indigo KNIME nodes October 6, 2011. History of Indigo projects I Di erent algorithms

Suggestions to KNIME

I Pipes between knodes to avoid intermediate data

I Automatic node conversion to a newer version

I Universal domain-specific language for table processing

Mikhail Rybalkin Indigo KNIME nodes October 6, 2011