What’s new in JChem back-end and Markush storage, search and enumeration
Szabolcs Csepregi
Solutions for Cheminformatics
Contents
• ChemAxon chemical database tools
• Main features of JChem Base, Cartridge
• Example interfaces: JSP, ASP, AJAX examples
• Integration with other CXN products
• Markush structure storage, search and enumeration
• Recent developments, plans
Chemical database products
JChem Base– A library for adding chemical structures into relational
database systems. Available in Java, JSP and .NET– Open-source web application example is available.
JChem Cartridge for Oracle– Extends Oracle SQL with chemical operators and index.– SQL interface for ChemAxon functionality
Instant JChem– An all-in-one desktop chemical database application.
JChem Web Services – SOAP interface to JChem Base
JC4XL – Excel integration (coming)
3
Compatibility and integration
Supported chemical file formats:• SMILES• MDL MOL/RXN/SDF/RDF (v2000 and v3000)• CML, MRV• IUPAC and traditional names• InChI, mol2, PDB, etc.
Database engines:• Oracle, MySQL, MS SQL Server, MS Access,
PostgreSQL, IBM DB2, Derby, etc.
All operating systems through:• Java API (JChem Base)• .NET API (JChem Base + IKVM) – for Windows• SQL (Cartridge)
4
Structure searching: features• Substructure, Similarity,
Full, Full fragment, etc. search types
• Wide range of query atoms
• Query properties
• R-group queries
• Full SMARTS support
• Coordination compounds
• Link nodes
• Pseudo atoms, Lone pairs
• Relative stereo
• Reaction search features
• Polymers
• Position variation
• Hit coloring ...
www.chemaxon.com/conf/Structural_Search.ppt
5
Structure searching: options
Some selected structure search options:– Chemical Terms filter constraint– Tautomer search– Stereo on/off– Ignore charge/isotope/radical/valence/polymers– Vague bond matching modes: „or aromatic”; ignore
bond types– Inverse hit list– Maximum search time / number of hits– SQL SELECT statement for pre-filtering– Ordering of results– etc.
6
Structure search: performance
7
JChem Base 5.2.0,
Intel Quad Q6600 2.4GHz,
8GB RAM; Oracle 10.2.0.3
Number of compounds
Elapsed time
Duplicates not checked
Duplicates checked
10,000 21 s 26 s
100,000 2 min 2 min 36 s
200,000 3 min 45 s 5 min 5 s
Query Number of hits Search time
2 0.81 s
93 0.79 s
5,855 1.457 s
142,950 11.076 s
Compound registration:
Substructure search in PubChem (19.5 million
compounds):
Table typesControl allowed chemical structures and available
operations
• Molecule
• Reaction
• Markush
• Query
• Any structure
8
Example web applications
Open source JSP, ASP examples– Marvin applets
are used for query drawing and structurevisualization
AJAX example– Back-end is JChem
Web Services– No Java is needed
for browsing
Demo
9
Integration
Integration with other ChemAxon tools: – Custom, uniform chemical representation. (Standardizer –
see separate presentation today.)– Automatically calculated properties by Chemical Terms
Calculated columns (Calculator plugins)– Additional similarity calculations (Screen - JChem Base
only) – Tautomer handling:
• Tautomer search
• Tautomer duplicate filter table/index option
• Custom tautomer transforms or canonical tautomer using Standardizer
– Query drawing and structure visualization (Marvin)Provides the most consistent interface and back-end.
10
Integration
Additional Cartridge functionality– JChem index (for non-JChem tables)– Communication with Oracle optimizer– Reaction based enumeration (Reactor)– Format conversions – image generation also– Markush enumeration (Calculator plugins)– Property predictions through Chemical Terms
(Calculator plugins)
11
Registration system
• New component for registration system is under development (API only)
• Main features:– Customizable business logic
• Multilevel duplication control • Customizable corporate registration ID • Handling of salts, batches, lots, samples, and mixtures
– Identification, split and registration of salt and solvent structures Storage of input structures in original format
– Mock registration (dry run)
– Pre-registration through a transitory area
– Basic, customizable implementation examples • Separate examples for chemists and registrars
• Web and Instant JChem interfaces will follow later
12
Handling of Markush structures
Markush structures
• Combinatorial Markush structure registration and search features handled in search and enumeration– R-groups (nesting to any depth)– Atom lists, bond lists– Position variation bond– Link nodes– Repeating units– Homology groups (aryl, alkyl, etc.)
• Built-in• User-defined
• Compatible Markush enumeration plugin
Markush Enumeration
• Markush enumeration plugin– Full enumeration– Selected parts only– Random enumeration– Calculate library size:
exact size of huge Markush libraries
arbitrary precision orMagnitude
– Scaffold alignmentand coloring
– Markush code– Optional example
homology groupenumeration
Markush storage & search
• Available in JChem Base and Instant JChem
• No enumeration involved – can handle very complex Markush structures (tested up to 1040, but no explicit limits were built in.)
• Substructure and Full structure search
• Basic query features supported
• Substructure hit visualization: „Markush structure reduction”
Markush demo
What’s new
What’s new: JChem Base
5.1– Position variation in queries– New fast & reliable tautomer duplicate search
5.2– .NET API– Polymer storage and search– New query options and features including searching of
attached data, group matching of undefined R-atoms, repeating units.
– Improved substructure search performance– JChem Web Services– New metrics for similarity search (Tversky, etc.) (5.2.2)
What’s new: JChem Base
Polymer support details
• Polymer brackets and properties(type, connectivity, etc.) considered during search and registration
• Attached data search (optional) – attached to atoms/bonds/brackets
• Source- and structure-based representation equivalence is checked (but can be switched off)– Addition to a double bond. E.g. polystyrene.– Polymerization through elimination of water or HCl. E.g.
polyester, polyamide.
What’s new: JChem Base
Polymer support details (cont.)
• Ladder type polymers
• Phase-shifting (for ht SRU) (can be switched off)
• End group matching:– * atoms: unspecified end groups– Search option to switch on/off end group matching
• Copolymer types: co, alt, rnd, blk, grf, xl, mer, mod
• Polymer mixtures
• New search options
What’s new: Cartridge-specific
5.1– Tautomer duplicate filtering index option– Alter index option– Improved import speed (5.1.3)– Improved upgrade: no need to remove/recreate indices
(5.1.4)
5.2– Interactive installer– Increased substructure search performance (5.2.2)– Tversky similarity search (5.2.2)
What’s new: Markush
• New Features– Homology groups
• 19 built-in groups• Customizable:
– Examples (for built-in groups, enumeration only),
– Full user-defined homology groupsdefined by R-group definition
• Marvin templates for easier sketching
– Import reagent files as R-groups– Position variation and Repeating units
Plans
Plans: JChem Base & Cartridge
JChem Base
• Further speed improvements (SSS, similarity)
• New vague bond level options
• R-group decomposition integration
• Improved support for Screen molecular descriptors
Cartridge
• Screen molecular descriptors (BCUT, pharmacophore similarity, chemical hashed fp, etc) and metrics (Euclidean, Dice, etc.) for similarity search
• User-defined descriptor fingerprints
• Markush tables and search
• JChem Server, JChem cluster
Plans: Markush
– .VMN import (format used by Merged Markush Service & Derwent World Patent Index)
– Multiple graphical attachment points of R-groups– Homology variation queries– Overlap analysis of Markush structures– Homology group properties (# of atoms, branching points,
# of heteroatoms, etc.)– Conditions for Markush variables
Summary
• JChem Base and Cartridge are comprehensive and efficient
• Markush structure storage, search and enumeration now reaching patent features coverage
• Continuous development, improvements in the pipeline
Find out more
• Product descriptions & linkswww.chemaxon.com/products.html
• Forumwww.chemaxon.com/forum
• Presentations and posterswww.chemaxon.com/conf
• Download
www.chemaxon.com/download.html