g09-misc-emboss

Post on 28-Dec-2014

546 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

EMBOSS: New developments and extended data access (Peter Rice)

TRANSCRIPT

EBI is an Outstation of the European Molecular Biology Laboratory.

EMBOSS

European Molecular Biology Open Software Suite

Open-Bio Project Update 2011

Peter Rice pmr@ebi.ac.uk

Alan Bleasby, Jon Ison,

Mahmut Uludag, Michael Schuster

BOSC 2011: EMBOSS10.04.232

A quick introduction

• Open source package for sequence analysis• ANSI C source code• GPL licensed applications, LGPL libraries• 275+ applications• 150+ third party applications in 15 associated packages

• MIRA, MEME, HMMER, PHYLIP, VIENNA, etc.• Project started 1996 at Sanger and Daresbury/HGMP• Now based at EBI• Release 1.0.0 15th July 2000• Release 6.4.0 15th July 2011• Funded by UK-BBSRC and EMBL-EBI• Originally funded by the Wellcome Trust• Additional funds from UK-MRC

BOSC 2011: EMBOSS10.04.233

Who do we serve?

• Expert software developers• Bioinformaticians• Computer scientists

• Expert users• Biology research community• Industry

• Scientific users• Biology research community• Industry

BOSC 2011: EMBOSS10.04.234

EMBOSS command line interface

• EMBOSS applications run from the command line• This is not the only interface

• There are over 100 interfaces and packaged systems available• Web: wEMBOSS, Mobyle• GUI: Jemboss• Web Services: SoapLab• Workflows: Galaxy, Taverna, Pipeline Pilot• Windows: mEMBOSS

• All applications have a command definition file (.acd)• Defines all inputs, outputs, and other options• Read at startup• Contains all command line options with descriptions• Template for any other interface

BOSC 2011: EMBOSS10.04.235

EMBOSS Update

• Release 6.4.0 as usual on 15th July 2011• New Website emboss.open-bio.org• Three open source books: users, developers, admin

• Cambridge University Press

BOSC 2011: EMBOSS10.04.236

Data sources for EMBOSS

• Server definitions• One server, 100+ databases• server:dbname as the database name

• Data access methods• Ensembl, DAS, BioMart, CHADO,SRS, Entrez, MRS• EBI REST and SOAP services• Data resource Catalogue (DRCAT)

• emboss.standard file for all installations• IF-ELSE-ENDIF to customize for SQL, AXIS2C, local setup

• New applications• showserver, dbtell, servertell

BOSC 2011: EMBOSS10.04.237

New data types: input and output

• OBO ontology terms• NCBI Taxonomy• Data Resource Catalogue entries• Text• URL

• Cross-references:• dbname and identifier• data content

BOSC 2011: EMBOSS10.04.238

New query language

• SRS-like syntax• id lists: dbname:{ida,idb,idc}• or operator: dbname-{id:h* | des:hemoglobin}• and operator: dbname-{id:h* & des:hemoglobin}• and operator: dbname-{id:h* & des:hemoglobin}• eor operator: dbname-{id:h* ^ des:hemoglobin}

• Compressed (20-fold) b+tree indexes• New indexing applications (obo, taxon, drcat)

BOSC 2011: EMBOSS10.04.239

EDAM ontology

• EDAM defines topic, operation, data, format, identifier• ACD file application, inputs, outputs, parameters• DRCAT resources, queries, identifiers• SoapLab web services• Redefined EMBOSS program groups.

• OBO format ontology• 2835 terms• Available throughout EMBOSS as database EDAM:

• New applications• EDAM namespace searches, relation queries• OBO ontology applications• GO, SO, and other OBO ontologies in release

BOSC 2011: EMBOSS10.04.2310

DRCAT Data Resource Catalogue

• Public Data Resources• EDAM annotations• UniProt and EMBL/GenBank/DDBJ cross-references• Query prototypes• Example identifiers for testing• 662 entries• Available in EMBOSS as database DRCAT:

• Applications:• Search by EDAM annotation• Search by 18 indexed fields

BOSC 2011: EMBOSS10.04.2311

Ontologies: NCBI Taxonomy

• Parsers for “.dmp” files• Indexed by dbxtax• Navigation up, down, siblings (the usual suspects)• Automatic cross references from sequence data

• EMBL source line• UniProt OX lines• BioMart mart name (organism name)• etc.

• New applications• Search and retrieve from taxon hierarchy

BOSC 2011: EMBOSS10.04.2312

Installation

• Release size increased• EDAM, DRCAT, NCBI Taxonomy, GO, plus index files• Associated packages

• AXIS2C (SOAP web service access)• MYSQL (Ensembl)• PostgresQL (FlyBase)

• mEMBOSS for Windows• Enhanced QA testing

• Standard test set adapted for use on Windows and Unix

BOSC 2011: EMBOSS10.04.2313

EMBOSS Interfaces and wrappers

• Two releases in this year• Too many for other projects to keep up

• So we are obliged to help, starting with:• SoapLab2• Jemboss• Galaxy• Mobyle• … and anyone else who asks

• Interface generation should be automated• Tested during development• Changes highlighted before release

BOSC 2011: EMBOSS10.04.2314

EMBOSS Future Plans

• Further development this year• Mapped short reads• Reference sequences• Sequence variation• Genome browser data format support

• Leaving EBI in December

• … into the unknown

• …still supporting EMBOSS and planning new developments

BOSC 2011: EMBOSS10.04.2315

Peter RiceAlan Bleasby

Jon Ison Mahmut Uludag

The Emboss Team

Michael Schuster

BOSC 2011: EMBOSS10.04.2316

Acknowledgements

• EBI: Peter Rice, Alan Bleasby, Jon Ison, Mahmut Uludag, Michael Schuster, Martin Senger, Tom Oinn, Jaina Mistry, Rodrigo Lopez, Sharmilla Pillai, Hamish McWilliam, Syed Haider

• RFCGR/HGMP: Alan Bleasby, Jon Ison, Tim Carver, Hugh Morgan, Claude Beazley, Lisa Mullan, Damian Counsell, Gary Williams, Val Curwen, Mark Faller, Sinead O’Leary, Thon deBoer, Martin Bishop

• LION: Thomas Laurent, Bijay Jassal, Bren Vaughan, Thure Etzold

• Sanger Institute: Ian Longden, Richard Bruskiewich, Simon Kelley

• National bioinformatics service providers in: Norway, Spain, Italy, Netherlands, Germany, Belgium, Russia, China, Canada, Australia, Argentina

• Others: Catherine Letondal, Don Gilbert, Rodger Staden, Bill Pearson, Webb Miller, Marie-Laetitia Denayer, Amandine Schurmann, Gabriele Weiler, Luke McCarthy, David Mathog, David Bauer, Henrikki Almusa, Thomas Siegmund, Scott Markel, Darryl Leon, Bastien Chevreux, Ivo Hofacker, Kristoffer Rapacki, Matus Kalas

• Cambridge University Press, LION bioscience, IBM, Hewlett-Packard, (Compaq), Apple, SGI, Sun, SciTegic, Microsoft Research

• Open-Bio Foundation, Sourceforge, ... And the British Antarctic Survey

http://emboss.open-bio.org

http://emboss.open-bio.org/wiki

top related