vincent robert

32
CBS-KNAW & IMA . www.mycobank.org MYCOBANK AND ASSOCIATED DATABASES AS WORKING TOOLS FOR TAXONOMISTS: OPPORTUNITIES AND CHALLENGES Vincent Robert CBS-KNAW, Utrecht, The Netherlands, [email protected]

Upload: buikhuong

Post on 11-Jan-2017

232 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Vincent Robert

CBS-KNAW & IMA . www.mycobank.org

MYCOBANK AND ASSOCIATED DATABASES AS WORKING TOOLS FOR TAXONOMISTS: OPPORTUNITIES AND CHALLENGES

Vincent Robert

CBS-KNAW, Utrecht, The Netherlands, [email protected]

Page 2: Vincent Robert

CBS-KNAW & IMA . www.mycobank.org

HISTORY & FACTS •  Started in 2004.

P.W. Crous, W. Gams, J.A. Stalpers, V. Robert and G. Stegehuis. 2004. MycoBank: an online initiative to launch mycology into the 21st century. Studies in Mycology 50: 19–22

•  Originally based on a CBS Firebird database (J. Stalpers and G. Stegehuis) and an old version of BioloMICS software (V. Robert)

•  Started at CBS-KNAW but later transferred to International Mycological Association (IMA)

•  IMA is now the owner of MycoBank •  Created to centralize deposit of new fungal taxa

(species, genera, families, etc)

Page 3: Vincent Robert

CBS-KNAW & IMA . www.mycobank.org

HISTORY & FACTS •  Created to know what has been published until now and

provide an exhaustive and centralized list of fungal species

•  More and more data associated with taxa have been added to MycoBank

•  MB numbers required by a majority of journals publishing new taxa

•  New version of MycoBank released in April 2012 with a number of new features: 1.  Complete integration with BioloMICS software allows

extensive developments of new features 2.  Simplified basic and advanced search engine 3.  Simplified registration of new taxa

Page 4: Vincent Robert

CBS-KNAW & IMA . www.mycobank.org

HISTORY & FACTS •  New version of MycoBank released in April 2012 with a

number of new features: 1.  Complete integration with BioloMICS software allows

extensive developments of new features 2.  Simplified basic and advanced search engine 3.  Simplified registration of new taxa 4.  Link out to many websites:

1. DBs containing fungi:

a. Catalogue of Life (CoL) b. Encyclopedia of Life (EOL) c. Global Biodiversity Information Facility (GBIF) d. Index Fungorum (IF) e. Integrated Taxonomic Information System (ITIS)

2. Bibliography links :

a. Google Scholar b. PubMed c. Libri Fungorum d. Biblioteca Digital e. Biodiversity Heritage Librar

3. General links :

a. Google b. Wikimedia c. Wikipedia d. Wikispecies

4. Molecular links :

a. BOLD Systems b. EMBL c. NCBI

5. Specimens and strains links :

a. All Russian Collection of Microorganisms (VKM) b. CBS collection c. StrainInfo d. WDCM will be made available as soon as ready

Page 5: Vincent Robert

CBS-KNAW & IMA . www.mycobank.org

Stage 1

Aspergillus 341

Penicillium 430

Genbank +++

Mycosphaerella 117

Colletotrichum 452

Calonectria 160

Phytophtora 77

MLST Fusarium

1365

Phoma 309

CBS 26098

Clinical ITS 488

FunBOL 4366

BOLD

Yeasts 1365

Indoor molds 1646

Pasteur -+5000

CDC

Ceratocystis 92

Monilinia 54

Stenocarplella 32

Russula 608

Dermatophytes 378

Phaeoacremonium 30

Medical 315

History & facts Distributed Fungal Databases

Page 6: Vincent Robert

CBS-KNAW & IMA . www.mycobank.org

HISTORY & FACTS •  New version of MycoBank released in April 2012 with a

number of new features: 1.  Complete integration with BioloMICS software allows

extensive developments of new features 2.  Simplified basic and advanced search engine 3.  Simplified registration of new taxa 4.  Link out to many websites 5.  Pairwise sequence alignments against a large number of

websites including CBS and GenBank (BOLD soon as well)

6.  Polyphasic identifications for a number of taxonomic groups

Page 7: Vincent Robert

CBS-KNAW & IMA . www.mycobank.org

HISTORY & FACTS •  New version of MycoBank released in April 2012 with a

number of new features: 7.  Multilingual:

1.  English, French, Chinese, German and Arabic 2.  Spanish, Portuguese and Dutch are ready and will be available in

November 3.  Russian, Indonesian, Thai and Japanese coming soon 4.  More if volunteers are ready to translate the system in their own

languages

Page 8: Vincent Robert

CBS-KNAW & IMA . www.mycobank.org

HISTORY & FACTS •  New version of MycoBank released in April 2012 with a

number of new features: 7.  Multilingual 8.  Remote curation of MycoBank allows curators from all

around the world to manage the system, Will be effectively available in November 2012 with a simplified version of the database/software.

Page 9: Vincent Robert

CBS-KNAW & IMA . www.mycobank.org

MULTIPLE REPOSITORIES SYNCHRONIZATION

<Taxon> <ID>1213321</ID> <MycoBank_Number>800624</MycoBank_Number> <Taxon_Name>Candida mycobankii</Taxon_Name> <Authors>J. Smith</Authors> <Rank>Species</Rank> <Higher_Rank_ID>4569</Higher_Rank_ID> <Higher_Rank_Name>Candida</Higher_Rank_Name> <Status>Legitimate</Rank> <Published>True</Published> … </Taxon> <Taxon> <ID>1213322</ID> <MycoBank_Number>800625</MycoBank_Number> <Taxon_Name>Candida bertreensis</Taxon_Name> <Authors>J. Smith</Authors> <Rank>Species</Rank> <Higher_Rank_ID>4569</Higher_Rank_ID> <Higher_Rank_Name>Candida</Higher_Rank_Name> <Status>Legitimate</Rank> <Published>True</Published> … </Taxon>

Page 10: Vincent Robert

CBS-KNAW & IMA . www.mycobank.org

REMOTE MYCOBANK

Page 11: Vincent Robert

CBS-KNAW & IMA . www.mycobank.org

DISCUSSION PANELS & ANNOTATIONS

Page 12: Vincent Robert

CBS-KNAW & IMA . www.mycobank.org

STATISTICS

•  Number of registered users 2007 and 2008: 539, 869 •  Number of registered users today: 4053* •  Number of records: 463111* •  Number of species: 365132* •  Number of genera: 17722*

*Statistics are based on data present in MycoBank on May 2012 **2012 results, extrapolation based on the first 5 months

0 1000 2000 3000 4000 5000

2004 2005 2006 2007 2008 2009 2010 2011 2012 **

Page 13: Vincent Robert

CBS-KNAW & IMA . www.mycobank.org

0

10

20

30

40

50

60

70

80

90

100

2006 2007 2008 2009 2010

Percentage of new species having at least one associated sequence 28S rDNA 18S rDNA

ITS rDNA Actin gene

Elongation factor Any

RPB1 RPB2

ATP6

20.7 20.43 21.04

2.65 4.79

28.64

0.74 2.63 0.08 0

10 20 30 40 50 60 70 80 90

100

28S rDNA 18S rDNA ITS rDNA Actin gene Elongation factor

Any RPB1 RPB2 ATP6

Percentages of new species published between 2006 and 2010 having sequences

STATISTICS

Page 14: Vincent Robert

CBS-KNAW & IMA . www.mycobank.org

New sp.

New sp.

Force/suggest/help sequencing ?

Specialized strains and species databases

iBOL

Others

Genbank

Mycobank

HOW TO IMPROVE STATISTICS ?

Page 15: Vincent Robert

CBS-KNAW & IMA . www.mycobank.org

All ascomycetous yeasts 830 species

ITS1+5.8S+ITS2 & LSU Pairwise sequence alignments

Page 16: Vincent Robert

CBS-KNAW & IMA . www.mycobank.org

Page 17: Vincent Robert

CBS-KNAW & IMA . www.mycobank.org

Page 18: Vincent Robert

CBS-KNAW & IMA . www.mycobank.org

0

10

20

30

40

50

60

70

80

90

1923

19

29

1931

19

33

1936

19

39

1941

19

43

1948

19

51

1953

19

55

1957

19

59

1961

19

63

1965

19

67

1969

19

71

1973

19

75

1977

19

79

1981

19

83

1985

19

87

1989

19

91

1993

19

95

1997

19

99

2001

20

03

2005

20

07

2009

20

11

Publications of Candida species

C. Ramírez & A.E. González 1984 Mycopathologia. Many new species

S.A. Meyer & Yarrow 1978 Int. J. Syst. Bacteriol. Almost all new combinations beased on physiology

Page 19: Vincent Robert

CBS-KNAW & IMA . www.mycobank.org

Page 20: Vincent Robert

CBS-KNAW & IMA . www.mycobank.org

ITS RESULTS

ITS on Venice (Italy) museum specimens 1999 specimens 1000 species

Page 21: Vincent Robert

CBS-KNAW & IMA . www.mycobank.org

ITS on Venice (Italy) museum specimens Genera Families

ITS RESULTS

Page 22: Vincent Robert

CBS-KNAW & IMA . www.mycobank.org

THERMOREGULATION

Page 23: Vincent Robert

CBS-KNAW & IMA . www.mycobank.org

RESOLUTION LEVEL

Kingdom Division Class Order Family Genus Species Strain Gene/Function/Time/Space

n

Page 24: Vincent Robert

CBS-KNAW & IMA . www.mycobank.org

Future developments Microbiome analysis, metagenomics

Species 1 59%

Species 2 23%

Species 3 10%

Species 4 8%

NGS results Time 1

Mix samples

NGS Genome

0

10

20

30

40

50

60

70

Time 1 Time 2 Time 3 Time 4

Species 1

Species 2

Species 3

Species 4

Species 5

Page 25: Vincent Robert

CBS-KNAW & IMA . www.mycobank.org

Requires a similarity matrix between all sequences

MULTILEVEL CLUSTERING

Page 26: Vincent Robert

CBS-KNAW & IMA . www.mycobank.org

MULTILEVEL CLUSTERING

•  Only compare representative sequences of the species

Page 27: Vincent Robert

CBS-KNAW & IMA . www.mycobank.org

(on one 64bit CPU dual core computer with 8 GB RAM)

Dataset Tools Time Quality

91 families of 866 sequences Transitivity clustering seconds 0.9281

91 families of 866 sequences Multilevel clustering seconds 0.9378

1259 species of 4412 ITS sequences Transitivity clustering 46m75s 0.855

1259 species of 4412 ITS sequences Multilevel clustering 3m18s 0.861

475159 ITS sequences Transitivity clustering 156 days (assuming no memory constraints)

475159 ITS sequences Multilevel clustering 38hours

MULTILEVEL CLUSTERING

Page 28: Vincent Robert

CBS-KNAW & IMA . www.mycobank.org

MULTILEVEL CLUSTERING

Page 29: Vincent Robert

CBS-KNAW & IMA . www.mycobank.org

Future developments Database scaling needed

Page 30: Vincent Robert

CBS-KNAW & IMA . www.mycobank.org

Future developments Database scaling needed NOSQL

Bioinformaticians

IT & Software developers

Page 31: Vincent Robert

CBS-KNAW & IMA . www.mycobank.org

THE TEAM

V. Robert D. Vu (G. Stegehuis) N. van de Wiele

A. Ben Hadj Amor S. Szoke B. Jabas E. Blom O. Chouchen M. Jaidane S. Ben Daoud

Utrecht, The Netherlands

Sousse, Tunisia Hannut, Belgium

F. Borges dos Santos Lea Vaas

MycoBank curation, Utrecht, The Netherlands

Joost Stalpers Arthur de Cock

Page 32: Vincent Robert

CBS-KNAW & IMA . www.mycobank.org

Conrad L. Schoch1, Keith A. Seifert2, Sabine Huhndorf3, Vincent Robert4, John L. Spouge1, Elena Bolchacova5, Kerstin Voigt6, Wen Chen2, Andrew N. Miller7, Michael J. Wingfield8, M. Catherine Aime9, Kwang-Deuk An10, Feng-Yan Bai11, Robert W. Barreto12, Dominik Begerow13, Marie-Josée Bergeron14, Meredith Blackwell15, Teun Boekhout4, Mesfin Bogale16, Nattawut Boonyuen17, Ana R. Burgaz18, Bart Buyck19, Lei Cai11, G. Cardinali20, Priscilla Chaverri21, Brian J. Coppins22, Ana Crespo18, Pedro W. Crous4, Paloma Cuibas18, Ulrike Damm4, Z. Wilhelm De Beer8, G. Sybren De Hoog4, Ruth Del-Prado18, Bryn Dentinger23,56, Javier Diéguez-Uribeondo24, Pradeep K. Divakar18, Brian Douglas25, Margarita Dueñas24, Duong Vu4, Tuan A. Duong26, Ursula Eberhardt4, Mostafa S. Elshahed27, Katerina Fliegerova28, Miguel A. García24, Zai-Wei Ge29, Gareth W. Griffith25, K. Griffiths30, Johannes Z. Groenewald4, Marizeth Groenewald4, Martin Grube31, Marieka Gryzenhout32, Liang-Dong Guo11, Ferry Hagen4, Sarah Hambleton2, Richard C. Hamelin14, Karen Hansen33, Paul Harrold22, G. Heller9, Kazuyuki Hirayama34, Hsiao-Man Ho35, Kerstin Hoffman6, Valérie Hofstetter36, Filip Högnabba37, Peter M. Hollingsworth22, Seung-Beom Hong38, Kentaro Hosaka39, Jos Houbraken4, Karen Hughes40, Seppo Huhtinen41, Kevin D. Hyde43,44, Timothy James44, Peter R. Johnston45, E.B. Gareth Jones17, Laura J. Kelly22,23, Paul M. Kirk46, Dániel G. Knapp47, Urmas Kõljalg48, Gábor M. Kovács47, Cletus P. Kurtzman49, Sara Landvik50, Steven D. Leavitt3, André Levesque2, Audra S. Liggenstoffer27, Kare Liimatainen51, Lorenzo Lombard4, J. Jennifer Luangsa-ard52, H. Thorsten Lumbsch3, Harinad Maganti53, Sajeewa S. N. Maharachchikumbura42, María P. Martin24, Tom W. May30, Alistair R. McTaggert9, Andrew S. Methven54, Wieland Meyer55, Jean-Marc Moncalvo56, S. Mongkolsamrit52, László G. Nagy57, R. Henrik Nilsson58, Tuula Niskanen 51, ldikó Nyilasi57, Gen Okada10, Izumi Okane59, Ibai Olariaga60, J. Otte61, Tamás Papp57, Duckhul Park45, Tamás Petkovits57, Raquel Pino-Bodas24, C. Qing29, Willem Quaedvlieg4, Huzefa A. Raja62, Dirk Redecker63, Tara Rintoul2, Constantino Ruibal18, J.M. Sarmiento-Ramírez24, Imke Schmitt61,64, Arthur Schüßler65, Carol Shearer66, Kozue Sotome67, Frank O.P. Stefani30, Soili Stenroos37, Herbert Stockinger63, Satinee Suetrong17, Sung-Oui Suh68, Gi-Ho Sung38, Motofumi Suzuki10, Kazuaki Tanaka34, Leho Tedersoo69, M. Teresa Telleria24, Eric Tretter70, Wendy A. Untereiner16, Hector Urbina15, Csaba Vágvölgyi57, Agathe Vialle14, Grit Walther4, Qi-Ming Wang11, Bevan S. Weir45, Michael Weiß71, Merlin M. White70, Jianping Xu53, Rebecca Yahr22, Z.-L. Yang29, Andrey Yurkov13, Juan-Carlos Zamora24, Ning Zhang72, Wen-Ying Zhuang11 and David Schindel, Szaniszlo Szoke, Bernard Jabas, Ammar Amor, Oussema Chouchen, Samy Ben Daoud, Ahemd Dridi, Samrock, Matteo Garbelotto, David Yarrow, Joost Stalpers, Gerrit Stegehuis, Nathalie van de Wiele, Donald Hobern, Matt Branford , Frank Bisby, Peter Bonants, Harm Huttinga, Sloan Foundation, Embarc project, EU grants QBOL, FES project, NCB Project, i4Life project, CBS, My wife and kids

10 Thanks to all mycologists, developers, EU, NCB, FES, Embarc,

Sloan Foundation, QBOL, q-Bank, & iBOL projects