vincent robert
TRANSCRIPT
CBS-KNAW & IMA . www.mycobank.org
MYCOBANK AND ASSOCIATED DATABASES AS WORKING TOOLS FOR TAXONOMISTS: OPPORTUNITIES AND CHALLENGES
Vincent Robert
CBS-KNAW, Utrecht, The Netherlands, [email protected]
CBS-KNAW & IMA . www.mycobank.org
HISTORY & FACTS • Started in 2004.
P.W. Crous, W. Gams, J.A. Stalpers, V. Robert and G. Stegehuis. 2004. MycoBank: an online initiative to launch mycology into the 21st century. Studies in Mycology 50: 19–22
• Originally based on a CBS Firebird database (J. Stalpers and G. Stegehuis) and an old version of BioloMICS software (V. Robert)
• Started at CBS-KNAW but later transferred to International Mycological Association (IMA)
• IMA is now the owner of MycoBank • Created to centralize deposit of new fungal taxa
(species, genera, families, etc)
CBS-KNAW & IMA . www.mycobank.org
HISTORY & FACTS • Created to know what has been published until now and
provide an exhaustive and centralized list of fungal species
• More and more data associated with taxa have been added to MycoBank
• MB numbers required by a majority of journals publishing new taxa
• New version of MycoBank released in April 2012 with a number of new features: 1. Complete integration with BioloMICS software allows
extensive developments of new features 2. Simplified basic and advanced search engine 3. Simplified registration of new taxa
CBS-KNAW & IMA . www.mycobank.org
HISTORY & FACTS • New version of MycoBank released in April 2012 with a
number of new features: 1. Complete integration with BioloMICS software allows
extensive developments of new features 2. Simplified basic and advanced search engine 3. Simplified registration of new taxa 4. Link out to many websites:
1. DBs containing fungi:
a. Catalogue of Life (CoL) b. Encyclopedia of Life (EOL) c. Global Biodiversity Information Facility (GBIF) d. Index Fungorum (IF) e. Integrated Taxonomic Information System (ITIS)
2. Bibliography links :
a. Google Scholar b. PubMed c. Libri Fungorum d. Biblioteca Digital e. Biodiversity Heritage Librar
3. General links :
a. Google b. Wikimedia c. Wikipedia d. Wikispecies
4. Molecular links :
a. BOLD Systems b. EMBL c. NCBI
5. Specimens and strains links :
a. All Russian Collection of Microorganisms (VKM) b. CBS collection c. StrainInfo d. WDCM will be made available as soon as ready
CBS-KNAW & IMA . www.mycobank.org
Stage 1
Aspergillus 341
Penicillium 430
Genbank +++
Mycosphaerella 117
Colletotrichum 452
Calonectria 160
Phytophtora 77
MLST Fusarium
1365
Phoma 309
CBS 26098
Clinical ITS 488
FunBOL 4366
BOLD
Yeasts 1365
Indoor molds 1646
Pasteur -+5000
CDC
Ceratocystis 92
Monilinia 54
Stenocarplella 32
Russula 608
Dermatophytes 378
Phaeoacremonium 30
Medical 315
History & facts Distributed Fungal Databases
CBS-KNAW & IMA . www.mycobank.org
HISTORY & FACTS • New version of MycoBank released in April 2012 with a
number of new features: 1. Complete integration with BioloMICS software allows
extensive developments of new features 2. Simplified basic and advanced search engine 3. Simplified registration of new taxa 4. Link out to many websites 5. Pairwise sequence alignments against a large number of
websites including CBS and GenBank (BOLD soon as well)
6. Polyphasic identifications for a number of taxonomic groups
CBS-KNAW & IMA . www.mycobank.org
HISTORY & FACTS • New version of MycoBank released in April 2012 with a
number of new features: 7. Multilingual:
1. English, French, Chinese, German and Arabic 2. Spanish, Portuguese and Dutch are ready and will be available in
November 3. Russian, Indonesian, Thai and Japanese coming soon 4. More if volunteers are ready to translate the system in their own
languages
CBS-KNAW & IMA . www.mycobank.org
HISTORY & FACTS • New version of MycoBank released in April 2012 with a
number of new features: 7. Multilingual 8. Remote curation of MycoBank allows curators from all
around the world to manage the system, Will be effectively available in November 2012 with a simplified version of the database/software.
CBS-KNAW & IMA . www.mycobank.org
MULTIPLE REPOSITORIES SYNCHRONIZATION
<Taxon> <ID>1213321</ID> <MycoBank_Number>800624</MycoBank_Number> <Taxon_Name>Candida mycobankii</Taxon_Name> <Authors>J. Smith</Authors> <Rank>Species</Rank> <Higher_Rank_ID>4569</Higher_Rank_ID> <Higher_Rank_Name>Candida</Higher_Rank_Name> <Status>Legitimate</Rank> <Published>True</Published> … </Taxon> <Taxon> <ID>1213322</ID> <MycoBank_Number>800625</MycoBank_Number> <Taxon_Name>Candida bertreensis</Taxon_Name> <Authors>J. Smith</Authors> <Rank>Species</Rank> <Higher_Rank_ID>4569</Higher_Rank_ID> <Higher_Rank_Name>Candida</Higher_Rank_Name> <Status>Legitimate</Rank> <Published>True</Published> … </Taxon>
CBS-KNAW & IMA . www.mycobank.org
REMOTE MYCOBANK
CBS-KNAW & IMA . www.mycobank.org
DISCUSSION PANELS & ANNOTATIONS
CBS-KNAW & IMA . www.mycobank.org
STATISTICS
• Number of registered users 2007 and 2008: 539, 869 • Number of registered users today: 4053* • Number of records: 463111* • Number of species: 365132* • Number of genera: 17722*
*Statistics are based on data present in MycoBank on May 2012 **2012 results, extrapolation based on the first 5 months
0 1000 2000 3000 4000 5000
2004 2005 2006 2007 2008 2009 2010 2011 2012 **
CBS-KNAW & IMA . www.mycobank.org
0
10
20
30
40
50
60
70
80
90
100
2006 2007 2008 2009 2010
Percentage of new species having at least one associated sequence 28S rDNA 18S rDNA
ITS rDNA Actin gene
Elongation factor Any
RPB1 RPB2
ATP6
20.7 20.43 21.04
2.65 4.79
28.64
0.74 2.63 0.08 0
10 20 30 40 50 60 70 80 90
100
28S rDNA 18S rDNA ITS rDNA Actin gene Elongation factor
Any RPB1 RPB2 ATP6
Percentages of new species published between 2006 and 2010 having sequences
STATISTICS
CBS-KNAW & IMA . www.mycobank.org
New sp.
New sp.
Force/suggest/help sequencing ?
Specialized strains and species databases
iBOL
Others
Genbank
Mycobank
HOW TO IMPROVE STATISTICS ?
CBS-KNAW & IMA . www.mycobank.org
All ascomycetous yeasts 830 species
ITS1+5.8S+ITS2 & LSU Pairwise sequence alignments
CBS-KNAW & IMA . www.mycobank.org
CBS-KNAW & IMA . www.mycobank.org
CBS-KNAW & IMA . www.mycobank.org
0
10
20
30
40
50
60
70
80
90
1923
19
29
1931
19
33
1936
19
39
1941
19
43
1948
19
51
1953
19
55
1957
19
59
1961
19
63
1965
19
67
1969
19
71
1973
19
75
1977
19
79
1981
19
83
1985
19
87
1989
19
91
1993
19
95
1997
19
99
2001
20
03
2005
20
07
2009
20
11
Publications of Candida species
C. Ramírez & A.E. González 1984 Mycopathologia. Many new species
S.A. Meyer & Yarrow 1978 Int. J. Syst. Bacteriol. Almost all new combinations beased on physiology
CBS-KNAW & IMA . www.mycobank.org
CBS-KNAW & IMA . www.mycobank.org
ITS RESULTS
ITS on Venice (Italy) museum specimens 1999 specimens 1000 species
CBS-KNAW & IMA . www.mycobank.org
ITS on Venice (Italy) museum specimens Genera Families
ITS RESULTS
CBS-KNAW & IMA . www.mycobank.org
THERMOREGULATION
CBS-KNAW & IMA . www.mycobank.org
RESOLUTION LEVEL
Kingdom Division Class Order Family Genus Species Strain Gene/Function/Time/Space
n
CBS-KNAW & IMA . www.mycobank.org
Future developments Microbiome analysis, metagenomics
Species 1 59%
Species 2 23%
Species 3 10%
Species 4 8%
NGS results Time 1
Mix samples
NGS Genome
0
10
20
30
40
50
60
70
Time 1 Time 2 Time 3 Time 4
Species 1
Species 2
Species 3
Species 4
Species 5
CBS-KNAW & IMA . www.mycobank.org
Requires a similarity matrix between all sequences
MULTILEVEL CLUSTERING
CBS-KNAW & IMA . www.mycobank.org
MULTILEVEL CLUSTERING
• Only compare representative sequences of the species
CBS-KNAW & IMA . www.mycobank.org
(on one 64bit CPU dual core computer with 8 GB RAM)
Dataset Tools Time Quality
91 families of 866 sequences Transitivity clustering seconds 0.9281
91 families of 866 sequences Multilevel clustering seconds 0.9378
1259 species of 4412 ITS sequences Transitivity clustering 46m75s 0.855
1259 species of 4412 ITS sequences Multilevel clustering 3m18s 0.861
475159 ITS sequences Transitivity clustering 156 days (assuming no memory constraints)
475159 ITS sequences Multilevel clustering 38hours
MULTILEVEL CLUSTERING
CBS-KNAW & IMA . www.mycobank.org
MULTILEVEL CLUSTERING
CBS-KNAW & IMA . www.mycobank.org
Future developments Database scaling needed
CBS-KNAW & IMA . www.mycobank.org
Future developments Database scaling needed NOSQL
Bioinformaticians
IT & Software developers
CBS-KNAW & IMA . www.mycobank.org
THE TEAM
V. Robert D. Vu (G. Stegehuis) N. van de Wiele
A. Ben Hadj Amor S. Szoke B. Jabas E. Blom O. Chouchen M. Jaidane S. Ben Daoud
Utrecht, The Netherlands
Sousse, Tunisia Hannut, Belgium
F. Borges dos Santos Lea Vaas
MycoBank curation, Utrecht, The Netherlands
Joost Stalpers Arthur de Cock
CBS-KNAW & IMA . www.mycobank.org
Conrad L. Schoch1, Keith A. Seifert2, Sabine Huhndorf3, Vincent Robert4, John L. Spouge1, Elena Bolchacova5, Kerstin Voigt6, Wen Chen2, Andrew N. Miller7, Michael J. Wingfield8, M. Catherine Aime9, Kwang-Deuk An10, Feng-Yan Bai11, Robert W. Barreto12, Dominik Begerow13, Marie-Josée Bergeron14, Meredith Blackwell15, Teun Boekhout4, Mesfin Bogale16, Nattawut Boonyuen17, Ana R. Burgaz18, Bart Buyck19, Lei Cai11, G. Cardinali20, Priscilla Chaverri21, Brian J. Coppins22, Ana Crespo18, Pedro W. Crous4, Paloma Cuibas18, Ulrike Damm4, Z. Wilhelm De Beer8, G. Sybren De Hoog4, Ruth Del-Prado18, Bryn Dentinger23,56, Javier Diéguez-Uribeondo24, Pradeep K. Divakar18, Brian Douglas25, Margarita Dueñas24, Duong Vu4, Tuan A. Duong26, Ursula Eberhardt4, Mostafa S. Elshahed27, Katerina Fliegerova28, Miguel A. García24, Zai-Wei Ge29, Gareth W. Griffith25, K. Griffiths30, Johannes Z. Groenewald4, Marizeth Groenewald4, Martin Grube31, Marieka Gryzenhout32, Liang-Dong Guo11, Ferry Hagen4, Sarah Hambleton2, Richard C. Hamelin14, Karen Hansen33, Paul Harrold22, G. Heller9, Kazuyuki Hirayama34, Hsiao-Man Ho35, Kerstin Hoffman6, Valérie Hofstetter36, Filip Högnabba37, Peter M. Hollingsworth22, Seung-Beom Hong38, Kentaro Hosaka39, Jos Houbraken4, Karen Hughes40, Seppo Huhtinen41, Kevin D. Hyde43,44, Timothy James44, Peter R. Johnston45, E.B. Gareth Jones17, Laura J. Kelly22,23, Paul M. Kirk46, Dániel G. Knapp47, Urmas Kõljalg48, Gábor M. Kovács47, Cletus P. Kurtzman49, Sara Landvik50, Steven D. Leavitt3, André Levesque2, Audra S. Liggenstoffer27, Kare Liimatainen51, Lorenzo Lombard4, J. Jennifer Luangsa-ard52, H. Thorsten Lumbsch3, Harinad Maganti53, Sajeewa S. N. Maharachchikumbura42, María P. Martin24, Tom W. May30, Alistair R. McTaggert9, Andrew S. Methven54, Wieland Meyer55, Jean-Marc Moncalvo56, S. Mongkolsamrit52, László G. Nagy57, R. Henrik Nilsson58, Tuula Niskanen 51, ldikó Nyilasi57, Gen Okada10, Izumi Okane59, Ibai Olariaga60, J. Otte61, Tamás Papp57, Duckhul Park45, Tamás Petkovits57, Raquel Pino-Bodas24, C. Qing29, Willem Quaedvlieg4, Huzefa A. Raja62, Dirk Redecker63, Tara Rintoul2, Constantino Ruibal18, J.M. Sarmiento-Ramírez24, Imke Schmitt61,64, Arthur Schüßler65, Carol Shearer66, Kozue Sotome67, Frank O.P. Stefani30, Soili Stenroos37, Herbert Stockinger63, Satinee Suetrong17, Sung-Oui Suh68, Gi-Ho Sung38, Motofumi Suzuki10, Kazuaki Tanaka34, Leho Tedersoo69, M. Teresa Telleria24, Eric Tretter70, Wendy A. Untereiner16, Hector Urbina15, Csaba Vágvölgyi57, Agathe Vialle14, Grit Walther4, Qi-Ming Wang11, Bevan S. Weir45, Michael Weiß71, Merlin M. White70, Jianping Xu53, Rebecca Yahr22, Z.-L. Yang29, Andrey Yurkov13, Juan-Carlos Zamora24, Ning Zhang72, Wen-Ying Zhuang11 and David Schindel, Szaniszlo Szoke, Bernard Jabas, Ammar Amor, Oussema Chouchen, Samy Ben Daoud, Ahemd Dridi, Samrock, Matteo Garbelotto, David Yarrow, Joost Stalpers, Gerrit Stegehuis, Nathalie van de Wiele, Donald Hobern, Matt Branford , Frank Bisby, Peter Bonants, Harm Huttinga, Sloan Foundation, Embarc project, EU grants QBOL, FES project, NCB Project, i4Life project, CBS, My wife and kids
10 Thanks to all mycologists, developers, EU, NCB, FES, Embarc,
Sloan Foundation, QBOL, q-Bank, & iBOL projects