text mining for metagenomics - biocreative...metagenomics approach - handles environmental...

21
© 2014 The MITRE Corporation. ALL RIGHTS RESERVED. Text Mining for Metagenomics: A New BioCreative Task Lynette Hirschman, MITRE BioCreative Workshop BioCuration 2014 Toronto, April 6-9, 2014 Approved for Public Release Case No. 14-1214

Upload: others

Post on 09-Jun-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Text Mining for Metagenomics A New BioCreative Task

Lynette Hirschman MITRE

BioCreative Workshop

BioCuration 2014 Toronto April 6-9 2014

Approved for Public Release Case No 14-1214

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Metagenomics and Metadata

Metagenomics approach - Handles environmental (heterogeneous) samples - Enables exploration of Microbial communities that canrsquot be cultured Biodiversity in multiple environments eg soil

ocean toxic waste siteshellip Human microbiome studies

Metagenomics data sets must preserve metadata - Context is everything - including naming the

environment eg toxic sludge whalefall ldquoa blade of grass from Raritan River NJ USArdquo

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

The Genomic Standards Consortium1 (GSC) has been a leader in standards and Minimum Information checklists for Metagenomics

To capture computable metadata we need - Minimal data standards (eg GSCrsquos Minimum

Information about any sequence or MIxS)2

- Controlled structured vocabulary or ontologies EnvO (Environmental Ontology)3

- Tools to extract metadata from free text and map into structured vocabulary Prospective for new meta(genomics) data Retrospective from published literature

Text Mining and Metadata Capture

1httpgenscorg wwwnaturecomnbtjournalv29n5fullnbt1823html 3Buttigieg et al Journal of Biomedical Semantics 2013 443 httpwwwjbiomedsemcomcontent4143

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Why BioCreative

BioCreative does Critical Assessment of Information Extraction for Biology

- Metagenomics is an important new area of research Focus of BioCreative is to

- Drive research towards real applications - Supply real(istic) challenge tasks including Applications from biological database curators Community amp standards-based metrics Reusable data and resources Interoperability (eg BioC)

BioCreative V (2015) will include a task on Text Mining for Metagenomics

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

What Metadata to Capture Candidate metadata types

- Species - Sample location Environment Geospatial location

- Phenotype Morphological characteristics Antibiotic resistance

BioCreative Metagenomics Advisory Group identified capture of sample environment

(isolation source) as critical text mining task

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Constructing a BioCreative Task Define the task

- Capture of environmental metadata - Interactive (prospective) capture at data entry time

Define a target output or vocabulary - EnvO (Environmental Ontology)1 EnvO-Lite2

Identify sources of training data - megX3 data annotated with EnvO-Lite terms - ENVIRONMENTS4 project mapping Encyclopedia

of Life to EnvO Identify test data and interested end users

- GOLD MG-RAST BioProject [your data here] Recruit text mining teams to participate

1Buttigieg PL et al J Biomed Semantics 2013 Dec 114(1)43 doi 1011862041-1480-4-43 2HirschmanL et al (2008) Habitat-Lite a GSC case study based on free text terms for environmental metadata OMICS 12 129ndash136 3httpwwwmegxnethabitatshabitatshtml 4httpenvironmentshcmrgr

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Defining a Terminology EnvO and EnvO-Lite (aka Habitat-Lite)

GSC participantsrsquo need - Light-weight structured

terminology to capture high level environment metadata

EnvO-Lite a ldquoslimrdquo from EnvO1 (Environmental Ontology Ashburner Morrison et al)

Extended by Gloumlcknerrsquos group at MPI Bremen (Buttigieg)

In use at - megX GOLD MG-RAST

BioProject Genomic Standards Consortium

1Buttigieg PL et al J Biomed Semantics 2013 Dec 114(1)43 doi 1011862041-1480-4-43

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Training Data megx Experiment Data Tagged with EnvO-Lite Classes

httpwwwmegxnethabitatshabitatshtml

Crocosphaera watsonii WH0002 was isolated from the subtropical Pacific Ocean waters taken at a depth of 50 meters

Marine Habitat

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Training Data ENVIRONMENTS Project (Pafilis Hellenic Centre for Marine Research

httpenvironmentshcmrgr

Environmental data from Encyclopedia of Life tagged

with EnvO terms

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Genomes On Line Database (GOLD) An Example Use Case

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Environmental Metadata in GOLD

Isolation Site a Blade of grass from Raritan River NJ USA Mapping into EnvO-Lite

From Genomes On Line Database (GOLD) D346ndashD354 Nucleic Acids Research 2010 Vol 38 Database issue Published online 13 November 2009

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

The Plan

BioCreative IV (Bethesda October 2014) - Discussion of text mining needs of metagenomics

community Metagenomics Advisory Group (ongoing teleconfs)

- Soliciting datasets and organizers for a metagenomics task for BioCreative V (2015)

Biocuration Conference (Toronto April 2014) - Update on Metagenomics task for BioCreative

BioCreative V (Spain 2015) - Task for text mining for metagenomics

12

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

BioCreative Metagenomics Advisory Group

Jim Cole Michigan State George Garrity Names for Life and Michigan State Folker Meyer Argonne National Lab Nikos Kyrpides Joint Genome Institute Evangelos Pafilis Hellenic Centre for Marine

Research Lynn Schriml U Maryland Medical School Tatiana Tatusova NCBI

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Acknowledgements National Science Foundation for support of

BioCreative1 National Science Foundation for support of MITRErsquos

earlier activities on Mining Metadata for Metagenomics2

Department of Energy for conference grant for the metagenomics and text mining work3

14

1NSF Grant DBI-0850319 2NSF Grants IIS-0746650 and IIS-0844419 3Office of Science (BER) of the US Dept of Energy This material is based upon work supported by the Department of Energy under Award Number DE-SC0010838 Disclaimer This presentation was prepared as an account of work sponsored by an agency of the United States Government Neither the United States Government nor any agency thereof nor any of their employees makes any warranty express of implied or assumes any legal liability or responsibility for the accuracy completeness or usefulness of any information apparatus product or process disclosed or represents that its use would not infringe privately owned rights Reference herein to any specific commercial product process or service by trade name trademark manufacturer or otherwise does not necessarily constitute or imply its endorsement recommendation or favoring by the United States Government or any agency thereof The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Back Up

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

MegX (Marine Ecological GenomiX) (Pre-MIGSMIMSMIENS)

copy 2007 The MITRE Corporation ALL RIGHTS RESERVED

Information in Full Text Full text article Methods section1 Further details for all methods used in this study are provided in Supplementary Information O algarvensis specimens were collected off Capo di Sant Andrea Elba Italy Supplementary Material (pdf) Juvenile and adult Olavius algarvensis specimens were collected in May and September 2004 from 56 m water depth in silicate sediments around sea grass beds of Posidonia oceanica in a bay off Capo di Santrsquo Andrea Elba Italy (42deg4826N 010deg0828E)

1Symbiosis insights through metagenomic analysis of a microbial consortium Woyke T et al Nature 443 950-955 (26 October 2006) doi101038nature05192 2httpwwwnaturecomnaturejournalv443n7114extrefnature05192-s1pdf

copy 2007 The MITRE Corporation ALL RIGHTS RESERVED

Metadata in Reference

Faiz O Colak A Saglam N Canakccedili S Belduumlz AO

J Biochem Mol Biol 2007 Jul 3140(4)588-94

Information scattered throughout article probably in reference

More specific information given in the conclusion

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Critical Assessment of Information Extraction in Biology BioCreative 2004-05 27 teams - Organizers MITRE CNB (now CNIO1) NCBI - Curators GOA (Camon Lee Apweiler) BioCreative II 2006-07 44 teams - Organizers CNIO MITRE NCBI - Curators MINT IntAct BioCreative II5 2008-2009 15 teams - Organizers CNIO MINT Elsevier MITRE BioCreative III 2009-2010 23 teams - Organizers U Delaware NCBI CTD2 CNIO MITRE Colorado BioCreative IV 2013 24 teams - Organizers U Delaware NCBI CNIO MITRE CTD Colorado BioCreative V 2015 planning underway BioCreative IIIIV is funded by NSF grant DBI-0850319

1Spanish National Cancer Center 2Comparative Toxicogenomics Database

Presenter
Presentation Notes
CTD = Comparative Toxicogenomics Database

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Minimum Information Checklists from Genomic Standards Consortium

Yilmaz et al Nat Biotechnol 2011 May29(5)415-20 doi 101038nbt1823

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

megX EnvO-Lite Annotation

httpwwwmegxnethabitatshabitatshtml

  • Text Mining for MetagenomicsA New BioCreative Task
  • Metagenomics and Metadata
  • Text Mining and Metadata Capture
  • Why BioCreative
  • What Metadata to Capture
  • Constructing a BioCreative Task
  • Defining a Terminology EnvO andEnvO-Lite (aka Habitat-Lite)
  • Training Data megx ExperimentData Tagged with EnvO-Lite Classes
  • Training Data ENVIRONMENTS Project(Pafilis Hellenic Centre for Marine Research
  • Genomes On Line Database (GOLD)An Example Use Case
  • Environmental Metadata in GOLD
  • The Plan
  • BioCreative Metagenomics Advisory Group
  • Acknowledgements
  • Back Up
  • MegX (Marine Ecological GenomiX)(Pre-MIGSMIMSMIENS)
  • Information in Full Text
  • Metadata in Reference
  • Critical Assessment of Information Extraction in Biology
  • Minimum Information Checklists from Genomic Standards Consortium
  • megX EnvO-Lite Annotation

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Metagenomics and Metadata

Metagenomics approach - Handles environmental (heterogeneous) samples - Enables exploration of Microbial communities that canrsquot be cultured Biodiversity in multiple environments eg soil

ocean toxic waste siteshellip Human microbiome studies

Metagenomics data sets must preserve metadata - Context is everything - including naming the

environment eg toxic sludge whalefall ldquoa blade of grass from Raritan River NJ USArdquo

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

The Genomic Standards Consortium1 (GSC) has been a leader in standards and Minimum Information checklists for Metagenomics

To capture computable metadata we need - Minimal data standards (eg GSCrsquos Minimum

Information about any sequence or MIxS)2

- Controlled structured vocabulary or ontologies EnvO (Environmental Ontology)3

- Tools to extract metadata from free text and map into structured vocabulary Prospective for new meta(genomics) data Retrospective from published literature

Text Mining and Metadata Capture

1httpgenscorg wwwnaturecomnbtjournalv29n5fullnbt1823html 3Buttigieg et al Journal of Biomedical Semantics 2013 443 httpwwwjbiomedsemcomcontent4143

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Why BioCreative

BioCreative does Critical Assessment of Information Extraction for Biology

- Metagenomics is an important new area of research Focus of BioCreative is to

- Drive research towards real applications - Supply real(istic) challenge tasks including Applications from biological database curators Community amp standards-based metrics Reusable data and resources Interoperability (eg BioC)

BioCreative V (2015) will include a task on Text Mining for Metagenomics

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

What Metadata to Capture Candidate metadata types

- Species - Sample location Environment Geospatial location

- Phenotype Morphological characteristics Antibiotic resistance

BioCreative Metagenomics Advisory Group identified capture of sample environment

(isolation source) as critical text mining task

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Constructing a BioCreative Task Define the task

- Capture of environmental metadata - Interactive (prospective) capture at data entry time

Define a target output or vocabulary - EnvO (Environmental Ontology)1 EnvO-Lite2

Identify sources of training data - megX3 data annotated with EnvO-Lite terms - ENVIRONMENTS4 project mapping Encyclopedia

of Life to EnvO Identify test data and interested end users

- GOLD MG-RAST BioProject [your data here] Recruit text mining teams to participate

1Buttigieg PL et al J Biomed Semantics 2013 Dec 114(1)43 doi 1011862041-1480-4-43 2HirschmanL et al (2008) Habitat-Lite a GSC case study based on free text terms for environmental metadata OMICS 12 129ndash136 3httpwwwmegxnethabitatshabitatshtml 4httpenvironmentshcmrgr

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Defining a Terminology EnvO and EnvO-Lite (aka Habitat-Lite)

GSC participantsrsquo need - Light-weight structured

terminology to capture high level environment metadata

EnvO-Lite a ldquoslimrdquo from EnvO1 (Environmental Ontology Ashburner Morrison et al)

Extended by Gloumlcknerrsquos group at MPI Bremen (Buttigieg)

In use at - megX GOLD MG-RAST

BioProject Genomic Standards Consortium

1Buttigieg PL et al J Biomed Semantics 2013 Dec 114(1)43 doi 1011862041-1480-4-43

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Training Data megx Experiment Data Tagged with EnvO-Lite Classes

httpwwwmegxnethabitatshabitatshtml

Crocosphaera watsonii WH0002 was isolated from the subtropical Pacific Ocean waters taken at a depth of 50 meters

Marine Habitat

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Training Data ENVIRONMENTS Project (Pafilis Hellenic Centre for Marine Research

httpenvironmentshcmrgr

Environmental data from Encyclopedia of Life tagged

with EnvO terms

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Genomes On Line Database (GOLD) An Example Use Case

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Environmental Metadata in GOLD

Isolation Site a Blade of grass from Raritan River NJ USA Mapping into EnvO-Lite

From Genomes On Line Database (GOLD) D346ndashD354 Nucleic Acids Research 2010 Vol 38 Database issue Published online 13 November 2009

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

The Plan

BioCreative IV (Bethesda October 2014) - Discussion of text mining needs of metagenomics

community Metagenomics Advisory Group (ongoing teleconfs)

- Soliciting datasets and organizers for a metagenomics task for BioCreative V (2015)

Biocuration Conference (Toronto April 2014) - Update on Metagenomics task for BioCreative

BioCreative V (Spain 2015) - Task for text mining for metagenomics

12

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

BioCreative Metagenomics Advisory Group

Jim Cole Michigan State George Garrity Names for Life and Michigan State Folker Meyer Argonne National Lab Nikos Kyrpides Joint Genome Institute Evangelos Pafilis Hellenic Centre for Marine

Research Lynn Schriml U Maryland Medical School Tatiana Tatusova NCBI

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Acknowledgements National Science Foundation for support of

BioCreative1 National Science Foundation for support of MITRErsquos

earlier activities on Mining Metadata for Metagenomics2

Department of Energy for conference grant for the metagenomics and text mining work3

14

1NSF Grant DBI-0850319 2NSF Grants IIS-0746650 and IIS-0844419 3Office of Science (BER) of the US Dept of Energy This material is based upon work supported by the Department of Energy under Award Number DE-SC0010838 Disclaimer This presentation was prepared as an account of work sponsored by an agency of the United States Government Neither the United States Government nor any agency thereof nor any of their employees makes any warranty express of implied or assumes any legal liability or responsibility for the accuracy completeness or usefulness of any information apparatus product or process disclosed or represents that its use would not infringe privately owned rights Reference herein to any specific commercial product process or service by trade name trademark manufacturer or otherwise does not necessarily constitute or imply its endorsement recommendation or favoring by the United States Government or any agency thereof The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Back Up

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

MegX (Marine Ecological GenomiX) (Pre-MIGSMIMSMIENS)

copy 2007 The MITRE Corporation ALL RIGHTS RESERVED

Information in Full Text Full text article Methods section1 Further details for all methods used in this study are provided in Supplementary Information O algarvensis specimens were collected off Capo di Sant Andrea Elba Italy Supplementary Material (pdf) Juvenile and adult Olavius algarvensis specimens were collected in May and September 2004 from 56 m water depth in silicate sediments around sea grass beds of Posidonia oceanica in a bay off Capo di Santrsquo Andrea Elba Italy (42deg4826N 010deg0828E)

1Symbiosis insights through metagenomic analysis of a microbial consortium Woyke T et al Nature 443 950-955 (26 October 2006) doi101038nature05192 2httpwwwnaturecomnaturejournalv443n7114extrefnature05192-s1pdf

copy 2007 The MITRE Corporation ALL RIGHTS RESERVED

Metadata in Reference

Faiz O Colak A Saglam N Canakccedili S Belduumlz AO

J Biochem Mol Biol 2007 Jul 3140(4)588-94

Information scattered throughout article probably in reference

More specific information given in the conclusion

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Critical Assessment of Information Extraction in Biology BioCreative 2004-05 27 teams - Organizers MITRE CNB (now CNIO1) NCBI - Curators GOA (Camon Lee Apweiler) BioCreative II 2006-07 44 teams - Organizers CNIO MITRE NCBI - Curators MINT IntAct BioCreative II5 2008-2009 15 teams - Organizers CNIO MINT Elsevier MITRE BioCreative III 2009-2010 23 teams - Organizers U Delaware NCBI CTD2 CNIO MITRE Colorado BioCreative IV 2013 24 teams - Organizers U Delaware NCBI CNIO MITRE CTD Colorado BioCreative V 2015 planning underway BioCreative IIIIV is funded by NSF grant DBI-0850319

1Spanish National Cancer Center 2Comparative Toxicogenomics Database

Presenter
Presentation Notes
CTD = Comparative Toxicogenomics Database

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Minimum Information Checklists from Genomic Standards Consortium

Yilmaz et al Nat Biotechnol 2011 May29(5)415-20 doi 101038nbt1823

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

megX EnvO-Lite Annotation

httpwwwmegxnethabitatshabitatshtml

  • Text Mining for MetagenomicsA New BioCreative Task
  • Metagenomics and Metadata
  • Text Mining and Metadata Capture
  • Why BioCreative
  • What Metadata to Capture
  • Constructing a BioCreative Task
  • Defining a Terminology EnvO andEnvO-Lite (aka Habitat-Lite)
  • Training Data megx ExperimentData Tagged with EnvO-Lite Classes
  • Training Data ENVIRONMENTS Project(Pafilis Hellenic Centre for Marine Research
  • Genomes On Line Database (GOLD)An Example Use Case
  • Environmental Metadata in GOLD
  • The Plan
  • BioCreative Metagenomics Advisory Group
  • Acknowledgements
  • Back Up
  • MegX (Marine Ecological GenomiX)(Pre-MIGSMIMSMIENS)
  • Information in Full Text
  • Metadata in Reference
  • Critical Assessment of Information Extraction in Biology
  • Minimum Information Checklists from Genomic Standards Consortium
  • megX EnvO-Lite Annotation

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

The Genomic Standards Consortium1 (GSC) has been a leader in standards and Minimum Information checklists for Metagenomics

To capture computable metadata we need - Minimal data standards (eg GSCrsquos Minimum

Information about any sequence or MIxS)2

- Controlled structured vocabulary or ontologies EnvO (Environmental Ontology)3

- Tools to extract metadata from free text and map into structured vocabulary Prospective for new meta(genomics) data Retrospective from published literature

Text Mining and Metadata Capture

1httpgenscorg wwwnaturecomnbtjournalv29n5fullnbt1823html 3Buttigieg et al Journal of Biomedical Semantics 2013 443 httpwwwjbiomedsemcomcontent4143

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Why BioCreative

BioCreative does Critical Assessment of Information Extraction for Biology

- Metagenomics is an important new area of research Focus of BioCreative is to

- Drive research towards real applications - Supply real(istic) challenge tasks including Applications from biological database curators Community amp standards-based metrics Reusable data and resources Interoperability (eg BioC)

BioCreative V (2015) will include a task on Text Mining for Metagenomics

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

What Metadata to Capture Candidate metadata types

- Species - Sample location Environment Geospatial location

- Phenotype Morphological characteristics Antibiotic resistance

BioCreative Metagenomics Advisory Group identified capture of sample environment

(isolation source) as critical text mining task

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Constructing a BioCreative Task Define the task

- Capture of environmental metadata - Interactive (prospective) capture at data entry time

Define a target output or vocabulary - EnvO (Environmental Ontology)1 EnvO-Lite2

Identify sources of training data - megX3 data annotated with EnvO-Lite terms - ENVIRONMENTS4 project mapping Encyclopedia

of Life to EnvO Identify test data and interested end users

- GOLD MG-RAST BioProject [your data here] Recruit text mining teams to participate

1Buttigieg PL et al J Biomed Semantics 2013 Dec 114(1)43 doi 1011862041-1480-4-43 2HirschmanL et al (2008) Habitat-Lite a GSC case study based on free text terms for environmental metadata OMICS 12 129ndash136 3httpwwwmegxnethabitatshabitatshtml 4httpenvironmentshcmrgr

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Defining a Terminology EnvO and EnvO-Lite (aka Habitat-Lite)

GSC participantsrsquo need - Light-weight structured

terminology to capture high level environment metadata

EnvO-Lite a ldquoslimrdquo from EnvO1 (Environmental Ontology Ashburner Morrison et al)

Extended by Gloumlcknerrsquos group at MPI Bremen (Buttigieg)

In use at - megX GOLD MG-RAST

BioProject Genomic Standards Consortium

1Buttigieg PL et al J Biomed Semantics 2013 Dec 114(1)43 doi 1011862041-1480-4-43

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Training Data megx Experiment Data Tagged with EnvO-Lite Classes

httpwwwmegxnethabitatshabitatshtml

Crocosphaera watsonii WH0002 was isolated from the subtropical Pacific Ocean waters taken at a depth of 50 meters

Marine Habitat

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Training Data ENVIRONMENTS Project (Pafilis Hellenic Centre for Marine Research

httpenvironmentshcmrgr

Environmental data from Encyclopedia of Life tagged

with EnvO terms

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Genomes On Line Database (GOLD) An Example Use Case

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Environmental Metadata in GOLD

Isolation Site a Blade of grass from Raritan River NJ USA Mapping into EnvO-Lite

From Genomes On Line Database (GOLD) D346ndashD354 Nucleic Acids Research 2010 Vol 38 Database issue Published online 13 November 2009

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

The Plan

BioCreative IV (Bethesda October 2014) - Discussion of text mining needs of metagenomics

community Metagenomics Advisory Group (ongoing teleconfs)

- Soliciting datasets and organizers for a metagenomics task for BioCreative V (2015)

Biocuration Conference (Toronto April 2014) - Update on Metagenomics task for BioCreative

BioCreative V (Spain 2015) - Task for text mining for metagenomics

12

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

BioCreative Metagenomics Advisory Group

Jim Cole Michigan State George Garrity Names for Life and Michigan State Folker Meyer Argonne National Lab Nikos Kyrpides Joint Genome Institute Evangelos Pafilis Hellenic Centre for Marine

Research Lynn Schriml U Maryland Medical School Tatiana Tatusova NCBI

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Acknowledgements National Science Foundation for support of

BioCreative1 National Science Foundation for support of MITRErsquos

earlier activities on Mining Metadata for Metagenomics2

Department of Energy for conference grant for the metagenomics and text mining work3

14

1NSF Grant DBI-0850319 2NSF Grants IIS-0746650 and IIS-0844419 3Office of Science (BER) of the US Dept of Energy This material is based upon work supported by the Department of Energy under Award Number DE-SC0010838 Disclaimer This presentation was prepared as an account of work sponsored by an agency of the United States Government Neither the United States Government nor any agency thereof nor any of their employees makes any warranty express of implied or assumes any legal liability or responsibility for the accuracy completeness or usefulness of any information apparatus product or process disclosed or represents that its use would not infringe privately owned rights Reference herein to any specific commercial product process or service by trade name trademark manufacturer or otherwise does not necessarily constitute or imply its endorsement recommendation or favoring by the United States Government or any agency thereof The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Back Up

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

MegX (Marine Ecological GenomiX) (Pre-MIGSMIMSMIENS)

copy 2007 The MITRE Corporation ALL RIGHTS RESERVED

Information in Full Text Full text article Methods section1 Further details for all methods used in this study are provided in Supplementary Information O algarvensis specimens were collected off Capo di Sant Andrea Elba Italy Supplementary Material (pdf) Juvenile and adult Olavius algarvensis specimens were collected in May and September 2004 from 56 m water depth in silicate sediments around sea grass beds of Posidonia oceanica in a bay off Capo di Santrsquo Andrea Elba Italy (42deg4826N 010deg0828E)

1Symbiosis insights through metagenomic analysis of a microbial consortium Woyke T et al Nature 443 950-955 (26 October 2006) doi101038nature05192 2httpwwwnaturecomnaturejournalv443n7114extrefnature05192-s1pdf

copy 2007 The MITRE Corporation ALL RIGHTS RESERVED

Metadata in Reference

Faiz O Colak A Saglam N Canakccedili S Belduumlz AO

J Biochem Mol Biol 2007 Jul 3140(4)588-94

Information scattered throughout article probably in reference

More specific information given in the conclusion

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Critical Assessment of Information Extraction in Biology BioCreative 2004-05 27 teams - Organizers MITRE CNB (now CNIO1) NCBI - Curators GOA (Camon Lee Apweiler) BioCreative II 2006-07 44 teams - Organizers CNIO MITRE NCBI - Curators MINT IntAct BioCreative II5 2008-2009 15 teams - Organizers CNIO MINT Elsevier MITRE BioCreative III 2009-2010 23 teams - Organizers U Delaware NCBI CTD2 CNIO MITRE Colorado BioCreative IV 2013 24 teams - Organizers U Delaware NCBI CNIO MITRE CTD Colorado BioCreative V 2015 planning underway BioCreative IIIIV is funded by NSF grant DBI-0850319

1Spanish National Cancer Center 2Comparative Toxicogenomics Database

Presenter
Presentation Notes
CTD = Comparative Toxicogenomics Database

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Minimum Information Checklists from Genomic Standards Consortium

Yilmaz et al Nat Biotechnol 2011 May29(5)415-20 doi 101038nbt1823

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

megX EnvO-Lite Annotation

httpwwwmegxnethabitatshabitatshtml

  • Text Mining for MetagenomicsA New BioCreative Task
  • Metagenomics and Metadata
  • Text Mining and Metadata Capture
  • Why BioCreative
  • What Metadata to Capture
  • Constructing a BioCreative Task
  • Defining a Terminology EnvO andEnvO-Lite (aka Habitat-Lite)
  • Training Data megx ExperimentData Tagged with EnvO-Lite Classes
  • Training Data ENVIRONMENTS Project(Pafilis Hellenic Centre for Marine Research
  • Genomes On Line Database (GOLD)An Example Use Case
  • Environmental Metadata in GOLD
  • The Plan
  • BioCreative Metagenomics Advisory Group
  • Acknowledgements
  • Back Up
  • MegX (Marine Ecological GenomiX)(Pre-MIGSMIMSMIENS)
  • Information in Full Text
  • Metadata in Reference
  • Critical Assessment of Information Extraction in Biology
  • Minimum Information Checklists from Genomic Standards Consortium
  • megX EnvO-Lite Annotation

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Why BioCreative

BioCreative does Critical Assessment of Information Extraction for Biology

- Metagenomics is an important new area of research Focus of BioCreative is to

- Drive research towards real applications - Supply real(istic) challenge tasks including Applications from biological database curators Community amp standards-based metrics Reusable data and resources Interoperability (eg BioC)

BioCreative V (2015) will include a task on Text Mining for Metagenomics

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

What Metadata to Capture Candidate metadata types

- Species - Sample location Environment Geospatial location

- Phenotype Morphological characteristics Antibiotic resistance

BioCreative Metagenomics Advisory Group identified capture of sample environment

(isolation source) as critical text mining task

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Constructing a BioCreative Task Define the task

- Capture of environmental metadata - Interactive (prospective) capture at data entry time

Define a target output or vocabulary - EnvO (Environmental Ontology)1 EnvO-Lite2

Identify sources of training data - megX3 data annotated with EnvO-Lite terms - ENVIRONMENTS4 project mapping Encyclopedia

of Life to EnvO Identify test data and interested end users

- GOLD MG-RAST BioProject [your data here] Recruit text mining teams to participate

1Buttigieg PL et al J Biomed Semantics 2013 Dec 114(1)43 doi 1011862041-1480-4-43 2HirschmanL et al (2008) Habitat-Lite a GSC case study based on free text terms for environmental metadata OMICS 12 129ndash136 3httpwwwmegxnethabitatshabitatshtml 4httpenvironmentshcmrgr

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Defining a Terminology EnvO and EnvO-Lite (aka Habitat-Lite)

GSC participantsrsquo need - Light-weight structured

terminology to capture high level environment metadata

EnvO-Lite a ldquoslimrdquo from EnvO1 (Environmental Ontology Ashburner Morrison et al)

Extended by Gloumlcknerrsquos group at MPI Bremen (Buttigieg)

In use at - megX GOLD MG-RAST

BioProject Genomic Standards Consortium

1Buttigieg PL et al J Biomed Semantics 2013 Dec 114(1)43 doi 1011862041-1480-4-43

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Training Data megx Experiment Data Tagged with EnvO-Lite Classes

httpwwwmegxnethabitatshabitatshtml

Crocosphaera watsonii WH0002 was isolated from the subtropical Pacific Ocean waters taken at a depth of 50 meters

Marine Habitat

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Training Data ENVIRONMENTS Project (Pafilis Hellenic Centre for Marine Research

httpenvironmentshcmrgr

Environmental data from Encyclopedia of Life tagged

with EnvO terms

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Genomes On Line Database (GOLD) An Example Use Case

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Environmental Metadata in GOLD

Isolation Site a Blade of grass from Raritan River NJ USA Mapping into EnvO-Lite

From Genomes On Line Database (GOLD) D346ndashD354 Nucleic Acids Research 2010 Vol 38 Database issue Published online 13 November 2009

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

The Plan

BioCreative IV (Bethesda October 2014) - Discussion of text mining needs of metagenomics

community Metagenomics Advisory Group (ongoing teleconfs)

- Soliciting datasets and organizers for a metagenomics task for BioCreative V (2015)

Biocuration Conference (Toronto April 2014) - Update on Metagenomics task for BioCreative

BioCreative V (Spain 2015) - Task for text mining for metagenomics

12

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

BioCreative Metagenomics Advisory Group

Jim Cole Michigan State George Garrity Names for Life and Michigan State Folker Meyer Argonne National Lab Nikos Kyrpides Joint Genome Institute Evangelos Pafilis Hellenic Centre for Marine

Research Lynn Schriml U Maryland Medical School Tatiana Tatusova NCBI

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Acknowledgements National Science Foundation for support of

BioCreative1 National Science Foundation for support of MITRErsquos

earlier activities on Mining Metadata for Metagenomics2

Department of Energy for conference grant for the metagenomics and text mining work3

14

1NSF Grant DBI-0850319 2NSF Grants IIS-0746650 and IIS-0844419 3Office of Science (BER) of the US Dept of Energy This material is based upon work supported by the Department of Energy under Award Number DE-SC0010838 Disclaimer This presentation was prepared as an account of work sponsored by an agency of the United States Government Neither the United States Government nor any agency thereof nor any of their employees makes any warranty express of implied or assumes any legal liability or responsibility for the accuracy completeness or usefulness of any information apparatus product or process disclosed or represents that its use would not infringe privately owned rights Reference herein to any specific commercial product process or service by trade name trademark manufacturer or otherwise does not necessarily constitute or imply its endorsement recommendation or favoring by the United States Government or any agency thereof The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Back Up

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

MegX (Marine Ecological GenomiX) (Pre-MIGSMIMSMIENS)

copy 2007 The MITRE Corporation ALL RIGHTS RESERVED

Information in Full Text Full text article Methods section1 Further details for all methods used in this study are provided in Supplementary Information O algarvensis specimens were collected off Capo di Sant Andrea Elba Italy Supplementary Material (pdf) Juvenile and adult Olavius algarvensis specimens were collected in May and September 2004 from 56 m water depth in silicate sediments around sea grass beds of Posidonia oceanica in a bay off Capo di Santrsquo Andrea Elba Italy (42deg4826N 010deg0828E)

1Symbiosis insights through metagenomic analysis of a microbial consortium Woyke T et al Nature 443 950-955 (26 October 2006) doi101038nature05192 2httpwwwnaturecomnaturejournalv443n7114extrefnature05192-s1pdf

copy 2007 The MITRE Corporation ALL RIGHTS RESERVED

Metadata in Reference

Faiz O Colak A Saglam N Canakccedili S Belduumlz AO

J Biochem Mol Biol 2007 Jul 3140(4)588-94

Information scattered throughout article probably in reference

More specific information given in the conclusion

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Critical Assessment of Information Extraction in Biology BioCreative 2004-05 27 teams - Organizers MITRE CNB (now CNIO1) NCBI - Curators GOA (Camon Lee Apweiler) BioCreative II 2006-07 44 teams - Organizers CNIO MITRE NCBI - Curators MINT IntAct BioCreative II5 2008-2009 15 teams - Organizers CNIO MINT Elsevier MITRE BioCreative III 2009-2010 23 teams - Organizers U Delaware NCBI CTD2 CNIO MITRE Colorado BioCreative IV 2013 24 teams - Organizers U Delaware NCBI CNIO MITRE CTD Colorado BioCreative V 2015 planning underway BioCreative IIIIV is funded by NSF grant DBI-0850319

1Spanish National Cancer Center 2Comparative Toxicogenomics Database

Presenter
Presentation Notes
CTD = Comparative Toxicogenomics Database

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Minimum Information Checklists from Genomic Standards Consortium

Yilmaz et al Nat Biotechnol 2011 May29(5)415-20 doi 101038nbt1823

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

megX EnvO-Lite Annotation

httpwwwmegxnethabitatshabitatshtml

  • Text Mining for MetagenomicsA New BioCreative Task
  • Metagenomics and Metadata
  • Text Mining and Metadata Capture
  • Why BioCreative
  • What Metadata to Capture
  • Constructing a BioCreative Task
  • Defining a Terminology EnvO andEnvO-Lite (aka Habitat-Lite)
  • Training Data megx ExperimentData Tagged with EnvO-Lite Classes
  • Training Data ENVIRONMENTS Project(Pafilis Hellenic Centre for Marine Research
  • Genomes On Line Database (GOLD)An Example Use Case
  • Environmental Metadata in GOLD
  • The Plan
  • BioCreative Metagenomics Advisory Group
  • Acknowledgements
  • Back Up
  • MegX (Marine Ecological GenomiX)(Pre-MIGSMIMSMIENS)
  • Information in Full Text
  • Metadata in Reference
  • Critical Assessment of Information Extraction in Biology
  • Minimum Information Checklists from Genomic Standards Consortium
  • megX EnvO-Lite Annotation

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

What Metadata to Capture Candidate metadata types

- Species - Sample location Environment Geospatial location

- Phenotype Morphological characteristics Antibiotic resistance

BioCreative Metagenomics Advisory Group identified capture of sample environment

(isolation source) as critical text mining task

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Constructing a BioCreative Task Define the task

- Capture of environmental metadata - Interactive (prospective) capture at data entry time

Define a target output or vocabulary - EnvO (Environmental Ontology)1 EnvO-Lite2

Identify sources of training data - megX3 data annotated with EnvO-Lite terms - ENVIRONMENTS4 project mapping Encyclopedia

of Life to EnvO Identify test data and interested end users

- GOLD MG-RAST BioProject [your data here] Recruit text mining teams to participate

1Buttigieg PL et al J Biomed Semantics 2013 Dec 114(1)43 doi 1011862041-1480-4-43 2HirschmanL et al (2008) Habitat-Lite a GSC case study based on free text terms for environmental metadata OMICS 12 129ndash136 3httpwwwmegxnethabitatshabitatshtml 4httpenvironmentshcmrgr

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Defining a Terminology EnvO and EnvO-Lite (aka Habitat-Lite)

GSC participantsrsquo need - Light-weight structured

terminology to capture high level environment metadata

EnvO-Lite a ldquoslimrdquo from EnvO1 (Environmental Ontology Ashburner Morrison et al)

Extended by Gloumlcknerrsquos group at MPI Bremen (Buttigieg)

In use at - megX GOLD MG-RAST

BioProject Genomic Standards Consortium

1Buttigieg PL et al J Biomed Semantics 2013 Dec 114(1)43 doi 1011862041-1480-4-43

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Training Data megx Experiment Data Tagged with EnvO-Lite Classes

httpwwwmegxnethabitatshabitatshtml

Crocosphaera watsonii WH0002 was isolated from the subtropical Pacific Ocean waters taken at a depth of 50 meters

Marine Habitat

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Training Data ENVIRONMENTS Project (Pafilis Hellenic Centre for Marine Research

httpenvironmentshcmrgr

Environmental data from Encyclopedia of Life tagged

with EnvO terms

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Genomes On Line Database (GOLD) An Example Use Case

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Environmental Metadata in GOLD

Isolation Site a Blade of grass from Raritan River NJ USA Mapping into EnvO-Lite

From Genomes On Line Database (GOLD) D346ndashD354 Nucleic Acids Research 2010 Vol 38 Database issue Published online 13 November 2009

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

The Plan

BioCreative IV (Bethesda October 2014) - Discussion of text mining needs of metagenomics

community Metagenomics Advisory Group (ongoing teleconfs)

- Soliciting datasets and organizers for a metagenomics task for BioCreative V (2015)

Biocuration Conference (Toronto April 2014) - Update on Metagenomics task for BioCreative

BioCreative V (Spain 2015) - Task for text mining for metagenomics

12

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

BioCreative Metagenomics Advisory Group

Jim Cole Michigan State George Garrity Names for Life and Michigan State Folker Meyer Argonne National Lab Nikos Kyrpides Joint Genome Institute Evangelos Pafilis Hellenic Centre for Marine

Research Lynn Schriml U Maryland Medical School Tatiana Tatusova NCBI

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Acknowledgements National Science Foundation for support of

BioCreative1 National Science Foundation for support of MITRErsquos

earlier activities on Mining Metadata for Metagenomics2

Department of Energy for conference grant for the metagenomics and text mining work3

14

1NSF Grant DBI-0850319 2NSF Grants IIS-0746650 and IIS-0844419 3Office of Science (BER) of the US Dept of Energy This material is based upon work supported by the Department of Energy under Award Number DE-SC0010838 Disclaimer This presentation was prepared as an account of work sponsored by an agency of the United States Government Neither the United States Government nor any agency thereof nor any of their employees makes any warranty express of implied or assumes any legal liability or responsibility for the accuracy completeness or usefulness of any information apparatus product or process disclosed or represents that its use would not infringe privately owned rights Reference herein to any specific commercial product process or service by trade name trademark manufacturer or otherwise does not necessarily constitute or imply its endorsement recommendation or favoring by the United States Government or any agency thereof The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Back Up

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

MegX (Marine Ecological GenomiX) (Pre-MIGSMIMSMIENS)

copy 2007 The MITRE Corporation ALL RIGHTS RESERVED

Information in Full Text Full text article Methods section1 Further details for all methods used in this study are provided in Supplementary Information O algarvensis specimens were collected off Capo di Sant Andrea Elba Italy Supplementary Material (pdf) Juvenile and adult Olavius algarvensis specimens were collected in May and September 2004 from 56 m water depth in silicate sediments around sea grass beds of Posidonia oceanica in a bay off Capo di Santrsquo Andrea Elba Italy (42deg4826N 010deg0828E)

1Symbiosis insights through metagenomic analysis of a microbial consortium Woyke T et al Nature 443 950-955 (26 October 2006) doi101038nature05192 2httpwwwnaturecomnaturejournalv443n7114extrefnature05192-s1pdf

copy 2007 The MITRE Corporation ALL RIGHTS RESERVED

Metadata in Reference

Faiz O Colak A Saglam N Canakccedili S Belduumlz AO

J Biochem Mol Biol 2007 Jul 3140(4)588-94

Information scattered throughout article probably in reference

More specific information given in the conclusion

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Critical Assessment of Information Extraction in Biology BioCreative 2004-05 27 teams - Organizers MITRE CNB (now CNIO1) NCBI - Curators GOA (Camon Lee Apweiler) BioCreative II 2006-07 44 teams - Organizers CNIO MITRE NCBI - Curators MINT IntAct BioCreative II5 2008-2009 15 teams - Organizers CNIO MINT Elsevier MITRE BioCreative III 2009-2010 23 teams - Organizers U Delaware NCBI CTD2 CNIO MITRE Colorado BioCreative IV 2013 24 teams - Organizers U Delaware NCBI CNIO MITRE CTD Colorado BioCreative V 2015 planning underway BioCreative IIIIV is funded by NSF grant DBI-0850319

1Spanish National Cancer Center 2Comparative Toxicogenomics Database

Presenter
Presentation Notes
CTD = Comparative Toxicogenomics Database

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Minimum Information Checklists from Genomic Standards Consortium

Yilmaz et al Nat Biotechnol 2011 May29(5)415-20 doi 101038nbt1823

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

megX EnvO-Lite Annotation

httpwwwmegxnethabitatshabitatshtml

  • Text Mining for MetagenomicsA New BioCreative Task
  • Metagenomics and Metadata
  • Text Mining and Metadata Capture
  • Why BioCreative
  • What Metadata to Capture
  • Constructing a BioCreative Task
  • Defining a Terminology EnvO andEnvO-Lite (aka Habitat-Lite)
  • Training Data megx ExperimentData Tagged with EnvO-Lite Classes
  • Training Data ENVIRONMENTS Project(Pafilis Hellenic Centre for Marine Research
  • Genomes On Line Database (GOLD)An Example Use Case
  • Environmental Metadata in GOLD
  • The Plan
  • BioCreative Metagenomics Advisory Group
  • Acknowledgements
  • Back Up
  • MegX (Marine Ecological GenomiX)(Pre-MIGSMIMSMIENS)
  • Information in Full Text
  • Metadata in Reference
  • Critical Assessment of Information Extraction in Biology
  • Minimum Information Checklists from Genomic Standards Consortium
  • megX EnvO-Lite Annotation

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Constructing a BioCreative Task Define the task

- Capture of environmental metadata - Interactive (prospective) capture at data entry time

Define a target output or vocabulary - EnvO (Environmental Ontology)1 EnvO-Lite2

Identify sources of training data - megX3 data annotated with EnvO-Lite terms - ENVIRONMENTS4 project mapping Encyclopedia

of Life to EnvO Identify test data and interested end users

- GOLD MG-RAST BioProject [your data here] Recruit text mining teams to participate

1Buttigieg PL et al J Biomed Semantics 2013 Dec 114(1)43 doi 1011862041-1480-4-43 2HirschmanL et al (2008) Habitat-Lite a GSC case study based on free text terms for environmental metadata OMICS 12 129ndash136 3httpwwwmegxnethabitatshabitatshtml 4httpenvironmentshcmrgr

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Defining a Terminology EnvO and EnvO-Lite (aka Habitat-Lite)

GSC participantsrsquo need - Light-weight structured

terminology to capture high level environment metadata

EnvO-Lite a ldquoslimrdquo from EnvO1 (Environmental Ontology Ashburner Morrison et al)

Extended by Gloumlcknerrsquos group at MPI Bremen (Buttigieg)

In use at - megX GOLD MG-RAST

BioProject Genomic Standards Consortium

1Buttigieg PL et al J Biomed Semantics 2013 Dec 114(1)43 doi 1011862041-1480-4-43

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Training Data megx Experiment Data Tagged with EnvO-Lite Classes

httpwwwmegxnethabitatshabitatshtml

Crocosphaera watsonii WH0002 was isolated from the subtropical Pacific Ocean waters taken at a depth of 50 meters

Marine Habitat

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Training Data ENVIRONMENTS Project (Pafilis Hellenic Centre for Marine Research

httpenvironmentshcmrgr

Environmental data from Encyclopedia of Life tagged

with EnvO terms

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Genomes On Line Database (GOLD) An Example Use Case

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Environmental Metadata in GOLD

Isolation Site a Blade of grass from Raritan River NJ USA Mapping into EnvO-Lite

From Genomes On Line Database (GOLD) D346ndashD354 Nucleic Acids Research 2010 Vol 38 Database issue Published online 13 November 2009

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

The Plan

BioCreative IV (Bethesda October 2014) - Discussion of text mining needs of metagenomics

community Metagenomics Advisory Group (ongoing teleconfs)

- Soliciting datasets and organizers for a metagenomics task for BioCreative V (2015)

Biocuration Conference (Toronto April 2014) - Update on Metagenomics task for BioCreative

BioCreative V (Spain 2015) - Task for text mining for metagenomics

12

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

BioCreative Metagenomics Advisory Group

Jim Cole Michigan State George Garrity Names for Life and Michigan State Folker Meyer Argonne National Lab Nikos Kyrpides Joint Genome Institute Evangelos Pafilis Hellenic Centre for Marine

Research Lynn Schriml U Maryland Medical School Tatiana Tatusova NCBI

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Acknowledgements National Science Foundation for support of

BioCreative1 National Science Foundation for support of MITRErsquos

earlier activities on Mining Metadata for Metagenomics2

Department of Energy for conference grant for the metagenomics and text mining work3

14

1NSF Grant DBI-0850319 2NSF Grants IIS-0746650 and IIS-0844419 3Office of Science (BER) of the US Dept of Energy This material is based upon work supported by the Department of Energy under Award Number DE-SC0010838 Disclaimer This presentation was prepared as an account of work sponsored by an agency of the United States Government Neither the United States Government nor any agency thereof nor any of their employees makes any warranty express of implied or assumes any legal liability or responsibility for the accuracy completeness or usefulness of any information apparatus product or process disclosed or represents that its use would not infringe privately owned rights Reference herein to any specific commercial product process or service by trade name trademark manufacturer or otherwise does not necessarily constitute or imply its endorsement recommendation or favoring by the United States Government or any agency thereof The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Back Up

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

MegX (Marine Ecological GenomiX) (Pre-MIGSMIMSMIENS)

copy 2007 The MITRE Corporation ALL RIGHTS RESERVED

Information in Full Text Full text article Methods section1 Further details for all methods used in this study are provided in Supplementary Information O algarvensis specimens were collected off Capo di Sant Andrea Elba Italy Supplementary Material (pdf) Juvenile and adult Olavius algarvensis specimens were collected in May and September 2004 from 56 m water depth in silicate sediments around sea grass beds of Posidonia oceanica in a bay off Capo di Santrsquo Andrea Elba Italy (42deg4826N 010deg0828E)

1Symbiosis insights through metagenomic analysis of a microbial consortium Woyke T et al Nature 443 950-955 (26 October 2006) doi101038nature05192 2httpwwwnaturecomnaturejournalv443n7114extrefnature05192-s1pdf

copy 2007 The MITRE Corporation ALL RIGHTS RESERVED

Metadata in Reference

Faiz O Colak A Saglam N Canakccedili S Belduumlz AO

J Biochem Mol Biol 2007 Jul 3140(4)588-94

Information scattered throughout article probably in reference

More specific information given in the conclusion

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Critical Assessment of Information Extraction in Biology BioCreative 2004-05 27 teams - Organizers MITRE CNB (now CNIO1) NCBI - Curators GOA (Camon Lee Apweiler) BioCreative II 2006-07 44 teams - Organizers CNIO MITRE NCBI - Curators MINT IntAct BioCreative II5 2008-2009 15 teams - Organizers CNIO MINT Elsevier MITRE BioCreative III 2009-2010 23 teams - Organizers U Delaware NCBI CTD2 CNIO MITRE Colorado BioCreative IV 2013 24 teams - Organizers U Delaware NCBI CNIO MITRE CTD Colorado BioCreative V 2015 planning underway BioCreative IIIIV is funded by NSF grant DBI-0850319

1Spanish National Cancer Center 2Comparative Toxicogenomics Database

Presenter
Presentation Notes
CTD = Comparative Toxicogenomics Database

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Minimum Information Checklists from Genomic Standards Consortium

Yilmaz et al Nat Biotechnol 2011 May29(5)415-20 doi 101038nbt1823

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

megX EnvO-Lite Annotation

httpwwwmegxnethabitatshabitatshtml

  • Text Mining for MetagenomicsA New BioCreative Task
  • Metagenomics and Metadata
  • Text Mining and Metadata Capture
  • Why BioCreative
  • What Metadata to Capture
  • Constructing a BioCreative Task
  • Defining a Terminology EnvO andEnvO-Lite (aka Habitat-Lite)
  • Training Data megx ExperimentData Tagged with EnvO-Lite Classes
  • Training Data ENVIRONMENTS Project(Pafilis Hellenic Centre for Marine Research
  • Genomes On Line Database (GOLD)An Example Use Case
  • Environmental Metadata in GOLD
  • The Plan
  • BioCreative Metagenomics Advisory Group
  • Acknowledgements
  • Back Up
  • MegX (Marine Ecological GenomiX)(Pre-MIGSMIMSMIENS)
  • Information in Full Text
  • Metadata in Reference
  • Critical Assessment of Information Extraction in Biology
  • Minimum Information Checklists from Genomic Standards Consortium
  • megX EnvO-Lite Annotation

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Defining a Terminology EnvO and EnvO-Lite (aka Habitat-Lite)

GSC participantsrsquo need - Light-weight structured

terminology to capture high level environment metadata

EnvO-Lite a ldquoslimrdquo from EnvO1 (Environmental Ontology Ashburner Morrison et al)

Extended by Gloumlcknerrsquos group at MPI Bremen (Buttigieg)

In use at - megX GOLD MG-RAST

BioProject Genomic Standards Consortium

1Buttigieg PL et al J Biomed Semantics 2013 Dec 114(1)43 doi 1011862041-1480-4-43

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Training Data megx Experiment Data Tagged with EnvO-Lite Classes

httpwwwmegxnethabitatshabitatshtml

Crocosphaera watsonii WH0002 was isolated from the subtropical Pacific Ocean waters taken at a depth of 50 meters

Marine Habitat

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Training Data ENVIRONMENTS Project (Pafilis Hellenic Centre for Marine Research

httpenvironmentshcmrgr

Environmental data from Encyclopedia of Life tagged

with EnvO terms

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Genomes On Line Database (GOLD) An Example Use Case

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Environmental Metadata in GOLD

Isolation Site a Blade of grass from Raritan River NJ USA Mapping into EnvO-Lite

From Genomes On Line Database (GOLD) D346ndashD354 Nucleic Acids Research 2010 Vol 38 Database issue Published online 13 November 2009

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

The Plan

BioCreative IV (Bethesda October 2014) - Discussion of text mining needs of metagenomics

community Metagenomics Advisory Group (ongoing teleconfs)

- Soliciting datasets and organizers for a metagenomics task for BioCreative V (2015)

Biocuration Conference (Toronto April 2014) - Update on Metagenomics task for BioCreative

BioCreative V (Spain 2015) - Task for text mining for metagenomics

12

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

BioCreative Metagenomics Advisory Group

Jim Cole Michigan State George Garrity Names for Life and Michigan State Folker Meyer Argonne National Lab Nikos Kyrpides Joint Genome Institute Evangelos Pafilis Hellenic Centre for Marine

Research Lynn Schriml U Maryland Medical School Tatiana Tatusova NCBI

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Acknowledgements National Science Foundation for support of

BioCreative1 National Science Foundation for support of MITRErsquos

earlier activities on Mining Metadata for Metagenomics2

Department of Energy for conference grant for the metagenomics and text mining work3

14

1NSF Grant DBI-0850319 2NSF Grants IIS-0746650 and IIS-0844419 3Office of Science (BER) of the US Dept of Energy This material is based upon work supported by the Department of Energy under Award Number DE-SC0010838 Disclaimer This presentation was prepared as an account of work sponsored by an agency of the United States Government Neither the United States Government nor any agency thereof nor any of their employees makes any warranty express of implied or assumes any legal liability or responsibility for the accuracy completeness or usefulness of any information apparatus product or process disclosed or represents that its use would not infringe privately owned rights Reference herein to any specific commercial product process or service by trade name trademark manufacturer or otherwise does not necessarily constitute or imply its endorsement recommendation or favoring by the United States Government or any agency thereof The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Back Up

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

MegX (Marine Ecological GenomiX) (Pre-MIGSMIMSMIENS)

copy 2007 The MITRE Corporation ALL RIGHTS RESERVED

Information in Full Text Full text article Methods section1 Further details for all methods used in this study are provided in Supplementary Information O algarvensis specimens were collected off Capo di Sant Andrea Elba Italy Supplementary Material (pdf) Juvenile and adult Olavius algarvensis specimens were collected in May and September 2004 from 56 m water depth in silicate sediments around sea grass beds of Posidonia oceanica in a bay off Capo di Santrsquo Andrea Elba Italy (42deg4826N 010deg0828E)

1Symbiosis insights through metagenomic analysis of a microbial consortium Woyke T et al Nature 443 950-955 (26 October 2006) doi101038nature05192 2httpwwwnaturecomnaturejournalv443n7114extrefnature05192-s1pdf

copy 2007 The MITRE Corporation ALL RIGHTS RESERVED

Metadata in Reference

Faiz O Colak A Saglam N Canakccedili S Belduumlz AO

J Biochem Mol Biol 2007 Jul 3140(4)588-94

Information scattered throughout article probably in reference

More specific information given in the conclusion

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Critical Assessment of Information Extraction in Biology BioCreative 2004-05 27 teams - Organizers MITRE CNB (now CNIO1) NCBI - Curators GOA (Camon Lee Apweiler) BioCreative II 2006-07 44 teams - Organizers CNIO MITRE NCBI - Curators MINT IntAct BioCreative II5 2008-2009 15 teams - Organizers CNIO MINT Elsevier MITRE BioCreative III 2009-2010 23 teams - Organizers U Delaware NCBI CTD2 CNIO MITRE Colorado BioCreative IV 2013 24 teams - Organizers U Delaware NCBI CNIO MITRE CTD Colorado BioCreative V 2015 planning underway BioCreative IIIIV is funded by NSF grant DBI-0850319

1Spanish National Cancer Center 2Comparative Toxicogenomics Database

Presenter
Presentation Notes
CTD = Comparative Toxicogenomics Database

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Minimum Information Checklists from Genomic Standards Consortium

Yilmaz et al Nat Biotechnol 2011 May29(5)415-20 doi 101038nbt1823

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

megX EnvO-Lite Annotation

httpwwwmegxnethabitatshabitatshtml

  • Text Mining for MetagenomicsA New BioCreative Task
  • Metagenomics and Metadata
  • Text Mining and Metadata Capture
  • Why BioCreative
  • What Metadata to Capture
  • Constructing a BioCreative Task
  • Defining a Terminology EnvO andEnvO-Lite (aka Habitat-Lite)
  • Training Data megx ExperimentData Tagged with EnvO-Lite Classes
  • Training Data ENVIRONMENTS Project(Pafilis Hellenic Centre for Marine Research
  • Genomes On Line Database (GOLD)An Example Use Case
  • Environmental Metadata in GOLD
  • The Plan
  • BioCreative Metagenomics Advisory Group
  • Acknowledgements
  • Back Up
  • MegX (Marine Ecological GenomiX)(Pre-MIGSMIMSMIENS)
  • Information in Full Text
  • Metadata in Reference
  • Critical Assessment of Information Extraction in Biology
  • Minimum Information Checklists from Genomic Standards Consortium
  • megX EnvO-Lite Annotation

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Training Data megx Experiment Data Tagged with EnvO-Lite Classes

httpwwwmegxnethabitatshabitatshtml

Crocosphaera watsonii WH0002 was isolated from the subtropical Pacific Ocean waters taken at a depth of 50 meters

Marine Habitat

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Training Data ENVIRONMENTS Project (Pafilis Hellenic Centre for Marine Research

httpenvironmentshcmrgr

Environmental data from Encyclopedia of Life tagged

with EnvO terms

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Genomes On Line Database (GOLD) An Example Use Case

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Environmental Metadata in GOLD

Isolation Site a Blade of grass from Raritan River NJ USA Mapping into EnvO-Lite

From Genomes On Line Database (GOLD) D346ndashD354 Nucleic Acids Research 2010 Vol 38 Database issue Published online 13 November 2009

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

The Plan

BioCreative IV (Bethesda October 2014) - Discussion of text mining needs of metagenomics

community Metagenomics Advisory Group (ongoing teleconfs)

- Soliciting datasets and organizers for a metagenomics task for BioCreative V (2015)

Biocuration Conference (Toronto April 2014) - Update on Metagenomics task for BioCreative

BioCreative V (Spain 2015) - Task for text mining for metagenomics

12

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

BioCreative Metagenomics Advisory Group

Jim Cole Michigan State George Garrity Names for Life and Michigan State Folker Meyer Argonne National Lab Nikos Kyrpides Joint Genome Institute Evangelos Pafilis Hellenic Centre for Marine

Research Lynn Schriml U Maryland Medical School Tatiana Tatusova NCBI

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Acknowledgements National Science Foundation for support of

BioCreative1 National Science Foundation for support of MITRErsquos

earlier activities on Mining Metadata for Metagenomics2

Department of Energy for conference grant for the metagenomics and text mining work3

14

1NSF Grant DBI-0850319 2NSF Grants IIS-0746650 and IIS-0844419 3Office of Science (BER) of the US Dept of Energy This material is based upon work supported by the Department of Energy under Award Number DE-SC0010838 Disclaimer This presentation was prepared as an account of work sponsored by an agency of the United States Government Neither the United States Government nor any agency thereof nor any of their employees makes any warranty express of implied or assumes any legal liability or responsibility for the accuracy completeness or usefulness of any information apparatus product or process disclosed or represents that its use would not infringe privately owned rights Reference herein to any specific commercial product process or service by trade name trademark manufacturer or otherwise does not necessarily constitute or imply its endorsement recommendation or favoring by the United States Government or any agency thereof The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Back Up

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

MegX (Marine Ecological GenomiX) (Pre-MIGSMIMSMIENS)

copy 2007 The MITRE Corporation ALL RIGHTS RESERVED

Information in Full Text Full text article Methods section1 Further details for all methods used in this study are provided in Supplementary Information O algarvensis specimens were collected off Capo di Sant Andrea Elba Italy Supplementary Material (pdf) Juvenile and adult Olavius algarvensis specimens were collected in May and September 2004 from 56 m water depth in silicate sediments around sea grass beds of Posidonia oceanica in a bay off Capo di Santrsquo Andrea Elba Italy (42deg4826N 010deg0828E)

1Symbiosis insights through metagenomic analysis of a microbial consortium Woyke T et al Nature 443 950-955 (26 October 2006) doi101038nature05192 2httpwwwnaturecomnaturejournalv443n7114extrefnature05192-s1pdf

copy 2007 The MITRE Corporation ALL RIGHTS RESERVED

Metadata in Reference

Faiz O Colak A Saglam N Canakccedili S Belduumlz AO

J Biochem Mol Biol 2007 Jul 3140(4)588-94

Information scattered throughout article probably in reference

More specific information given in the conclusion

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Critical Assessment of Information Extraction in Biology BioCreative 2004-05 27 teams - Organizers MITRE CNB (now CNIO1) NCBI - Curators GOA (Camon Lee Apweiler) BioCreative II 2006-07 44 teams - Organizers CNIO MITRE NCBI - Curators MINT IntAct BioCreative II5 2008-2009 15 teams - Organizers CNIO MINT Elsevier MITRE BioCreative III 2009-2010 23 teams - Organizers U Delaware NCBI CTD2 CNIO MITRE Colorado BioCreative IV 2013 24 teams - Organizers U Delaware NCBI CNIO MITRE CTD Colorado BioCreative V 2015 planning underway BioCreative IIIIV is funded by NSF grant DBI-0850319

1Spanish National Cancer Center 2Comparative Toxicogenomics Database

Presenter
Presentation Notes
CTD = Comparative Toxicogenomics Database

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Minimum Information Checklists from Genomic Standards Consortium

Yilmaz et al Nat Biotechnol 2011 May29(5)415-20 doi 101038nbt1823

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

megX EnvO-Lite Annotation

httpwwwmegxnethabitatshabitatshtml

  • Text Mining for MetagenomicsA New BioCreative Task
  • Metagenomics and Metadata
  • Text Mining and Metadata Capture
  • Why BioCreative
  • What Metadata to Capture
  • Constructing a BioCreative Task
  • Defining a Terminology EnvO andEnvO-Lite (aka Habitat-Lite)
  • Training Data megx ExperimentData Tagged with EnvO-Lite Classes
  • Training Data ENVIRONMENTS Project(Pafilis Hellenic Centre for Marine Research
  • Genomes On Line Database (GOLD)An Example Use Case
  • Environmental Metadata in GOLD
  • The Plan
  • BioCreative Metagenomics Advisory Group
  • Acknowledgements
  • Back Up
  • MegX (Marine Ecological GenomiX)(Pre-MIGSMIMSMIENS)
  • Information in Full Text
  • Metadata in Reference
  • Critical Assessment of Information Extraction in Biology
  • Minimum Information Checklists from Genomic Standards Consortium
  • megX EnvO-Lite Annotation

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Training Data ENVIRONMENTS Project (Pafilis Hellenic Centre for Marine Research

httpenvironmentshcmrgr

Environmental data from Encyclopedia of Life tagged

with EnvO terms

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Genomes On Line Database (GOLD) An Example Use Case

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Environmental Metadata in GOLD

Isolation Site a Blade of grass from Raritan River NJ USA Mapping into EnvO-Lite

From Genomes On Line Database (GOLD) D346ndashD354 Nucleic Acids Research 2010 Vol 38 Database issue Published online 13 November 2009

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

The Plan

BioCreative IV (Bethesda October 2014) - Discussion of text mining needs of metagenomics

community Metagenomics Advisory Group (ongoing teleconfs)

- Soliciting datasets and organizers for a metagenomics task for BioCreative V (2015)

Biocuration Conference (Toronto April 2014) - Update on Metagenomics task for BioCreative

BioCreative V (Spain 2015) - Task for text mining for metagenomics

12

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

BioCreative Metagenomics Advisory Group

Jim Cole Michigan State George Garrity Names for Life and Michigan State Folker Meyer Argonne National Lab Nikos Kyrpides Joint Genome Institute Evangelos Pafilis Hellenic Centre for Marine

Research Lynn Schriml U Maryland Medical School Tatiana Tatusova NCBI

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Acknowledgements National Science Foundation for support of

BioCreative1 National Science Foundation for support of MITRErsquos

earlier activities on Mining Metadata for Metagenomics2

Department of Energy for conference grant for the metagenomics and text mining work3

14

1NSF Grant DBI-0850319 2NSF Grants IIS-0746650 and IIS-0844419 3Office of Science (BER) of the US Dept of Energy This material is based upon work supported by the Department of Energy under Award Number DE-SC0010838 Disclaimer This presentation was prepared as an account of work sponsored by an agency of the United States Government Neither the United States Government nor any agency thereof nor any of their employees makes any warranty express of implied or assumes any legal liability or responsibility for the accuracy completeness or usefulness of any information apparatus product or process disclosed or represents that its use would not infringe privately owned rights Reference herein to any specific commercial product process or service by trade name trademark manufacturer or otherwise does not necessarily constitute or imply its endorsement recommendation or favoring by the United States Government or any agency thereof The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Back Up

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

MegX (Marine Ecological GenomiX) (Pre-MIGSMIMSMIENS)

copy 2007 The MITRE Corporation ALL RIGHTS RESERVED

Information in Full Text Full text article Methods section1 Further details for all methods used in this study are provided in Supplementary Information O algarvensis specimens were collected off Capo di Sant Andrea Elba Italy Supplementary Material (pdf) Juvenile and adult Olavius algarvensis specimens were collected in May and September 2004 from 56 m water depth in silicate sediments around sea grass beds of Posidonia oceanica in a bay off Capo di Santrsquo Andrea Elba Italy (42deg4826N 010deg0828E)

1Symbiosis insights through metagenomic analysis of a microbial consortium Woyke T et al Nature 443 950-955 (26 October 2006) doi101038nature05192 2httpwwwnaturecomnaturejournalv443n7114extrefnature05192-s1pdf

copy 2007 The MITRE Corporation ALL RIGHTS RESERVED

Metadata in Reference

Faiz O Colak A Saglam N Canakccedili S Belduumlz AO

J Biochem Mol Biol 2007 Jul 3140(4)588-94

Information scattered throughout article probably in reference

More specific information given in the conclusion

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Critical Assessment of Information Extraction in Biology BioCreative 2004-05 27 teams - Organizers MITRE CNB (now CNIO1) NCBI - Curators GOA (Camon Lee Apweiler) BioCreative II 2006-07 44 teams - Organizers CNIO MITRE NCBI - Curators MINT IntAct BioCreative II5 2008-2009 15 teams - Organizers CNIO MINT Elsevier MITRE BioCreative III 2009-2010 23 teams - Organizers U Delaware NCBI CTD2 CNIO MITRE Colorado BioCreative IV 2013 24 teams - Organizers U Delaware NCBI CNIO MITRE CTD Colorado BioCreative V 2015 planning underway BioCreative IIIIV is funded by NSF grant DBI-0850319

1Spanish National Cancer Center 2Comparative Toxicogenomics Database

Presenter
Presentation Notes
CTD = Comparative Toxicogenomics Database

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Minimum Information Checklists from Genomic Standards Consortium

Yilmaz et al Nat Biotechnol 2011 May29(5)415-20 doi 101038nbt1823

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

megX EnvO-Lite Annotation

httpwwwmegxnethabitatshabitatshtml

  • Text Mining for MetagenomicsA New BioCreative Task
  • Metagenomics and Metadata
  • Text Mining and Metadata Capture
  • Why BioCreative
  • What Metadata to Capture
  • Constructing a BioCreative Task
  • Defining a Terminology EnvO andEnvO-Lite (aka Habitat-Lite)
  • Training Data megx ExperimentData Tagged with EnvO-Lite Classes
  • Training Data ENVIRONMENTS Project(Pafilis Hellenic Centre for Marine Research
  • Genomes On Line Database (GOLD)An Example Use Case
  • Environmental Metadata in GOLD
  • The Plan
  • BioCreative Metagenomics Advisory Group
  • Acknowledgements
  • Back Up
  • MegX (Marine Ecological GenomiX)(Pre-MIGSMIMSMIENS)
  • Information in Full Text
  • Metadata in Reference
  • Critical Assessment of Information Extraction in Biology
  • Minimum Information Checklists from Genomic Standards Consortium
  • megX EnvO-Lite Annotation

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Genomes On Line Database (GOLD) An Example Use Case

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Environmental Metadata in GOLD

Isolation Site a Blade of grass from Raritan River NJ USA Mapping into EnvO-Lite

From Genomes On Line Database (GOLD) D346ndashD354 Nucleic Acids Research 2010 Vol 38 Database issue Published online 13 November 2009

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

The Plan

BioCreative IV (Bethesda October 2014) - Discussion of text mining needs of metagenomics

community Metagenomics Advisory Group (ongoing teleconfs)

- Soliciting datasets and organizers for a metagenomics task for BioCreative V (2015)

Biocuration Conference (Toronto April 2014) - Update on Metagenomics task for BioCreative

BioCreative V (Spain 2015) - Task for text mining for metagenomics

12

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

BioCreative Metagenomics Advisory Group

Jim Cole Michigan State George Garrity Names for Life and Michigan State Folker Meyer Argonne National Lab Nikos Kyrpides Joint Genome Institute Evangelos Pafilis Hellenic Centre for Marine

Research Lynn Schriml U Maryland Medical School Tatiana Tatusova NCBI

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Acknowledgements National Science Foundation for support of

BioCreative1 National Science Foundation for support of MITRErsquos

earlier activities on Mining Metadata for Metagenomics2

Department of Energy for conference grant for the metagenomics and text mining work3

14

1NSF Grant DBI-0850319 2NSF Grants IIS-0746650 and IIS-0844419 3Office of Science (BER) of the US Dept of Energy This material is based upon work supported by the Department of Energy under Award Number DE-SC0010838 Disclaimer This presentation was prepared as an account of work sponsored by an agency of the United States Government Neither the United States Government nor any agency thereof nor any of their employees makes any warranty express of implied or assumes any legal liability or responsibility for the accuracy completeness or usefulness of any information apparatus product or process disclosed or represents that its use would not infringe privately owned rights Reference herein to any specific commercial product process or service by trade name trademark manufacturer or otherwise does not necessarily constitute or imply its endorsement recommendation or favoring by the United States Government or any agency thereof The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Back Up

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

MegX (Marine Ecological GenomiX) (Pre-MIGSMIMSMIENS)

copy 2007 The MITRE Corporation ALL RIGHTS RESERVED

Information in Full Text Full text article Methods section1 Further details for all methods used in this study are provided in Supplementary Information O algarvensis specimens were collected off Capo di Sant Andrea Elba Italy Supplementary Material (pdf) Juvenile and adult Olavius algarvensis specimens were collected in May and September 2004 from 56 m water depth in silicate sediments around sea grass beds of Posidonia oceanica in a bay off Capo di Santrsquo Andrea Elba Italy (42deg4826N 010deg0828E)

1Symbiosis insights through metagenomic analysis of a microbial consortium Woyke T et al Nature 443 950-955 (26 October 2006) doi101038nature05192 2httpwwwnaturecomnaturejournalv443n7114extrefnature05192-s1pdf

copy 2007 The MITRE Corporation ALL RIGHTS RESERVED

Metadata in Reference

Faiz O Colak A Saglam N Canakccedili S Belduumlz AO

J Biochem Mol Biol 2007 Jul 3140(4)588-94

Information scattered throughout article probably in reference

More specific information given in the conclusion

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Critical Assessment of Information Extraction in Biology BioCreative 2004-05 27 teams - Organizers MITRE CNB (now CNIO1) NCBI - Curators GOA (Camon Lee Apweiler) BioCreative II 2006-07 44 teams - Organizers CNIO MITRE NCBI - Curators MINT IntAct BioCreative II5 2008-2009 15 teams - Organizers CNIO MINT Elsevier MITRE BioCreative III 2009-2010 23 teams - Organizers U Delaware NCBI CTD2 CNIO MITRE Colorado BioCreative IV 2013 24 teams - Organizers U Delaware NCBI CNIO MITRE CTD Colorado BioCreative V 2015 planning underway BioCreative IIIIV is funded by NSF grant DBI-0850319

1Spanish National Cancer Center 2Comparative Toxicogenomics Database

Presenter
Presentation Notes
CTD = Comparative Toxicogenomics Database

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Minimum Information Checklists from Genomic Standards Consortium

Yilmaz et al Nat Biotechnol 2011 May29(5)415-20 doi 101038nbt1823

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

megX EnvO-Lite Annotation

httpwwwmegxnethabitatshabitatshtml

  • Text Mining for MetagenomicsA New BioCreative Task
  • Metagenomics and Metadata
  • Text Mining and Metadata Capture
  • Why BioCreative
  • What Metadata to Capture
  • Constructing a BioCreative Task
  • Defining a Terminology EnvO andEnvO-Lite (aka Habitat-Lite)
  • Training Data megx ExperimentData Tagged with EnvO-Lite Classes
  • Training Data ENVIRONMENTS Project(Pafilis Hellenic Centre for Marine Research
  • Genomes On Line Database (GOLD)An Example Use Case
  • Environmental Metadata in GOLD
  • The Plan
  • BioCreative Metagenomics Advisory Group
  • Acknowledgements
  • Back Up
  • MegX (Marine Ecological GenomiX)(Pre-MIGSMIMSMIENS)
  • Information in Full Text
  • Metadata in Reference
  • Critical Assessment of Information Extraction in Biology
  • Minimum Information Checklists from Genomic Standards Consortium
  • megX EnvO-Lite Annotation

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Environmental Metadata in GOLD

Isolation Site a Blade of grass from Raritan River NJ USA Mapping into EnvO-Lite

From Genomes On Line Database (GOLD) D346ndashD354 Nucleic Acids Research 2010 Vol 38 Database issue Published online 13 November 2009

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

The Plan

BioCreative IV (Bethesda October 2014) - Discussion of text mining needs of metagenomics

community Metagenomics Advisory Group (ongoing teleconfs)

- Soliciting datasets and organizers for a metagenomics task for BioCreative V (2015)

Biocuration Conference (Toronto April 2014) - Update on Metagenomics task for BioCreative

BioCreative V (Spain 2015) - Task for text mining for metagenomics

12

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

BioCreative Metagenomics Advisory Group

Jim Cole Michigan State George Garrity Names for Life and Michigan State Folker Meyer Argonne National Lab Nikos Kyrpides Joint Genome Institute Evangelos Pafilis Hellenic Centre for Marine

Research Lynn Schriml U Maryland Medical School Tatiana Tatusova NCBI

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Acknowledgements National Science Foundation for support of

BioCreative1 National Science Foundation for support of MITRErsquos

earlier activities on Mining Metadata for Metagenomics2

Department of Energy for conference grant for the metagenomics and text mining work3

14

1NSF Grant DBI-0850319 2NSF Grants IIS-0746650 and IIS-0844419 3Office of Science (BER) of the US Dept of Energy This material is based upon work supported by the Department of Energy under Award Number DE-SC0010838 Disclaimer This presentation was prepared as an account of work sponsored by an agency of the United States Government Neither the United States Government nor any agency thereof nor any of their employees makes any warranty express of implied or assumes any legal liability or responsibility for the accuracy completeness or usefulness of any information apparatus product or process disclosed or represents that its use would not infringe privately owned rights Reference herein to any specific commercial product process or service by trade name trademark manufacturer or otherwise does not necessarily constitute or imply its endorsement recommendation or favoring by the United States Government or any agency thereof The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Back Up

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

MegX (Marine Ecological GenomiX) (Pre-MIGSMIMSMIENS)

copy 2007 The MITRE Corporation ALL RIGHTS RESERVED

Information in Full Text Full text article Methods section1 Further details for all methods used in this study are provided in Supplementary Information O algarvensis specimens were collected off Capo di Sant Andrea Elba Italy Supplementary Material (pdf) Juvenile and adult Olavius algarvensis specimens were collected in May and September 2004 from 56 m water depth in silicate sediments around sea grass beds of Posidonia oceanica in a bay off Capo di Santrsquo Andrea Elba Italy (42deg4826N 010deg0828E)

1Symbiosis insights through metagenomic analysis of a microbial consortium Woyke T et al Nature 443 950-955 (26 October 2006) doi101038nature05192 2httpwwwnaturecomnaturejournalv443n7114extrefnature05192-s1pdf

copy 2007 The MITRE Corporation ALL RIGHTS RESERVED

Metadata in Reference

Faiz O Colak A Saglam N Canakccedili S Belduumlz AO

J Biochem Mol Biol 2007 Jul 3140(4)588-94

Information scattered throughout article probably in reference

More specific information given in the conclusion

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Critical Assessment of Information Extraction in Biology BioCreative 2004-05 27 teams - Organizers MITRE CNB (now CNIO1) NCBI - Curators GOA (Camon Lee Apweiler) BioCreative II 2006-07 44 teams - Organizers CNIO MITRE NCBI - Curators MINT IntAct BioCreative II5 2008-2009 15 teams - Organizers CNIO MINT Elsevier MITRE BioCreative III 2009-2010 23 teams - Organizers U Delaware NCBI CTD2 CNIO MITRE Colorado BioCreative IV 2013 24 teams - Organizers U Delaware NCBI CNIO MITRE CTD Colorado BioCreative V 2015 planning underway BioCreative IIIIV is funded by NSF grant DBI-0850319

1Spanish National Cancer Center 2Comparative Toxicogenomics Database

Presenter
Presentation Notes
CTD = Comparative Toxicogenomics Database

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Minimum Information Checklists from Genomic Standards Consortium

Yilmaz et al Nat Biotechnol 2011 May29(5)415-20 doi 101038nbt1823

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

megX EnvO-Lite Annotation

httpwwwmegxnethabitatshabitatshtml

  • Text Mining for MetagenomicsA New BioCreative Task
  • Metagenomics and Metadata
  • Text Mining and Metadata Capture
  • Why BioCreative
  • What Metadata to Capture
  • Constructing a BioCreative Task
  • Defining a Terminology EnvO andEnvO-Lite (aka Habitat-Lite)
  • Training Data megx ExperimentData Tagged with EnvO-Lite Classes
  • Training Data ENVIRONMENTS Project(Pafilis Hellenic Centre for Marine Research
  • Genomes On Line Database (GOLD)An Example Use Case
  • Environmental Metadata in GOLD
  • The Plan
  • BioCreative Metagenomics Advisory Group
  • Acknowledgements
  • Back Up
  • MegX (Marine Ecological GenomiX)(Pre-MIGSMIMSMIENS)
  • Information in Full Text
  • Metadata in Reference
  • Critical Assessment of Information Extraction in Biology
  • Minimum Information Checklists from Genomic Standards Consortium
  • megX EnvO-Lite Annotation

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

The Plan

BioCreative IV (Bethesda October 2014) - Discussion of text mining needs of metagenomics

community Metagenomics Advisory Group (ongoing teleconfs)

- Soliciting datasets and organizers for a metagenomics task for BioCreative V (2015)

Biocuration Conference (Toronto April 2014) - Update on Metagenomics task for BioCreative

BioCreative V (Spain 2015) - Task for text mining for metagenomics

12

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

BioCreative Metagenomics Advisory Group

Jim Cole Michigan State George Garrity Names for Life and Michigan State Folker Meyer Argonne National Lab Nikos Kyrpides Joint Genome Institute Evangelos Pafilis Hellenic Centre for Marine

Research Lynn Schriml U Maryland Medical School Tatiana Tatusova NCBI

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Acknowledgements National Science Foundation for support of

BioCreative1 National Science Foundation for support of MITRErsquos

earlier activities on Mining Metadata for Metagenomics2

Department of Energy for conference grant for the metagenomics and text mining work3

14

1NSF Grant DBI-0850319 2NSF Grants IIS-0746650 and IIS-0844419 3Office of Science (BER) of the US Dept of Energy This material is based upon work supported by the Department of Energy under Award Number DE-SC0010838 Disclaimer This presentation was prepared as an account of work sponsored by an agency of the United States Government Neither the United States Government nor any agency thereof nor any of their employees makes any warranty express of implied or assumes any legal liability or responsibility for the accuracy completeness or usefulness of any information apparatus product or process disclosed or represents that its use would not infringe privately owned rights Reference herein to any specific commercial product process or service by trade name trademark manufacturer or otherwise does not necessarily constitute or imply its endorsement recommendation or favoring by the United States Government or any agency thereof The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Back Up

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

MegX (Marine Ecological GenomiX) (Pre-MIGSMIMSMIENS)

copy 2007 The MITRE Corporation ALL RIGHTS RESERVED

Information in Full Text Full text article Methods section1 Further details for all methods used in this study are provided in Supplementary Information O algarvensis specimens were collected off Capo di Sant Andrea Elba Italy Supplementary Material (pdf) Juvenile and adult Olavius algarvensis specimens were collected in May and September 2004 from 56 m water depth in silicate sediments around sea grass beds of Posidonia oceanica in a bay off Capo di Santrsquo Andrea Elba Italy (42deg4826N 010deg0828E)

1Symbiosis insights through metagenomic analysis of a microbial consortium Woyke T et al Nature 443 950-955 (26 October 2006) doi101038nature05192 2httpwwwnaturecomnaturejournalv443n7114extrefnature05192-s1pdf

copy 2007 The MITRE Corporation ALL RIGHTS RESERVED

Metadata in Reference

Faiz O Colak A Saglam N Canakccedili S Belduumlz AO

J Biochem Mol Biol 2007 Jul 3140(4)588-94

Information scattered throughout article probably in reference

More specific information given in the conclusion

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Critical Assessment of Information Extraction in Biology BioCreative 2004-05 27 teams - Organizers MITRE CNB (now CNIO1) NCBI - Curators GOA (Camon Lee Apweiler) BioCreative II 2006-07 44 teams - Organizers CNIO MITRE NCBI - Curators MINT IntAct BioCreative II5 2008-2009 15 teams - Organizers CNIO MINT Elsevier MITRE BioCreative III 2009-2010 23 teams - Organizers U Delaware NCBI CTD2 CNIO MITRE Colorado BioCreative IV 2013 24 teams - Organizers U Delaware NCBI CNIO MITRE CTD Colorado BioCreative V 2015 planning underway BioCreative IIIIV is funded by NSF grant DBI-0850319

1Spanish National Cancer Center 2Comparative Toxicogenomics Database

Presenter
Presentation Notes
CTD = Comparative Toxicogenomics Database

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Minimum Information Checklists from Genomic Standards Consortium

Yilmaz et al Nat Biotechnol 2011 May29(5)415-20 doi 101038nbt1823

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

megX EnvO-Lite Annotation

httpwwwmegxnethabitatshabitatshtml

  • Text Mining for MetagenomicsA New BioCreative Task
  • Metagenomics and Metadata
  • Text Mining and Metadata Capture
  • Why BioCreative
  • What Metadata to Capture
  • Constructing a BioCreative Task
  • Defining a Terminology EnvO andEnvO-Lite (aka Habitat-Lite)
  • Training Data megx ExperimentData Tagged with EnvO-Lite Classes
  • Training Data ENVIRONMENTS Project(Pafilis Hellenic Centre for Marine Research
  • Genomes On Line Database (GOLD)An Example Use Case
  • Environmental Metadata in GOLD
  • The Plan
  • BioCreative Metagenomics Advisory Group
  • Acknowledgements
  • Back Up
  • MegX (Marine Ecological GenomiX)(Pre-MIGSMIMSMIENS)
  • Information in Full Text
  • Metadata in Reference
  • Critical Assessment of Information Extraction in Biology
  • Minimum Information Checklists from Genomic Standards Consortium
  • megX EnvO-Lite Annotation

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

BioCreative Metagenomics Advisory Group

Jim Cole Michigan State George Garrity Names for Life and Michigan State Folker Meyer Argonne National Lab Nikos Kyrpides Joint Genome Institute Evangelos Pafilis Hellenic Centre for Marine

Research Lynn Schriml U Maryland Medical School Tatiana Tatusova NCBI

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Acknowledgements National Science Foundation for support of

BioCreative1 National Science Foundation for support of MITRErsquos

earlier activities on Mining Metadata for Metagenomics2

Department of Energy for conference grant for the metagenomics and text mining work3

14

1NSF Grant DBI-0850319 2NSF Grants IIS-0746650 and IIS-0844419 3Office of Science (BER) of the US Dept of Energy This material is based upon work supported by the Department of Energy under Award Number DE-SC0010838 Disclaimer This presentation was prepared as an account of work sponsored by an agency of the United States Government Neither the United States Government nor any agency thereof nor any of their employees makes any warranty express of implied or assumes any legal liability or responsibility for the accuracy completeness or usefulness of any information apparatus product or process disclosed or represents that its use would not infringe privately owned rights Reference herein to any specific commercial product process or service by trade name trademark manufacturer or otherwise does not necessarily constitute or imply its endorsement recommendation or favoring by the United States Government or any agency thereof The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Back Up

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

MegX (Marine Ecological GenomiX) (Pre-MIGSMIMSMIENS)

copy 2007 The MITRE Corporation ALL RIGHTS RESERVED

Information in Full Text Full text article Methods section1 Further details for all methods used in this study are provided in Supplementary Information O algarvensis specimens were collected off Capo di Sant Andrea Elba Italy Supplementary Material (pdf) Juvenile and adult Olavius algarvensis specimens were collected in May and September 2004 from 56 m water depth in silicate sediments around sea grass beds of Posidonia oceanica in a bay off Capo di Santrsquo Andrea Elba Italy (42deg4826N 010deg0828E)

1Symbiosis insights through metagenomic analysis of a microbial consortium Woyke T et al Nature 443 950-955 (26 October 2006) doi101038nature05192 2httpwwwnaturecomnaturejournalv443n7114extrefnature05192-s1pdf

copy 2007 The MITRE Corporation ALL RIGHTS RESERVED

Metadata in Reference

Faiz O Colak A Saglam N Canakccedili S Belduumlz AO

J Biochem Mol Biol 2007 Jul 3140(4)588-94

Information scattered throughout article probably in reference

More specific information given in the conclusion

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Critical Assessment of Information Extraction in Biology BioCreative 2004-05 27 teams - Organizers MITRE CNB (now CNIO1) NCBI - Curators GOA (Camon Lee Apweiler) BioCreative II 2006-07 44 teams - Organizers CNIO MITRE NCBI - Curators MINT IntAct BioCreative II5 2008-2009 15 teams - Organizers CNIO MINT Elsevier MITRE BioCreative III 2009-2010 23 teams - Organizers U Delaware NCBI CTD2 CNIO MITRE Colorado BioCreative IV 2013 24 teams - Organizers U Delaware NCBI CNIO MITRE CTD Colorado BioCreative V 2015 planning underway BioCreative IIIIV is funded by NSF grant DBI-0850319

1Spanish National Cancer Center 2Comparative Toxicogenomics Database

Presenter
Presentation Notes
CTD = Comparative Toxicogenomics Database

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Minimum Information Checklists from Genomic Standards Consortium

Yilmaz et al Nat Biotechnol 2011 May29(5)415-20 doi 101038nbt1823

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

megX EnvO-Lite Annotation

httpwwwmegxnethabitatshabitatshtml

  • Text Mining for MetagenomicsA New BioCreative Task
  • Metagenomics and Metadata
  • Text Mining and Metadata Capture
  • Why BioCreative
  • What Metadata to Capture
  • Constructing a BioCreative Task
  • Defining a Terminology EnvO andEnvO-Lite (aka Habitat-Lite)
  • Training Data megx ExperimentData Tagged with EnvO-Lite Classes
  • Training Data ENVIRONMENTS Project(Pafilis Hellenic Centre for Marine Research
  • Genomes On Line Database (GOLD)An Example Use Case
  • Environmental Metadata in GOLD
  • The Plan
  • BioCreative Metagenomics Advisory Group
  • Acknowledgements
  • Back Up
  • MegX (Marine Ecological GenomiX)(Pre-MIGSMIMSMIENS)
  • Information in Full Text
  • Metadata in Reference
  • Critical Assessment of Information Extraction in Biology
  • Minimum Information Checklists from Genomic Standards Consortium
  • megX EnvO-Lite Annotation

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Acknowledgements National Science Foundation for support of

BioCreative1 National Science Foundation for support of MITRErsquos

earlier activities on Mining Metadata for Metagenomics2

Department of Energy for conference grant for the metagenomics and text mining work3

14

1NSF Grant DBI-0850319 2NSF Grants IIS-0746650 and IIS-0844419 3Office of Science (BER) of the US Dept of Energy This material is based upon work supported by the Department of Energy under Award Number DE-SC0010838 Disclaimer This presentation was prepared as an account of work sponsored by an agency of the United States Government Neither the United States Government nor any agency thereof nor any of their employees makes any warranty express of implied or assumes any legal liability or responsibility for the accuracy completeness or usefulness of any information apparatus product or process disclosed or represents that its use would not infringe privately owned rights Reference herein to any specific commercial product process or service by trade name trademark manufacturer or otherwise does not necessarily constitute or imply its endorsement recommendation or favoring by the United States Government or any agency thereof The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Back Up

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

MegX (Marine Ecological GenomiX) (Pre-MIGSMIMSMIENS)

copy 2007 The MITRE Corporation ALL RIGHTS RESERVED

Information in Full Text Full text article Methods section1 Further details for all methods used in this study are provided in Supplementary Information O algarvensis specimens were collected off Capo di Sant Andrea Elba Italy Supplementary Material (pdf) Juvenile and adult Olavius algarvensis specimens were collected in May and September 2004 from 56 m water depth in silicate sediments around sea grass beds of Posidonia oceanica in a bay off Capo di Santrsquo Andrea Elba Italy (42deg4826N 010deg0828E)

1Symbiosis insights through metagenomic analysis of a microbial consortium Woyke T et al Nature 443 950-955 (26 October 2006) doi101038nature05192 2httpwwwnaturecomnaturejournalv443n7114extrefnature05192-s1pdf

copy 2007 The MITRE Corporation ALL RIGHTS RESERVED

Metadata in Reference

Faiz O Colak A Saglam N Canakccedili S Belduumlz AO

J Biochem Mol Biol 2007 Jul 3140(4)588-94

Information scattered throughout article probably in reference

More specific information given in the conclusion

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Critical Assessment of Information Extraction in Biology BioCreative 2004-05 27 teams - Organizers MITRE CNB (now CNIO1) NCBI - Curators GOA (Camon Lee Apweiler) BioCreative II 2006-07 44 teams - Organizers CNIO MITRE NCBI - Curators MINT IntAct BioCreative II5 2008-2009 15 teams - Organizers CNIO MINT Elsevier MITRE BioCreative III 2009-2010 23 teams - Organizers U Delaware NCBI CTD2 CNIO MITRE Colorado BioCreative IV 2013 24 teams - Organizers U Delaware NCBI CNIO MITRE CTD Colorado BioCreative V 2015 planning underway BioCreative IIIIV is funded by NSF grant DBI-0850319

1Spanish National Cancer Center 2Comparative Toxicogenomics Database

Presenter
Presentation Notes
CTD = Comparative Toxicogenomics Database

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Minimum Information Checklists from Genomic Standards Consortium

Yilmaz et al Nat Biotechnol 2011 May29(5)415-20 doi 101038nbt1823

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

megX EnvO-Lite Annotation

httpwwwmegxnethabitatshabitatshtml

  • Text Mining for MetagenomicsA New BioCreative Task
  • Metagenomics and Metadata
  • Text Mining and Metadata Capture
  • Why BioCreative
  • What Metadata to Capture
  • Constructing a BioCreative Task
  • Defining a Terminology EnvO andEnvO-Lite (aka Habitat-Lite)
  • Training Data megx ExperimentData Tagged with EnvO-Lite Classes
  • Training Data ENVIRONMENTS Project(Pafilis Hellenic Centre for Marine Research
  • Genomes On Line Database (GOLD)An Example Use Case
  • Environmental Metadata in GOLD
  • The Plan
  • BioCreative Metagenomics Advisory Group
  • Acknowledgements
  • Back Up
  • MegX (Marine Ecological GenomiX)(Pre-MIGSMIMSMIENS)
  • Information in Full Text
  • Metadata in Reference
  • Critical Assessment of Information Extraction in Biology
  • Minimum Information Checklists from Genomic Standards Consortium
  • megX EnvO-Lite Annotation

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Back Up

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

MegX (Marine Ecological GenomiX) (Pre-MIGSMIMSMIENS)

copy 2007 The MITRE Corporation ALL RIGHTS RESERVED

Information in Full Text Full text article Methods section1 Further details for all methods used in this study are provided in Supplementary Information O algarvensis specimens were collected off Capo di Sant Andrea Elba Italy Supplementary Material (pdf) Juvenile and adult Olavius algarvensis specimens were collected in May and September 2004 from 56 m water depth in silicate sediments around sea grass beds of Posidonia oceanica in a bay off Capo di Santrsquo Andrea Elba Italy (42deg4826N 010deg0828E)

1Symbiosis insights through metagenomic analysis of a microbial consortium Woyke T et al Nature 443 950-955 (26 October 2006) doi101038nature05192 2httpwwwnaturecomnaturejournalv443n7114extrefnature05192-s1pdf

copy 2007 The MITRE Corporation ALL RIGHTS RESERVED

Metadata in Reference

Faiz O Colak A Saglam N Canakccedili S Belduumlz AO

J Biochem Mol Biol 2007 Jul 3140(4)588-94

Information scattered throughout article probably in reference

More specific information given in the conclusion

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Critical Assessment of Information Extraction in Biology BioCreative 2004-05 27 teams - Organizers MITRE CNB (now CNIO1) NCBI - Curators GOA (Camon Lee Apweiler) BioCreative II 2006-07 44 teams - Organizers CNIO MITRE NCBI - Curators MINT IntAct BioCreative II5 2008-2009 15 teams - Organizers CNIO MINT Elsevier MITRE BioCreative III 2009-2010 23 teams - Organizers U Delaware NCBI CTD2 CNIO MITRE Colorado BioCreative IV 2013 24 teams - Organizers U Delaware NCBI CNIO MITRE CTD Colorado BioCreative V 2015 planning underway BioCreative IIIIV is funded by NSF grant DBI-0850319

1Spanish National Cancer Center 2Comparative Toxicogenomics Database

Presenter
Presentation Notes
CTD = Comparative Toxicogenomics Database

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Minimum Information Checklists from Genomic Standards Consortium

Yilmaz et al Nat Biotechnol 2011 May29(5)415-20 doi 101038nbt1823

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

megX EnvO-Lite Annotation

httpwwwmegxnethabitatshabitatshtml

  • Text Mining for MetagenomicsA New BioCreative Task
  • Metagenomics and Metadata
  • Text Mining and Metadata Capture
  • Why BioCreative
  • What Metadata to Capture
  • Constructing a BioCreative Task
  • Defining a Terminology EnvO andEnvO-Lite (aka Habitat-Lite)
  • Training Data megx ExperimentData Tagged with EnvO-Lite Classes
  • Training Data ENVIRONMENTS Project(Pafilis Hellenic Centre for Marine Research
  • Genomes On Line Database (GOLD)An Example Use Case
  • Environmental Metadata in GOLD
  • The Plan
  • BioCreative Metagenomics Advisory Group
  • Acknowledgements
  • Back Up
  • MegX (Marine Ecological GenomiX)(Pre-MIGSMIMSMIENS)
  • Information in Full Text
  • Metadata in Reference
  • Critical Assessment of Information Extraction in Biology
  • Minimum Information Checklists from Genomic Standards Consortium
  • megX EnvO-Lite Annotation

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

MegX (Marine Ecological GenomiX) (Pre-MIGSMIMSMIENS)

copy 2007 The MITRE Corporation ALL RIGHTS RESERVED

Information in Full Text Full text article Methods section1 Further details for all methods used in this study are provided in Supplementary Information O algarvensis specimens were collected off Capo di Sant Andrea Elba Italy Supplementary Material (pdf) Juvenile and adult Olavius algarvensis specimens were collected in May and September 2004 from 56 m water depth in silicate sediments around sea grass beds of Posidonia oceanica in a bay off Capo di Santrsquo Andrea Elba Italy (42deg4826N 010deg0828E)

1Symbiosis insights through metagenomic analysis of a microbial consortium Woyke T et al Nature 443 950-955 (26 October 2006) doi101038nature05192 2httpwwwnaturecomnaturejournalv443n7114extrefnature05192-s1pdf

copy 2007 The MITRE Corporation ALL RIGHTS RESERVED

Metadata in Reference

Faiz O Colak A Saglam N Canakccedili S Belduumlz AO

J Biochem Mol Biol 2007 Jul 3140(4)588-94

Information scattered throughout article probably in reference

More specific information given in the conclusion

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Critical Assessment of Information Extraction in Biology BioCreative 2004-05 27 teams - Organizers MITRE CNB (now CNIO1) NCBI - Curators GOA (Camon Lee Apweiler) BioCreative II 2006-07 44 teams - Organizers CNIO MITRE NCBI - Curators MINT IntAct BioCreative II5 2008-2009 15 teams - Organizers CNIO MINT Elsevier MITRE BioCreative III 2009-2010 23 teams - Organizers U Delaware NCBI CTD2 CNIO MITRE Colorado BioCreative IV 2013 24 teams - Organizers U Delaware NCBI CNIO MITRE CTD Colorado BioCreative V 2015 planning underway BioCreative IIIIV is funded by NSF grant DBI-0850319

1Spanish National Cancer Center 2Comparative Toxicogenomics Database

Presenter
Presentation Notes
CTD = Comparative Toxicogenomics Database

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Minimum Information Checklists from Genomic Standards Consortium

Yilmaz et al Nat Biotechnol 2011 May29(5)415-20 doi 101038nbt1823

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

megX EnvO-Lite Annotation

httpwwwmegxnethabitatshabitatshtml

  • Text Mining for MetagenomicsA New BioCreative Task
  • Metagenomics and Metadata
  • Text Mining and Metadata Capture
  • Why BioCreative
  • What Metadata to Capture
  • Constructing a BioCreative Task
  • Defining a Terminology EnvO andEnvO-Lite (aka Habitat-Lite)
  • Training Data megx ExperimentData Tagged with EnvO-Lite Classes
  • Training Data ENVIRONMENTS Project(Pafilis Hellenic Centre for Marine Research
  • Genomes On Line Database (GOLD)An Example Use Case
  • Environmental Metadata in GOLD
  • The Plan
  • BioCreative Metagenomics Advisory Group
  • Acknowledgements
  • Back Up
  • MegX (Marine Ecological GenomiX)(Pre-MIGSMIMSMIENS)
  • Information in Full Text
  • Metadata in Reference
  • Critical Assessment of Information Extraction in Biology
  • Minimum Information Checklists from Genomic Standards Consortium
  • megX EnvO-Lite Annotation

copy 2007 The MITRE Corporation ALL RIGHTS RESERVED

Information in Full Text Full text article Methods section1 Further details for all methods used in this study are provided in Supplementary Information O algarvensis specimens were collected off Capo di Sant Andrea Elba Italy Supplementary Material (pdf) Juvenile and adult Olavius algarvensis specimens were collected in May and September 2004 from 56 m water depth in silicate sediments around sea grass beds of Posidonia oceanica in a bay off Capo di Santrsquo Andrea Elba Italy (42deg4826N 010deg0828E)

1Symbiosis insights through metagenomic analysis of a microbial consortium Woyke T et al Nature 443 950-955 (26 October 2006) doi101038nature05192 2httpwwwnaturecomnaturejournalv443n7114extrefnature05192-s1pdf

copy 2007 The MITRE Corporation ALL RIGHTS RESERVED

Metadata in Reference

Faiz O Colak A Saglam N Canakccedili S Belduumlz AO

J Biochem Mol Biol 2007 Jul 3140(4)588-94

Information scattered throughout article probably in reference

More specific information given in the conclusion

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Critical Assessment of Information Extraction in Biology BioCreative 2004-05 27 teams - Organizers MITRE CNB (now CNIO1) NCBI - Curators GOA (Camon Lee Apweiler) BioCreative II 2006-07 44 teams - Organizers CNIO MITRE NCBI - Curators MINT IntAct BioCreative II5 2008-2009 15 teams - Organizers CNIO MINT Elsevier MITRE BioCreative III 2009-2010 23 teams - Organizers U Delaware NCBI CTD2 CNIO MITRE Colorado BioCreative IV 2013 24 teams - Organizers U Delaware NCBI CNIO MITRE CTD Colorado BioCreative V 2015 planning underway BioCreative IIIIV is funded by NSF grant DBI-0850319

1Spanish National Cancer Center 2Comparative Toxicogenomics Database

Presenter
Presentation Notes
CTD = Comparative Toxicogenomics Database

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Minimum Information Checklists from Genomic Standards Consortium

Yilmaz et al Nat Biotechnol 2011 May29(5)415-20 doi 101038nbt1823

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

megX EnvO-Lite Annotation

httpwwwmegxnethabitatshabitatshtml

  • Text Mining for MetagenomicsA New BioCreative Task
  • Metagenomics and Metadata
  • Text Mining and Metadata Capture
  • Why BioCreative
  • What Metadata to Capture
  • Constructing a BioCreative Task
  • Defining a Terminology EnvO andEnvO-Lite (aka Habitat-Lite)
  • Training Data megx ExperimentData Tagged with EnvO-Lite Classes
  • Training Data ENVIRONMENTS Project(Pafilis Hellenic Centre for Marine Research
  • Genomes On Line Database (GOLD)An Example Use Case
  • Environmental Metadata in GOLD
  • The Plan
  • BioCreative Metagenomics Advisory Group
  • Acknowledgements
  • Back Up
  • MegX (Marine Ecological GenomiX)(Pre-MIGSMIMSMIENS)
  • Information in Full Text
  • Metadata in Reference
  • Critical Assessment of Information Extraction in Biology
  • Minimum Information Checklists from Genomic Standards Consortium
  • megX EnvO-Lite Annotation

copy 2007 The MITRE Corporation ALL RIGHTS RESERVED

Metadata in Reference

Faiz O Colak A Saglam N Canakccedili S Belduumlz AO

J Biochem Mol Biol 2007 Jul 3140(4)588-94

Information scattered throughout article probably in reference

More specific information given in the conclusion

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Critical Assessment of Information Extraction in Biology BioCreative 2004-05 27 teams - Organizers MITRE CNB (now CNIO1) NCBI - Curators GOA (Camon Lee Apweiler) BioCreative II 2006-07 44 teams - Organizers CNIO MITRE NCBI - Curators MINT IntAct BioCreative II5 2008-2009 15 teams - Organizers CNIO MINT Elsevier MITRE BioCreative III 2009-2010 23 teams - Organizers U Delaware NCBI CTD2 CNIO MITRE Colorado BioCreative IV 2013 24 teams - Organizers U Delaware NCBI CNIO MITRE CTD Colorado BioCreative V 2015 planning underway BioCreative IIIIV is funded by NSF grant DBI-0850319

1Spanish National Cancer Center 2Comparative Toxicogenomics Database

Presenter
Presentation Notes
CTD = Comparative Toxicogenomics Database

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Minimum Information Checklists from Genomic Standards Consortium

Yilmaz et al Nat Biotechnol 2011 May29(5)415-20 doi 101038nbt1823

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

megX EnvO-Lite Annotation

httpwwwmegxnethabitatshabitatshtml

  • Text Mining for MetagenomicsA New BioCreative Task
  • Metagenomics and Metadata
  • Text Mining and Metadata Capture
  • Why BioCreative
  • What Metadata to Capture
  • Constructing a BioCreative Task
  • Defining a Terminology EnvO andEnvO-Lite (aka Habitat-Lite)
  • Training Data megx ExperimentData Tagged with EnvO-Lite Classes
  • Training Data ENVIRONMENTS Project(Pafilis Hellenic Centre for Marine Research
  • Genomes On Line Database (GOLD)An Example Use Case
  • Environmental Metadata in GOLD
  • The Plan
  • BioCreative Metagenomics Advisory Group
  • Acknowledgements
  • Back Up
  • MegX (Marine Ecological GenomiX)(Pre-MIGSMIMSMIENS)
  • Information in Full Text
  • Metadata in Reference
  • Critical Assessment of Information Extraction in Biology
  • Minimum Information Checklists from Genomic Standards Consortium
  • megX EnvO-Lite Annotation

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Critical Assessment of Information Extraction in Biology BioCreative 2004-05 27 teams - Organizers MITRE CNB (now CNIO1) NCBI - Curators GOA (Camon Lee Apweiler) BioCreative II 2006-07 44 teams - Organizers CNIO MITRE NCBI - Curators MINT IntAct BioCreative II5 2008-2009 15 teams - Organizers CNIO MINT Elsevier MITRE BioCreative III 2009-2010 23 teams - Organizers U Delaware NCBI CTD2 CNIO MITRE Colorado BioCreative IV 2013 24 teams - Organizers U Delaware NCBI CNIO MITRE CTD Colorado BioCreative V 2015 planning underway BioCreative IIIIV is funded by NSF grant DBI-0850319

1Spanish National Cancer Center 2Comparative Toxicogenomics Database

Presenter
Presentation Notes
CTD = Comparative Toxicogenomics Database

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Minimum Information Checklists from Genomic Standards Consortium

Yilmaz et al Nat Biotechnol 2011 May29(5)415-20 doi 101038nbt1823

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

megX EnvO-Lite Annotation

httpwwwmegxnethabitatshabitatshtml

  • Text Mining for MetagenomicsA New BioCreative Task
  • Metagenomics and Metadata
  • Text Mining and Metadata Capture
  • Why BioCreative
  • What Metadata to Capture
  • Constructing a BioCreative Task
  • Defining a Terminology EnvO andEnvO-Lite (aka Habitat-Lite)
  • Training Data megx ExperimentData Tagged with EnvO-Lite Classes
  • Training Data ENVIRONMENTS Project(Pafilis Hellenic Centre for Marine Research
  • Genomes On Line Database (GOLD)An Example Use Case
  • Environmental Metadata in GOLD
  • The Plan
  • BioCreative Metagenomics Advisory Group
  • Acknowledgements
  • Back Up
  • MegX (Marine Ecological GenomiX)(Pre-MIGSMIMSMIENS)
  • Information in Full Text
  • Metadata in Reference
  • Critical Assessment of Information Extraction in Biology
  • Minimum Information Checklists from Genomic Standards Consortium
  • megX EnvO-Lite Annotation

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

Minimum Information Checklists from Genomic Standards Consortium

Yilmaz et al Nat Biotechnol 2011 May29(5)415-20 doi 101038nbt1823

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

megX EnvO-Lite Annotation

httpwwwmegxnethabitatshabitatshtml

  • Text Mining for MetagenomicsA New BioCreative Task
  • Metagenomics and Metadata
  • Text Mining and Metadata Capture
  • Why BioCreative
  • What Metadata to Capture
  • Constructing a BioCreative Task
  • Defining a Terminology EnvO andEnvO-Lite (aka Habitat-Lite)
  • Training Data megx ExperimentData Tagged with EnvO-Lite Classes
  • Training Data ENVIRONMENTS Project(Pafilis Hellenic Centre for Marine Research
  • Genomes On Line Database (GOLD)An Example Use Case
  • Environmental Metadata in GOLD
  • The Plan
  • BioCreative Metagenomics Advisory Group
  • Acknowledgements
  • Back Up
  • MegX (Marine Ecological GenomiX)(Pre-MIGSMIMSMIENS)
  • Information in Full Text
  • Metadata in Reference
  • Critical Assessment of Information Extraction in Biology
  • Minimum Information Checklists from Genomic Standards Consortium
  • megX EnvO-Lite Annotation

copy 2014 The MITRE Corporation ALL RIGHTS RESERVED

megX EnvO-Lite Annotation

httpwwwmegxnethabitatshabitatshtml

  • Text Mining for MetagenomicsA New BioCreative Task
  • Metagenomics and Metadata
  • Text Mining and Metadata Capture
  • Why BioCreative
  • What Metadata to Capture
  • Constructing a BioCreative Task
  • Defining a Terminology EnvO andEnvO-Lite (aka Habitat-Lite)
  • Training Data megx ExperimentData Tagged with EnvO-Lite Classes
  • Training Data ENVIRONMENTS Project(Pafilis Hellenic Centre for Marine Research
  • Genomes On Line Database (GOLD)An Example Use Case
  • Environmental Metadata in GOLD
  • The Plan
  • BioCreative Metagenomics Advisory Group
  • Acknowledgements
  • Back Up
  • MegX (Marine Ecological GenomiX)(Pre-MIGSMIMSMIENS)
  • Information in Full Text
  • Metadata in Reference
  • Critical Assessment of Information Extraction in Biology
  • Minimum Information Checklists from Genomic Standards Consortium
  • megX EnvO-Lite Annotation