biocuration 2013 - fiona brinkman - from genes, to genomes to networks, with community aided...

35
From genes, to genomes to networks, with community aided cura5on Fiona Brinkman Simon Fraser University Biocura4on conference April 2013

Upload: fionabrinkman

Post on 14-Jul-2015

449 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Biocuration 2013 - Fiona Brinkman - From genes, to genomes to networks, with community aided curation

From  genes,  to  genomes  to  networks,    with  community  aided  cura5on  

Fiona  Brinkman  Simon  Fraser  University  

Biocura4on  conference  April  2013  

Page 2: Biocuration 2013 - Fiona Brinkman - From genes, to genomes to networks, with community aided curation

From  genes,  to  genomes  to  networks,    with  community  aided  cura5on  

with  a  li?le  help  from  my  friends  …  

Page 3: Biocuration 2013 - Fiona Brinkman - From genes, to genomes to networks, with community aided curation

3

Targe4ng  major  players  resul4ng  in  infec4ous  disease:  

o   Pathogen  virulence     ID  an4-­‐infec4ves  (don’t  kill  the  pathogen,  disarm  them)  

o   Host  immune  system  failure/over-­‐ac4vity    Immune  modulators  that  dampen  damaging  inflamma4on  and  boost  “good”  immune  response    

o   Changes  in  environment/social  factors     Integra4ng  pathogen  genome  data  with  environment,  microbiome,  and  social  network  data     Be?er  iden4fy  source/cause  of  disease  outbreaks  

My  Primary  Research  Interest  

Developing  more  sustainable  approaches  for  infec:ous  disease  control  …using  novel  computa:onal  tools,  integrated  data  and  interdisciplinary  approaches  

Page 4: Biocuration 2013 - Fiona Brinkman - From genes, to genomes to networks, with community aided curation

4

o   Pathogen  virulence     PSORTb  –  Protein  localiza4on  analysis  (ID  cell  surface/secreted  drug  targets)       IslandViewer  –  Genomic  island  analysis,  pathogen-­‐associated  genes       Ortholuge  DB  –  Precomputed  assessments  of  bacterial  orthologs     Genera-­‐specific  DBs  like  Pseudomonas  Genome  Database  

o   Host  immune  system  failure/over-­‐ac5vity    InnateDB  –  Human/Mouse  interactome  +  curated  innate  immunity-­‐associated  interac4ons  

o   Changes  in  environment/social  factors     Metagenomics  projects     Integrated  Rapid  Infec4ous  Disease  Analysis  Pipeline  (IRIDA)  

Some  of  our  labs  tools…  

Page 5: Biocuration 2013 - Fiona Brinkman - From genes, to genomes to networks, with community aided curation

5

o   Pathogen  virulence     PSORTb  –  Protein  localiza4on  analysis  (ID  cell  surface/secreted  drug  targets)       IslandViewer  –  Genomic  island  analysis,  pathogen-­‐associated  genes       Ortholuge  DB  –  Precomputed  assessments  of  bacterial  orthologs     Genera-­‐specific  DBs  like  Pseudomonas  Genome  Database  

o   Host  immune  system  failure/over-­‐ac5vity    InnateDB  –  Human/Mouse  interactome  +  curated  innate  immunity-­‐associated  interac4ons  

o   Changes  in  environment/social  factors     Metagenomics  projects     Integrated  Rapid  Infec4ous  Disease  Analysis  Pipeline  (IRIDA)  

Some  of  our  labs  tools…  

Page 6: Biocuration 2013 - Fiona Brinkman - From genes, to genomes to networks, with community aided curation

6

High  quality  analyses  are  only  as  good  as  the  robust  data,    effec:ve  data  organiza:on  and    accurate  analysis  methods  used.      

Want  high  accuracy    –  usually  erring  on  the  side  of  high  precision  at  the  expense  of  recall.    

To  a?ain  high  accuracy,    biocura4on  is  oben  KEY  

Research  Philosophy  

Robust data

Data organization

Accurate analysis methods

The Nexus

Page 7: Biocuration 2013 - Fiona Brinkman - From genes, to genomes to networks, with community aided curation

7

                                                             Overview  

•   Community-­‐based    Community-­‐aided  gene/genome  annota4on  •     1997  –  present:  Pseudomonas  Genome  Project  and  PseudoCAP  

                             (Pseudomonas  Community  Annota4on  Project)  

•   Community-­‐aided    Mul4ple  community-­‐aided  contextual  cura4on      of  molecular  interac4ons  

•   2006  –  present:  InnateDB  project  

•   What  we’re  doing  next…  

•   Funding  it  all!  

Page 8: Biocuration 2013 - Fiona Brinkman - From genes, to genomes to networks, with community aided curation

8

Pseudomonas  Community  Annota5on  Project  

Goals  

Cri4cal  and  conserva4ve  genome  annota4on  Minimize  project  costs    Capitalize  on  large  Pseudomonas  aeruginosa  research  community  

Solu:on  

Community-­‐based,  Internet-­‐based  approach    for  (con4nually  updated)  genome  annota4on  

“Crowdsourcing”  in  the  90’s!  

Page 9: Biocuration 2013 - Fiona Brinkman - From genes, to genomes to networks, with community aided curation

9

Pseudomonas  Community  Annota5on  Project  

Ini:al  PseudoCAP  leading  to  genome  publica:on  (1997  –  2000)  

61  researchers  from  13  countries,  1741  annota4ons    

                                                                                             Focus  on  conserva4ve  annota4on  

                                                                                             Need  to  capture  researcher’s  excellent,  diverse            biol                                                                  biological  knowledge,  NOT  their  diverse                                                                                    ways  of  annota4ng!  

Page 10: Biocuration 2013 - Fiona Brinkman - From genes, to genomes to networks, with community aided curation

10

Pseudomonas  Community  Annota5on  Project  

Ini:al  PseudoCAP  leading  to  genome  publica:on  (1997  –  2000)  

Ini4al  1741  community-­‐based  annota4ons…    Annota4ons  incorporated  by  3  annotators  through  web-­‐based  tool    1st  fully  internet-­‐based  community  annota4on  effort    

Page 11: Biocuration 2013 - Fiona Brinkman - From genes, to genomes to networks, with community aided curation

11

Pseudomonas  Community  Annota5on  Project  

Current  PseudoCAP  –  con:nually  updated  annota:on  (2000  –  present)  

151  researchers,  2356  curated  gene  annota4ons    (not  incl.  computa4onal  analyses)  

Movement  from  gene-­‐based    genes  plus  other  genome  features  (2,590  other  genome  features  added  in  the  last  year  alone)  

Found  we  needed  to  further  modify  our  community-­‐based  approach…  

Winsor et al 2011 PMID: 20929876 Winsor et al 2005 PMID: 15608211

Page 12: Biocuration 2013 - Fiona Brinkman - From genes, to genomes to networks, with community aided curation

12

Pseudomonas  Community  Annota5on  Project  

Current  PseudoCAP  –  con:nually  updated  annota:on  (2000  –  present)  

Annota4ons  incorporated  by  one  part  4me  project  coordinator  Subject  to  review  process  (peer  reviewed  paper  or  other  peer  review)  

Increasing  movement  from  Community-­‐based    Community-­‐aided  -­‐  Coordinator  contacts  researchers  more  to  get  input    -­‐   Capitalize  on  exper4se  most  efficiently    -­‐   Coordinator  ensures  consistency  

                       Coordinator  and  community                            collec4vely  ensures  quality  

Page 13: Biocuration 2013 - Fiona Brinkman - From genes, to genomes to networks, with community aided curation

13

Pseudomonas  Community  Annota5on  Project  

Challenges  and  Solu:ons  

-­‐   Disputes  between  researchers  regarding  an  annota4on    -­‐  Go  with  first  published  and  have  alternate  annota4ons  

-­‐   Researchers  are  busy!    -­‐   Keep  submission  system/input  process  simple!  -­‐   We  now  contact  them  more  than  they  contact  us  -­‐   Have  rounds  of  major  annota4on  pushes  

Future:  Will  try  again  the    “paper  carrot”  for  another    annota4on  push  –    authorship  on  a  NAR  update    paper  (as  a  consor4um)    to  encourage  par4cipa4on  

Page 14: Biocuration 2013 - Fiona Brinkman - From genes, to genomes to networks, with community aided curation

14

InnateDB:    Cura5ng  molecular  interac5ons,  networks  

Community-­‐aided    “Mul4ple  community-­‐aided”  Highly  contextual  annota4on  

Page 15: Biocuration 2013 - Fiona Brinkman - From genes, to genomes to networks, with community aided curation

Mouse Model Datasets:

Cerebral Malaria mouse model (IMR, Australia)

Tuberculosis mouse model (AECM)

Shigella xenograft model (Pasteur)

Human Clinical Datasets:

Typhoid & Malaria Vietnam (OUCRU/Stanford/

Sanger)

Non Typhoidal Salmonella Malawi (Sanger)

Chronic/Acute Helminth Ecuador (USF de Quito/

Sanger)

Dengue (OUCRU) Modulating innate immune response via

Host Defense peptides (Hancock lab, UBC)

Mouse KOs (Sanger)

+

InnateDB  Developed  to  Aid  Two  Large  Interna4onal    Systems  Biology  Projects  

Novel insight into host response and mechanism of peptides. Common Pathways, networks and transcriptional regulation.

Thompson et al PNAS December 2009

Page 16: Biocuration 2013 - Fiona Brinkman - From genes, to genomes to networks, with community aided curation

Systems Biology & The Innate Immune Response:

  Many layers of complexity.

  Layers of regulation: transcriptional; post-transcriptional (miRNAs); post-translational (ubiquitination, phosphorylation)

  Host-pathogen interactions

  100s – 1000s DE genes

  Not simple pathways - networks of molecular interactions

Gardy*, Lynn*, Brinkman, Hancock (2009). Enabling a systems biology approach to immunology: focus on innate immunity. Trends in Immunology PMID: 19428301

Page 17: Biocuration 2013 - Fiona Brinkman - From genes, to genomes to networks, with community aided curation

Breuer et al., 2013 InnateDB: systems biology of innate immunity and beyond… NAR (DB issue) PMID: 23180781

Page 18: Biocuration 2013 - Fiona Brinkman - From genes, to genomes to networks, with community aided curation

Manual Curation of Interaction Data From Literature to Database Greatly Enhances Coverage of Innate Immunity Interactome

INNATEDB CURATED INTERACTOME

INTERACTIONS ALSO CURATED BY TOP 5 OTHER

INTERACTION DATABASES:

BIND, INTACT, DIP, BIOGRID & MINT

INTERACTIONS ONLY CURATED

BY INNATEDB

Lynn et al., Curating the Innate Immunity Interactome. BMC Systems Biology 2010 PMID: 20727158

Page 19: Biocuration 2013 - Fiona Brinkman - From genes, to genomes to networks, with community aided curation

Manual  Cura4on  of  Interac4on  Data  From  Literature  to  Database  –  Enhancing  coverage  of  Innate  Immunity  Interactome  

Breuer et al., 2013 InnateDB: systems biology of innate immunity and beyond… Nucleic Acids Research (Database issue)

The  InnateDB  curated  interactome  in  July  2012.  Red  edges  represent  interac4ons  that  have  been  added  in  2011  and  2012.  

Page 20: Biocuration 2013 - Fiona Brinkman - From genes, to genomes to networks, with community aided curation

Contextually Curating Innate Immunity-Relevant Interactions

Annotated fields include:

Molecule type; organism; biological role; interaction detection method; the host system (in vitro, in vivo, ex vivo); host organism; interaction type; cell, cell-line and tissue types; cell status (primary/cell line); experimental role; participant identification method and sub-cellular localization, plus variety of additional curator comments.

Page 21: Biocuration 2013 - Fiona Brinkman - From genes, to genomes to networks, with community aided curation

Curating Innate Immunity-Relevant Interactions

  71% human, 22% mouse, 7% human- mouse

  ~80% interactions in innate immunity interactome not annotated by other major databases

  Protein (69%), DNA and RNA interactions

  Developed InnateDB submission system software to allow submission of interaction annotation in an OBO ontology-controlled and MIMIx & PSI-MI 2.5 compliant manner.

Lynn et al., Curating the Innate Immunity Interactome. BMC Systems Biology 2010 PMID: 20727158

Page 22: Biocuration 2013 - Fiona Brinkman - From genes, to genomes to networks, with community aided curation

Which journals are curated?

  >4,400 journal articles curated to date

  Don’t focus on specific journals - relevant articles curated if meet appropriate quality standards for the interaction evidence.

  Indeed, at least one protein has been curated from >200 different journals.

  More than 70% of curated articles have come from 20 journals.

  Note many journals in top 20 are not “immunology journals”, underscoring importance of not limiting curation efforts to journals perceived as “relevant”.

Page 23: Biocuration 2013 - Fiona Brinkman - From genes, to genomes to networks, with community aided curation

Curating Innate Immunity-Relevant Interactions – 4-pronged approach

  Curation primarily pathway-centric   systematically review all literature describing interactions for a particular innate

immunity pathway.   Curate all other interactors regardless of whether the interacting molecule is a member

of the pathway or has any known role in innate immunity expands network outside of known innate immunity players.

  Systematically curated pathways are scheduled for frequent re-curation as the field is moving quickly.

  Also, new publications on innate immunity assessed on a daily basis to identify novel interactions of interest.   Priority given to the most recent publications incorporates new information on the

most current research

  Immunology Community-aided:   Curators consult with researchers to confirm unclear literature data   Most common issue: Unclear what species the protein/DNA/RNA interactors come

from

  Curation Community-aided:   InnateDB curators review each others curations as an error check   IMEx consortium!

http://www.innatedb.com/doc/InnateDB_2010_curation_guide.pdf

Page 24: Biocuration 2013 - Fiona Brinkman - From genes, to genomes to networks, with community aided curation

•  InnateDB  is  a  member  of  IMEx  –  an  interna4onal  consor4um  of  interac4on  databases  involved  in  cura4on  

•  Goal:  Develop  common  standards,  avoid  too  much  redundancy  in  data  collec4on/cura4on,  central  registry,  single  search  interface  

•  Orchard  et  al  Nature  Methods  9:345-­‐350  PMID:  22453911  

•  Stay  tuned  for  Sandra  Orchard's  talk!    

Page 25: Biocuration 2013 - Fiona Brinkman - From genes, to genomes to networks, with community aided curation

Going Beyond Innate Immunity – An Integrative Biology Resource

  >196,000 human and mouse interactions extracted & loaded from BIND, INTACT, DIP, BIOGRID & MINT DBs

  Cross-referenced genes to >3,000 pathways from KEGG, PID, BIOCARTA, INOH, NetPath & Reactome DBs   Visualize/analyze interactions

associated with specific pathway   Pathway over-representation analysis

  Ensembl annotation provides details of all human & mouse genes/transcripts/ proteins. UniProt, Entrez, Gene Ontology, etc rich protein & gene annotation

  Transcript. factor–DNA interactions experimentally confirmed from Transfac, TransCompel

  Robust orthology & gene synteny analysis facilitate human-mouse comparisons

Page 26: Biocuration 2013 - Fiona Brinkman - From genes, to genomes to networks, with community aided curation

InnateDB  –  Advanced  Yet  User-­‐Friendly  Searching  –  Find  &  Analyze  Relevant  Interac4ons,  Pathways  &  Genes/Proteins.  

Page 27: Biocuration 2013 - Fiona Brinkman - From genes, to genomes to networks, with community aided curation

InnateDB  –  Facilita4ng  Systems-­‐Level  Analyses  of  Gene  Expression  Data  

Upload Your Own Gene Expression Data - Up to 10 conditions/timepoints at 1 time.

Overlay Gene Expression Data from Multiple Conditions on

Networks/Pathways

Pathway, Gene Ontology & TF ORA tools Find – DE Pathways/Functionally Related

Genes/TFs

Go Beyond Pathway Analysis – Differentially Expressed Sub-networks – New Pathways? How Are DE Genes Actually Inter-connected? Central Regulators

(Network Hubs)

Page 28: Biocuration 2013 - Fiona Brinkman - From genes, to genomes to networks, with community aided curation

InnateDB and curated data aided study of an immune modulator – host-directed adjunctive therapy coupled with anti-malarial

Page 29: Biocuration 2013 - Fiona Brinkman - From genes, to genomes to networks, with community aided curation

29

What  we’re  doing  next…  Need  to  develop  more  ontologies  and  data  standards  to  integrate  microbial  genomic  data  from  a  disease  outbreak  with  epidemiological  data.    

Cura4ng  pathogen  status  for  complete  microbial  genomes  

Will  try  the  “paper  carrot”  again  for  next  Pseudomonas  Genome  Database  cura4on  project    

InnateDB  –  expanding  to    Allergy  and  Asthma

Page 30: Biocuration 2013 - Fiona Brinkman - From genes, to genomes to networks, with community aided curation

  Iden4fy  genes  unique  to/shared  between  strains,  species,  genera,  any  selected  bacteria….  

30

Page 31: Biocuration 2013 - Fiona Brinkman - From genes, to genomes to networks, with community aided curation

31

Funding!    Grants!  

One  of  the  biggest  challenges  is  to    secure  long  term,  reliable  funding.    

We've  found:      

Need  to  target  cura4on  to  specific  bio  projects.    (ie  innate  immunity,  then  to  allergy  and  asthma;    aiding  a  specific  Pseudomonas  analysis)    

Limits  what  we  can  do,  but  good  in  the  sense  that    cura4on  benefits  are  more  quickly  felt  as  they  are  needed/used  by  others  

Page 32: Biocuration 2013 - Fiona Brinkman - From genes, to genomes to networks, with community aided curation

32

Concluding  comments  

Using  community-­‐aided,  expert  curator-­‐centered,  approach  for  balancing  consistency,  reliability  and  maximizing  knowledge.  Degree    of  community  involvement  depends  on  nature  of  data.    

Capitalize  on  both  bio  community  and  cura4on  community  –  keep  linked  

Researchers  are  busy!  Make  it  super  easy  for  them  to  provide  input.  A  li?le  contribu4on  can  go  a  long  way  

Paper  carrots!  

Link  cura4on  to  bio  research  to  secure  funding  

Indoctrinate  young  minds!  Get  biocura4on    and  its  challenges  into  undergrad  curriculums    

Page 33: Biocuration 2013 - Fiona Brinkman - From genes, to genomes to networks, with community aided curation

Acknowledgements - InnateDB

  InnateDB Principle Investigators:   Fiona Brinkman (SFU)   Bob Hancock (UBC)   David Lynn (Teagasc)

  InnateDB Development:   Karin Breuer   Geoff Winsor   Matthew Laird   Calvin Chan   Amir Foroushani   Brian Meredith   Nathan Lawless   Nicolas Richard   Avinash Chikatamarla   Fiona Roche   Timothy Chan   Naisha Shah   Michael Acab

  InnateDB Curation: www.innatedb.com

  Raymond Lo   Anastasia Sribnaia   Carol Chan   Misbah Naseer   Melissa Yau   Giselle Ring   Kathleen Wee   Jaimmie Que

  Cerebral network visualizer:

  Aaron Barsky   Jennifer Gardy   Tamara Munzner

  FNIH/GCGH Collaborators:

  Gordon Dougan (Sanger)   Fernanda Schreiber (Sanger)   Melita Gordon (U. Liverpool)   Bill Jacobs (AECM)   Dee Dao (AECM)   Philip Cooper (St. Georges)   Louis Schofield (WEHI)   Sandra Pilat (WEHI)   Sarah Dunstan (OUCRU)   Brett Finlay (UBC)

Page 34: Biocuration 2013 - Fiona Brinkman - From genes, to genomes to networks, with community aided curation

Acknowledgements  –  PseudoCAP  

Geoff  Winsor  Ray  Lo  Ma?  Laird  Bhav  Dhillon  Ma?hew  Whiteside  

151  PseudoCAP  par4cipants  

www.pseudomonas.com  

Page 35: Biocuration 2013 - Fiona Brinkman - From genes, to genomes to networks, with community aided curation