national microbial pathogen data resource connecting bioinformatics to the bench leslie klis mcneil...

28
National Microbial Pathogen Data Resource Connecting Bioinformatics to the Bench Leslie Klis McNeil NCSA, University of Illinois, Urbana

Upload: sherilyn-mcdaniel

Post on 04-Jan-2016

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: National Microbial Pathogen Data Resource Connecting Bioinformatics to the Bench Leslie Klis McNeil NCSA, University of Illinois, Urbana

National Microbial Pathogen Data Resource

Connecting Bioinformatics to the Bench

Leslie Klis McNeilNCSA, University of Illinois, Urbana

Page 2: National Microbial Pathogen Data Resource Connecting Bioinformatics to the Bench Leslie Klis McNeil NCSA, University of Illinois, Urbana

www.nmpdr.org

NMPDR is a BRC• NIAID Bioinformatic Resource Centers

common goals different focus organisms

• Provide annotations and tools to develop diagnostics and therepeutics against Priority Pathogens

• NMPDR core organisms, all category B: Campylobacter jejuni Listeria monocytogenes Staphylococcus aureus Strepcococcus pyogenes and pneumoniae Vibrio cholerae, vulnificus, parahaemolyticus

Page 3: National Microbial Pathogen Data Resource Connecting Bioinformatics to the Bench Leslie Klis McNeil NCSA, University of Illinois, Urbana

www.nmpdr.org

Sister BRCs focus on other priority pathogens

• Unified port of entry at

• Eight BRCs curate viruses, protozoa, and bacteria, or insect vectors of disease

Page 4: National Microbial Pathogen Data Resource Connecting Bioinformatics to the Bench Leslie Klis McNeil NCSA, University of Illinois, Urbana

www.nmpdr.org

Who is NMPDR• Fellowship for Interpretation of Genomes

Primary software developers Curators who do manual annotation

• Computation Institute at University of Chicago Software developers Hardware managers

• Argonne National Laboratory Software developers

• NCSA University of Illinois at Urbana Education, outreach, training

Page 5: National Microbial Pathogen Data Resource Connecting Bioinformatics to the Bench Leslie Klis McNeil NCSA, University of Illinois, Urbana

www.nmpdr.org

What is NMPDR• Genome database with value added

Manual annotation in context of systems biology Comparative analysis tools

• Bidirectional Best Hits—select and align• Functional clusters—genes with conserved proximity• Compare regions—adjust size of region, number of genomes• Pinned regions—phylogenetic comparison with all genomes• Signature genes—find genes in common or that distinguish user-

selected groups of genomes; groups may contain one or many

Essential genes page Drug target discovery and in silico screening Organism pages with phenotype information

Page 6: National Microbial Pathogen Data Resource Connecting Bioinformatics to the Bench Leslie Klis McNeil NCSA, University of Illinois, Urbana

www.nmpdr.org

Pathogen-specific gateways to data

Page 7: National Microbial Pathogen Data Resource Connecting Bioinformatics to the Bench Leslie Klis McNeil NCSA, University of Illinois, Urbana

www.nmpdr.org

Outreach services in the user interface• User forum links to iLabs with Inquiry Units for

teaching and training

• PathInfo—VBI’s PIML project, info about General info and strain descriptions Lab handling and safety Epidemiology

• Journals button opens most recent, relevant ASM articles

• Google news—RSS feed of popular press

• Links to resources such as strain collections

Page 8: National Microbial Pathogen Data Resource Connecting Bioinformatics to the Bench Leslie Klis McNeil NCSA, University of Illinois, Urbana

www.nmpdr.org

Annotation Status Table• Immediate access to genes whose functions

are known with some degree of certainty Named genes in subsystems Named genes not in subsystems Hypothetical genes in subsystems

• Gateway to genes about which nothing is known Hypothetical genes not in subsystems

• List of genes with links to NMPDR analysis tools• Exploration in comparative framework first step to

formulating working hypotheses about functions

Page 9: National Microbial Pathogen Data Resource Connecting Bioinformatics to the Bench Leslie Klis McNeil NCSA, University of Illinois, Urbana

www.nmpdr.org

Pathways to Data• Start with keyword search for name of gene or protein• Start with sequence of your gene or protein and blast against

any complete genome• Start by browsing an organism of interest

View lists of proteins with/without functional names; included/not in biological subsystem. Choose one from the list to investigate with comparative tools.

• Start from subsystems tree to view the phylogenetic distribution of an interesting biological process

• Start from essential genes page to view essential genes in model organisms and to project essentiality to closely or distantly related organisms

• Start from virtual structural proteomes to investigate proteins about which structural information is available in PDB

Page 10: National Microbial Pathogen Data Resource Connecting Bioinformatics to the Bench Leslie Klis McNeil NCSA, University of Illinois, Urbana

www.nmpdr.org

Subsystems approach to genome annotation• Subsystems annotation provides researchers with corrected

functional annotations in a structured biological context • Consistency across genomes achieved by vertical annotation

of functions rather than horizontal focus on single genomes• More than 500 distinct subsystems have been developed

Metabolic pathways Complex structures Genotype – phenotype associations

• Subsystems integrate genomic and functional contexts of genes in metabolic reconstructions or populated subsystem spreadsheets

• Metabolic reconstructions summarize all subsystems in a given genome

• Populated subsystems compare all genomes in a given subsystem

Page 11: National Microbial Pathogen Data Resource Connecting Bioinformatics to the Bench Leslie Klis McNeil NCSA, University of Illinois, Urbana

www.nmpdr.org

What is a Subsystem?• Subsystem is a generalization of pathway

Collection of functional roles jointly involved in a biological process or complex

• metabolic, signaling, regulatory, structural

• Functional Role is the abstract biological function of a gene product Atomic or fundamental; examples:

• 6-phosphofructokinase (EC 2.7.1.11)• LSU ribosomal protein L31p• cell division protein FtsZ

Page 12: National Microbial Pathogen Data Resource Connecting Bioinformatics to the Bench Leslie Klis McNeil NCSA, University of Illinois, Urbana

www.nmpdr.org

Expert-Defined Subsystems• Curator is researcher with first-hand

knowledge of biological system

• Functional roles defined and grouped into subsystem and subsets by curator universal groups of roles include all organisms functional variants are subsets of roles found in

a limited number of organisms• often represent alternative paths

Page 13: National Microbial Pathogen Data Resource Connecting Bioinformatics to the Bench Leslie Klis McNeil NCSA, University of Illinois, Urbana

www.nmpdr.org

Populated Subsystems• Two-dimensional integration of functional

roles with genomes universal groups of roles include all organisms functional variants are subsets of roles found in

a limited number of organisms

• Spreadsheet Columns of functional roles Rows of organisms Cells of annotated genes

• Table of functional roles with GO terms• Diagram

Page 14: National Microbial Pathogen Data Resource Connecting Bioinformatics to the Bench Leslie Klis McNeil NCSA, University of Illinois, Urbana

www.nmpdr.org

Simple Example:Histidine Degradation Subsystem

1 HutH Histidine ammonia-lyase (EC 4.3.1.3)

2 HutU Urocanate hydratase (EC 4.2.1.49)

3 HutI Imidazolonepropionase (EC 3.5.2.7)4 GluF Glutamate formiminotransferase (EC 2.1.2.5)

5 HutG Formiminoglutamase (EC 3.5.3.8)

6 NfoD N-formylglutamate deformylase (EC 3.5.1.68)

7 ForI Formiminoglutamic iminohydrolase (EC 3.5.3.13)

Subsystem: Histidine Degradation

• Conversion of histidine to glutamate is organizing principle

• Functional roles defined in table:

Page 15: National Microbial Pathogen Data Resource Connecting Bioinformatics to the Bench Leslie Klis McNeil NCSA, University of Illinois, Urbana

www.nmpdr.org

Subsystem Diagram• Three functional variants

• Universal subset has three roles, followed by three alternative paths from IV to VI

www.nmpdr.org

ForI

H2 O

V NfoD

NH 3

I III HutI IV HutG VI

H2 O H2 O H2 O Formamide

HutH II HutU

NH 3

GluF

Tetrahydrofolate FormiminotetrahydrofolateSubsystem Diagram

Page 16: National Microbial Pathogen Data Resource Connecting Bioinformatics to the Bench Leslie Klis McNeil NCSA, University of Illinois, Urbana

www.nmpdr.org

Subsystem Spreadsheet

• Column headers taken from table of functional roles• Rows are selected genomes, or organisms• Cells are populated with specific, annotated genes• Shared background color indicates proximity of genes• Functional variants defined by the annotated roles• Variant code -1 indicates subsystem is not functional

Organism Variant HutH HutU HutI GluF HutG NfoD ForI

Bacteroides thetaiotaomicron 1 Q8A4B3 Q8A4A9 Q8A4B1 Q8A4B0

Desulfotela psychrophila 1 gi51246205 gi51246204 gi51246203 gi51246202

Halobacterium sp. 2 Q9HQD5 Q9HQD8 Q9HQD6 Q9HQD7

Deinococcus radiodurans 2 Q9RZ06 Q9RZ02 Q9RZ05 Q9RZ04

Bacillus subtilis 2 P10944 P25503 P42084 P42068

Caulobacter crescentus 3 P58082 Q9A9MI P58079 Q9A9M0

Pseudomonas putida 3 Q88CZ7 Q88CZ6 Q88CZ9 Q88D00

Xanthomonas campestris 3 Q8PAA7 P58988 Q8PAA6 Q8PAA8

Listeria monocytogenes -1

Subsystem Spreadsheet

Page 17: National Microbial Pathogen Data Resource Connecting Bioinformatics to the Bench Leslie Klis McNeil NCSA, University of Illinois, Urbana

www.nmpdr.org

Missing Genes Noticed by Subsystems Annotation

• No genes were annotated “ForI (EC 3.5.3.13) Formiminoglutamic iminohydrolase” when the Histidine Degradation subsystem was populated

• Organisms missing ForI convert His to Glu• Candidate genes that could perform the role

“ForI” must be identified• Strategy for finding genes is based on

chromosomal clustering and occurrence profiling

Page 18: National Microbial Pathogen Data Resource Connecting Bioinformatics to the Bench Leslie Klis McNeil NCSA, University of Illinois, Urbana

www.nmpdr.org

Finding Genes that Cluster with NfoD

• Green gene is NfoD of Xanthomonas• Blue genes within 10 kb of NfoD in at least four other species

• finds biggest clusters in other species • fc-sc shows table of homologous pairs in other genomes• displays homologous regions in other genomes

Page 19: National Microbial Pathogen Data Resource Connecting Bioinformatics to the Bench Leslie Klis McNeil NCSA, University of Illinois, Urbana

www.nmpdr.org

What are Pinned Regions?• Focus gene is number 1, colored red• Most frequently co-localized homolog

numbered 2, colored green• Homologous genes presented in the same

color with the same numerical label• Numerical labels correspond to rank

ordered frequency of co-localization with the focus gene Focus gene labeled 1 Gene 17 is homolog 16th most frequently co-

localized with focus gene

Page 20: National Microbial Pathogen Data Resource Connecting Bioinformatics to the Bench Leslie Klis McNeil NCSA, University of Illinois, Urbana

www.nmpdr.org

• Homologous regions around NfoD, red, center

• Same color indicates homology BLAST cutoff 1e-20

• HutH, the first functional role in the subsystem, is green, 2

• Candidate ForI is pink, 4, “conserved hypothetical”

Candidate ForI in Context with NfoD

Page 21: National Microbial Pathogen Data Resource Connecting Bioinformatics to the Bench Leslie Klis McNeil NCSA, University of Illinois, Urbana

www.nmpdr.org

Annotation of ForI EC 3.5.3.13• Metabolic context proves need for role

Organisms missing annotated ForI degrade His to Glu

• Chromosomal context points to candidate Clusters with NfoD and other genes in subsystem

• Occurrence context supports candidate Organisms containing NfoD lack GluF and HutG,

required for functional variants 1 and 2, respectively Organisms containing candidate ForI also contain

NfoD, indicating functional variant 3

• Phylogenetic trees of candidate ForI genes are coherent

Page 22: National Microbial Pathogen Data Resource Connecting Bioinformatics to the Bench Leslie Klis McNeil NCSA, University of Illinois, Urbana

www.nmpdr.org

Conjectures archived in HOPS• Hypotheses and Open Problems identified by

Subsystems HOPS linked from NMPDR’s FAQ

• Subsystems point to missing or alternative genes

• Bioinformatic predictions need to be tested at the bench

• ForI candidate now verified experimentally• Connections forged between bench and

bioinformatics

Page 23: National Microbial Pathogen Data Resource Connecting Bioinformatics to the Bench Leslie Klis McNeil NCSA, University of Illinois, Urbana

www.nmpdr.org

Bioinformatics to Bench• Essential genes page at NMPDR

Click bar to search for essential genes Follow NMPDR link to compare with other genomes

Page 24: National Microbial Pathogen Data Resource Connecting Bioinformatics to the Bench Leslie Klis McNeil NCSA, University of Illinois, Urbana

www.nmpdr.org

Candidate Drug Targets• First-draft table (manually derived) links to

biochemical data in BRENDA or TCDB• Candidate proteins

essential in at least one of the NMPDR pathogens included in subsystems by our curators orthologs in the Protein Data Bank orthologs in a substantial number of bacterial priority

pathogens curated in the BRC system• Second-draft table to be automatically generated

annotations include essential for growth or virulence PDB and pathogen orthologs No good hit in host targets without crystallized orthologs suggested to HTS

project at Argonne National Laboratory

Page 25: National Microbial Pathogen Data Resource Connecting Bioinformatics to the Bench Leslie Klis McNeil NCSA, University of Illinois, Urbana

www.nmpdr.org

NMPDR efforts feed into high-throughput structure project at Argonne

Page 26: National Microbial Pathogen Data Resource Connecting Bioinformatics to the Bench Leslie Klis McNeil NCSA, University of Illinois, Urbana

www.nmpdr.org

In Silico Screening• Targets docked with 10 K random compounds

as training set• Neural network program tracks 9 properties of

compounds to learn characteristics of those that bind and those that do not

• ZINC compound db screened to find 10K likely binders predicted to be ligands

• Targets docked against 10K predicted ligands on BlueGene with Dock5

• Top 1000 docked compounds soon to be linked to NMPDR

Page 27: National Microbial Pathogen Data Resource Connecting Bioinformatics to the Bench Leslie Klis McNeil NCSA, University of Illinois, Urbana

www.nmpdr.org

IBM BlueGene SupercomputerWorld’s fastestSupercomputer280 TeraFLOPS

Page 28: National Microbial Pathogen Data Resource Connecting Bioinformatics to the Bench Leslie Klis McNeil NCSA, University of Illinois, Urbana

www.nmpdr.org

Live Demo of NMPDR• From essential genes, click H.pylori, then click NMPDR for first protein• Show compare regions

Possible to increase/decrease size of region Possible to “walk” chromosome Possible to include more genomes--type in 10 and click resubmit

• Click on the homologous gene 1 in the second genome, Campylobacter• Ask, is this function also essential in Campy,is this a good drug target?• Investigate the campy homolog by using Pins, Compare Regions, find

best clusters (CL)• What is the pathway or biological system that this protein is essential for?

IF not included in a subsystem by NMPDR curators, follow alias link to KEGG• Pathway is lysine biosynthesis—Ask:

Does this protein catalyze the rate-limiting step? Is this the best function in this pathway to target for inhibition by a drug? Does this protein have a close structural/functional homolog in human or

PDB? Use BLAST to find homologs. Is this a broad or narrow spectrum target? Show all homologs using

Bidirectional Best Hits button.