bioinformatics approaches for… teresa k attwood faculty of life sciences & school of computer...

63
Bioinformatics approaches Bioinformatics approaches for… for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester M13 9PT, UK http://www.bioinf.man.ac.uk/dbbrowser/

Upload: sienna-stockton

Post on 13-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

Bioinformatics approaches for…Bioinformatics approaches for…

Teresa K Attwood

Faculty of Life Sciences & School of Computer Science

University of Manchester, Oxford Road

Manchester M13 9PT, UK

http://www.bioinf.man.ac.uk/dbbrowser/

Page 2: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

…….analysing GPCRs…..analysing GPCRs….

Page 3: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

……..whichwhich craft is best? craft is best?

Page 4: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

OverviewOverview

• What are GPCRs?– why they’re interesting & important– why bioinformatics approaches are important

• In silico function prediction – a reality check

• Family-based methods for characterising GPCRs• Understanding the tools

– problems with pair-wise & family-based approaches– estimating (biological) significance

• Seeking deeper functional insights• Conclusions

Page 5: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

GDPGTP

GTP

GTPGTP

What are GPCRs?What are GPCRs?G protein-coupled receptorsG protein-coupled receptors

• A functionally diverse family of cell-surface 7TM proteins • Functional diversity achieved via

– interaction with a variety of ligands – stimulation of various intracellular pathways via coupling to

different G proteins

Page 6: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

Why are GPCRs interesting?Why are GPCRs interesting?Attwood, TK & Flower, DR (2002) Trawling the genome for G protein-coupled receptors: the importance

of integrating bioinformatic approaches. In Drug Design – Cutting Edge Approaches, pp.60-71.

• They are ubiquitous – >800 GPCR genes in the human genome, from 3 major

superfamilies • rhodopsin-, secretin- & metabotropic glutamate receptor-like

• Share almost no sequence similarity– but are united by common 7TM architecture

• Constitute a complex multi-gene family– populated by >50 families & >350 subtypes

Page 7: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

Isn’t just stamp collecting!Isn’t just stamp collecting!Attwood, TK & Flower, DR (2002) Trawling the genome for G protein-coupled receptors: the importance

of integrating bioinformatic approaches. In Drug Design – Cutting Edge Approaches, pp.60-71.

• GPCRs are of profound biomedical importance– targets for >50% of prescription drugs– yield sales >$16 billion/annum

• they’re big business!

• Given their importance, we need to – characterise the ones we know about– identify new ones

• & discover what they do!– e.g., as potential new drug targets

Page 8: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

Why studying GPCRs is difficultWhy studying GPCRs is difficult

• Only 2 crystal structures available– bovine rhodopsin (2000) & human 2-adrenergic receptor (2007)

• Many GPCRs haven’t been characterised experimentally– remain 'orphans’, with unknown ligand specificity

• With >800 human GPCRs, this isn’t much to go on!

Page 9: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

Why use bioinformatics approaches?Why use bioinformatics approaches?

• Computational approaches are important– can be used to help identify, characterise & model novel receptors

• usually by similarity & extrapolation of known characteristics

• Bioinformatics thus offers complementary tools for elucidating the structures & functions of receptors

• But the task is non-trivial– GPCRs exhibit rich relationships & complex molecular interactions

• present many challenges for in silico analysis– in trying to derive meaningful functional insights, traditional methods are

likely to be limited

Page 10: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

γ

Src Grb2Shc Sos

Ras Rap

MAPK

GDP

GTP

GTP

GDP

GTP

GTP

GTP

GPCR

P

PRegulation of geneexpression

Nucleus

PI3Kγ

PLCβPKC

RasGRF

PYK2

MEK

Raf1 B-Raf

RTK

cAMP

EPAC

PKACa2+

biogenicamines

amino acids

ions

lipids

peptidesproteins

lightothers

αi

αq

γβα

αo

αi

βα γ

αs

GPCR

biogenicamines

amino acids

ions

lipids

peptidesproteins

lightothers

We’ve been using biology-unaware search tools to analyse such complex systemsHow far can we truly expect to understand cellular function with such naïve approaches…?

Page 11: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

In silicoIn silico function prediction function prediction…a reality check…a reality check

• What is the function of this structure?

• What is the function of this sequence?

• What is the function of this motif?– the fold provides a scaffold, which can be

decorated in different ways by different sequences to confer different functions - knowing the fold & function allows us to rationalise how the structure effects its function at the molecular level

Page 12: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

“A test case for structural genomics Structure-based assignment of the biochemical function of

hypothetical protein mj0577” (Zarembinski et al., PNAS 95 1998)

Although the structure co-crystallised with ATP, the biochemical function of the protein is unknown

Page 13: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

What's in a sequence?What's in a sequence?

Page 14: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

Full domain alignment methods

Single motif methods

Multiple motif methods

Fuzzy regex (eMOTIF)

Exact regex (PROSITE)

Profiles (Profile Library)

HMMs (Pfam)

Identity matrices (PRINTS)

Weight matrices (Blocks)

Methods for family analysisMethods for family analysisAttwood, TK (2000). The quest to deduce protein function from sequence: the role of pattern databases. Int.J. Biochem. Cell Biol., 32(2), 139–155.

Page 15: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

The challenge of family analysisThe challenge of family analysis

• highly divergent family with single function?• superfamily with many diverse functional families?

– must distinguish if function analysis done in silico– a tough challenge!

Page 16: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

In the beginning was PROSITEIn the beginning was PROSITE

[GSTALIVMYWC]-[GSTANCPDE]-{EDPKRH}-X(2)-[LIVMNQGA]-X(2)-[LIVMFT]-[GSTANC]-LIVMFYWSTAC]-[DENH]-R

TM domain

Page 17: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

Diagnostic limitations of PROSITEDiagnostic limitations of PROSITEID G_PROTEIN_RECEP_F1_1; PATTERN. AC PS00237; DT APR-1990 (CREATED); NOV-1997 (DATA UPDATE); SEP-2004 (INFO UPDATE). DE G-protein coupled receptors family 1 signature. PA [GSTALIVMFYWC]-[GSTANCPDE]-{EDPKRH}-x(2)-[LIVMNQGA]-x(2)-[LIVMFT]- PA [GSTANC]-[LIVMFYWSTAC]-[DENH]-R-[FYWCSH]-x(2)-[LIVM]. NR /RELEASE=44.6,159201; NR /TOTAL=1622(1621); /POSITIVE=1530(1529); /UNKNOWN=0(0); NR /FALSE_POS=92(92); /FALSE_NEG=261; /PARTIAL=61;

• This represents an apparent 22% error rate – the actual rate is probably higher

• Thus, a match to a pattern is not necessarily true – & a mis-match is not necessarily false!

• False-negatives are a fundamental limitation to this type of pattern matching– if you don't know what you're looking for, you'll never know

you missed it!

Page 18: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

Where do motifs (fingerprints) fit in?Where do motifs (fingerprints) fit in?(fingerprints are hierarchical)(fingerprints are hierarchical)

loop regionTM domain TM domain

Page 19: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

Rhodopsin-likeRhodopsin-like superfamily, family superfamily, family & subtype& subtype GPCRs in PRINTSGPCRs in PRINTS

Attwood, TK (2001) A compendium of specific motifs for diagnosing GPCR subtypes. TiPS, 22(4), 162-165.

Page 20: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

Searching PRINTS - FingerPRINTScanSearching PRINTS - FingerPRINTScanScordis, P, Flower, DR & Attwood, TK (1999) FingerPRINTScan: intelligent

searching of the PRINTS motif database. Bioinformatics, 15, 523-524.

• GPCR fingerprints are embedded in PRINTS– allows diagnosis of GPCR mosaics

Page 21: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester
Page 22: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

N CN C

Visualising fingerprintsVisualising fingerprintsAttwood, TK & Findlay, JBC (1993) Design of a discriminating fingerprint

for G-protein-coupled receptors. Protein Eng., 6(2), 167–176.

Page 23: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

Visualising fingerprintsVisualising fingerprintsAttwood, TK & Findlay, JBC (1993) Design of a discriminating fingerprint

for G-protein-coupled receptors. Protein Eng., 6(2), 167–176.

N

C

Page 24: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

Diagnosing partial matchesDiagnosing partial matches

• Missed by PROSITE– wasn’t annotated as a FN

Page 25: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

An integrated approachAn integrated approachMulder, NJ, Apweiler, R, Attwood, TK, Bairoch, A et al. (2007) New developments in InterPro. NAR, 35, D224-8.

• To simplify sequence analysis, the family dbs were integrated within a unified annotation resource – InterPro– initial partners were PRINTS,

PROSITE, profiles & Pfam• now many more partners

– linked to its satellite dbs• but lags behind their coverage

– by Oct 2007, it had 14,768 entries & covered 76% of UnitProtKB

• major role in fly & human genome annotation

Page 26: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

InterPro – method comparisonInterPro – method comparison

Page 27: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

Where has this got us?Where has this got us?

Page 28: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

Understanding the tools Understanding the tools …estimating significance…estimating significance

• How do we know what to believe? • Let’s explore some of the difficulties that arise when

pair-wise search tools (BLAST & FastA) & family-based methods are used naïvely– these examples caution us to think about what the results

actually mean in biological terms.....

Page 29: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

Identifying sequence similarityIdentifying sequence similarity

• GPCRs present many challenges for in silico functional analysis

• Several signature-based methods now available– with different areas of optimum application

• Yet naïve, pair-wise similarity searching has been the mainstay of functional annotation efforts– it allows us to identify/quantify relationships between

sequences

• But quantifying similarity between sequences is not the same as identifying their functions

Page 30: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

Problems with pairwise similarity toolsProblems with pairwise similarity toolsGaulton, A & Attwood, TK (2003) Bioinformatics approaches for the classification of G protein-coupled

receptors. Current Opinion in Pharmacology, 3, 114-120.

• For identifying precise families to which receptors belong & the ligands they bind, pair-wise tools are limited – at what level of seq ID is ligand specificity conserved?

• some GPCRs with 25% ID share a common ligand;

• others, with greater levels, don’t…

• It may be impossible to tell from BLAST if an orphan belongs to a known family (the top hit), or if it will bind a novel ligand – e.g., for the now de-orphaned UR2R, BLAST indicates most

similarity to the type 4 SSRs, yet it is known to bind a different (related) ligand

Page 31: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

When is a GPCR not an SSR?When is a GPCR not an SSR?

Query length: 389 AA Date run: 2002-10-18 09:08:29 UTC+0100 on sib-blast.unil.chTaxon: Homo sapiensDatabase: XXswissprot

120,412 sequences; 45,523,583 total letters SWISS-PROT Release 40.29 of 10-Oct-2002

Db AC Description Score E-value sp Q9UKP6 Q9UKP6 Orphan receptor [Homo sapiens... 782 0.0sp P31391 SSR4_HUMAN Somatostatin receptor type 4 (SS4R) [SSTR4]... 167 3e-41sp O43603 GALS_HUMAN Galanin receptor type 2 (GAL2-R) (GALR2) [G... 147 4e-35sp P30872 SSR1_HUMAN Somatostatin receptor type 1 (SS1R) (SRIF-2... 144 3e-34sp P32745 SSR3_HUMAN Somatostatin receptor type 3 (SS3R) (SSR-28... 140 3e-33sp P35346 SSR5_HUMAN Somatostatin receptor type 3 (SS5R) (SSTR5)... 140 6e-33sp P30874 SPLICE ISOFORM B of P30874 [SSTR2] [Homo sapiens... 134 3e-31sp P30874 SSR2_HUMAN Somatostatin receptor type 2 (SS2R) (SRIF-1... 134 3e-31sp P48145 GPR7_HUMAN Neuropeptides B/W receptor type 1 (G protei... 133 7e-31sp O60755 GALT_HUMAN Galanin receptor type 3 (GAL3-R) (GALR3) [G... 132 2e-30sp P41143 OPRD_HUMAN Delta-type opioid receptor (DOR-1) [OPRD1] ... 128 2e-29sp P35372 SPLICE ISOFORM 1A of P35372 [OPRM1] [Homo sapien... 125 1e-28sp P35372 OPRM_HUMAN Mu-type opioid receptor (MOR-1) [OPRM1] [Ho... 125 1e-28

Page 32: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

When is a GPCR not an SSR?…when it’s a UR2R…when it’s a UR2R

Query length: 389 AA Date run: 2002-10-18 09:08:29 UTC+0100 on sib-blast.unil.chTaxon: Homo sapiensDatabase: XXswissprot

120,412 sequences; 45,523,583 total letters SWISS-PROT Release 40.29 of 10-Oct-2002

Db AC Description Score E-value sp Q9UKP6 UR2R_HUMAN Urotensin II receptor (UR-II-R) [GPR14] [Ho... 782 0.0sp P31391 SSR4_HUMAN Somatostatin receptor type 4 (SS4R) [SSTR4]... 167 3e-41sp O43603 GALS_HUMAN Galanin receptor type 2 (GAL2-R) (GALR2) [G... 147 4e-35sp P30872 SSR1_HUMAN Somatostatin receptor type 1 (SS1R) (SRIF-2... 144 3e-34sp P32745 SSR3_HUMAN Somatostatin receptor type 3 (SS3R) (SSR-28... 140 3e-33sp P35346 SSR5_HUMAN Somatostatin receptor type 3 (SS5R) (SSTR5)... 140 6e-33sp P30874 SPLICE ISOFORM B of P30874 [SSTR2] [Homo sapiens... 134 3e-31sp P30874 SSR2_HUMAN Somatostatin receptor type 2 (SS2R) (SRIF-1... 134 3e-31sp P48145 GPR7_HUMAN Neuropeptides B/W receptor type 1 (G protei... 133 7e-31sp O60755 GALT_HUMAN Galanin receptor type 3 (GAL3-R) (GALR3) [G... 132 2e-30sp P41143 OPRD_HUMAN Delta-type opioid receptor (DOR-1) [OPRD1] ... 128 2e-29sp P35372 SPLICE ISOFORM 1A of P35372 [OPRM1] [Homo sapien... 125 1e-28sp P35372 OPRM_HUMAN Mu-type opioid receptor (MOR-1) [OPRM1] [Ho... 125 1e-28

Page 33: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

Residue Number

%ID

UR2R_HUMAN vs SOMATOSTANRUR2R_HUMAN vs UROTENSIN2R

1 3801 380

7

6

5

4

3

2

1

9

8

7

6

5

4

3

2

11

2

3

4

5

6

7

8

9

Page 34: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

The trouble with top hitsThe trouble with top hits

• The most statistically significant hit is not always the most biologically relevant

• Yet many rule-based ‘expert systems’ still rely on top BLAST or FastA hits to make their diagnoses

• BLAST/FastA ‘see’ generic similarity & not the often-subtle differences that constitute the functional determinants between closely-related receptor families & subtypes

• Failure to appreciate this fundamental point has generated numerous annotation errors in our databases

Page 35: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

-opioid receptor -opioid receptor-opioid receptor true

Misleading annotation via FastAMisleading annotation via FastA

Page 36: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

• As we’ve seen, it’s tempting to use top hits from BLAST or FastA results to classify unknown proteins– but this may lead us (& especially computer programs) to false

functional conclusions

• PSI-BLAST is more sensitive than BLAST, because it creates a profile from hits above a given threshold– but this too can cause problems– let’s take a closer look

Misleading results from BLASTMisleading results from BLAST

Page 37: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester
Page 38: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

So, is UL78 a GPCR?So, is UL78 a GPCR?& if so, what sort?& if so, what sort?

Page 39: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

What What PSI-PSI-BLAST BLAST saidsaid(profile dilution (profile dilution in action)in action)

*

*

*

Page 40: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

What GeneQuiz said…What GeneQuiz said…a thrombin receptora thrombin receptor

Page 41: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

What GeneQuiz said later…What GeneQuiz said later…

Page 42: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

Overview of resultsOverview of resultspair-wise & family-based methodspair-wise & family-based methods

Page 43: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

What is UL78?What is UL78?

Tool No hit Poor hit Significant hitBLAST GPCRs in list

PSI-BLAST thrombin receptor; chemokine & opioid receptors

PROSITE profile GPCR

Pfam

PRINTS

Blocks-PRINTS GPCR

GeneQuiz thrombin receptor; C5A receptor

Bioinformatics tools, alone, cannot tell us!

Page 44: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

So, beware top hitsSo, beware top hits…but also beware bottom hits!…but also beware bottom hits!

Let us now compare & contrast some InterPro results with those of its source dbs…

Page 45: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

Rhodopsin-like superfamily Rhodopsin-like superfamily GPCRs in InterPro 2005 GPCRs in InterPro 2005

IPR000276 GPCR_Rhodopsn 7752 proteins

PS50262 G_PROTEIN_RECEP_F1_2 7702 proteins

PF00001 7tm_1 7064 proteins

PS00237 G_PROTEIN_RECEP_F1_1 6527 proteins

PR00237 GPCRRHODOPSN 5821 proteins (don’t include partials)

Page 46: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

Rhodopsin-like superfamily Rhodopsin-like superfamily GPCRs in the source databases GPCRs in the source databases

Pfam FP ? FN ? U ? TP? 8776 matches 7064

PROSITE (profile) FP 3 FN 3 U 12 TP 1837 matches 7702

PROSITE (regex) FP 92 FN 261 U 0 TP 1530 matches 6527

PRINTS FP 0 FN ? U 0 TP 1154 matches 5821

>2165 updated

Page 47: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

Rhodopsin-like superfamily Rhodopsin-like superfamily GPCRs in InterPro 2007 GPCRs in InterPro 2007

IPR000276 GPCR_Rhodopsn 16,845 proteins

PS50262 G_PROTEIN_RECEP_F1_2 16,714 proteins

PF00001 7tm_1 15,712 proteins

PR00237 GPCRRHODOPSN 13,405 proteins

PS00237 G_PROTEIN_RECEP_F1_1 13,723 proteins

No human curator has time to validate all these matches…

Page 48: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

14,615 rhodopsin-like superfamily 14,615 rhodopsin-like superfamily GPCRs in Pfam?GPCRs in Pfam?

Page 49: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

ID Q6NV75 PRELIMINARY; PRT; 609 AA.AC Q6NV75;DT 05-JUL-2004 (TrEMBLrel. 27, Created)DT 05-JUL-2004 (TrEMBLrel. 27, Last sequence update)DT 05-JUL-2004 (TrEMBLrel. 27, Last annotation update)DE G protein-coupled receptor 153.GN Name=GPR153;OS Homo sapiens (Human).OX NCBI_TaxID=9606 RN [1]RP SEQUENCE FROM N.A.RC TISSUE=Brain;RA Strausberg R.L., Feingold E.A., Grouse L.H., Derge J.G.,RA Jones S.J., Marra M.A.;RT "Generation and initial analysis of more than 15,000 full-lengthRT human and mouse cDNA sequences.";RL Proc. Natl. Acad. Sci. U.S.A. 99:16899-16903(2002).RP SEQUENCE FROM N.A.RC TISSUE=Brain;RA Strausberg R.;RL Submitted (MAR-2004) to the EMBL/GenBank/DDBJ databases.DR EMBL; BC068275; AAH68275.1; -. DR GO; GO:0004872 DR InterPro; IPR000276; GPCR_Rhodpsn.DR Pfam; PF00001; 7tm_1; 1.DR PROSITE; PS50262; G_PROTEIN_RECEP_F1_2; 1.KW ReceptorSQ SEQUENCE 609 AA; 65341 MW; E525CC7F60D0891C CRC64; MSDERRLPGS AVGWLVCGGL SLLANAWGIL SVGAKQKKWK PLEFLLCTLA ATHMLNVAVP IATYSVVQLR RQRPDFEWNE GLCKVFVSTF YTLTLATCFS VTSLSYHRMW MVCWPVNYRL SNAKKQAVHT VMGIWMVSFI LSALPAVGWH DTSERFYTHG CRFIVAEIGL GFGVCFLLLV GGSVAMGVIC TAIALFQTLA VQVGRQADHR AFTVPTIVVE DAQGKRRSSI DGSEPAKTSL QTTGLVTTIV FIYDCLMGFP VLVVSFSSLR ADASAPWMAL CVLWCSVAQA LLLPVFLWAC DRYRADLKAV REKCMALMAN DEESDDETSL EGGISPDLVL ERSLDYGYGG DFVALDRMAK YEISALEGGL PQLYPLRPLQ EDKMQYLQVP PTRRFSHDDA DVWAAVPLPA FLPRWGSGED LAALAHLVLP AGPERRRASL LAFAEDAPPS RARRRSAESL LSLRPSALDS GPRGARDSPP GSPRRRPGPG PRSASASLLP DAFALTAFEC EPQALRRPPG PFPAAPAAPD GADPGEAPTP PSSAQRSPGP RPSAHSHAGS LRPGLSASWG EPGGLRAAGG GGSTSSFLSS PSESSGYATL HSDSLGSAS//

Pfam match Q6NV75/24-297

GPCR?

PROSITE (profile) no match

PROSITE (regex) no match

PRINTS no match

ClustalW – sequences too

divergent to be aligned

false negative

Page 50: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

Beware top & bottom hitsBeware top & bottom hits…but also beware simplistic analysis …but also beware simplistic analysis tools coupled with wet experiments! tools coupled with wet experiments!

Let’s finally look at how hydropathy profiles can compel biologists to make strange deductions…

- & still get their results published in Science!

Page 51: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

GPCR?

Pfam Lanthionine synthetase C-like protein

PROSITE (profile) no match

PROSITE (regex) no match

PRINTS no match

ClustalW – sequences too

divergent to be aligned

ID Q9C929_ARATH Unreviewed; 401 AA.AC Q9C929;DT 01-JUN-2001, integrated into UniProtKB/TrEMBL.DT 01-JUN-2001, sequence version 1.DT 24-JUL-2007, entry version 23.DE Putative G protein-coupled receptor; 80093-78432.GN Name=F14G24.19; OrderedLocusNames=At1g52920;OS Arabidopsis thaliana (Mouse-ear cress).OC Eukaryota; Viridiplantae; Streptophyta; ... Arabidopsis. OX NCBI_TaxID=3702;RN [1]RP NUCLEOTIDE SEQUENCE.RA Lin X., Kaul S., Town C.D., Benito M., Creasy T.H., Haas B.J., Wu D.,RA Maiti R., Ronning C.M., Koo H., Fujii C.Y., Utterback T.R.,RA Barnstead M.E., Bowman C.L., White O., Nierman W.C., Fraser C.M.;RT "Arabidopsis thaliana chromosome 1 BAC F14G24 genomic sequence.";RL Submitted (DEC-1999) to the EMBL/GenBank/DDBJ databases.RN [2]RP NUCLEOTIDE SEQUENCE.RA Town C.D., Kaul S.;RL Submitted (JAN-2001) to the EMBL/GenBank/DDBJ databases.DR EMBL; AC019018; AAG52264.1; -; Genomic_DNA. [EMBL / GenBank / DDBJ]DR PIR; E96570; E96570.DR UniGene; At.66935; -.DR GenomeReviews; CT485782_GR; AT1G52920.DR KEGG; ath:At1g52920; -.DR TAIR; At1g52920; -.DR GO; GO:0004872; F:receptor activity; IEA:UniProtKB-KW.DR InterPro; IPR007822; LANC_like.DR InterPro; Graphical view of domain structure.DR Pfam; PF05147; LANC_like; 1.KW Receptor.SQ SEQUENCE 401 AA; 45284 MW; C9D3BF8CC8F0FE0B CRC64; MPEFVPEDLS GEEETVTECK DSLTKLLSLP YKSFSEKLHR YALSIKDKVV WETWERSGKR VRDYNLYTGV LGTAYLLFKS YQVTRNEDDL KLCLENVEAC DVASRDSERV TFICGYAGVC ALGAVAAKCL GDDQLYDRYL ARFRGIRLPS DLPYELLYGR AGYLWACLFL NKHIGQESIS SERMRSVVEE IFRAGRQLGN KGTCPLMYEW HGKRYWGAAH GLAGIMNVLM HTELEPDEIK DVKGTLSYMI QNRFPSGNYL SSEGSKSDRL VHWCHGAPGV ALTLVKAAQV YNTKEFVEAA MEAGEVVWSR GLLKRVGICH GISGNTYVFL SLYRLTRNPK YLYRAKAFAS FLLDKSEKLI SEGQMHGGDR PFSLFEGIGG MAYMLLDMND PTQALFPGYE L//

Page 52: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

γ

Src Grb2Shc Sos

Ras Rap

MAPK

GDP

GTP

GTP

GDP

GTP

GTP

GTP

GPCR

P

PRegulation of geneexpression

Nucleus

PI3Kγ

PLCβPKC

RasGRF

PYK2

MEK

Raf1 B-Raf

RTK

cAMP

EPAC

PKACa2+

biogenicamines

amino acids

ions

lipids

peptidesproteins

lightothers

αi

αq

γβα

αo

αi

βα γ

αs

GPCR

biogenicamines

amino acids

ions

lipids

peptidesproteins

lightothers

They do sums (quickly) & crude string matching

RememberRememberComputers don’t do biology!Computers don’t do biology!

Page 53: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

Seeking deeper functional insightsSeeking deeper functional insightsAttwood, TK, Croning, MD & Gaulton, A (2002) Deriving structural and functional insights from a ligand-based

hierarchical classification of G protein-coupled receptors. Protein Eng., 15, 7-12.

• S’family, family & subtype motifs have different locations• If s’family motifs define the common scaffold, hypothesis:

– family motifs relate to ligand binding?– subtype motifs relate to G protein coupling?– powerful tools for subtyping & potentially de-orphaning GPCRs

Page 54: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

Locations of ligand-binding residues & motif distributionLocations of ligand-binding residues & motif distribution

Page 55: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

Locations of G protein-coupling residues & distribution of motifsLocations of G protein-coupling residues & distribution of motifs

Subtype motifs & # of fingerprints mapping to each region

G protein coupling regions & ## of families mapping to each region

Page 56: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

Seeking deeper functional insights?Seeking deeper functional insights?Attwood, TK, Croning, MD & Gaulton, A (2002) Deriving structural and functional insights from a ligand-based

hierarchical classification of G protein-coupled receptors. Protein Eng., 15, 7-12.

• Clearly, many family- & subtype motifs are simply in the ‘wrong’ place for the initial hypothesis to be true

Muscarinic receptors Muscarinic receptor M5GPCR superfamily

Page 57: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

Refining the hypothesisRefining the hypothesis

• Besides, it’s not that simple– only part of the answer

• Need to consider that GPCRs don’t function in isolation– their functions are modulated via interactions with other proteins

• Also, the phenomenon of dimerisation challenges the view of the GPCR monomer as functional unit– many GPCRs exist as homo- & heterodimers

• Such observations demand a more systematic analysis of motifs & their likely functional roles

Page 58: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

Oligomerisation & protein-protein interaction Oligomerisation & protein-protein interaction residues/regionsresidues/regions

A pilot study with adrenergic, bradykinin & dopamine receptorsA pilot study with adrenergic, bradykinin & dopamine receptors

family-level motifs

subfamily-level motifs

residues involved in oligomerisation

residues involved in protein-protein interaction

residues involved in G protein coupling

residues involved in ligand binding

Page 59: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

Where next?Where next?• Based on location, some family-level motifs couldn’t

be involved in ligand binding & some subtype-level motifs couldn’t be involved in G protein coupling– clearly, 3D location must be taken into account

• functional correlations would then be stronger

• The remaining motifs are likely to be involved in other molecular interactions– e.g., dimerisation, effector proteins….(early results promising)

• this will help us to build a knowledge-based system to help suggest the likely functional roles for family- & subtype-level motifs in future

Page 60: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

ConclusionsConclusions

• There are many barriers to success for the jobbing bioinformatician, e.g.: – not fully understanding the processes we’re trying to model

& predict (e.g., protein folding)– the dynamic nature of biological data– not having been rigorous in the way we define &/or describe

biology/biological processes in the literature– the volume of data, data heterogeneity– maintenance of data, propagation of errors…

• Possibly the largest hurdle is that computers are number crunchers– they don’t do biology, & trying to teach them is hard– & the harder we try, the clearer it is how naïve we’ve been

Page 61: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

ConclusionsConclusions

• In silico functional annotation requires several dbs to be searched & several tools to be used– different methods provide different perspectives– dbs aren’t complete & their contents don’t fully overlap

• The more dbs searched, the harder it is to interpret results• The more computers are involved in automating annotation,

the greater the need for collaboration– especially between s/w developers, annotators & ‘wet’

experimentalists

• The more data we have, the more rigorous we must be in thinking/writing if we are to make sense of the complexities

Page 62: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester

ConclusionsConclusionsFlower DR & Attwood, TK (2004) Integrative bioinformatics for functional genome annotation: trawling for G

protein-coupled receptors.Semin Cell Dev Biol., 15(6), 693-701.

• For GPCRs, there are many analysis tools available– BLAST, FastA, family databases, modelling tools, etc.

• We must understand the limitations of the methods– no method is infallible or able to replace the need for biological validation

– use all available resources & understand their problems – none is best!

• Used wisely, bioinformatics tools are useful– BLAST/FastA offer broad brush strokes, motif-methods add fine detail

– together, they facilitate receptor characterisation & prediction of ligand specificity, & allow identification of novel ligand-binding, G protein-coupling or other likely molecular interaction motifs

• We are a long way from having reliable tools for deducing GPCR function & structure from sequence– but with the right approach, there is hope

Page 63: Bioinformatics approaches for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester