the craft of annotation carole goble based on observations of the prints protein fingerprint...

31
The craft of annotation The craft of annotation Carole Goble Based on observations of the PRINTS protein fingerprint database

Upload: pierce-jackson

Post on 25-Dec-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: The craft of annotation Carole Goble Based on observations of the PRINTS protein fingerprint database

The craft of annotationThe craft of annotation

Carole GobleBased on observations of the

PRINTS protein fingerprint database

Page 2: The craft of annotation Carole Goble Based on observations of the PRINTS protein fingerprint database

Primary & Secondary Primary & Secondary databasesdatabases

Primary source generated by experimentalists. Role: standards, quality thresholds, dissemination

•Sequence databases: EMBL, GenBank•Increasingly other data types: micro-array

Secondary source derived from repositories, other secondary databases, analysis and expertise.Role: Distilled and accumulated specialist knowledge. Value added commentary.

•Swiss-Prot, PRINTS, CATH, PAX6, Enzyme, dbSNP…

Role: Warehouses to support analysis over replicated data

• GIMS, aMAZE, InterPro…

Page 3: The craft of annotation Carole Goble Based on observations of the PRINTS protein fingerprint database

The “Annotation Pipeline”The “Annotation Pipeline”

EMBLSwiss-Prot

PRINTS

Analysis

Analysis

GPCRDB

Analysis

TrEMBL

Analysis

Interpro

BLOCKS

Page 4: The craft of annotation Carole Goble Based on observations of the PRINTS protein fingerprint database

Annotation DistillationAnnotation Distillation

Expressed Sequence Tags millions

nrdb 503,479

TrEMBL 234,059

Swiss-Prot 85,661

InterPro 2990

PRINTS1310

Page 5: The craft of annotation Carole Goble Based on observations of the PRINTS protein fingerprint database
Page 6: The craft of annotation Carole Goble Based on observations of the PRINTS protein fingerprint database

PRINTSPRINTS

PRINTS - a database of protein family “fingerprints”Fingerprints - groups of motifs excised from alignments–used to provide diagnostic signatures for protein families

PRINTS forms basis of derived resources–e.g., blocks, emotif, InterPro

Used in gene family analysis, genome annotation, etc.

Page 7: The craft of annotation Carole Goble Based on observations of the PRINTS protein fingerprint database

ID PRIO_HUMAN STANDARD; PRT; 253 AA.AC P04156;DE MAJOR PRION PROTEIN PRECURSOR (PRP) (PRP27-30) (PRP33-35C) (ASCR).OS Homo sapiens (Human).OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.OX NCBI_TaxID=9606;RN [1]RP SEQUENCE FROM N.A.RX MEDLINE=86300093 [NCBI, ExPASy, Israel, Japan]; PubMed=3755672;RA Kretzschmar H.A., Stowring L.E., Westaway D., Stubblebine W.H., Prusiner S.B., Dearmond S.J.RT "Molecular cloning of a human prion protein cDNA.";RL DNA 5:315-324(1986).RN [6]RP STRUCTURE BY NMR OF 23-231.RX MEDLINE=97424376 [NCBI, ExPASy, Israel, Japan]; PubMed=9280298;RA Riek R., Hornemann S., Wider G., Glockshuber R., Wuethrich K.;RT "NMR characterization of the full-length recombinant murine prion protein, mPrP(23-231).";RL FEBS Lett. 413:282-288(1997).CC -!- FUNCTION: THE FUNCTION OF PRP IS NOT KNOWN. PRP IS ENCODED IN THE HOST GENOME AND IS CC EXPRESSED BOTH IN NORMAL AND INFECTED CELLS.CC -!- SUBUNIT: PRP HAS A TENDENCY TO AGGREGATE YIELDING POLYMERS CALLED "RODS".CC -!- SUBCELLULAR LOCATION: ATTACHED TO THE MEMBRANE BY A GPI-ANCHOR.CC -!- DISEASE: PRP IS FOUND IN HIGH QUANTITY IN THE BRAIN OF HUMANS AND ANIMALS INFECTED WITH CC NEURODEGENERATIVE DISEASES KNOWN AS TRANSMISSIBLE SPONGIFORM ENCEPHALOPATHIES OR PRION CC DISEASES, LIKE: CREUTZFELDT-JAKOB DISEASE (CJD), GERSTMANN-STRAUSSLER SYNDROME (GSS), CC FATAL FAMILIAL INSOMNIA (FFI) AND KURU IN HUMANS; SCRAPIE IN SHEEP AND GOAT; BOVINE CC SPONGIFORM ENCEPHALOPATHY (BSE) IN CATTLE; TRANSMISSIBLE MINK ENCEPHALOPATHY (TME); CC CHRONIC WASTING DISEASE (CWD) OF MULE DEER AND ELK; FELINE SPONGIFORM ENCEPHALOPATHY CC (FSE) IN CATS AND EXOTIC UNGULATE ENCEPHALOPATHY(EUE) IN NYALA AND GREATER KUDU. THE CC PRION DISEASES ILLUSTRATE THREE MANIFESTATIONS OF CNS DEGENERATION: (1) INFECTIOUS (2)CC SPORADIC AND (3) DOMINANTLY INHERITED FORMS. TME, CWD, BSE, FSE, EUE ARE ALL THOUGHT TO CC OCCUR AFTER CONSUMPTION OF PRION-INFECTED FOODSTUFFS.CC -!- SIMILARITY: BELONGS TO THE PRION FAMILY.DR HSSP; P04925; 1AG2. [HSSP ENTRY / SWISS-3DIMAGE / PDB]DR MIM; 176640; -. [NCBI / EBI]DR InterPro; IPR000817; -.DR Pfam; PF00377; prion; 1.DR PRINTS; PR00341; PRION.KW Prion; Brain; Glycoprotein; GPI-anchor; Repeat; Signal; Polymorphism; Disease mutation.

Swiss-Prot Swiss-Prot annotatioannotatio

nn

Page 8: The craft of annotation Carole Goble Based on observations of the PRINTS protein fingerprint database

gc; gx; gn; ga; gt; gp; manual annotationbb;gr; bb;gd; bb;si; SUMMARY INFORMATIONsi; -------------------sd; 37 codes involving 8 elementssd; 0 codes involving 7 elementssd; 0 codes involving 6 elementssd; 0 codes involving 5 elementssd; 0 codes involving 4 elementssd; 1 codes involving 3 elementssd; 0 codes involving 2 elementsbb;ci; COMPOSITE FINGERPRINT INDEXci; ---------------------------cr;cd; 8| 37 37 37 37 37 37 37 37 cd; 7| 0 0 0 0 0 0 0 0 cd; 6| 0 0 0 0 0 0 0 0 cd; 5| 0 0 0 0 0 0 0 0 cd; 4| 0 0 0 0 0 0 0 0 cd; 3| 1 0 0 0 1 1 0 0 cd; 2| 0 0 0 0 0 0 0 0 cd; --+-----------------------------------------cd; | 1 2 3 4 5 6 7 8 bb;tp; PRIO_COLGU PRIO_MACFA PRIO_CEREL PRIO_ODOHE KA; P40251 M1 P40254 M1 P79142 M1 P47852 M1

tp; PRIO_GORGO PRIO_PANTR PRIO_HUMAN O46648 SWISS-PROT IDsKA; P40252 M1 P40253 M1 P04156 M1 O46648 M1 tp; PRIO_SHEEP PRIO_CALJA PRIO_BOVIN PRP2_BOVIN KA; P23907 M1 P40247 M1 P10279 M1 Q01880 M1 bb;tt; PRIO_COLGU MAJOR PRION PROTEIN PRECURSOR (PRP) (PRP27-30) (PRP33-35C) - COLOBUS GUEREZA.tt; PRIO_MACFA MAJOR PRION PROTEIN PRECURSOR (PRP) (PRP27-30) (PRP33-35C) - MACACA FASCICULARIS (CRAB EATING MACAQUE) tt; PRIO_CEREL MAJOR PRION PROTEIN PRECURSOR (PRP) - CERVUS ELAPHUS (RED DEER).tt; PRIO_ODOHE MAJOR PRION PROTEIN PRECURSOR (PRP) - ODOCOILEUS HEMIONUS (MULE DEER) (BLACK-TAILED DEER).tt; PRIO_GORGO MAJOR PRION PROTEIN PRECURSOR (PRP) (PRP27-30) (PRP33-35C) - GORILLA GORILLA GORILLA (LOWLAND GORILLA)tt; PRIO_PANTR MAJOR PRION PROTEIN PRECURSOR (PRP) (PRP27-30) (PRP33-35C) - PAN TROGLODYTES (CHIMPANZEE)tt; PRIO_HUMAN MAJOR PRION PROTEIN PRECURSOR (PRP) (PRP27-30) (PRP33-35C) (ASCR) - HOMO SAPIENS (HUMAN).

Nude Nude PRINTS PRINTS entryentry

Page 9: The craft of annotation Carole Goble Based on observations of the PRINTS protein fingerprint database

Low Level AnnotationLow Level Annotation

Prion protein signature

PROSITE; PS00291 PRION_ 1; PS00706 PRION_ 2

BLOCKS; BL00291

PFAM; PF00377 prion

INTERPRO; IPR000817

1. STAHL, N. AND PRUSINER, S. B.

Prions and prion proteins.

FASEB J. 5 2799- 2807 (1991).

Page 10: The craft of annotation Carole Goble Based on observations of the PRINTS protein fingerprint database

Annotation: “High-level”Annotation: “High-level”

Semi-structured text-based annotation, representing the accumulated knowledge of the biological community about the data entryIntellectually formed – the accumulated knowledge of an expert distilling the aggregated information drawn from multiple data sources and analyses, and the annotators knowledge.Culled from other sources such as other database entries annotations and the literature. Intended to be human readable rather than machine processable.

Page 11: The craft of annotation Carole Goble Based on observations of the PRINTS protein fingerprint database

gc; PRIONgx; PR00341gt; Prion protein signaturegp; INTERPRO; IPR000817gp; PROSITE; PS00291 PRION_1; PS00706 PRION_2gp; BLOCKS; BL00291gp; PFAM; PF00377 prionbb;gr; 1. STAHL, N. AND PRUSINER, S.B.gr; Prions and prion proteins.gr; FASEB J. 5 2799-2807 (1991).gr;gr; 2. BRUNORI, M., CHIARA SILVESTRINI, M. AND POCCHIARI, M.gr; The scrapie agent and the prion hypothesis.gr; TRENDS BIOCHEM.SCI. 13 309-313 (1988).gr; gr; 3. PRUSINER, S.B.gr; Scrapie prions.gr; ANNU.REV.MICROBIOL. 43 345-374 (1989).bb;gd; Prion protein (PrP) is a small glycoprotein found in high quantity in the brain of animals infected with gd; certain degenerative neurological diseases, such as sheep scrapie and bovine spongiform encephalopathy (BSE), gd; and the human dementias Creutzfeldt-Jacob disease (CJD) and Gerstmann-Straussler syndrome (GSS). PrP is gd; encoded in the host genome and is expressed both in normal and infected cells. During infection, however, the gd; PrP molecules become altered and polymerise, yielding fibrils of modified PrP protein.gd;gd; PrP molecules have been found on the outer surface of plasma membranes of nerve cells, to which they are gd; anchored through a covalent-linked glycolipid, suggesting a role as a membrane receptor. PrP is also gd; expressed in other tissues, indicating that it may have different functions depending on its location. gd;gd; The primary sequences of PrP's from different sources are highly similar: all bear an N-terminal domain gd; containing multiple tandem repeats of a Pro/Gly rich octapeptide; sites of Asn-linked glycosylation; an gd; essential disulphide bond; and 3 hydrophobic segments. These sequences show some similarity to a chicken gd; glycoprotein, thought to be an acetylcholine receptor-inducing activity (ARIA) molecule. It has been gd; suggested that changes in the octapeptide repeat region may indicate a predisposition to disease, but it is gd; not known for certain whether the repeat can meaningfully be used as a fingerprint to indicate susceptibility.gd;gd; PRION is an 8-element fingerprint that provides a signature for the prion proteins. The fingerprint was gd; derived from an initial alignment of 5 sequences: the motifs were drawn from conserved regions spanning gd; virtually the full alignment length, including the 3 hydrophobic domains and the octapeptide repeats gd; (WGQPHGGG). Two iterations on OWL18.0 were required to reach convergence, at which point a true set comprising gd; 9 sequences was identified. Several partial matches were also found: these include a fragment (PRIO_RAT) gd; lacking part of the sequence bearing the first motif,and the PrP homologue found in chicken - this matches gd; well with only 2 of the 3 hydrophobic motifs (1 and 5) and one of the other conserved regions (6), but has an gd; N-terminal signature based on a sextapeptide repeat (YPHNPG) rather than the characteristic PrP octapeptide.

PRINTS PRINTS AnnotationAnnotation(manual)(manual)

Page 12: The craft of annotation Carole Goble Based on observations of the PRINTS protein fingerprint database

High level annotationHigh level annotation

Prion protein (PrP) is a small glycoprotein found in high quantity in the brain of

animals infected with certain degenerative neurological diseases, such as sheep

scrapie and bovine spongiform encephalopathy (BSE), and the human dementias

Creutzfeldt- Jacob disease (CJD) and Gerstmann- Straussler syndrome (GSS).

Page 13: The craft of annotation Carole Goble Based on observations of the PRINTS protein fingerprint database

PRINTS Annotation ProcessPRINTS Annotation Process

FingerPrint

Process

Blank Annotation

Annotation

gathering

Editorial culling

SWISS-PROT

MEDLINE

heuristics

mapping rules

Filled Annotatio

n

TagDeco

r-ation

OMIM GRAPPRINTS

Knowledge

Page 14: The craft of annotation Carole Goble Based on observations of the PRINTS protein fingerprint database

For all matches to a fingerprint, full SWISS-PROT entry is retrieved:

tp; PRIO_COLGU PRIO_MACFA PRIO_CEREL PRIO_ODOHE tp; PRIO_GORGO PRIO_PANTR PRIO_HUMAN O46648 tp; PRIO_SHEEP PRIO_CALJA PRIO_BOVIN PRP2_BOVIN

ID analysis determines if the entry is a super-family, family or domain This is essential as influences how the annotation is processed:

tp; URIC_RAT URIC_MOUSE URIC_RABIT URIC_PAPHAtp; URIC_PIG URIC_DROPS URIC_DROME URIC_DROVI tp; URIC_SOYBN URIC_EMENI URIC_ASPFL URID_CANLI

tp; MUP5_MOUSE LACB_BOVIN LACB_BUBAR LACB_CAPHI tp; MUP_RAT RET1_ONCMY RET2_ONCMY PURP_CHICKtp; RETB_HUMAN ICYA_MANSE ICYB_MANSE CRA2_HOMGA

tp; UROT_HUMAN PLMN_PIG PLMN_HUMAN PLMN_BOVIN tp; APOA_HUMAN UROK_HUMAN APOA_MACMU UROK_PIGtp; THRB_BOVIN HGFL_MOUSE THRB_HUMAN HGFL_HUMAN

PRINTS Annotation ProcessPRINTS Annotation Process

Page 15: The craft of annotation Carole Goble Based on observations of the PRINTS protein fingerprint database

ID analysis usually reveals families unambiguouslythe comment field helps to resolve super-families from domains

CC -!- SIMILARITY: BELONGS TO THE PRION FAMILYCC -!- SIMILARITY: BELONGS TO THE URICASE FAMILYCC -!- SIMILARITY: BELONGS TO THE LIPOCALIN FAMILYCC -!- SIMILARITY: CONTAINS 38 KRINGLE REGIONS

Once entry type established, appropriate precis is constructed Shared annotation is engineered to provide a report detailing

the function & structure of the proteinthe disease(s) with which it is associatedthe family to which it belongsa set of literature referencesa list of keywordsAny other remarks

The precis is then fed into a naked pre-PRINTS file.Output is English.

PRINTS Annotation ProcessPRINTS Annotation Process

Page 16: The craft of annotation Carole Goble Based on observations of the PRINTS protein fingerprint database

Description Copy gt (title)

RAuthor, RTitle, Rlocation

Common + Filters: •Top four - Date priority•Mixed paper subject portfolio

gr (reference)

Database cross Reference fields

Common + Filters:-Preferred links -Preferred order

gp (other databases)

KeyWords Up to a threshold of common keywords

gd (general annotation)

Function Majority vote function

Subcellular location

Majority vote subcellular location

Disease Golden vote -Sequence provenance disease

Similarity tag Cluster on SWISS-PROT codesMajority vote for familiesEven distribution for superfamilies and domains

family

Subunit An indication of structure subunit (structure)

RP Structure Paper type classification - 1 crystallographic- 1 NMR

structure

Swiss-Prot tag Heuristics PRINTS tag

--- Com

ment fie

ld

------>

Page 17: The craft of annotation Carole Goble Based on observations of the PRINTS protein fingerprint database

Swiss-Prot RedundancySwiss-Prot Redundancy

OPSD SHEEP DR PRINTS; PR00237; GPCRRHODOPSN.

OPSD HUMAN DR PRINTS; PR00237; GPCRRHODOPSN.

OPSD MOUSE DR PRINTS; PR00237; GPCRRHODOPSN.

OPSD SHEEP VISUAL PIGMENTS ARE THE LIGHT- ABSORBING

MOLECULES THAT MEDIATE VISION

OPSD HUMAN VISUAL PIGMENTS ARE THE LIGHT- ABSORBING

MOLECULES THAT MEDIATE VISION

OPSD MOUSE VISUAL PIGMENTS ARE THE LIGHT- ABSORBING

MOLECULES THAT MEDIATE VISION

Impact on provenance.

Page 18: The craft of annotation Carole Goble Based on observations of the PRINTS protein fingerprint database

Redundancy eliminationRedundancy elimination

ACM1 HUMAN Primary transducing effect is

pi turnover.

ACM4 HUMAN Primary transducing effect is

inhibition of adenylate cyclase.

ACM2 HUMAN Primary transducing effect is

adenylate cyclase inhibition.

Page 19: The craft of annotation Carole Goble Based on observations of the PRINTS protein fingerprint database

Databases: majority voteDatabases: majority vote

Major prion protein precursor (PRP)

PRINTS; PR00341 PRION

PROSITE; PS00291 PRION_ 1; PS00706 PRION_ 2

PFAM; PF00377 prion

INTERPRO; IPR000817

PDB; 1B10; 1AG2

Page 20: The craft of annotation Carole Goble Based on observations of the PRINTS protein fingerprint database

References: date ranking ++References: date ranking ++

1. CERVENAKOVA, L., [...]Infectious amyloid precursor gene sequences in primates used for experimental transmission of human spongiform encephalopathy.PROC. NATL. ACAD. SCI. USA 91 12159- 12162 (1994).2. LOWENSTEIN, D. H., [...]Three hamster species with different scrapie incubation times and neuropathological featuresencode distinct prion proteins.MOL. CELL. BIOL. 10 1153- 1163 (1990).3. KALUZ, S., [...]Sequencing analysis of prion genes from red deer and camel.GENE 199 283- 286 (1997).

Page 21: The craft of annotation Carole Goble Based on observations of the PRINTS protein fingerprint database

Disease – Golden Voting.Disease – Golden Voting.

(PRIO_ HUMAN; P04156): Prp is found in high quantity in the brain of humans and animals infected with neurodegenerative diseases known as transmissible spongiform encephalopathies or prion diseases [...]

(PRIO_ HUMAN; P04156): Kuru is transmitted during ritualistic cannibalism, among natives of the new guinea highlands. [...]

(PRIO_ SHEEP; P23907): Polymorphism at position 171 may be related to the alleles of scrapie [...]

Page 22: The craft of annotation Carole Goble Based on observations of the PRINTS protein fingerprint database

PRINTS Annotation ProcessPRINTS Annotation Process

FingerPrint

Process

Blank Annotation

Annotation

gathering

Editorial culling

SWISS-PROT

MEDLINE

heuristics

mapping rules

Filled Annotatio

n

TagDeco

r-ation

OMIM GRAPPRINTS

Knowledge

Page 23: The craft of annotation Carole Goble Based on observations of the PRINTS protein fingerprint database

gc; PRIOgx; gt; Major prion protein precursor (PRP) signaturegp; PROSITE; PS00291 PRION_1; PS00706 PRION_2gp; INTERPRO; IPR000817gp; PFAM; PF00377 priongp; PDB; 1B10; 1AG2gp; SCOP; 1B10; 1AG2gp; CATH; 1B10; 1AG2gp; MIM; 176640; 123400; 137440; 245300; 600072bb;gr; 1. LOWENSTEIN, D.H., BUTLER, D.A., WESTAWAY, D., MCKINLEY, M.P., DEARMOND, S.J. AND PRUSINER, S.B. gr; Three hamster species with different scrapie incubation times and neuropathological features encode distinct gr; prion proteins. gr; MOL.CELL.BIOL. 10 1153-1163 (1990).gr;gr; 5. RIEK, R., HORNEMANN, S., WIDER, G., GLOCKSHUBER, R. AND WUETHRICH, K. gr; NMR characterization of the full-length recombinant murine prion protein, mPrP(23-231). gr; FEBS LETT. 413 282-288 (1997).bb;gd; The function of prp is not known. Prp is encoded in the host genome and is expressed both in normal and gd; infected cells. gd;gd; (PRIO_HUMAN; P04156): gd; Prp is found in high quantity in the brain of humans and animals infected with neurodegenerative diseases gd; known as transmissible spongiform encephalopathies or prion diseases, like: creutzfeldt-jakob disease (cjd), gd; gerstmann-straussler syndrome (gss), fatal familial insomnia (ffi) and kuru in humans; scrapie in sheep and gd; goat; bovine spongiform encephalopathy (bse) in cattle; transmissible mink encephalopathy (tme); chronic gd; wasting disease (cwd) of mule deer and elk; feline spongiform encephalopathy (fse) in cats and exotic ungulate gd; encephalopathy (eue) in nyala and greater kudu. The prion diseases illustrate three manifestations of cns gd; degeneration: (1) infectious (2) sporadic and (3) dominantly inherited forms. Tme, cwd, bse, fse, eue are all gd; thought to occur after consumption of prion-infected foodstuffs. gd;gd; Prp has a tendency to aggregate yielding polymers called "rods". gd;gd; The structure has been determined, e.g. "NMR characterization of the full-length recombinant murine prion gd; protein, mPrP(23-231)" [5]. gd; gd; Belongs to the prion family. gd;gd; Keywords: GPI-anchor; Repeat; Signal; Prion; Brain; Glycoprotein; Polymorphism; Disease mutation; 3D-structure.gd;gd; PRIO is an 8-element fingerprint that provides a signature for the Major prion protein precursor (PRP). The gd; fingerprint was derived from an initial alignment of 6 sequences: the motifs were drawn from conserved regions gd; spanning virtually the full alignment length. Two iterations on SPTR37_9f were required to reach convergence, gd; at which point a true set comprising 37 sequences was identified. A single partial match was also found: gd; (PRIO_CHICK; P27177).

PRECIS PRECIS annotationannotation

Page 24: The craft of annotation Carole Goble Based on observations of the PRINTS protein fingerprint database

gc; PRIONgx; PR00341gt; Prion protein signaturegp; INTERPRO; IPR000817gp; PROSITE; PS00291 PRION_1; PS00706 PRION_2gp; BLOCKS; BL00291gp; PFAM; PF00377 prionbb;gr; 1. STAHL, N. AND PRUSINER, S.B.gr; Prions and prion proteins.gr; FASEB J. 5 2799-2807 (1991).gr;gr; 2. BRUNORI, M., CHIARA SILVESTRINI, M. AND POCCHIARI, M.gr; The scrapie agent and the prion hypothesis.gr; TRENDS BIOCHEM.SCI. 13 309-313 (1988).gr; gr; 3. PRUSINER, S.B.gr; Scrapie prions.gr; ANNU.REV.MICROBIOL. 43 345-374 (1989).bb;gd; Prion protein (PrP) is a small glycoprotein found in high quantity in the brain of animals infected with gd; certain degenerative neurological diseases, such as sheep scrapie and bovine spongiform encephalopathy (BSE), gd; and the human dementias Creutzfeldt-Jacob disease (CJD) and Gerstmann-Straussler syndrome (GSS). PrP is gd; encoded in the host genome and is expressed both in normal and infected cells. During infection, however, the gd; PrP molecules become altered and polymerise, yielding fibrils of modified PrP protein.gd;gd; PrP molecules have been found on the outer surface of plasma membranes of nerve cells, to which they are gd; anchored through a covalent-linked glycolipid, suggesting a role as a membrane receptor. PrP is also gd; expressed in other tissues, indicating that it may have different functions depending on its location. gd;gd; The primary sequences of PrP's from different sources are highly similar: all bear an N-terminal domain gd; containing multiple tandem repeats of a Pro/Gly rich octapeptide; sites of Asn-linked glycosylation; an gd; essential disulphide bond; and 3 hydrophobic segments. These sequences show some similarity to a chicken gd; glycoprotein, thought to be an acetylcholine receptor-inducing activity (ARIA) molecule. It has been gd; suggested that changes in the octapeptide repeat region may indicate a predisposition to disease, but it is gd; not known for certain whether the repeat can meaningfully be used as a fingerprint to indicate susceptibility.gd;gd; PRION is an 8-element fingerprint that provides a signature for the prion proteins. The fingerprint was gd; derived from an initial alignment of 5 sequences: the motifs were drawn from conserved regions spanning gd; virtually the full alignment length, including the 3 hydrophobic domains and the octapeptide repeats gd; (WGQPHGGG). Two iterations on OWL18.0 were required to reach convergence, at which point a true set comprising gd; 9 sequences was identified. Several partial matches were also found: these include a fragment (PRIO_RAT) gd; lacking part of the sequence bearing the first motif,and the PrP homologue found in chicken - this matches gd; well with only 2 of the 3 hydrophobic motifs (1 and 5) and one of the other conserved regions (6), but has an gd; N-terminal signature based on a sextapeptide repeat (YPHNPG) rather than the characteristic PrP octapeptide.

Human Human annotationannotation

Page 25: The craft of annotation Carole Goble Based on observations of the PRINTS protein fingerprint database

Implications for provenanceImplications for provenance

Tools used by the service providers can be sophisticated.Provenance information may be recorded in those tools.But are not passed on into the annotation (e.g. SWISS-PROT and PRINTS)

•Why?

Page 26: The craft of annotation Carole Goble Based on observations of the PRINTS protein fingerprint database

Implications for provenanceImplications for provenance

Mining, Aggregating, Distilling, Summarising and Generating phrases and texts from comment fields. Distillation to create compact and comprehensive summary.Urge to be non-redundant.

•How to represent the provenance? •How does the provenance get aggregated?•How does it get propagated?•Degrees of evidence -> Degrees of provenance

Page 27: The craft of annotation Carole Goble Based on observations of the PRINTS protein fingerprint database

Implications for provenanceImplications for provenance

gr; 5. RIEK, R., HORNEMANN, S., WIDER, G., GLOCKSHUBER, R. AND WUETHRICH, K. gr; NMR characterization of the full-length recombinant murine prion protein, mPrP(23-231). gr; FEBS LETT. 413 282-288 (1997).bb;gd; The function of prp is not known. Prp is encoded in the host genome and is expressed both in normal and gd; infected cells. gd;gd; (PRIO_HUMAN; P04156): gd; Prp is found in high quantity in the brain of humans and animals infected with neurodegenerative diseases gd; known as transmissible spongiform encephalopathies or prion diseases, like: creutzfeldt-jakob disease (cjd), gd; gerstmann-straussler syndrome (gss), fatal familial insomnia (ffi) and kuru in humans; scrapie in sheep and gd; goat; bovine spongiform encephalopathy (bse) in cattle; transmissible mink encephalopathy (tme); chronic gd; wasting disease (cwd) of mule deer and elk; feline spongiform encephalopathy (fse) in cats and exotic ungulate gd; encephalopathy (eue) in nyala and greater kudu. The prion diseases illustrate three manifestations of cns gd; degeneration: (1) infectious (2) sporadic and (3) dominantly inherited forms. Tme, cwd, bse, fse, eue are all gd; thought to occur after consumption of prion-infected foodstuffs. gd;gd; Prp has a tendency to aggregate yielding polymers called "rods". gd;gd; The structure has been determined, e.g. "NMR characterization of the full-length recombinant murine prion gd; protein, mPrP(23-231)" [5].

•Inter and Intra provenance

Swiss-Prot

Page 28: The craft of annotation Carole Goble Based on observations of the PRINTS protein fingerprint database

Inheritance of errors E.g. SWISS-PROT errorsgd; Polymorphism at position 171 may be related to the

gd; alleles of scarpie incubation-control (sic) gene in this species.

Poor quality begates poor quality. E.g. SWISS-PROT annotation poor or inconsistentgd; Visual pigments are the light-absorbing molecules that mediate vision. They consist gd; of an apoprotein, opsin, covalently linked to cis-retinal. This receptor is coupled gd; to the activation of phospholipase c. gd;gd; Visual pigments are the light-absorbing molecules that mediate vision. They consist gd; of an apoprotein, opsin, covalently linked to cis-retinal. This receptor is coupled gd; to the activation of phospholipase c (by similarity).

•How do we record that it’s a copy but its been corrected and why?

Implications for provenanceImplications for provenance

Page 29: The craft of annotation Carole Goble Based on observations of the PRINTS protein fingerprint database

Implications for provenanceImplications for provenance

Hugely subjective.e.g. if only one annotation claims that the family is implicated in a disease, and that annotation was by a group Terri Attwood respects then it gets in.

• How to capture that subjectivity and use it when using the annotation?•The workflow is complex – how to capture this?• Its more like argumentation than reproducible derivation.

Page 30: The craft of annotation Carole Goble Based on observations of the PRINTS protein fingerprint database

Questions, questions …Questions, questions …

Where does provenance come from? –Incidental vs supplied by the scientist, somehow.

What is provenance used for? –Reliability & quality: –Justification & audit: –Reusability, reproducibility & repeatability–Change & evolution: –Ownership, security, credit & copyright. –Identity - LSID–Immutability–Migration & storage–Aggregation–Versioning

Page 31: The craft of annotation Carole Goble Based on observations of the PRINTS protein fingerprint database

SparesSpares