author: jason weston et., al pans presented by tie wang protein ranking: from local to global...

25
Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network

Post on 20-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network

Author: Jason Weston et., al

PANS

Presented by Tie Wang

Protein Ranking: From Local to global structure in

protein similarity network

Page 2: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network

Outline

Introduction; Background; Method; Experiment Analysis

Page 3: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network

Introduction

Pairwise subtle sequence similarities imply structural functional and evolutionary relations among DNA and protein seqences;

Search biosequences from online database is analogous to searching the WWW (search engine search the db for query and return a ranked list);

A protein ranking algorithm is presented for biosequence query;

Page 4: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network

Early algorithms only focus on pair-wise sequence similarity (SW LA search);

Statistical models use multiple alignments for similarity search (profile based, psi-blast);

Global similarity search can be mapped onto protein similarity network.

Background

Page 5: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network

How to perform protein ranking?

Underlying idea: Google ranking Key feature: Exploiting global structure by

interring it from local hyperlink structure. Construct a protein similarity network Add query sequence Weight diffusion Rank proteins upon convergence

Page 6: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network

Algorithm

Page 7: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network

Experiment

Use protein 3-D structure database SCOP as golden standard.

Sequences have no more than 95% similarity.

7329 proteins are splitted into 379 superfamilies as training and 332 for testing

3 networks are generated using BLAST and PSI-BLAST.

Page 8: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network

Experiment

Value

Compare with other two experiments: 1. only local structure are considered 2. non-local edges without weak edges The result shows that the second one is only slightly

worse than our algorithm

=

Where Sj(i) is E value assigned to protein I given query j.

Page 9: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network

Analysis

Bower et al, Science vol 306, 2004

Cluster structure

Page 10: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network

Author: Kuang Rui et., al

Bioinformatics

Presented by Tie Wang

Motif based protein ranking by network

propagation

Page 11: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network

Outline

Introduction; Background; Method; Experiment Analysis

Page 12: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network

Direct measure of pairwise sequence is proved to be effective on classification.

Performance is dropped down when detecting subtle remotely homology sequences.

Those sequences share a conserved structure at least at some components.

Formulate problem based on this statement.

Background

Page 13: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network

Protein motif bipartite network

• Each protein contains a set of motifs.

• Each motif belongs to a set of proteins.

• Their relationship are mapped to a Bipartite graph as shown on the left.

• The edge weight indicates the probi- lity that motif x is in protein y.

Page 14: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network

Motifdrop Algorithm

Set P represents protein sequences and set F represents motifs. H is the connectivity matrix.

is row normalized version of H.

is a vector of initial value for H.

is a vector of initial value for P.

Page 15: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network

MotifProp Algorithm

The convergence of motifdrop is guranteed. The problem is reformulated based on the

following rule,

is row normalized version of H.

is a vector of initial value for H.

is a vector of initial value for P.

Page 16: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network

Edge weighting scheme

PSI-BLAST E-value is assigned between pair-wise protein nodes.

Gaussian edge weights are calculated.

The Gaussian weights from query to each protein are assigned as initial value.

Page 17: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network

Value estimation

Sq(i) is the E-value of protein i and query q.

Eq(j) is the E-value of the jth motif and ith protein.

(1)

???

Page 18: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network

Estimation on substitution score

Substitutions score between a kmer f and sequence x can be estimated as,

where

and

sl is a log value which implied the S score below threshold can be a motif hits against sequence x.

Page 19: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network

Sequential MotifProp

Empirical experiments suggest that using a weighted linear combination of multiple motifs does not improve the results.

Apply a simple multiple motif sets scheme. Motif nodes F can be divided into n set partition in which F(i) is a set of motif from ith

motif set. F set represents the motifs instead of individual ones.

Page 20: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network

Motif-rich regions

Page 21: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network

Experiments

7329 protein domains with known 3D structure on SCOP.

They are divided into training (4246) and testing (3083).

Apply additional 10602 from swiss-prot db. Evaluation on ROC curve.

Page 22: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network

Results of classification

Page 23: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network

Results of classification (cont)

Page 24: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network

Results on Motif rich region

Page 25: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network

Conclusion

Two methods are presented on protein classification using protein ranking methods.

Similarity matrix and protein/motif propagation network are base structures.

Simple methods but innovative formulation. Better results compared with current

approaches. Analysis on results play an important roles.