clustering and motif discovery in kinases of yeast, worm and arabidopsis thaliana

21
Clustering and Motif Discovery in Kinases of Yeast, Worm and Arabidopsis thaliana Sihui Zhao

Upload: reuben-george

Post on 30-Dec-2015

28 views

Category:

Documents


0 download

DESCRIPTION

Clustering and Motif Discovery in Kinases of Yeast, Worm and Arabidopsis thaliana. Sihui Zhao. Background – Kinase. Protein kinases play a pivotal role in the control of all cellular processes Cell proliferation, differentiation, adhesion, migration, metabolism and signal transduction - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Clustering and Motif Discovery  in Kinases of Yeast, Worm and  Arabidopsis thaliana

Clustering and Motif Discovery

in Kinases of Yeast, Worm and Arabidopsis thaliana

Sihui Zhao

Page 2: Clustering and Motif Discovery  in Kinases of Yeast, Worm and  Arabidopsis thaliana

Background – Kinase

Protein kinases play a pivotal role in the control of all cellular processes

Cell proliferation, differentiation, adhesion, migration, metabolism and signal transduction

A kinase superfamily in each genome, ~2% of all sequences

Page 3: Clustering and Motif Discovery  in Kinases of Yeast, Worm and  Arabidopsis thaliana

Structure of Catalytic Domain

Also called C-subunit Conserved among protein kinase

superfamily Contains 250-300 residues 12 subdomains

Background

Page 4: Clustering and Motif Discovery  in Kinases of Yeast, Worm and  Arabidopsis thaliana

Subdomains of C-subunit

Two pivital subdomains (based on PKA): Subdomain I: Sequester ATP

Gly-X-Gly-X-X-Gly-X-Val Subdomain VIB: ‘Catalytic loop’

His-Arg-Asp-X-Lys-X-X-Asn

Background

Page 5: Clustering and Motif Discovery  in Kinases of Yeast, Worm and  Arabidopsis thaliana

Conserved Residues

Residue Probable Function

Gly50 Gly52 Val57 Sequester ATP

Lys72 Glu91 Positioning triphosphate group

Asp166 Lys168 Asn171 Catalytic loop

Glu208 Arg280 Assembly of catalytic core

Asp220 Assembly of catalytic loop

Background

Page 6: Clustering and Motif Discovery  in Kinases of Yeast, Worm and  Arabidopsis thaliana

Motif

Motif is a locally conserved region Conserved due to higher selection

pressure compared to non-conserved regions

Importance to the biological function or structure

Background

Page 7: Clustering and Motif Discovery  in Kinases of Yeast, Worm and  Arabidopsis thaliana

Problem & Strategy in Motif Discovery

Motif discovery relies on either statistical or combinatorial pattern search techqniues

Problem: High noise compared to signal when facing huge number of sequences

Strategy: Clustering/classification used to find sequence families first to decrease the noise ratio

Background

Page 8: Clustering and Motif Discovery  in Kinases of Yeast, Worm and  Arabidopsis thaliana

Objectives

Cluster kinase sequences into different families

Find conserved motifs from sequence families

Page 9: Clustering and Motif Discovery  in Kinases of Yeast, Worm and  Arabidopsis thaliana

Tools Blast – Sequence alignment tool ClustalW – Multiple alignment tool HMMER – HMM-based package BAG package – Sequence clustering

package BlockerMaker – Block/Motif

discovery tool LAMA – Alignment tool for Blocks Perl

Page 10: Clustering and Motif Discovery  in Kinases of Yeast, Worm and  Arabidopsis thaliana

Collecting and clustering kinase sequences based on similarity

The iterative HMM search – To collect more kinases, especially remotely homologous sequences

Motif discovery – To find blocks from each cluster and merge blocks across multiple clusters

Computational Framework – Outline

Page 11: Clustering and Motif Discovery  in Kinases of Yeast, Worm and  Arabidopsis thaliana

Collecting and Clustering Sequences

Extract annotated kinase sequences

All to all pairwise comparison

Estimate best score for clustering

Cluster sequences using BAG

Cluster kinase sequences

Computational Framework

Page 12: Clustering and Motif Discovery  in Kinases of Yeast, Worm and  Arabidopsis thaliana

HMM Iterative Search

Collect more sequences for each cluster

Computational Framework

Multiple alignment using CLUSTALW

Build HMM/Profile

Search all 3 genomes

Add hits to each cluster if any

Page 13: Clustering and Motif Discovery  in Kinases of Yeast, Worm and  Arabidopsis thaliana

Motif Discovery

Block discovery by BlockMaker

All to all block comparison by LAMA

Clustering blocks using BAG package

Conserved sites detection

Find blocks and merge across multiple clusters

Computational Framework

Page 14: Clustering and Motif Discovery  in Kinases of Yeast, Worm and  Arabidopsis thaliana

Result 963 kinase from ~45,000

sequences (~2%) 159 clusters of kinase

sequences containing 2 to 32 sequences each

0 to ~1000 sequences added to each cluster after HMM iterative search

Page 15: Clustering and Motif Discovery  in Kinases of Yeast, Worm and  Arabidopsis thaliana

Result 71 sequence clusters sent to BlockMaker

ID c51.seq-1 BLOCK

AC c51.seq-1; distance from previous block=(79,120)

DE similar to eukaryotic protein kinase domains

BL EGL motif=[5,0,17] motomat=[1,1,-10] width=31 seqs=5

gi|3329644|gb|AAC ( 792) SNFNFEFHKDSLEILEPIGSGHFGVVRRGIL 99

gi|3329650|gb|AAC ( 154) YNPKYEVDLEKLEILEQLGDGQFGLVNRGLL 92

gi|3877967|emb|CA ( 836) YNNDYEIDPVNLEILNPIGSGHFGVVKKGLL 79

gi|3877968|emb|CA ( 842) YNEDYEIDLENLEILETLGSGQFGIVKKGYL 77

gi|3878749|emb|CA ( 129) YKKQYEIASENLENKSILGSGNFGVVRKGIL 100

Page 16: Clustering and Motif Discovery  in Kinases of Yeast, Worm and  Arabidopsis thaliana

Result

45 clusters of Blocks after LAMA comparison and BAG clustering

Page 17: Clustering and Motif Discovery  in Kinases of Yeast, Worm and  Arabidopsis thaliana

Some Found Conserved Sites

Result

Cluster 11, size 29Subdomain I: G-X-G-X-X-G-X-V

Cluster 16, size 97Subdomain VIB: H-R-D-X-K-X-X-N

Page 18: Clustering and Motif Discovery  in Kinases of Yeast, Worm and  Arabidopsis thaliana

Some New Sites Cluster 20, size=8 Alignment and motif

Known: Arg280 - assembly of catalytic core Unknown: Cys, Trp, Pro

Cluster 31, size=13 Alignment and motif Known: Asp220 - assembly of catalytic loop Unknown: Gly, Thr, Tyr

Cluster 40, size=7 Alignment and motif Known: Glu91 - positioning triphosphate

group Unknown: His, Pro

Result

Page 19: Clustering and Motif Discovery  in Kinases of Yeast, Worm and  Arabidopsis thaliana

Conclusion This computational framework is

successful Especially when no preliminary

information on huge amount of sequences

It’s efficient Not completely automatic

Page 20: Clustering and Motif Discovery  in Kinases of Yeast, Worm and  Arabidopsis thaliana

Conclusion Kinases are clustered based on

similarity, which provides a way to deduce the functions from other family members

Some new conserved sites are found, which might indicate the specificity of kinase functions

Page 21: Clustering and Motif Discovery  in Kinases of Yeast, Worm and  Arabidopsis thaliana

Acknowledgement

Prof. Sun Kim Prof. Mehmet Dalkilic Dr. Irfan Gunduz