sequence based analysis tutorial

35
Sequence Based Analysis Tutorial NIH Proteomics Workshop Lai-Su Yeh, Ph.D. Protein Information Resource at Georgetown University Medical Center

Upload: homer

Post on 06-Jan-2016

44 views

Category:

Documents


0 download

DESCRIPTION

Sequence Based Analysis Tutorial. NIH Proteomics Workshop Lai-Su Yeh, Ph.D. Protein Information Resource at Georgetown University Medical Center. Retrieval, Sequence Search & Classification Methods. Retrieve protein info by text / UID Sequence Similarity Search - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Sequence Based Analysis Tutorial

Sequence Based Analysis Tutorial

NIH Proteomics Workshop

Lai-Su Yeh, Ph.D.Protein Information Resource at Georgetown University Medical Center

Page 2: Sequence Based Analysis Tutorial

22

Retrieval, Sequence Search & Classification Methods

Retrieve protein info by text / UID Sequence Similarity Search

BLAST, FASTA, Dynamic Programming Family Classification

Patterns, Profiles, Hidden Markov Models, Sequence Alignments, Neural Networks

Integrated Search and Classification System

Page 3: Sequence Based Analysis Tutorial

33

Sequence Similarity Search (I)

Based on Pair-Wise Comparisons Dynamic Programming Algorithms

Global Similarity: Needleman-Wunch Local Similarity: Smith-Waterman

Heuristic Algorithms FASTA: Based on K-Tuples (2-Amino Acid) BLAST: Triples of Conserved Amino Acids Gapped-BLAST: Allow Gaps in Segment Pairs PHI-BLAST: Pattern-Hit Initiated Search PSI-BLAST: Position-Specific Iterated Search

Page 4: Sequence Based Analysis Tutorial

44

Sequence Similarity Search (II) Similarity Search Parameters

Scoring Matrices – Based on Conserved Amino Acid Substitution

Dayhoff Mutation Matrix, e.g., PAM250 (~20% Identity)

Henikoff Matrix from Ungapped Alignments, e.g., BLOSUM 62

Gap Penalty Search Time Comparisons

Smith-Waterman: 10 Min FASTA: 2 Min BLAST: 20 Sec

Page 5: Sequence Based Analysis Tutorial

55

Feature Representation Features of Amino Acids: Physicochemical Properties,

Context (Local & Global) Features, Evolutionary Features Alternative Amino Acids: Classification of Amino Acids To

Capture Different Features of Amino Acid Residues

Page 6: Sequence Based Analysis Tutorial

66

Substitution Matrix Likelihood of One Amino Acid Mutated into Another Over

Evolutionary Time Negative Score: Unlikely to Happen (e.g., Gly/Trp, -7) Positive Score: Conservative Substitution (e.g., Lys/Arg, +3) High Score for Identical Matches: Rare Amino Acids (e.g., Trp, Cys)

Page 7: Sequence Based Analysis Tutorial

77

Secondary Structure Features Helix Patterns of Hydrophobic Residue Conservation

Showing I, I+3, I+4, I+7 Pattern Are Highly Indicative of an Helix (Amphipathic)

Strands That Are Half Buried in the Protein Core Will Tend to Have Hydrophobic Residues at Positions I, I+2, I+4, I+6

Page 8: Sequence Based Analysis Tutorial

88

BLASTBLAST (Basic Local Alignment Search Tool) Extremely fast Robust Most frequently used

It finds very short segment pairs (“seeds”) between the query and the database sequence

These seeds are then extended in both directions until the maximum possible score for extensions of this particular seed is reached

Page 9: Sequence Based Analysis Tutorial

99

BLAST Search From BLAST Search Interface Table-Format Result with BLAST Output and SSEARCH

(Smith-Waterman) Pair-Wise Alignment

Link to NCBI taxonomy

Click to seealignment

Links to iProClass and UniProtKB reports

Link to PIRSF report

Click to see SSearch alignment

Page 10: Sequence Based Analysis Tutorial

1010

Blast Result & Pairwise Alignment

BLAST Aligment

Page 11: Sequence Based Analysis Tutorial

1111

How do you build a tree?

Pick sequences to align Align them Verify the alignment Keep the parts that are aligned correctly Build and evaluate a phylogenetic tree Integrated Analysis

Page 12: Sequence Based Analysis Tutorial

1212

Pairwise alignment:Calculate distance matrix

Mean number of differences per residue

Unrooted Neighbor-Joining Tree Branch length drawn to scale

Rooted NJ Tree (guide tree)

Root place at a position where the means of the branch lengths on either side of the root are equal

Progressive Alignment guided by the tree

Alignment starts from the tips of the tree towards the root

Thompson et al., NAR 22, 4675 (1994).

Multiple Sequence Alignment: CLUSTALW

Page 13: Sequence Based Analysis Tutorial

1313

PIR Multiple Alignment and Tree From Text/Sequence Search Result or CLUSTAL W Alignment Interface

Page 14: Sequence Based Analysis Tutorial

1414

Page 15: Sequence Based Analysis Tutorial

1515

PIR Pattern Search From Text/Sequence Search Result or Pattern Search Interface

P-[IV]-[WY]-x(3)-H-[MR]-V-x(3,4)-Q-x(1,2)-D-x(4,5)-G-A-N

P-[IV]-[WY]-x(3)-H-[MR]-V-x(3,4)-Q-x(1,2)-D-x(4,5)-G-A-N

Alignment of a region involved in catalytic activity

Create Pattern and search in database:

A

B

O05689

Test sequence against PROSITE database

Signature Patterns for Functional Motifs

Page 16: Sequence Based Analysis Tutorial

1616

Pattern Search Result (I)A. One Query Pattern Against UniProtKB or UniRef100 DBs

Display the query pattern

Links to iProClass and UniProtKB reports

Link to NCBI taxonomy

Link to PIRSF report

Indicate pattern sequence region(s)

Page 17: Sequence Based Analysis Tutorial

1717

Pattern Search Result (II)B. One Query Sequence Against PROSITE Pattern Database

Page 18: Sequence Based Analysis Tutorial

1818

Profile Method

Profile: A Table of Scores to Express Family Consensus Derived from Multiple Sequence Alignments Num of Rows = Num of Aligned Positions Each row contains a score for the alignment with each possible

residue. Profile Searching

Summation of Scores for Each Amino Acid Residue along Query Sequence

Higher Match Values at Conserved Positions

Page 19: Sequence Based Analysis Tutorial

1919

Prosite PS50157 profile for Zinc finger C2H2

Page 20: Sequence Based Analysis Tutorial

2020

Search One Query Protein Against all the Full-length and Domain HMM models for the fully curated PIRSFs by HMMER

The matched regions and statistics will be displayed.

Shows PIRSF that the query belongs to

Statistical data for all domains

Statistical data per domain

Alignment with consensus sequence

1

PIRSF scan

Page 21: Sequence Based Analysis Tutorial

2121

Lab Section

Page 22: Sequence Based Analysis Tutorial

2222

Rat eye lens phosphoproteomics in normal and cataractKamei et al., Biol. Pharm. Bull., 2005.

Normal Cataract(-) pI (+)

Mw

More phosphorylated spots in cataract sample.Digestion and MS from Spot 16 gave these peptides:

MDVTIQHPWFKRALGPFYPSRCSLSADGMLTFSGYRLPSNVDQSALS

We want to identify the protein(s) that contain these peptides

Use Peptide Search

MDVTIQHPWFKR

Page 23: Sequence Based Analysis Tutorial

2323

Peptide Search

Page 24: Sequence Based Analysis Tutorial

2424

Links to iProClass and UniProtKB reports

Link to NCBI taxonomy

Link to PIRSF report

Matching peptidehighlighted in the sequence

Sorting arrows

Peptide Search & ResultsSpecies restricted search

Search in UniProtKB, 23 proteins

Page 25: Sequence Based Analysis Tutorial

2525

Batch Retrieval Results (I)

Retrieve more sequences

• Retrieve multiple proteins in from iProClass using a specific identifier or a combination of them• Provides a means to easily retrieve and analyze proteins when the identifiers come from different databases

Page 26: Sequence Based Analysis Tutorial

2626

ID MappingID Mapping

Page 27: Sequence Based Analysis Tutorial

2727

Blast Similarity Search

>P24623

• Perform sequence similarity search

What proteins are related to rat CRYAA?

http://pir.georgetown.edu/pirwww/search/blast.shtml

Page 28: Sequence Based Analysis Tutorial

2929

Pairwise Alignment

Page 29: Sequence Based Analysis Tutorial

3030

UniProtKBDatabaseand unique UniParc

sequences

PIR protein family classification

database

PIR Text Search ((http://pir.georgetown.edu/search/textsearch.shtml)

Let’s search for human crystallins

Page 30: Sequence Based Analysis Tutorial

3131

Refine your search or start over

Display PDB ID

Let’s look for crystallins which have 3D structure

Page 31: Sequence Based Analysis Tutorial

3232

Domain Display allows to compare simultaneously Pfam domains present in multiple proteins

Let’s perform a multiple alignment on the sequences containing PF00030

Share same domainarchitecture

Page 32: Sequence Based Analysis Tutorial

3333

Multiple Alignment

Page 33: Sequence Based Analysis Tutorial

3434

Interactive Phylogenetic Tree and Alignment

Beta B1 and gamma crystallins share the same domains, SCOP fold and share significant sequence similarity suggesting that they are related

Page 34: Sequence Based Analysis Tutorial

3535

Pattern Search (I)

Search for proteins containing this pattern (PS00225) in rat

Select P07320 and perform a pattern search

Page 35: Sequence Based Analysis Tutorial

3636

Pattern Search Result

Beta and gamma Crystallins have multiple copies of this pattern