protein analysis tools 2 nd april, 2012

48
Protein Analysis Tools 2 nd April, 2012 Ansuman Chattopadhyay, PhD, Head Molecular Biology Information Service Health Sciences Library System University of Pittsburgh [email protected] http://www.hsls.pitt.edu/guides/ge netics

Upload: brody

Post on 18-Feb-2016

15 views

Category:

Documents


0 download

DESCRIPTION

Protein Analysis Tools 2 nd April, 2012. Ansuman Chattopadhyay, PhD, Head Molecular Biology Information Service Health Sciences Library System University of Pittsburgh [email protected] http://www.hsls.pitt.edu/guides/genetics. What we’ll do:. Brief overview of CLC Main Workbench - PowerPoint PPT Presentation

TRANSCRIPT

Protein Analysis Tools2nd April, 2012

Ansuman Chattopadhyay, PhD, Head Molecular Biology Information ServiceHealth Sciences Library SystemUniversity of [email protected]

http://www.hsls.pitt.edu/guides/genetics

What we’ll do:

Brief overview of CLC Main Workbench

find genomic context of a protein sequence

search for the presence of conserved domains

create a  multiple sequence alignment plot

What we’ll do: analyze primary structure such as, hydrophobicity,

hydrophylicity, antigenicity, repeat sequence detection etc.

predict secondary structure

predict post translational modification such as, Phosphorylation, glycosylation, ….

search for interacting partners

predict domain driven protein-protein interactions

Workshop Resourceshttp://www.hsls.pitt.edu/molbio/tutorials

HSLS MolBio Videos

Sequence Analysis Software Suits Wisconsin GCG VectorNTI DNA STAR-LaserGene GeneiousCLC Main

Why CLC Main ?

Windows Mac Linux DNA, RNA, Protein, Microarray Data Analysis Regular Update HSLS Licensed

CLC Main Access

HSLS CLC Main Registration Link: http://www.hsls.pitt.edu/molbio/clcmain

Access via Pitt - Network Connect Instruction video: http://goo.gl/JNjMt

CLC Main Workbench Overview

Graphical Users Interface Protein sequences Import Sequence Navigation

CLC Main Graphical User Interface (GUI)

CLC Main

Navigate a proteinsequence

CLC Main –getting started (basic navigation steps): http://media.hsls.pitt.edu/media/molbiovideos/clc-navigation-ac0312.swf

CLC Main Workbench Walkthrough (Part1): http://media.hsls.pitt.edu/media/molbiovideos/clcmain-walkthrough-part1-ac0112.swf

CLC Main Workbench Walkthrough (Part2): http://media.hsls.pitt.edu/media/molbiovideos/clcmain-walkthrough-part2-ac0112.swf

Videos

Import a Protein Sequence

Protein Sequence

Human PLCg1 Refseq no: NP_002651 Uniprot Accession Number: P19174 FASTA file Raw sequence

CLC features:

Search, Import, Create new sequence

Import a DNA /Protein sequence into CLC Main (Part1):http://media.hsls.pitt.edu/media/molbiovideos/clc-import-part1-ac0112.swf

Import a DNA /Protein sequence into CLC Main (Part 2):http://media.hsls.pitt.edu/media/molbiovideos/clc-import-part2-ac0112.swf

Videos

CLC protein sequence

Protein sequence manipulation Create a new protein with PLCg1 SH2-SH2-

SH3 domains

Sequence Alignment

Pair-wise Alignment Global Local

Multiple Sequence Alignment

Sequence Alignment

Pair-wise Sequence Alignment

Multiple Sequence Alignment

Multiple Sequence Alignment Tools: ClustalW and T-coffee

PLCg1 Orthologous sequences PLCg1:

Mouse: NP_067255 Rat: NP_037319 Cow: NP_776850 Dog: XP_542998 Zebra fish: NP_919388

Human: NP_002651

NP_067255,NP_037319,NP_776850,XP_542998,NP_919388,NP_002651

Create a multiple sequence alignment plot using CLC(part1):

http://media.hsls.pitt.edu/media/molbiovideos/msf-clcmain-ac0212 part1.swf

Create a multiple sequence alignment plot using CLC (part2):

http://media.hsls.pitt.edu/media/molbiovideos/msf-clcmain-ac0212-part2.swf Create a multiple sequence alignment plot:

http://media.hsls.pitt.edu/media/clres2705/msa.swf Compare two peptide sequences.:

http://media.hsls.pitt.edu/media/clres2705/blast2.swf

Videos

Starting with a short peptide sequence find:

the whole protein sequence orthologs in other species (nematode)

Tool:UCSC BLATNCBI BLAST against SwissProt

Peptide to whole protein

Peptide seq: SPEGCWGPEPRDCVSCRNVSRGRECVDKCNLLEGEPR

Place a mRNA or peptide sequence into the human genome (BLAT):

http://www.hsls.pitt.edu/molbio/videos/play?v=12e

Find homologous sequences: http://media.hsls.pitt.edu/media/clres2705/blast.swf

Videos

Find homologous sequenceSPEGCWGPEPRDCVSCRNVSRGRECVDKCNLLEGEPR

Sequence Manipulation & Format Conversion Sequence Manipulation Suite

http://bioinformatics.org/sms2/ Readseq

http://thr.cit.nih.gov/molbio/readseq/

GenePept

FASTA

Hands-On Retrieve amino acid sequence present

between position 25 to 45 in Sequence A (MS Word Doc) Identify the rat gene which encodes this peptide

fragment and retrieve its whole protein sequence Find the fruit fly homolog of this protein.

What % identity the fruit fly protein shares with its rat homolog?

Predict potential MAPK phosphorylation sites present in the fruit fly protein

Protein Domain Search: InterPro Scan InterPro is a database of protein families, domains,

regions, repeats and sites in which identifiable features found in known proteins can be applied to new protein sequences.

>gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHK

Videos:

Find protein domains, PTM, secondary str etc: http://media.hsls.pitt.edu/media/clres2705/uniprot.swf

Start with a protein pattern and find what proteins posses that domain: http://media.hsls.pitt.edu/media/clres2705/scanprosite.swf

Search for protein domains,repeats and sites: http://media.hsls.pitt.edu/media/clres2705/interpro.swf

Protein Domain Search: ScanProsite

>gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHK

Pattern Search  [AC]-x-V-x(4)-{ED}:

This pattern is translated as: [Ala or Cys]-any-Val-any-any-any-any-{any but Glu or Asp}

F-[GSTV]-P-R-L-[G>]

Pattern Search

Protein Primary Structure Analysis Tool: ExPASy from SIB

Calculated Mol Wt Theoritical PI Extinction coefficients Estimated half-life

Hydropathicity plot : Kyte & Doolittle Hydrophilicity plot:  Hopp T.P., Woods K.R

Antigenic Site Prediction

Tool: Emboss Antigenic

>gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHK

EmBoss Antigenic Antigenic predicts potentially antigenic regions of a protein sequence, using

the method of Kolaskar and Tongaonkar.Analysis of data from

experimentally determined antigenic sites on proteins has revealed that the hydrophobic residues Cys, Leu and Val, if they occur on the surface of a protein, are more likely to be a part of antigenic sites. A semi-empirical method which makes use of physicochemical properties of amino acid residues and their frequencies of occurrence in experimentally known segmental epitopes was developed by Kolaskar and Tongaonkar to predict antigenic determinants on proteins. Application of this method to a large number of proteins has shown that their method can predict antigenic determinants with about 75% accuracy which is better than most of the known methods. This method is based on a single parameter and thus very simple to use.

Transmembrane Region prediction

Transmembrane Site Prediction

Tool: TMHMM Server

>gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHK

Protein Secondary Structure>gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHK

Protein-Protein Interactions Prediction

Tool: STRING

>gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHK

Hands-on Take the human BCL2 protein sequence and

Find its domain architecture Predict the topology of its transmembrane region Design suitable antigenic site for antibody generation What is its calculated Mol Wt and Ext Coefficient? Predict its secondary structure

What % of this protein possesses alpha helical structure? Predict its potential interacting partners

Hands-on

Prediction of potential phosphorylation sites present in a protein sequence.

Sequence: human BCL2

>gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHK

Phosphorylation Site Prediction:

>gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHK

Tool: NetPhos

Phosphorylation Site Prediction:

>gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHK

Tool: GPS

Thank you!Any questions?

Carrie Iwema Ansuman [email protected] [email protected] 412-383-6887 412-648-1297

http://www.hsls.pitt.edu/guides/genetics