Download - Protein Sequences
![Page 1: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/1.jpg)
Protein Sequences
![Page 2: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/2.jpg)
![Page 3: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/3.jpg)
The Genetic Code
![Page 4: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/4.jpg)
The natural extension of the genetic code…
![Page 5: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/5.jpg)
1. Overall amino acid structure2. Amino acid stereochemistry3. Amino acid sidechain structure & classification4. ‘Non-standard’ amino acids5. Amino acid ionization6. Formation of the peptide bond7. Disulfide bonds8. Comparing protein sequences to describe
evolutionary processes.
![Page 6: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/6.jpg)
Q: How many amino acids are there?
![Page 7: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/7.jpg)
The twenty alpha-amino acids that are encoded by the genetic code share the generic structure…
![Page 8: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/8.jpg)
Atom nomenclature within amino acids (as used within the PDB)
CA
CB
C
O
N
OG1CG2
![Page 9: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/9.jpg)
7
CACBCGCDCE
NZ
C
O, OXT
N
![Page 10: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/10.jpg)
ATOM 1 N PRO A 2 22.126 26.173 0.149 1.00 28.61 N ATOM 2 CA PRO A 2 21.848 26.169 1.597 1.00 27.50 C ATOM 3 C PRO A 2 20.582 25.363 1.875 1.00 26.69 C ATOM 4 O PRO A 2 19.724 25.215 0.973 1.00 26.48 O ATOM 5 CB PRO A 2 21.874 27.626 1.981 1.00 28.55 C ATOM 6 CG PRO A 2 21.899 28.434 0.721 1.00 29.65 C ATOM 7 CD PRO A 2 21.761 27.465 -0.440 1.00 28.77 C ATOM 8 N LYS A 3 20.499 24.795 3.073 1.00 22.80 N ATOM 9 CA LYS A 3 19.360 23.972 3.469 1.00 22.07 C ATOM 10 C LYS A 3 18.610 24.700 4.597 1.00 18.49 C ATOM 11 O LYS A 3 19.262 25.140 5.536 1.00 17.98 O ATOM 12 CB LYS A 3 19.669 22.668 4.145 1.00 24.58 C ATOM 13 CG LYS A 3 20.495 21.675 3.360 1.00 36.59 C ATOM 14 CD LYS A 3 20.652 20.419 4.220 1.00 48.23 C ATOM 15 CE LYS A 3 19.341 19.779 4.628 1.00 53.43 C ATOM 16 NZ LYS A 3 19.502 19.003 5.891 1.00 57.07 N ATOM 17 N ALA A 4 17.319 24.698 4.389 1.00 17.98 N ATOM 18 CA ALA A 4 16.468 25.371 5.384 1.00 17.19 C
The .pdb file format
Ato
m n
um
ber
Ato
m n
ame
Res
idu
e n
ame
Ch
ain
ID
Res
idu
e n
um
ber
X-c
oo
rdin
ate
Y-c
oo
rdin
ate
Z-c
oo
rdin
ate
Occ
up
ancy
B-f
acto
r(a
ka T
emp
fac
tor)
Ato
m t
ype
Rec
ord
nam
e
![Page 11: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/11.jpg)
Lys
ArgTo Do: Learn how to name the atoms of all amino acids.Hint: look at any generic PDB file to get a list of atom types.
-The alpha carbon (CA) is immediately adjacent the most oxidized carbon (which is the CO2- in amino acids)
-All the other heavy nuclei are named according to the Greek alphabet.
-Put otherwise, LYS can be described by: CA, CB, CG, CD, CE, and NZ.
Atom nomenclature within amino acids (as used within the PDB)
![Page 12: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/12.jpg)
Numbers are used to discriminate between similar positions…
CB
CG
OD1 ND2
CB
CG
ND1
CE1NE2
CD2
Here are some harder examples…
CB
CGCD2
CE2CZ
OH
CD1
CE2
CB
CGCD2
CD1
NE1CE2 CH2
CE3
CZ2
CZ3
CB
CD2CD1
CG
CB
OG1CG2
![Page 13: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/13.jpg)
Side-chain torsion angles-With the exception of Ala and Gly, all sidechains also have torsion angles.
-To Do on your own:- Count the # of chi’s in each amino acid.- Determine why Ala doesn’t have a chi angle.
![Page 14: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/14.jpg)
1. Overall amino acid structure2. Amino acid stereochemistry3. Amino acid sidechain structure & classification4. ‘Non-standard’ amino acids5. Amino acid ionization6. Formation of the peptide bond7. Disulfide bonds8. Comparing protein sequences to describe
evolutionary processes.
![Page 15: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/15.jpg)
![Page 16: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/16.jpg)
Fischer projection
![Page 17: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/17.jpg)
![Page 18: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/18.jpg)
1. Overall amino acid structure2. Amino acid stereochemistry3. Amino acid sidechain structure & classification4. ‘Non-standard’ amino acids5. Amino acid ionization6. Formation of the peptide bond7. Disulfide bonds8. Comparing protein sequences to describe
evolutionary processes.
![Page 19: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/19.jpg)
Terminologies
• Hydrophobic: Amino acids are those with side chains that do not like to reside in an aqueous environment. Hence, these amino acids buried within the hydrophobic core of the protein.
– Aliphatic: Hydrophobic group that contains only carbon or hydrogen atoms.
– Aromatic: A side chain is considered aromatic when it contains an
aromatic ring system.
• Polar: Polar amino acids are those with side-chains that prefer to reside in an aqueous environment and hence can be generally found exposed on the surface of a protein.
![Page 20: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/20.jpg)
![Page 21: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/21.jpg)
![Page 22: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/22.jpg)
![Page 23: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/23.jpg)
![Page 24: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/24.jpg)
![Page 25: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/25.jpg)
It’s actually a bit more complicated…
![Page 26: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/26.jpg)
-OH -SH
Twenty Amino acids
Hydrophobic (non polar)
Polar
Polar Neutral Charged
Aromatic
(PHE, TRP)
Aliphatic
(ALA, VAL, LEU, ILE, MET, PRO)
Amide Acidic Basic(ASN, GLN) (THR, SER) (CYS) (ASP, GLU) (HIS,
LYS,ARG)
TYR: Amphipathic
GLY: Unclassifiable
HINT: You should definitely know this!!!
![Page 27: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/27.jpg)
1. Overall amino acid structure2. Amino acid stereochemistry3. Amino acid sidechain structure & classification4. ‘Non-standard’ amino acids5. Amino acid ionization6. Formation of the peptide bond7. Disulfide bonds8. Comparing protein sequences to describe
evolutionary processes.
![Page 28: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/28.jpg)
![Page 29: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/29.jpg)
![Page 30: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/30.jpg)
Not uncommon amino acids in biochemistry, but they are not encoded within the genetic code (meaning not incorporated into proteins)…
![Page 31: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/31.jpg)
1. Overall amino acid structure2. Amino acid stereochemistry3. Amino acid sidechain structure & classification4. ‘Non-standard’ amino acids5. Amino acid ionization6. Formation of the peptide bond7. Disulfide bonds8. Comparing protein sequences to describe
evolutionary processes.
![Page 32: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/32.jpg)
![Page 33: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/33.jpg)
![Page 34: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/34.jpg)
![Page 35: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/35.jpg)
![Page 36: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/36.jpg)
![Page 37: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/37.jpg)
1. Overall amino acid structure2. Amino acid stereochemistry3. Amino acid sidechain structure & classification4. ‘Non-standard’ amino acids5. Amino acid ionization6. Formation of the peptide bond7. Disulfide bonds8. Comparing protein sequences to describe
evolutionary processes.
![Page 38: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/38.jpg)
Primary structure = the complete set of covalent bonds within a protein
![Page 39: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/39.jpg)
Polypeptides
Linear arrangement of n amino acid residues linked by peptide bonds.
Polymers composed of two, three, a few, and many amino acid residues are called as dipeptides, tripeptides, oligopeptides and polypeptides.
Proteins are molecules that consist of one or more polypeptide chains.
![Page 40: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/40.jpg)
Q: why is the pentapeptide SGYAL different than LAYGS?
![Page 41: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/41.jpg)
Amino acid to Dipeptide
Amino Acid 1 Amino Acid 2
Peptide bond is the amide linkage that is formed between two amino acids, which results in (net) release of a molecule of water (H2O).
The four atoms in the yellow box form a rigid planar unit and, as we will see next, there is no rotation around the C-N bond.
Peptide bond
Note: this chemistry will not work as
drawn!
![Page 42: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/42.jpg)
The peptide bond has a partial double bond character, estimated at 40% under typical conditions. It is this fact that makes the peptide bond planar and rigid.
![Page 43: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/43.jpg)
A quick aside…
+
+
+
+
A horrible leaving group
A viable leaving group
+
+
..
..
![Page 44: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/44.jpg)
1. Overall amino acid structure2. Amino acid stereochemistry3. Amino acid sidechain structure & classification4. ‘Non-standard’ amino acids5. Amino acid ionization6. Formation of the peptide bond7. Disulfide bonds8. Comparing protein sequences to describe
evolutionary processes.
![Page 45: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/45.jpg)
-- The primary structure is a complete description of the covalent bond network within a protein.
-- This is almost(!) completely described by the sequence of amino acids.
-- If you know that the protein is AVG…, you can look up the structures of A, V and G, plus what you know about peptide bonding allows you to complete the covalent bond structure.
-- So, when does the primary structure not fully describe the covalent bond network?
-- BTW, this is a HUGE pet peeve of mine…there is no such thing as a primary sequence, despite its rather common usage (including in journal article titles…UGG!).
A primary sequence implies a secondary sequence, which is nonsense. While there is of course primary, secondary, tertiary and quaternary structures, there is only the “sequence”.
![Page 46: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/46.jpg)
![Page 47: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/47.jpg)
![Page 48: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/48.jpg)
![Page 49: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/49.jpg)
1. Overall amino acid structure2. Amino acid stereochemistry3. Amino acid sidechain structure & classification4. ‘Non-standard’ amino acids5. Amino acid ionization6. Formation of the peptide bond7. Disulfide bonds8. Comparing protein sequences to describe
evolutionary processes.
![Page 50: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/50.jpg)
![Page 51: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/51.jpg)
Multiple sequence alignments
Given the sequences:
INDUSTRYINTERESTINGIMPORTANT
One example of a MSA is: But is it better than:
IN-DUST--RY INDU--ST-RYINTERESTING INTERESTINGIMPOR--TANT IMPOR-T-ANT
![Page 52: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/52.jpg)
Multiple sequence alignments
I-N-DU-ST-RY I--NDU-ST-RY-I-NTERESTING I--NTERESTINGIMPO-R--TANT I-MPO-R--TANT
IN-DUTS--RY INDU--ST-RYINTERESTING INTERESTINGIMPOR--TANT IMPOR-T-ANT
I-NDUS--T-RY- I-N--D--U-S-T-RYINT-ERES-TING I-N-TE-RE-S-TINGIMPOR--TAN--T -M-PO--RTA-NT---
![Page 53: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/53.jpg)
Multiple sequence alignments
Possible MSA Entire column can NOT have only gaps!
I-N-DU-ST-RY I--NDU-ST-RY-I-NTERESTING I--NTERESTINGIMPO-R--TANT I-MPO-R--TANT
Can NOT move residues around Possible
IN-DUTS--RY INDU--ST-RYINTERESTING INTERESTINGIMPOR--TANT IMPOR-T-ANT
Very few matches! Too many gaps!
I-NDUS--T-RY- I-N--D--U-S-T-RYINT-ERES-TING I-N-TE-RE-S-TINGIMPOR--TAN--T IM-PO--RTA-NT---
![Page 54: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/54.jpg)
Which alignment pairs make the most sense?
AVGTLEVLASID
AVGTLEEKWVKV
VS.
A-VT-G-R-L-EAA-TA-Q-V-IE
AVTG-RLEAATAQ-IE
VS.
AVWF----VLIMALWFAMVFILIM
ESQG----KTDDTQADGKCRTD
VS.
More similar amino acids
Fewer gaps Gap location makes more sense because gaps are less frequent in nonpolar regions.
![Page 55: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/55.jpg)
A multiple sequence alignment:-CAPSRPLNENDDGR-QAFELIGTAVNM...-CVPGRGEMEHDD-RDQVLELFGTVVNL...-AVPKRAALQNDDGR-QGWELYGTVSAQ...-AVPTKMNCFNDDGR-QSVNLIGTVSGN...-ILPARTSMCNDDGR-QTIEMKGTPAGG...--APGK--NGHKLV--Q-FELKGTYSRT...AFAPRRIKMVNKLGR-QNFTLLGTFERT...AYRPDRCNTCNKLGR-QDVELMGTDART...-YRPEEWFGENKLGR-QSAELIGTDERS...--APL-ETYWPKLGR-QTGALAGTNSAV...--RPY-KAGWNKLGR-QSYELGGTNPYI...---PARAKNMG---R-QSYHL--TMEWQ...
Chothia & Lesk. EMBO J. 5:823-826 (1986).
![Page 56: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/56.jpg)
An example multiple sequence alignment.Conserved residues are indicated by color. Note that gaps tend to cluster together.
Also gaps at the N- and C-terminal ends are more common. Why?
![Page 57: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/57.jpg)
Regular expressions and sequence logos.Regular expressions provide a coarse-grain summary of an alignment segment.
Sequence logos essentially due the same, but without information loss(cf. http://en.wikipedia.org/wiki/Sequence_logo).
![Page 58: Protein Sequences](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815ad9550346895dc8a58b/html5/thumbnails/58.jpg)
A phylogenetic tree describes an evolutionary process.But from a more pragmatic viewpoint, it also visually describes the similarities and
dissimilarities between sequences within a multiple alignment.