phylogenetic trees • protein structure · phylogenetic analysis variable position conserved...

43
Prof. Bystroff talks about BIOINFORMATICS Sequence database searching Phylogenetic Trees Protein Structure 1 hi

Upload: others

Post on 07-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

Prof. Bystroff talks about BIOINFORMATICS

• Sequence database searching • Phylogenetic Trees • Protein Structure

1

hi

Page 2: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

AAAGAGATTCTGCTAGCGGTCGGAGAGATGCTGCAGCGAGTCGGCC

Page 3: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

AAAGAGATTCTGCTAGCGGTCGGAGAGATGCTGCAGCGAGTCGGCC

Page 4: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

AAAGAGATTCTGCTAGCGGTCGG

AGAGATGCTGCAGCGAGTCGGCC

Page 5: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

5

Protein sequence alignment uses a "substitution matrix".

Sequence 1

Sequ

ence

2

Page 6: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

Find the best pathway through the substitution scores, and you have an alignment

6"dynamic programming" algorithm.

Page 7: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

BLAST searches millions of sequences

GenBank contains over 162 million sequences!!

The score for each should be the optimal alignment score. Even if we can do 1 per millisecond, it would take 45 hours to do one search. BLAST usually finishes in under a minute.

How does BLAST do it so fast?

Page 8: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

BLAST precalculates all triplet hits in the database.

PGQ

...

PGQ PGR PGS ... PGT PGV PGWPGY PAQ PCQPDQ PEQ PFQ ......

BLAST uses an expansion table to allow for near perfect matches

My sequence has this triplet BLAST saves a

lookup table (called an INDEX) for all of the near identity triplet location in the whole database.

This is all done when BLAST is set up, before any searches are carried out.

Page 9: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

BLAST finds diagonal arrangements of triplet hits

triplet hits in one database protein

Hits are joined by extension

Page 10: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

BLAST scores only the best hits (saves time)

BLAST connects the diagonals (FASTA algorithm)

This protein is given a score, and we save it for later only if the score passes a cutoff.

Re-scoring.

Convert score to a e-value*.

Rank by e-value.

cutoff

*later...

Page 11: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

11

Protein Databases available for BLAST search

Go to BLAST search page (i.e. blastp) , select a database to search and then select ? to learn a little about that database.

Page 12: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

12

Protein Databases available for BLAST search

On BLAST search page, select.a database to search and then select ? to learn a little about that database.

Page 13: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

13

Protein Databases available for BLAST search

On BLAST search page, select.a database to search and then select ? to learn a little about that database.

Page 14: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

14

forms of BLASTBLAST query database

blastn nucleotide nucleotide

blastp protein protein

tblastn protein translated DNA

blastx translated DNA protein

tblastx translated DNA translated DNA

psi-blast protein, profile protein

phi-blast pattern protein

Page 15: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

How significant is that?

Please give me a number for...

...how likely the data would not have been the result of chance,...

...as opposed to... ...a specific

inference.

Page 16: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

e-value

A better metric of significance.E-value = p-value x (number of attempts)

16

Page 17: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

Scores from random alignments are used to calculate the p-value of an alignment score

score--->

freq

p-value of x = ∫normalized normal distribution fit to random scoresx

x

Page 18: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

p-value is the significance of one (1) alignment score.

e-value is the significance of one score of many tries.

Searching a database of 162 million sequences for one hit is like trying 162 million times to get one good alignment. The number of times you will see that score by chance is the p-value times 162 million!

e-value = p-value * 162,000,000 (GenBank search)

Page 19: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

Pop-quizBLAST HIT.................... e-value1. annotation 3.0 2. annotation 3.03. annotation 3.0 4. annotation 3.05. annotation 3.0 6. annotation 3.07. annotation 3.0 8. annotation 3.09. annotation 3.0 10. annotation 3.0

How many of the above 10 hits are the expected to be by chance?

Page 20: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

Pop-quizBLAST HIT.................... e-value1. annotation 1.0 2. annotation 2.03. annotation 3.0 4. annotation 4.05. annotation 5.0 6. annotation 6.07. annotation 7.0 8. annotation 8.09. annotation 9.0 10. annotation 10.0

How many of the above 10 hits are the expected to be by chance?

Page 21: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

Pop-quizBLAST HIT.................... e-value1. annotation 0.0 2. annotation 0.013. annotation 0.01 4. annotation 0.015. annotation 0.02 6. annotation 0.027. annotation 0.02 8. annotation 0.029. annotation 0.02 10. annotation 10.0

How many of the above 10 hits are the expected to be by chance?

Page 22: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

Bioinformatics

22

• Sequence database searching • Phylogenetic Trees • Protein Structure

Page 23: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

Evolutionary time

A

B

C

D

11

1

6

3

5

genetic change

A

B

C

D

time

A

B

C

D

no meaning

Cladogram Phylogram Ultrametric tree

(D:5,(A:1,(C:1,B:6):1):3)

parenthesis (Newick) notation has both labels and distances.

Page 24: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

A multiple sequence alignment is made using many pairwise sequence alignments

Multiple Sequence Alignment

Page 25: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

Construct a distance-based tree

97 8177

82 59 3280 55 3190 65 40

61 4233

ABCDEF

A B C D E F ABCDEF

Draw tree heredistances

Page 26: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

Life is not strictly a tree -- horizontal gene transfer

26

BF Smets, T Barkay (2005) “Horizontal gene transfer: perspectives at a crossroads of scientific disciplines” Nature Reviews Microbiology.

Discrete Steps Needed for Stability of Gene TransferStably incorporating horizontally transferred genes into a recipient genome involves five distinct steps (Fig. 1). 1. First, a particular segment of DNA or RNA is prepared for transfer from the donor strain through one of several processes, including excision and circularization of conjugative transposons, initiation of conjugal plasmid transfer by synthesis of a mating pair-formation protein complex, or packaging of nucleic acids into phage virions. 2. Next, the segment is transferred either by conjugation, which requires contact between the donor and recipient cells, or by transformation and transduction without direct contact. 3. During the third step, genetic material enters the recipient cell, where cell exclusion may abort the transfer. 4. Otherwise, during the fourth step, the incoming gene is integrated into the recipient genome by legitimate or sitespecific recombination or by plasmid circularization and complementary strand

synthesis. Barriers to transfer during this step come from restriction modification systems, failure to integrate and replicate within the new host genome, and incompatibility with resident plasmids. 5. In the final step, transferred genes are replicated as part of the recipient genome and transmitted to daughter cells in stable fashion over successive generations. Researchers from different disciplines tend to focus on specific stages within this five-step sequence. Thus, evolutionary biologists who examine microbial genomes for evidence of past transfers tend to look at HGTs from the perspective of step five. Molecular biologists are more likely to examine the details of the transfer events, while microbial ecologists look more broadly when they describe the magnitude and diversity of the mobile gene pool, sometimes called the mobilome.

Page 27: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

Sequence homology trees are complicated by paralogy

Orthologs: homologs originating from a speciation eventParalogs: homologs originating from a gene duplication event.

clam

duck

crab

fish

clam

A

duck

A

crab

Bfis

h A

duck

B

fish

B

Sequence tree !!cl

am A

crab

B

duck

Adu

ck B

fish

Afis

h B

duplication

speciation

speciationgene loss

True Species tree reconciled trees

Page 28: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

Use orthologs• To make the right inferences about

evolution, make sure your phylogenetic tree is composed of orthologs

How do you know it's an ortholog?1. It has the same function in both species.2. It has about the same number of differences across species as other orthologs.3. You don't.

Page 29: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

Functional inference from multiple sequence alignments

ConservedNot conserved

folding

function

Page 30: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

Functional inference from multiple sequence alignments

ConservedNot conserved

folding

function

stability

kineticsenzyme activity

binding

post-translational modification

Page 31: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

ConservedNot conserved

folding

function

stability

kineticsenzyme activity

binding

post-translational modification

species differences

Page 32: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

Next time:

• Visit rcsb.org

• Try visualizing a protein.

• Locate a residue that is conserved across all species in a BLAST search.

• Locate one that is conserved except in one species. What might be its function?

Page 33: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

41

2 3

1 2

34

41

2 3

1 2

34

41

2 3

1 2

34

41

2 3

1 2

34

41

2 3

1 2

34

41

2 3

1 2

34

41

2 3

1 2

34

41

2 3

1 2

34

41

2 3

1 2

34

2

3

4

1

����� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ���������� �� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

���� �� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ������ � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

���������� �� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ����� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ������� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ������ � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ������ � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��������� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

���� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ������� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

��� ��������� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ����� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

��! � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ������ � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

� �"���� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �# ������� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �$� ���������� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ������������� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

������ � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ����� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �%��������� �" � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

�� ������� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

����

Mm1:C362 Mm1:S393 Mm1:C499 Mm2:C210 Mm3:C96 Mm3:C197 Mm4:C114 Mm4:C170 Mm4:C233• • • • • • • • •

5

*

Phylogenetic analysis

variable position conserved

single position conserved

Transmembrane Cys can still for self-reacting SS when it mutates to a new position. Therefore, variable position conservd Cys are self-reacting. Single

position conserved cys are cross-reacting.

Page 34: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

����� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ���������� �� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

���� �� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ������ � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

���������� �� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ����� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ������� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ������ � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ������ � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��������� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

���� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ������� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

��� ��������� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ����� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

��! � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ������ � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

� �"���� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �# ������� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �$� ���������� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ������������� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

������ � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ����� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �%��������� �" � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

�� ������� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

����

Mm1:C362 Mm1:S393• •

In this case, mammals found to be missing conserved cysteines in the sperm specific calcium channel CatSper were species that lacked sperm competition.

http://etetoolkit.org

Page 35: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

Format of sequence alignment for ETE tree

>Squirrel FLVVCLNT---CIFLCIYV---LTLMFTCLF---LLRICRVLR---VSICTSEFA---LGFCLFGI---LTILVCEV---LVHVCMAV---ICITQDGW >Beaver FVTVCLNT---CIFLCIYV---LILMFTCMF---LLRICRVLR---VSICTSEFF---LGFCLFGI---LTILICEV---LVHVCMAV---ICITQDGW >Blind mole rat FLVVCLNT---SIFLCIYI---LTLMFTCLF---LLRICRVLK---VSTYACEFF---LGFCLFGV---LTILTCEV---LVHVCMAV---ICITQDGW >Mouse FIVVCLNT---SIFLSIYV---LTLMFTCLF---LLRVCRVLR---VSVYVCEFL---LGFCLFGV---LTILICEV---LVHVCMAV---ICITQDGW >Pika FLVICLNT---CIFLSIYV---LTLMFTCLF---LLRICRVLR---VSIYASEFS---LGFCLFGT---LTILICEV---LLHVCMSV---ICITQDGW >Rabbit FLVVCLNT---CIFLCIYM---FVLMFTCLF---LLRICRVLR---VSIYASEFS---LGFCLFGA---LTILFCEV---LLHVCMAV---ICITQDGW >Gibbon FFVVCLNT---SIFFCIYV---LILMFTCLF---LLRICRVLR---VSICTSELF---LGFCLFGS---LTILICEV---LVHVCMAV---ICITQDGW >Monkey FFIVCLNT---SIFFCIYV---LILMFTCLF---FLRICRVLR---VSICTSELA---LGFCLFGS---LTILICEV---LVHVCMAV---ICITQDGW >Bushbaby FFIICLNT---CIFFCIYV---LILMFTCLF---FLRICRVLR---VGIYSAEFY---LGFCLFGV---LSILVCEV---LIHVCMAV---ICITQDGW >Lemur FFIICLNT---AIFFSIYL---LILMFTCLF---FLRICRVLR---VSIYSSEFV---LGFCLFGV---LTILICEV---LVHVCMAV---ICITQDGW >Sifaka FFVICLNT---SIFFCIYV---LILMFTCLF---FLRICRVLR---VSIYSSEFS---LGFCLFGV---LTILICEV---LVHVCMAV---ICITQDGW

FASTA format. Output by most alignment programs and packages.

http://etetoolkit.org

Page 36: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

Format of tree for ETE tree

(Beaver:0.106861,(('Blind mole rat':0.0870003,Mouse:0.128141):0.0287991,('Naked mole rat':0.316691,((Pika:0.0584227,Rabbit:0.0514835):0.0419969,(((Gibbon:0.062089,Monkey:0.0723263):0.0501071,(Bushbaby:0.104971,(Lemur:0.0853643,Sifaka:0.0510973):0.0395091):0.00449631):0.0111712,((Marsupials:0.390453,((Manatee:0.0517099,Mole:0.11989):0.00669083,'Elephant shrew':0.143216):0.0258566):0.00193119,('Star-nosed mole':0.111938,((Alpaca:0.102393,Pig:0.0618056):0.010159,(Leopard:0.0585696,('Brown bat':0.108369,('Fruit bat':0.0725375,('Horseshoe bat':0.0640651,'Leaf-nosed bat':0.0497925):0.0219586):0.00813098):0.00277678):0.00371913):0.00419253):0.00732498):0.0142435):0.00485045):0.00731201):0.00875856):0.00305836,Squirrel:0.0786295);

Newick format. Output UGENE, NCBI, most tree tools.

http://etetoolkit.org

Page 37: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

Protein Data Bank

• rcsb.org

• 4ms2 a voltage-gated calcium channel.

1) visualize overall structure in NGL2) view ligands3) view electron density3) find an amino acid. Zoom in.4) Homology modeling.

Page 38: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

38

superposed homologs

Page 39: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

39

Page 40: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

40

Page 41: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

Homology modeling in a nutshell.

ACDEFG....HIKLMNPQRSTVWY ||:|| || :| | ||||: .CDDFGACDGHIYIM..QQSTVWF

target

template

Modeling action... • Add Ala to the N-terminal Cys using energy minimization.• • Keep the conserved Phe sidechain and backbone. • Cut out the four residue insertion and connect G to H. • Switch non-similar sidechains Y->K. Possibly move backbone.. • Cut at M-Q, insert two residues, Asn-Pro. • Switch similar sidechains F->Y. Keep backbone fixed..

ALIGNMENT

Page 42: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

Automatic homology modeling by SWISS-MODEL

42

https://swissmodel.expasy.org/interactive

Page 43: Phylogenetic Trees • Protein Structure · Phylogenetic analysis variable position conserved single position conserved Transmembrane Cys can still for self-reacting SS when it mutates

Next time:

• Read CatSper paper Bystroff, C. (2018). Intramembranal disulfide cross-linking elucidates the super-quaternary structure of mammalian CatSpers. Reproductive biology, 18(1), 76-82. Chicago

Read at least one part of the paper in detail. Bring a comment, suggestion, or question to class 2/19.