4. profile hmms - eclass.uoa.gr · profile hmms for sequence families 1. so far we have...
TRANSCRIPT
![Page 1: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/1.jpg)
Profile HMMs for sequence families
1. So far we have concentrated on the intrinsic properties of single sequences,
such as CpG islands.
2. However functional biological sequences typically come in families, so what
we are after is identifying the relationship of an individual sequences to a
sequence family.
3. A multiple sequence alignment can show how the sequences in a family
relate to each other.
4. Some positions in a multiple sequence alignment are more conserved than
others (e.g. helices as opposed to loop regions).
5. Therefore it would be desirable, when identifying new sequence members to
concentrate more on conserved features.
6. For this task we will discuss a special type of HMM, well suited to modelling
multiple alignments; we will call these profile HMMs.
7. Profile HMMs are the most popular application of HMMs in molecular biology.
![Page 2: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/2.jpg)
Profile HMMs for sequence families
1. So far we have concentrated on the intrinsic properties of single sequences,
such as CpG islands.
2. However functional biological sequences typically come in families, so what
we are after is identifying the relationship of an individual sequences to a
sequence family.
3. A multiple sequence alignment can show how the sequences in a family
relate to each other.
4. Some positions in a multiple sequence alignment are more conserved than
others (e.g. helices as opposed to loop regions).
5. Therefore it would be desirable, when identifying new sequence members to
concentrate more on conserved features.
6. For this task we will discuss a special type of HMM, well suited to modelling
multiple alignments; we will call these profile HMMs.
7. Profile HMMs are the most popular application of HMMs in molecular biology.
![Page 3: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/3.jpg)
Profile HMMs for sequence families
1. So far we have concentrated on the intrinsic properties of single sequences,
such as CpG islands.
2. However functional biological sequences typically come in families, so what
we are after is identifying the relationship of an individual sequences to a
sequence family.
3. A multiple sequence alignment can show how the sequences in a family
relate to each other.
4. Some positions in a multiple sequence alignment are more conserved than
others (e.g. helices as opposed to loop regions).
5. Therefore it would be desirable, when identifying new sequence members to
concentrate more on conserved features.
6. For this task we will discuss a special type of HMM, well suited to modelling
multiple alignments; we will call these profile HMMs.
7. Profile HMMs are the most popular application of HMMs in molecular biology.
![Page 4: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/4.jpg)
Profile HMMs for sequence families
1. So far we have concentrated on the intrinsic properties of single sequences,
such as CpG islands.
2. However functional biological sequences typically come in families, so what
we are after is identifying the relationship of an individual sequences to a
sequence family.
3. A multiple sequence alignment can show how the sequences in a family
relate to each other.
4. Some positions in a multiple sequence alignment are more conserved than
others (e.g. helices as opposed to loop regions).
5. Therefore it would be desirable, when identifying new sequence members to
concentrate more on conserved features.
6. For this task we will discuss a special type of HMM, well suited to modelling
multiple alignments; we will call these profile HMMs.
7. Profile HMMs are the most popular application of HMMs in molecular biology.
![Page 5: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/5.jpg)
Profile HMMs for sequence families
1. So far we have concentrated on the intrinsic properties of single sequences,
such as CpG islands.
2. However functional biological sequences typically come in families, so what
we are after is identifying the relationship of an individual sequences to a
sequence family.
3. A multiple sequence alignment can show how the sequences in a family
relate to each other.
4. Some positions in a multiple sequence alignment are more conserved than
others (e.g. helices as opposed to loop regions).
5. Therefore it would be desirable, when identifying new sequence members to
concentrate more on conserved features.
6. For this task we will discuss a special type of HMM, well suited to modelling
multiple alignments; we will call these profile HMMs.
7. Profile HMMs are the most popular application of HMMs in molecular biology.
![Page 6: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/6.jpg)
Profile HMMs for sequence families
1. So far we have concentrated on the intrinsic properties of single sequences,
such as CpG islands.
2. However functional biological sequences typically come in families, so what
we are after is identifying the relationship of an individual sequences to a
sequence family.
3. A multiple sequence alignment can show how the sequences in a family
relate to each other.
4. Some positions in a multiple sequence alignment are more conserved than
others (e.g. helices as opposed to loop regions).
5. Therefore it would be desirable, when identifying new sequence members to
concentrate more on conserved features.
6. For this task we will discuss a special type of HMM, well suited to modelling
multiple alignments; we will call these profile HMMs.
7. Profile HMMs are the most popular application of HMMs in molecular biology.
![Page 7: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/7.jpg)
Profile HMMs for sequence families
1. So far we have concentrated on the intrinsic properties of single sequences,
such as CpG islands.
2. However functional biological sequences typically come in families, so what
we are after is identifying the relationship of an individual sequences to a
sequence family.
3. A multiple sequence alignment can show how the sequences in a family
relate to each other.
4. Some positions in a multiple sequence alignment are more conserved than
others (e.g. helices as opposed to loop regions).
5. Therefore it would be desirable, when identifying new sequence members to
concentrate more on conserved features.
6. For this task we will discuss a special type of HMM, well suited to modelling
multiple alignments; we will call these profile HMMs.
7. Profile HMMs are the most popular application of HMMs in molecular biology.
![Page 8: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/8.jpg)
Position specific score matrix
A common feature of protein multiple sequence alignments is that gaps tend to line up
with each other, leaving solid blocks with no insertions or deletions.
A probabilistic model for a conserved region (block) would be to specify independent
probabilities ei(α) of observing amino acid α in position i. The probability of a new
sequence x, is:
where L is the length of the block. To test for family membership we evaluate, as usual
log-odds ratio:
The values log(ei(a)/qa) behave like elements in a score matrix s(a, b) where the
second index is position i, rather than amino acid b.
Such an approach is known as a position specific score matrix (PSSM).
![Page 9: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/9.jpg)
Position specific score matrix
A common feature of protein multiple sequence alignments is that gaps tend to line up
with each other, leaving solid blocks with no insertions or deletions.
A probabilistic model for a conserved region (block) would be to specify independent
probabilities ei(α) of observing amino acid α in position i. The probability of a new
sequence x, is:
where L is the length of the block. To test for family membership we evaluate, as usual
log-odds ratio:
The values log(ei(a)/qa) behave like elements in a score matrix s(a, b) where the
second index is position i, rather than amino acid b.
Such an approach is known as a position specific score matrix (PSSM).
1
( | ) ( )L
i i
i
P x M e x=
=∏
![Page 10: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/10.jpg)
Position specific score matrix
A common feature of protein multiple sequence alignments is that gaps tend to line up
with each other, leaving solid blocks with no insertions or deletions.
A probabilistic model for a conserved region (block) would be to specify independent
probabilities ei(α) of observing amino acid α in position i. The probability of a new
sequence x, is:
where L is the length of the block. To test for family membership we evaluate, as usual
log-odds ratio:
The values log(ei(a)/qa) behave like elements in a score matrix s(a, b) where the
second index is position i, rather than amino acid b.
Such an approach is known as a position specific score matrix (PSSM).
1
( | ) ( )L
i i
i
P x M e x=
=∏
1
( )log
i
Li i
i x
e xS
q=
=∑
![Page 11: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/11.jpg)
Position specific score matrix
A common feature of protein multiple sequence alignments is that gaps tend to line up
with each other, leaving solid blocks with no insertions or deletions.
A probabilistic model for a conserved region (block) would be to specify independent
probabilities ei(α) of observing amino acid α in position i. The probability of a new
sequence x, is:
where L is the length of the block. To test for family membership we evaluate, as usual
log-odds ratio:
The values log(ei(a)/qa) behave like elements in a score matrix s(a, b) where the
second index is position i, rather than amino acid b.
Such an approach is known as a position specific score matrix (PSSM).
1
( | ) ( )L
i i
i
P x M e x=
=∏
1
( )log
i
Li i
i x
e xS
q=
=∑
![Page 12: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/12.jpg)
Profile HMMs: αs and es
![Page 13: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/13.jpg)
Profile HMMs: transition structure
•Match
•Insertion
•Deletion
![Page 14: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/14.jpg)
Profile HMMs: Viterbi
Let VjM(i) be the log-odds score of the best path matching
subsequence x1…i to the submodel up to state j, ending with xi being emitted by state Mj (match).
Similarly VjI(i), Vj
D(i) are the scores for state Ij (insertion) and Dj(deletion) respectively.
Then we can write:
![Page 15: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/15.jpg)
Profile HMMs: Viterbi
Let VjM(i) be the log-odds score of the best path matching
subsequence x1…i to the submodel up to state j, ending with xi being emitted by state Mj (match).
Similarly VjI(i), Vj
D(i) are the scores for state Ij (insertion) and Dj(deletion) respectively.
Then we can write:
![Page 16: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/16.jpg)
Profile HMMs: model construction
![Page 17: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/17.jpg)
![Page 18: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/18.jpg)
v
![Page 19: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/19.jpg)
![Page 20: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/20.jpg)
![Page 21: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/21.jpg)
![Page 22: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/22.jpg)
Begin
![Page 23: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/23.jpg)
Begin
![Page 24: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/24.jpg)
((
(
![Page 25: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/25.jpg)
(
![Page 26: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/26.jpg)
(
![Page 27: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/27.jpg)
![Page 28: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/28.jpg)
![Page 29: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/29.jpg)
(
![Page 30: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/30.jpg)
(
![Page 31: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/31.jpg)
((( (
![Page 32: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/32.jpg)
((
![Page 33: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/33.jpg)
End
![Page 34: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/34.jpg)
End
![Page 35: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/35.jpg)
hmmalign - align sequences to a profile HMM
hmmbuild - construct profile HMM(s) from multiple sequence alignment(s)
hmmconvert - convert profile file to a HMMER format
hmmemit - sample sequences from a profile HMM
hmmfetch - retrieve profile HMM(s) from a file
hmmpress - prepare an HMM database for hmmscan
hmmscan - search sequence(s) against a profile database
hmmsearch - search profile(s) against a sequence database
hmmsim - collect score distributions on random sequences
hmmstat - display summary statistics for a profile file
jackhmmer - iteratively search sequence(s) against a protein database
phmmer - search protein sequence(s) against a protein sequence database
SOURCE: http://hmmer.janelia.org/
HMMER
![Page 36: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/36.jpg)
hmmalign - align sequences to a profile HMM
hmmbuild - construct profile HMM(s) from multiple sequence alignment(s)
hmmconvert - convert profile file to a HMMER format
hmmemit - sample sequences from a profile HMM
hmmfetch - retrieve profile HMM(s) from a file
hmmpress - prepare an HMM database for hmmscan
hmmscan - search sequence(s) against a profile database
hmmsearch - search profile(s) against a sequence database
hmmsim - collect score distributions on random sequences
hmmstat - display summary statistics for a profile file
jackhmmer - iteratively search sequence(s) against a protein database
phmmer - search protein sequence(s) against a protein sequence database
SOURCE: http://hmmer.janelia.org/
HMMER
![Page 37: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/37.jpg)
hmmalign - align sequences to a profile HMM
hmmbuild - construct profile HMM(s) from multiple sequence alignment(s)
hmmconvert - convert profile file to a HMMER format
hmmemit - sample sequences from a profile HMM
hmmfetch - retrieve profile HMM(s) from a file
hmmpress - prepare an HMM database for hmmscan
hmmscan - search sequence(s) against a profile database
hmmsearch - search profile(s) against a sequence database
hmmsim - collect score distributions on random sequences
hmmstat - display summary statistics for a profile file
jackhmmer - iteratively search sequence(s) against a protein database
phmmer - search protein sequence(s) against a protein sequence database
SOURCE: http://hmmer.janelia.org/
HMMER
![Page 38: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/38.jpg)
hmmalign - align sequences to a profile HMM
hmmbuild - construct profile HMM(s) from multiple sequence alignment(s)
hmmconvert - convert profile file to a HMMER format
hmmemit - sample sequences from a profile HMM
hmmfetch - retrieve profile HMM(s) from a file
hmmpress - prepare an HMM database for hmmscan
hmmscan - search sequence(s) against a profile database
hmmsearch - search profile(s) against a sequence database
hmmsim - collect score distributions on random sequences
hmmstat - display summary statistics for a profile file
jackhmmer - iteratively search sequence(s) against a protein database
phmmer - search protein sequence(s) against a protein sequence database
SOURCE: http://hmmer.janelia.org/
HMMER
![Page 39: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/39.jpg)
hmmalign - align sequences to a profile HMM
hmmbuild - construct profile HMM(s) from multiple sequence alignment(s)
hmmconvert - convert profile file to a HMMER format
hmmemit - sample sequences from a profile HMM
hmmfetch - retrieve profile HMM(s) from a file
hmmpress - prepare an HMM database for hmmscan
hmmscan - search sequence(s) against a profile database
hmmsearch - search profile(s) against a sequence database
hmmsim - collect score distributions on random sequences
hmmstat - display summary statistics for a profile file
jackhmmer - iteratively search sequence(s) against a protein database
phmmer - search protein sequence(s) against a protein sequence database
SOURCE: http://hmmer.janelia.org/
HMMER
![Page 40: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/40.jpg)
hmmalign - align sequences to a profile HMM
hmmbuild - construct profile HMM(s) from multiple sequence alignment(s)
hmmconvert - convert profile file to a HMMER format
hmmemit - sample sequences from a profile HMM
hmmfetch - retrieve profile HMM(s) from a file
hmmpress - prepare an HMM database for hmmscan
hmmscan - search sequence(s) against a profile database
hmmsearch - search profile(s) against a sequence database
hmmsim - collect score distributions on random sequences
hmmstat - display summary statistics for a profile file
jackhmmer - iteratively search sequence(s) against a protein database
phmmer - search protein sequence(s) against a protein sequence database
SOURCE: http://hmmer.janelia.org/
HMMER
![Page 41: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/41.jpg)
hmmalign - align sequences to a profile HMM
hmmbuild - construct profile HMM(s) from multiple sequence alignment(s)
hmmconvert - convert profile file to a HMMER format
hmmemit - sample sequences from a profile HMM
hmmfetch - retrieve profile HMM(s) from a file
hmmpress - prepare an HMM database for hmmscan
hmmscan - search sequence(s) against a profile database
hmmsearch - search profile(s) against a sequence database
hmmsim - collect score distributions on random sequences
hmmstat - display summary statistics for a profile file
jackhmmer - iteratively search sequence(s) against a protein database
phmmer - search protein sequence(s) against a protein sequence database
SOURCE: http://hmmer.janelia.org/
HMMER
![Page 42: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/42.jpg)
hmmalign - align sequences to a profile HMM
hmmbuild - construct profile HMM(s) from multiple sequence alignment(s)
hmmconvert - convert profile file to a HMMER format
hmmemit - sample sequences from a profile HMM
hmmfetch - retrieve profile HMM(s) from a file
hmmpress - prepare an HMM database for hmmscan
hmmscan - search sequence(s) against a profile database
hmmsearch - search profile(s) against a sequence database
hmmsim - collect score distributions on random sequences
hmmstat - display summary statistics for a profile file
jackhmmer - iteratively search sequence(s) against a protein database
phmmer - search protein sequence(s) against a protein sequence database
SOURCE: http://hmmer.janelia.org/
HMMER
![Page 43: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/43.jpg)
hmmalign - align sequences to a profile HMM
hmmbuild - construct profile HMM(s) from multiple sequence alignment(s)
hmmconvert - convert profile file to a HMMER format
hmmemit - sample sequences from a profile HMM
hmmfetch - retrieve profile HMM(s) from a file
hmmpress - prepare an HMM database for hmmscan
hmmscan - search sequence(s) against a profile database
hmmsearch - search profile(s) against a sequence database
hmmsim - collect score distributions on random sequences
hmmstat - display summary statistics for a profile file
jackhmmer - iteratively search sequence(s) against a protein database
phmmer - search protein sequence(s) against a protein sequence database
SOURCE: http://hmmer.janelia.org/
HMMER
![Page 44: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/44.jpg)
hmmalign - align sequences to a profile HMM
hmmbuild - construct profile HMM(s) from multiple sequence alignment(s)
hmmconvert - convert profile file to a HMMER format
hmmemit - sample sequences from a profile HMM
hmmfetch - retrieve profile HMM(s) from a file
hmmpress - prepare an HMM database for hmmscan
hmmscan - search sequence(s) against a profile database
hmmsearch - search profile(s) against a sequence database
hmmsim - collect score distributions on random sequences
hmmstat - display summary statistics for a profile file
jackhmmer - iteratively search sequence(s) against a protein database
phmmer - search protein sequence(s) against a protein sequence database
SOURCE: http://hmmer.janelia.org/
HMMER
![Page 45: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/45.jpg)
hmmalign - align sequences to a profile HMM
hmmbuild - construct profile HMM(s) from multiple sequence alignment(s)
hmmconvert - convert profile file to a HMMER format
hmmemit - sample sequences from a profile HMM
hmmfetch - retrieve profile HMM(s) from a file
hmmpress - prepare an HMM database for hmmscan
hmmscan - search sequence(s) against a profile database
hmmsearch - search profile(s) against a sequence database
hmmsim - collect score distributions on random sequences
hmmstat - display summary statistics for a profile file
jackhmmer - iteratively search sequence(s) against a protein database
phmmer - search protein sequence(s) against a protein sequence database
SOURCE: http://hmmer.janelia.org/
HMMER
![Page 46: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/46.jpg)
hmmalign - align sequences to a profile HMM
hmmbuild - construct profile HMM(s) from multiple sequence alignment(s)
hmmconvert - convert profile file to a HMMER format
hmmemit - sample sequences from a profile HMM
hmmfetch - retrieve profile HMM(s) from a file
hmmpress - prepare an HMM database for hmmscan
hmmscan - search sequence(s) against a profile database
hmmsearch - search profile(s) against a sequence database
hmmsim - collect score distributions on random sequences
hmmstat - display summary statistics for a profile file
jackhmmer - iteratively search sequence(s) against a protein database
phmmer - search protein sequence(s) against a protein sequence database
SOURCE: http://hmmer.janelia.org/
HMMER
![Page 47: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/47.jpg)
![Page 48: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/48.jpg)
the number of
match states in
the model.
Symbol alphabet
type
the number of
sequences that the
HMM was trained on
![Page 49: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/49.jpg)
the number of
match states in
the model.
Symbol alphabet
type
the number of
sequences that the
HMM was trained on
the effective total number
of sequences determined
by hmmbuild during
sequence weighting, for
combining observed
counts with Dirichlet prior
information in
parameterizing the model
![Page 50: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/50.jpg)
the number of
match states in
the model.
Symbol alphabet
type
the number of
sequences that the
HMM was trained on
the effective total number
of sequences determined
by hmmbuild during
sequence weighting, for
combining observed
counts with Dirichlet prior
information in
parameterizing the model
This number is calculated
from the training sequence
data, and used in conjunction
with the alignment map
information to verify that a
given alignment is indeed the
alignment that the map is for
![Page 51: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/51.jpg)
the number of
match states in
the model.
Symbol alphabet
type
Reference annotation flag;.Reference
column annotation is picked up from a
Stockholm alignment file’s #=GC RF line.
the number of
sequences that the
HMM was trained on
the effective total number
of sequences determined
by hmmbuild during
sequence weighting, for
combining observed
counts with Dirichlet prior
information in
parameterizing the model
This number is calculated
from the training sequence
data, and used in conjunction
with the alignment map
information to verify that a
given alignment is indeed the
alignment that the map is for
![Page 52: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/52.jpg)
the number of
match states in
the model.
Symbol alphabet
type
Reference annotation flag;.Reference
column annotation is picked up from a
Stockholm alignment file’s #=GC RF line.
Consensus structure annotation flag; is picked
up from a Stockholm file’s #=GC SS_cons line
the number of
sequences that the
HMM was trained on
the effective total number
of sequences determined
by hmmbuild during
sequence weighting, for
combining observed
counts with Dirichlet prior
information in
parameterizing the model
This number is calculated
from the training sequence
data, and used in conjunction
with the alignment map
information to verify that a
given alignment is indeed the
alignment that the map is for
![Page 53: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/53.jpg)
the number of
match states in
the model.
Symbol alphabet
type
Reference annotation flag;.Reference
column annotation is picked up from a
Stockholm alignment file’s #=GC RF line.
Consensus structure annotation flag; is picked
up from a Stockholm file’s #=GC SS_cons line
Map annotation flag. The HMM/alignment map
annotates each match state with the index of the
alignment column from which it came. It can be
used for quickly mapping any subsequent HMM
alignment back to the original multiple alignment,
via the model
the number of
sequences that the
HMM was trained on
the effective total number
of sequences determined
by hmmbuild during
sequence weighting, for
combining observed
counts with Dirichlet prior
information in
parameterizing the model
This number is calculated
from the training sequence
data, and used in conjunction
with the alignment map
information to verify that a
given alignment is indeed the
alignment that the map is for
![Page 54: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/54.jpg)
Statistical parameters
needed for E-value
calculations.
![Page 55: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/55.jpg)
Flags the start of the
main model section
Statistical parameters
needed for E-value
calculations.
![Page 56: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/56.jpg)
Flags the start of the
main model section
these are the model’s
overall average match
state emission
probabilities, which are
used as a background
residue composition in the
“filter null” model.
Statistical parameters
needed for E-value
calculations.
![Page 57: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/57.jpg)
contains information for the core model’s BEGIN node. This is
stored as model node 0, and match state 0 is treated as the BEGIN
state.
The begin state is mute, so there are no match emission
probabilities. The first line is the insert 0 emissions. The second line
contains the transitions from the begin state and insert state 0.
These seven numbers are:
B → M1, B → I0, B → D1; I0 → M1, I0 → I0;
then a 0.0 and a ’*’, because by convention, nonexistent
transitions from the nonexistent delete state 0 are set to log 1 = 0
and log 0 = -∞ = ‘*’.
![Page 58: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/58.jpg)
Line 1: Match emission line
Line 2: Insert emission line
Line 3: State transition line
![Page 59: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/59.jpg)
Line 1: Match emission line
Line 2: Insert emission line
Line 3: State transition line
![Page 60: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/60.jpg)
the MAP annotation
the RF annotation
the CS annotation
… for this node
Line 1: Match emission line
Line 2: Insert emission line
Line 3: State transition line
![Page 61: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/61.jpg)
![Page 62: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/62.jpg)
score: The bit score for this
domain.
![Page 63: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/63.jpg)
score: The bit score for this
domain.
The biased composition
(null2) score correction that
was applied to the domain bit
score.
![Page 64: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/64.jpg)
score: The bit score for this
domain.
The biased composition
(null2) score correction that
was applied to the domain bit
score.
The “conditional E-value”, a
permissive measure of how reliable
this particular domain
may be. The conditional E-value is
calculated on a smaller search
space than the independent
Evalue. The conditional E-value
uses the number of targets that
pass the reporting thresholds.
![Page 65: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/65.jpg)
score: The bit score for this
domain.
The biased composition
(null2) score correction that
was applied to the domain bit
score.
The “conditional E-value”, a
permissive measure of how reliable
this particular domain
may be. The conditional E-value is
calculated on a smaller search
space than the independent
Evalue. The conditional E-value
uses the number of targets that
pass the reporting thresholds.
The “independent E-value”, the E-
value that the sequence/profile
comparison would have
received if this were the only
domain envelope found in it,
excluding any others. This is a
stringent measure of how reliable
this particular domain may be. The
independent E-value uses the total
number of targets in the target
database.
![Page 66: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/66.jpg)
score: The bit score for this
domain.
The biased composition
(null2) score correction that
was applied to the domain bit
score.
The “conditional E-value”, a
permissive measure of how reliable
this particular domain
may be. The conditional E-value is
calculated on a smaller search
space than the independent
Evalue. The conditional E-value
uses the number of targets that
pass the reporting thresholds.
The “independent E-value”, the E-
value that the sequence/profile
comparison would have
received if this were the only
domain envelope found in it,
excluding any others. This is a
stringent measure of how reliable
this particular domain may be. The
independent E-value uses the total
number of targets in the target
database.
The mean posterior
probability of aligned
residues in the alignment;
a measure of how reliable
the overall alignment is
![Page 67: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/67.jpg)
score: The bit score for this
domain.
The biased composition
(null2) score correction that
was applied to the domain bit
score.
The “conditional E-value”, a
permissive measure of how reliable
this particular domain
may be. The conditional E-value is
calculated on a smaller search
space than the independent
Evalue. The conditional E-value
uses the number of targets that
pass the reporting thresholds.
The “independent E-value”, the E-
value that the sequence/profile
comparison would have
received if this were the only
domain envelope found in it,
excluding any others. This is a
stringent measure of how reliable
this particular domain may be. The
independent E-value uses the total
number of targets in the target
database.
The start/end of the domain envelope on the
sequence, numbered 1..L for a sequence
of L residues. The envelope defines a
subsequence for which their is substantial
probability mass supporting a homologous
domain
The mean posterior
probability of aligned
residues in the alignment;
a measure of how reliable
the overall alignment is
![Page 68: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/68.jpg)
score: The bit score for this
domain.
The biased composition
(null2) score correction that
was applied to the domain bit
score.
The “conditional E-value”, a
permissive measure of how reliable
this particular domain
may be. The conditional E-value is
calculated on a smaller search
space than the independent
Evalue. The conditional E-value
uses the number of targets that
pass the reporting thresholds.
The “independent E-value”, the E-
value that the sequence/profile
comparison would have
received if this were the only
domain envelope found in it,
excluding any others. This is a
stringent measure of how reliable
this particular domain may be. The
independent E-value uses the total
number of targets in the target
database.
The start/end of the alignment
of this domain with respect to
the sequence
The start/end of the domain envelope on the
sequence, numbered 1..L for a sequence
of L residues. The envelope defines a
subsequence for which their is substantial
probability mass supporting a homologous
domain
The mean posterior
probability of aligned
residues in the alignment;
a measure of how reliable
the overall alignment is
![Page 69: 4. Profile HMMs - eclass.uoa.gr · Profile HMMs for sequence families 1. So far we have concentrated on the intrinsic properties of single sequences , such as CpGislands . 2. However](https://reader034.vdocuments.site/reader034/viewer/2022052519/5f12a180114dfa37344221e8/html5/thumbnails/69.jpg)
score: The bit score for this
domain.
The biased composition
(null2) score correction that
was applied to the domain bit
score.
The “conditional E-value”, a
permissive measure of how reliable
this particular domain
may be. The conditional E-value is
calculated on a smaller search
space than the independent
Evalue. The conditional E-value
uses the number of targets that
pass the reporting thresholds.
The “independent E-value”, the E-
value that the sequence/profile
comparison would have
received if this were the only
domain envelope found in it,
excluding any others. This is a
stringent measure of how reliable
this particular domain may be. The
independent E-value uses the total
number of targets in the target
database.
The start/end of the alignment
of this domain with respect to
the profile
The start/end of the alignment
of this domain with respect to
the sequence
The start/end of the domain envelope on the
sequence, numbered 1..L for a sequence
of L residues. The envelope defines a
subsequence for which their is substantial
probability mass supporting a homologous
domain
The mean posterior
probability of aligned
residues in the alignment;
a measure of how reliable
the overall alignment is