emboss – an application suite for bioinformatics shahid manzoor adnan niazi slu global...
TRANSCRIPT
![Page 1: EMBOSS – an application suite for Bioinformatics Shahid Manzoor Adnan Niazi SLU Global Bioinformatics Centre](https://reader036.vdocuments.site/reader036/viewer/2022081515/56649c785503460f9492d58d/html5/thumbnails/1.jpg)
EMBOSS – an application suite for Bioinformatics
Shahid ManzoorShahid Manzoor
Adnan NiaziAdnan NiaziSLU Global Bioinformatics Centre
![Page 2: EMBOSS – an application suite for Bioinformatics Shahid Manzoor Adnan Niazi SLU Global Bioinformatics Centre](https://reader036.vdocuments.site/reader036/viewer/2022081515/56649c785503460f9492d58d/html5/thumbnails/2.jpg)
E – European
M – Molecular
B – Biology
O – Open
S – Software
S - SuiteSLU Global Bioinformatics Centre
![Page 3: EMBOSS – an application suite for Bioinformatics Shahid Manzoor Adnan Niazi SLU Global Bioinformatics Centre](https://reader036.vdocuments.site/reader036/viewer/2022081515/56649c785503460f9492d58d/html5/thumbnails/3.jpg)
SLU Global Bioinformatics Centre
All Information
EMBOSS info at http://emboss.sourceforge.net/.
wEMBOSS info at http://wemboss.sourceforge.net/.
E-mail [email protected] to get a username and password for
wEMBOSS at http://ebiokit.hgen.slu.se/.
![Page 4: EMBOSS – an application suite for Bioinformatics Shahid Manzoor Adnan Niazi SLU Global Bioinformatics Centre](https://reader036.vdocuments.site/reader036/viewer/2022081515/56649c785503460f9492d58d/html5/thumbnails/4.jpg)
SLU Global Bioinformatics Centre
Open Source molecular biology analysis package.
Handles a variety of common file formats.
Provides libraries for easy development
Software, licensed under GPL and LGPL
Developed by Martin Sarachu and Marc Colet
Available at http://emboss.sourceforge.net
What is EMBOSS
![Page 5: EMBOSS – an application suite for Bioinformatics Shahid Manzoor Adnan Niazi SLU Global Bioinformatics Centre](https://reader036.vdocuments.site/reader036/viewer/2022081515/56649c785503460f9492d58d/html5/thumbnails/5.jpg)
SLU Global Bioinformatics Centre
A comprehensive set of sequence analysis programs.
All sequence and many alignment and structural formats are Handled.
It runs on practically every UNIX you can think of (and likely some that you can't), plus Windows and OS X.
Each application has the same style of interface so master one and you've mastered them all.
Features of EMBOSS
![Page 6: EMBOSS – an application suite for Bioinformatics Shahid Manzoor Adnan Niazi SLU Global Bioinformatics Centre](https://reader036.vdocuments.site/reader036/viewer/2022081515/56649c785503460f9492d58d/html5/thumbnails/6.jpg)
SLU Global Bioinformatics Centre
Sequence alignment.
Protein motif identification (including domain analysis)
Nucleotide sequence pattern analysis (for example to
identify CpG islands or repeats).
Presentation tools for publications.
Uses for EMBOSS
![Page 7: EMBOSS – an application suite for Bioinformatics Shahid Manzoor Adnan Niazi SLU Global Bioinformatics Centre](https://reader036.vdocuments.site/reader036/viewer/2022081515/56649c785503460f9492d58d/html5/thumbnails/7.jpg)
SLU Global Bioinformatics Centre
Many small and large programs in package (>140).
All programs share a common look and feel.
Easy to run from command line.
Retrieval of sequence data from the web.
Programs in EMBOSS
![Page 8: EMBOSS – an application suite for Bioinformatics Shahid Manzoor Adnan Niazi SLU Global Bioinformatics Centre](https://reader036.vdocuments.site/reader036/viewer/2022081515/56649c785503460f9492d58d/html5/thumbnails/8.jpg)
SLU Global Bioinformatics Centre
The one Argument
help
the –help argument displays a short help for any EMBOSS program.
![Page 9: EMBOSS – an application suite for Bioinformatics Shahid Manzoor Adnan Niazi SLU Global Bioinformatics Centre](https://reader036.vdocuments.site/reader036/viewer/2022081515/56649c785503460f9492d58d/html5/thumbnails/9.jpg)
SLU Global Bioinformatics Centre
wossname
wossname searches the other programs short description for keywords.
The One Command
![Page 10: EMBOSS – an application suite for Bioinformatics Shahid Manzoor Adnan Niazi SLU Global Bioinformatics Centre](https://reader036.vdocuments.site/reader036/viewer/2022081515/56649c785503460f9492d58d/html5/thumbnails/10.jpg)
Large collection of gene and protein analysis tools
Sequence retrieval
Alignments
Primer design
Restriction Mapping
Protein domain searching
Translation
SLU Global Bioinformatics Centre
![Page 11: EMBOSS – an application suite for Bioinformatics Shahid Manzoor Adnan Niazi SLU Global Bioinformatics Centre](https://reader036.vdocuments.site/reader036/viewer/2022081515/56649c785503460f9492d58d/html5/thumbnails/11.jpg)
DNA
Sequence 1
DNA
Sequence 2
dotplot translation
protein local/global alignment
protein
Sequence 1
protein
Sequence 2
multiple sequence alignment
motif and domain
searching
physico-chemical
properties
SLU Global Bioinformatics Centre
![Page 12: EMBOSS – an application suite for Bioinformatics Shahid Manzoor Adnan Niazi SLU Global Bioinformatics Centre](https://reader036.vdocuments.site/reader036/viewer/2022081515/56649c785503460f9492d58d/html5/thumbnails/12.jpg)
AGTGGTCGTGAAGAGAATGCTCCTCCTTTGGAATCTTAA
>SEQ1.fasta
AGTGCTCCTCCCTTAGAATCTTAG
>SEQ2.fasta
Unix% dottup SEQ1.fasta SEQ2.fasta –window 10 &
Unix% dotmatcher SEQ1.fasta SEQ2.fasta –window 10 –threshold 17 &
For an exact match:
For a similarity match:
DotplotsDotplots
SLU Global Bioinformatics Centre
![Page 13: EMBOSS – an application suite for Bioinformatics Shahid Manzoor Adnan Niazi SLU Global Bioinformatics Centre](https://reader036.vdocuments.site/reader036/viewer/2022081515/56649c785503460f9492d58d/html5/thumbnails/13.jpg)
A T G C
A 5 -4 -4 -4
T -4 5 -4 -4
G –4 -4 5 -4
C -4 -4 -4 5
Identity Matrix
Dotplots …Dotplots …
SLU Global Bioinformatics Centre
Window Size is number of bases in a sliding window that is moved along each sequence and compared to generate a single data point on the plot. Window size must be an odd number.
Mismatch Limit determines how similar the two sequences in a window must be to "match". For example, if window size is 9 and mismatch limit is 2, then up to 2 mismatches in a 9 base window will still be classified as a match.
![Page 14: EMBOSS – an application suite for Bioinformatics Shahid Manzoor Adnan Niazi SLU Global Bioinformatics Centre](https://reader036.vdocuments.site/reader036/viewer/2022081515/56649c785503460f9492d58d/html5/thumbnails/14.jpg)
A T G C
A 5 -4 -4 -4
T -4 5 -4 -4
G –4 -4 5 -4
C -4 -4 -4 5
CCTCCTTTGG
CCTCCTTTGG
Score = 50555555555 5
CCTCCTTTGG
CCTCCCTTAG
55-455555 5-4 Score = 32
Pro Leu
Pro Leu
Dotplots …Dotplots …
SLU Global Bioinformatics Centre
![Page 15: EMBOSS – an application suite for Bioinformatics Shahid Manzoor Adnan Niazi SLU Global Bioinformatics Centre](https://reader036.vdocuments.site/reader036/viewer/2022081515/56649c785503460f9492d58d/html5/thumbnails/15.jpg)
DotplotsDotplots
SLU Global Bioinformatics Centre
A dot plot is a simple graphical representation of identical residues between two sequences.
The X axis represents the first sequence (PHO5),
The Y axis represents the second sequence (PHO3)
A dot is plotted for each match between two residues of the sequences.
Diagonal lines reveal regions of identity between the two sequences.
![Page 16: EMBOSS – an application suite for Bioinformatics Shahid Manzoor Adnan Niazi SLU Global Bioinformatics Centre](https://reader036.vdocuments.site/reader036/viewer/2022081515/56649c785503460f9492d58d/html5/thumbnails/16.jpg)
SLU Global Bioinformatics Centre
The dot plot can be adapted to display only word matches, which correspond to a
diagonal of dots in the letter-based dot plot.
Example: alignment of PHO5 and PHO3 coding sequences, with different word sizes.
Dotplots …Dotplots …
![Page 17: EMBOSS – an application suite for Bioinformatics Shahid Manzoor Adnan Niazi SLU Global Bioinformatics Centre](https://reader036.vdocuments.site/reader036/viewer/2022081515/56649c785503460f9492d58d/html5/thumbnails/17.jpg)
SLU Global Bioinformatics Centre
Detecting repeats with a dot plot
Sequence repeats are easily detected in a dot plot when a sequence is
compared to itself.
The main diagonal is completely marked
(by definition, since the sequence is identical do itself)
Repeats appear as segments of lines parallel to the diagonal.
![Page 18: EMBOSS – an application suite for Bioinformatics Shahid Manzoor Adnan Niazi SLU Global Bioinformatics Centre](https://reader036.vdocuments.site/reader036/viewer/2022081515/56649c785503460f9492d58d/html5/thumbnails/18.jpg)
ATGGGTCGTGAAGAGAATGCTCCTCCTTTGGAATCTTAA
>SEQ1.fasta
ATGGCTCCTCCCTTAGAATCTTAG
>SEQ2.fasta
Unix% plotorf SEQ1.fasta –stop TAA, TAG –out GA.plot &
Unix% getorf SEQ1.fasta –minsize 5 –table 0 –find 1 –out GA.getorf &
SLU Global Bioinformatics Centre
PlotorfPlotorf
![Page 19: EMBOSS – an application suite for Bioinformatics Shahid Manzoor Adnan Niazi SLU Global Bioinformatics Centre](https://reader036.vdocuments.site/reader036/viewer/2022081515/56649c785503460f9492d58d/html5/thumbnails/19.jpg)
ATGGGTCGTGAAGAGAATGCTCCTCCTTTGGAATCTTAA
TACCCAGCACTTCTCTTACGAGGAGGAAACCTTAGAATT
Frame -3Frame -2
Frame -1
Frame 1Frame 2
Frame 3
Start and stop codons are located according to the instructions to the program, and the area in between start and stop codons
SLU Global Bioinformatics Centre
![Page 20: EMBOSS – an application suite for Bioinformatics Shahid Manzoor Adnan Niazi SLU Global Bioinformatics Centre](https://reader036.vdocuments.site/reader036/viewer/2022081515/56649c785503460f9492d58d/html5/thumbnails/20.jpg)
Indication of full coding sequence?
Alternative splice form?
SLU Global Bioinformatics Centre
![Page 21: EMBOSS – an application suite for Bioinformatics Shahid Manzoor Adnan Niazi SLU Global Bioinformatics Centre](https://reader036.vdocuments.site/reader036/viewer/2022081515/56649c785503460f9492d58d/html5/thumbnails/21.jpg)
>_1 [17 - 37]
MLLLWNL
>_2 [1 - 36]
MGREENAPPLES*
Using getorf:
stop codon
start methionine
SLU Global Bioinformatics Centre
![Page 22: EMBOSS – an application suite for Bioinformatics Shahid Manzoor Adnan Niazi SLU Global Bioinformatics Centre](https://reader036.vdocuments.site/reader036/viewer/2022081515/56649c785503460f9492d58d/html5/thumbnails/22.jpg)
Unix% transeq SEQ1.fasta –frame 1 –table 0 –sbegin 4 –send 33 -out GA.fasta &
>GA.fastaGREENAPPLES
SLU Global Bioinformatics Centre
![Page 23: EMBOSS – an application suite for Bioinformatics Shahid Manzoor Adnan Niazi SLU Global Bioinformatics Centre](https://reader036.vdocuments.site/reader036/viewer/2022081515/56649c785503460f9492d58d/html5/thumbnails/23.jpg)
Unix% needle GA.fasta A.fasta –gapopen 10 –gapextend 0.5 –matrix EPAM250 &
Unix% water GA.fasta A.fasta –gapopen 10 –gapextend 0.5 –matrix EPAM250 &
>GA.fastaGREENAPPLES
>A.fastaAPPLES
For a global alignment:
For a local alignment:
AlignmentsAlignments
SLU Global Bioinformatics Centre
![Page 24: EMBOSS – an application suite for Bioinformatics Shahid Manzoor Adnan Niazi SLU Global Bioinformatics Centre](https://reader036.vdocuments.site/reader036/viewer/2022081515/56649c785503460f9492d58d/html5/thumbnails/24.jpg)
Alignments …Alignments …
To align two or more sequences in a biologically significant way.
GREENAPPLES
GREENAPPLES
APPLES
APPLES
APPLES
Local (water) Global (needle)
Gap penalty = 10; Extension penalty = 0.5
APPLES
SLU Global Bioinformatics Centre
![Page 25: EMBOSS – an application suite for Bioinformatics Shahid Manzoor Adnan Niazi SLU Global Bioinformatics Centre](https://reader036.vdocuments.site/reader036/viewer/2022081515/56649c785503460f9492d58d/html5/thumbnails/25.jpg)
GREENAPPLESAPPLES
looks like the “apples” motif may be part of a larger domain
APPLES
physicochemical properties
pattern searching
SLU Global Bioinformatics Centre
![Page 26: EMBOSS – an application suite for Bioinformatics Shahid Manzoor Adnan Niazi SLU Global Bioinformatics Centre](https://reader036.vdocuments.site/reader036/viewer/2022081515/56649c785503460f9492d58d/html5/thumbnails/26.jpg)
Physico-chemical propertiesPhysico-chemical properties
Unix% iep GA.fasta –plot -step 0.5 –out GA.IEP &
Unix% pepinfo GA.fasta –hwindow 8 –generalplot –hydropathyplot &
Isoelectric point
General properties
SLU Global Bioinformatics Centre
![Page 27: EMBOSS – an application suite for Bioinformatics Shahid Manzoor Adnan Niazi SLU Global Bioinformatics Centre](https://reader036.vdocuments.site/reader036/viewer/2022081515/56649c785503460f9492d58d/html5/thumbnails/27.jpg)
Physico-chemical propertiesPhysico-chemical properties
D
Y
F W
HK
R
EQ
N
M
AG
C S
P
I V
LT
Aliphatic
Aromatic
Hydrophobic
Tiny
Small
Charged
Positive
Polar
The pepinfo graph of properties is based on this diagram
SLU Global Bioinformatics Centre
![Page 28: EMBOSS – an application suite for Bioinformatics Shahid Manzoor Adnan Niazi SLU Global Bioinformatics Centre](https://reader036.vdocuments.site/reader036/viewer/2022081515/56649c785503460f9492d58d/html5/thumbnails/28.jpg)
Physico-Physico-chemical chemical propertiesproperties
non-polar region with small residues
polar region to one side of non-charged region
SLU Global Bioinformatics Centre
![Page 29: EMBOSS – an application suite for Bioinformatics Shahid Manzoor Adnan Niazi SLU Global Bioinformatics Centre](https://reader036.vdocuments.site/reader036/viewer/2022081515/56649c785503460f9492d58d/html5/thumbnails/29.jpg)
Pattern searchingPattern searching
GREENAPPL---ES
-RE-DAPPL---ES
GREEN---LEAVES
-RE-D---LEAVES
GREENAPPLES>GA.fasta
GREENLEAVES>GL.fasta
REDAPPLES>RA.fasta
REDLEAVES>RL.fasta
[G] (0,1)-R–[E] (1,2)–[ND]–X (3)–L–X (3) – E – S
SLU Global Bioinformatics Centre
![Page 30: EMBOSS – an application suite for Bioinformatics Shahid Manzoor Adnan Niazi SLU Global Bioinformatics Centre](https://reader036.vdocuments.site/reader036/viewer/2022081515/56649c785503460f9492d58d/html5/thumbnails/30.jpg)
Pattern searchingPattern searching
Unix% fuzzpro sptr:* pattern.fruit –mismatch 0 –out GA.fuzzpro &
Search a protein database:
[G] (0,1) - [R] – [E] (1,2) – [ND] –x (3) – [L] –x (3) – [E] – [S]
pattern.fruit
Nothing resembling this pattern is found in the database
- But we could try scanning PRINTS (pscan) and PROSTIE
(patmatmotifs) with one of our sequences.
SLU Global Bioinformatics Centre
![Page 31: EMBOSS – an application suite for Bioinformatics Shahid Manzoor Adnan Niazi SLU Global Bioinformatics Centre](https://reader036.vdocuments.site/reader036/viewer/2022081515/56649c785503460f9492d58d/html5/thumbnails/31.jpg)
SLU Global Bioinformatics Centre
Some Programs
![Page 32: EMBOSS – an application suite for Bioinformatics Shahid Manzoor Adnan Niazi SLU Global Bioinformatics Centre](https://reader036.vdocuments.site/reader036/viewer/2022081515/56649c785503460f9492d58d/html5/thumbnails/32.jpg)
SLU Global Bioinformatics Centre
Some Programs …
![Page 33: EMBOSS – an application suite for Bioinformatics Shahid Manzoor Adnan Niazi SLU Global Bioinformatics Centre](https://reader036.vdocuments.site/reader036/viewer/2022081515/56649c785503460f9492d58d/html5/thumbnails/33.jpg)
SLU Global Bioinformatics Centre
More Information