ordered alignment information explorer. alignment editor conservation computtion “barcode” =...
TRANSCRIPT
ORDered ALignment Information Explorer
Alignment editor
Conservation computtion
“barcode” = schematic alignment
Phylogenic tree
3D viewer
=> sequence / structure / function / evolution cross-talks
Sequence Clustering
Features Editor
AlignmentPositions
Taxa
Contexts
Exploring Alignment Information up to the residue Level
Globallevel
Clusteringslevel
SingleTaxaLevel
Full length
Domains
Motifs, secondary structures, …..
ResiduesX x x
3D structure
conservation
phylogeny
Reads ALN, MSF, TFA, RSF, Macsims/XML, ORD file formats
What is an alignment ?- description of the alignment (NorMD score, date, etc …)- set of sequences
generic information (length, EC, phylogeny, …) features (PFAM-A, PROSITE, BLOCK, etc …)
- clustering = groups of sequences- conservation scores based on clustering
and Alignments :
Sequence editing Clustering editing
CurrentAlignment
Overwrite current Create new MACSIM
Ordalie parameters (colors, fonts, thresholds, …)
Description of the alignment (name, NorMD score, creation date, ...)
Original Set of aligned sequences- general information (length, pI, mol. Weight, …)- features (Pfam domain, secondary structures, …)- AA sequence
Coordinates of 3D structures corresponding to PDB entriesDescription of 3D objects (representation type, colors, etc …)
M 3 – new clusteringClustering 1Sequences set 1-> conservation
M 4 – edit sequencesClustering 1Edit Sequences-> conservation
M 5 – clust. + editClustering 2Edit Sequences-> conservation
Inside :
M 2 – macsims clusteringMacsims ClusteringOriginal Sequences set-> original conservation
M 1 – original alignmentOriginal Sequences set
SQlite Database accessible through SQL statements ODBC compatible
Platform independantLight weight
Contains all Ordalie data preferences performances
ORD : file format
Modes :- features- search- pairwise identity- sequences editor - features editor- clustering- trees- conservation- superposition
Zone selection :•Whole alignment•By Feature•User defined
•Criterions :•% identity•pI•Length•Composition (aminoacid, physico-chemical groups)
•Clustering Methods :•Manual clustering by inserting/removing separators•Hierarchical classification + Secator•Kmeans + DPC•Mixture model + AIC
Clustering:
Threshold Global Identity -> 100% IdentityGlobal Conserved -> >80% identity.Group Identity -> 100 % identity in group
Mean Distanceas cf ClustalX
Vector Normbased on a vectorial (polarity,volume) representation of amino acids
Liu2based on Blosum62
Entropytakes gaps and physico-chemical properties of AA intoaccount
Validity of score clustering ?
Conservation Methods :
Key Usage Points :
Always leave a mode before entering a new one
Sequences selection : « à la Windows »- <Button-1> selects a sequence- <Control-Button-1> add current seq. to selection- <Shift-Button-1>
Zone selection :- All (button)- selecting a feature <Control-Button-1>- manuaally :
- <Button-1> for starting point- <Button-3> for ending point- <Shift-Button-3> to delete a selected zone
TODO List :
Short term :- Bugs, if any …. ;-)- group naming- project handling- MacOS X version- documentation and tutorials- publication
Long term :- Bugs, if any …. ;-)- on-line web services- on-line Macsims calculation- on-line sequence, information, feature updating- 3D surface mapping of features.- ….
Running Ordalie :
On surf/lameX :- setordalie- ordalie <filename>- ordalie <filename> option value option value
File formats: MSF, TFA, ALN, RSF, XML/Macsims and ORD
Conversion :ordalie toto.msf –convert ALN
- toto.aln
1985 1985
19851985
19851985
Ens
eign
emen
tEns
eign
emen
t