endre barta
DESCRIPTION
Comparative genomics approach in promoter analysis - Orthologous promoter databases and conserved motif search. Endre Barta. Conserved motifs in the non-coding regions of the genome. 3’UTR binding sites, miRNA target sequences. Intronic binding sites. Multiple Conserved Sequences (MCS). - PowerPoint PPT PresentationTRANSCRIPT
Comparative genomics approach in promoter analysis - Orthologous promoter databases and conserved motif search
Endre Barta
Conserved motifs in the non-coding regions of the genome
Multiple Conserved Sequences (MCS)
Intronic binding sites
3’UTR binding sites, miRNA target sequences
Transcription Factor Binding Sites in the promoter region
Objectives:
•Finding orthologous promoter regions
•Defining conserved motifs
•Searching in conserved motifs
•Analysing the data
DoOP (Database of Orthologous Promoters http://doop.abc.hu)
Reference species
H. sapiens / A. thalianaGenes
Based on NCBI annotation
Choosing first exons
Genomic sequences (from different DNA databanks; genome projects)
Aligning the first exonsOrthologous first exons
500, 1000 and 3000 bp upstream regions from orthologous first exons
gene complement(1279..4993) /locus_tag="At5g01010" /note="synonym: TOPTELOMERE.1; expressed protein" /db_xref="GeneID:831893" mRNA complement(join(1279..1646,1745..1780,1914..1961, 2435..2509,2748..2799,2872..2934,3303..3383,3602..3658, 3761..3801,3926..4004,4101..4257,4334..4466,4551..4678, 4764..4993)) /locus_tag="At5g01010" /product="expressed protein" /transcript_id="NM_120177.3" /db_xref="GI:42567550" /db_xref="GeneID:831893" CDS complement(join(1527..1646,1745..1780,1914..1961, 2435..2509,2748..2799,2872..2934,3303..3383,3602..3658, 3761..3801,3926..4004,4101..4257,4334..4466,4551..4678, 4764..4923))
Exon data are from the NCBI’s reference sequence annotations of human and Arabidopsis genome sequences
Different types of annotated first exons
1. No annotated 5’ UTR, the length of the first exon is> 50 bp.
2. No annotated 5’ UTR, the length of the first exon is< 50 bp.
3. There is an annotated 5’ UTR, the CDS starts inthe first exon and it is> 50 bp.
4. Same as No. 3, but < 50 bp.5. There is one or more 5’UTR
exon(s), the first UTR exon is > 50 bp.
6. Same as No. 5, but < 50 bp.7. Wrong annotation
3791
684
1796
6108
779
9257
0
2000
4000
6000
8000
10000
1 2 3 4 5n 6n
Frequency of H. sapiens gene types (5n and 6n collapsed) in the chordate section of the DoOP database, version 1.4
Using BLAST to find orthologous first exons Very critical, and the most time consuming step TBLASTN, WuBlast gives too many false positives
we use BLASTN (gives more false negatives) We use Bioperl modules to parse Blast results The most difficult task is to find the real orthologs
among paralogs We use a simple algorithm; the best hit (considering
the alignment length and the score) is most probably from the orthologous gene
Orthologous promoter group
Defining conserved motifs in the DoOP database
Multiple sequence alignment
conserved motifs
Consensus sequence
for example
tGGGAGTGCG ATTTCGGGTC CACAGAGCTC tctgcgcggt gctggggcat -GGGAGCCCG AGGCCGAGGC CGCCGAGCTC G-cgtacggt ---------- ---------- ---------- CACTGGGATC A-acgtcaac ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- CGCTGAGATC A--------- ----------
CRCNGaGMTC
All consensus sequences from all groups
Consensus database
.
.
.
Creating the web interface ENSEMBL / EPD / TAIR
links Annotating repetitive
elements Multiple alignment
(DIALIGN) Defining conserved motifs Creating MySQL
database PHP / HTML
programming
Barta, E., Sebestyén, E., Pálfy, T.B., Tóth, G., Ortutay, C. P. and Patthy, L. (2005) DoOP: Databases of Orthologous Promoters, collections of clusters of orthologous upstream sequences from chordates and plants. Nucleic Acids Res. 33: D86-D90.
Are the conserved motifs indeed binding sites for transcription factors? For good quality motifs we may safely assume that
the answer is yes (or in other words they probably take part in transcription regulation)
How to prove? Experimentally (see the case study later) Comparing conserved motifs with known TFBSs Comparing ChiP-on-chip results (in a few years)
Bad motif (CTGTGTGTG repetition)
No. Species Motif Start End
1. Bos taurus TTTCTGCTGTGCGTGCTGG -198 -180
2. Canis familiaris TTTCTGCTGCGCGAGCTGG -205 -187
3. Homo sapiens TTTCTGCTGTGTGTGCTGG -183 -165
4. Macaca mulatta TTTCTGCTGTGTGTGCTGG -183 -165
5. Pan troglodytes TTTCTGCTGTGTGTGCTGG -182 -164
6. Takifugu rubripes TATTTTCTTTGTGacttta -453 -435
Consensus: TtTcTgCTgtGYGWgcTgg
Bad sequence in Takifugu
No. Species Motif Start End
1. Bos taurus TTCTTCGAAATGTAAAGCGAGGACCTCTTTAAGTGG -364 -329
2. Carollia perspicillata TTCTTCGAAATGTAAAGCGAGGACCTCTTTAAGTGG -365 -330
3. Gallus gallus TTCTTGGAAATGTAAAGCGAGAA-CTCTTTAAGTGG -298 -264
4. Homo sapiens TTCTTCGAAATGTAAAGCGAGGACCTCTTTAAGTGG -361 -326
5. Mus musculus TTCTTCGAAATGTAAAGCGAGGACCTCTTTAAGTGG -378 -343
6. Papio hamadryas TTCTTCGAAATGTAAAGCGAGGACCTCTTTAAGTGG -366 -331
7. Takifugu rubripes ctcccgcttacagcgcg------------------- -414 -398
8. Xenopus tropicalis TTCTTGGAAATGTAAAGCGAGAAAGTCTTTAAGTGG -248 -213
Uncertain cases (too many mismatches in rabbit) No. Species Motif Start End
1. Bos taurus TTTTCCACGCTGCCGAGAGGAATC -415 -392
2. Callithrix jacchus TTTTCCACGCTGCCGAGAGGAATC -421 -398
3. Homo sapiens TTTTCCACGCTGCCGAGAGGAATC -413 -390
4. Loxodonta africana TTCTCCACGCCGCCGGGAGGAATC -409 -386
5. Monodelphis domestica TTTTCAACACTACCGAGGGGAATT -427 -404
6. Mus musculus TTCTCCACGCCGCCGAGAGGAATC -410 -387
7. Oryctolagus cuniculus CCTTCCACGCCGCGGAGAGCGATC -435 -412
8. Otolemur garnettii TTTTCCACGCTGCCGAGAGGAATC -405 -382
9. Pan troglodytes TTTTCCACGCTGCCGAGAGGAATC -414 -391
10. Papio anubis TTTTCCACGCTGCCGAGAGGAATC -414 -391
11. Rattus norvegicus TTCTCCACGCCGCCGAGAGGAATC -410 -387
12. Rhinolophus ferrumequinum TTCTCCACGCTGCCGAGAGGAATC -416 -393
13. Takifugu rubripes GGTCCCACACCGCCAGCCTGAATG -539 -516
Consensus: ttYtCcACgCYgCcgagaggaATc
The number of motifs depends on the evolutionary distance between species in a promoter group
Only primates
Only mammals
chordates
Conserved known TFBSs in the promoter region of Centromere protein A (CENP-A) gene
SAP-1N-Myc
motif m3:
ggGTCAcgTGAc motif m4:
cCcggcccGgaGc
Conserved TATA box in the promoter of COL1A2 (Collagen alpha 2(I))
Homo sapiens GGAGGGCG---------------------------GGAGGATGCGGAGGGCGGAGG----
Gallus gallus GCAGGGCG---------------------------AGGGGCGGGGAACGTCTGAAAAAAA
Sus scrofa GAAGGCCG---------------------------GGGGGATGGGGAGGGCGGAGG----
Rattus norvegicus AGAGGGCG---------------------------GGTGGCTGGGGAGGGCGGAGG----
Danio rerio AACAGGA--------------------------------------------GGAG-----
Takifugu rubripes AACAGGA--------------------------------------------GGAG-----
Ornithorhynchus anatinus GGAGGct-----------------------------------------------------
Dasypus novemcinctus AGAGGACG---------------------------GGTGGATGGGGAGGGCGGAGG----
Callithrix jacchus GGAGGGCG---------------------------GGAGGATAGGGAGGGCGGAGG----
Sorex araneus GGAG--------------------------------------GGGGAGGGCGGAGG----
Mus musculus AGAGGGCG---------------------------GGTGGCTGGGGAGGGCGGAGG----
Meleagris gallopavo GCAGGGCG---------------------------AGGGGCGGGGAACGTCTGAAAAAAA
Taeniopygia guttata GGGGCGAG---------------------------AGGGGCGTGGGACGGCTGAGGGGAA
Papio anubis GGAGGGCG---------------------------GGAGGATGGGGAGGGCGGAGG----
Bos taurus GAggtggggggagttggggggaggaaggccagagcGGGGGATGGGGAGGGCGGAGG----
Canis familiaris GAAGGGCG---------------------------GGGGGATGGGGAGGGCGGAGG----
Felis catus GAAGAGAG---------------------------GGGGGATGGGGAGGGCGGAGG----
Tetraodon nigroviridis AACAGGA--------------------------------------------GGAGG----
Pan troglodytes GGAGAGCG---------------------------GGAGGATGCGGAGGGCGGAGG----
„Box 3A, Sp1 binding site GGGCGG”
Imagaki et al. 1994. JBC, 269, 14828-34
Known TFBSs in the DoOP database and their conservation
Name of the TF Sum In sequence In group (gene)
In conserved position
Type of the DNA binding domain
AP2 alpha 8084 8084 (27,51%) 1871 (72,92%) 745 (9,22%) AP2
SP1-1 6835 6816 (23,20%) 1741 (67,85%) 482 (7,05) Zn-finger, C2H2
FREAC-7 6610 6609 (22,49%) 1556 (60,64%) 676 (10,23%) Forkhead
S8 5848 5848 (19,90%) 1639 (63,87%) 982 (16,79%) Homeo
SP1-2 5564 5564 (18,93%) 1887 (73,54%) 590 (10,60%) Zn-finger, C2H2
Yin-Yang 5550 5550 (18,89%) 1879 (73,23%) 908 (16,36%) Zn-finger, C2H2
MZF-1-4 5168 5168 (17,59%) 1768 (68,90%) 770 (14,90%) Zn-finger, C2H2
Ahr-ARNT 5017 5015 (17,07%) 1778 (69,29%) 763 (15,21%) bHLH
deltaEF1 4970 4970 (16,91%) 1798 (70,07%) 752 (15,13%) Zn-finger, C2H2
SPI-B 4941 4941 (16,81%) 1771 (69,02) 996 (20,16%) ETS
FREAC-3 4907 4907 (16,70%) 1648 (64,22%) 427 (8,70%) Forkhead
Most conserved known TFBSs in the DoOP databaseName of the TF Sum In sequence In group Conserved Conserved % Type
SAP-1 1653 1615 (5,50%) 566 (22,06%) 598 (+ 140) 36,18% (44,65%) ETS
n-Myc 2885 2885 (9,82%) 1152 (44,89%) 888 (+ 210) 30,78% (38,06%) bHLH-zip
USF 2537 2536 (8,63%) 1029 (40,10%) 772 (+ 225) 30,43% (39,30%) bHLH-zip
ARNT 3648 3648 (12,41%) 1426 (55,57%) 978 (+231) 26,81% (33,14%) bHLH
SPI-1 4494 4494 (15,29%) 1623 (63,25%) 1135 (+333) 25,26% (32,67%) ETS
Max 2138 2103 (7,16%) 894 (34,84%) 500 (+184) 23,39% (31,99%) bHLH-zip
NRF-2 2461 2426 (8,26%) 935 (36,44%) 558 (+286) 22,67% (34,30%) ETS
SRF 340 340 (1,16%) 184 (7,17%) 73 (+14) 21,47% (25,59%) MADS
TCF11-MafG 4411 4409 (15,00%) 1568 (61,11%) 928 (+194) 21,04% (25,44%) bZIP
c-ETS 3934 3934 (13,39%) 1540 (60,02%) 822 (+266) 20,89% (27,66%) ETS
SPI-B 4941 4941 (16,81%) 1771 (69,02%) 996 (+368) 20,16% (27,61%) ETS
Searching between conserved motifs (MOFEXT program)Consensus database
.
.
.
Query sequenceSearching
ttRcGGWACCTgTaaSearch algorithm Query sequence
Next sequence
A window of given length (wordsize)
atGCTGAgRCGgAACCTGcGGAACcomparing sequences, and calculating scores
Searching between conserved motifs (MOFEXT program)Consensus database
.
.
.
Query sequence
ttRcGGWACCTgTaaSearch algorithm Query sequence
Next sequence
A window of given length (wordsize)
atGCTGAgRCGgAACCTGcGGAACcomparing sequences, and calculating scores
Searching
Searching between conserved motifs (MOFEXT program)Consensus database
.
.
.
Query sequence
ttRcGGWACCTgTaaSearch algorithm Query sequence
Next sequence
A window of given length (wordsize)
atGCTGAgRCGgAACCTGcGGAAC
Hit above the cutoff score
Extending the hit
extended hit
The MOFEXT (MOtif Find and EXTend) programWritten in standard C, available upon requestUsage: mofext -l mypatterns1.list mypatterns2.list -p query1 query2 -m matrix.txt
-w 10 -s 95Options:
-h Display this, but you know it, because you see it :-)-l The databases. Space separated, maximum 50-p Query patterns. Space separated, maximum 50-m The similarity matrix filename-w Word size. Default: 6-s The similarity percentage limit. Default: 80-e If you add this, the output is not the similarity score but the percentage-a If you add this, instead of print the database sequence, print the matched region
Output is a plain text file in table formatThe program is also suitable for searching protein
sequences
The DoOPSearch website (http://doopsearch.abc.hu) Conserved motifs database: from the current DoOP
database Web interface allowing to change the same
parameters as in the command line version The result is linked to the DoOP database It is possible to sort and/or filter the result (score,
ext. score, length, GO annotation!) FUZZNUC search in the DoOP promoter sequences
(MOFEXT uses only the conserved motifs to search, while FUZZNUC searches the whole promoters)
Utilization of DoOP data and the MOFEXT program Studying the promoter region of different genes.
Making guess about putative TFBSs Finding conserved motifs in the promoter regions of
other genes (possible co-regulation)
Case study: SOX9 binding sites in the promoter region of matrilin-1 gene
Studying the evolution of TFBSs Drawing regulation networks based on similar
conserved motifs Studying conservation in the core promoter region
PE1 element in the DoOP database (in silico data)
Human CTTCTGCAAGCAAAGGAGCCCTTGTGGTCAGChimp CTTCTGCAAGCAAAGGAGCCCTTGTGGTCAGMacaque CTTCTGCAGGCAAAGGAGCCCTTGTGGTCAGDog CTTCTGCAGGCAAAGGGGCCCTTGTGGTCCGCattle CTTCTGCAGGCAAAGGAGCCCTTGTGGTCAGElephant CTTCTGCAGGCAATGGAGCCCTTGTGGTTAGMouse CTTCTGCAGGCAAAGGGGCCCTTGTGGTCAGRat CTTCTGCAGGCAAAGGGGCCCTTGTGGTCAGChicken CTTCTCCGAGCAATGGAGCCATTGTGGAGGGConsensus CTTCTgCaRGCAAaGGRGCCcTTGTGGtcaG
No. Species Motif Start End
1. Bos taurus ---------GCTTCTGCAGGCAAAGGAGCCCTTGTGGTCAGAGGGGCCTCCGGAGCCC -266 -218
2. Canis familiaris GCTCTGGTTGCTTCTGCAGGCAAAGGAGCCCTTGTGGTCCGAGGGGCCTCTTGAGCCC -261 -204
3. Homo sapiens GCTCTAGTTGCTTCTGCAAGCAAAGGAGCCCTTGTGGTCAGAGGGGCCTCTGAAGCCT -270 -213
4. Macaca mulatta GCTCTGGTTGCTTCTGCAGGCAAAGGAGCCCTTGTGGTCAGAGGGGCCTCCGGAGCCC -269 -212
5. Pan troglodytes GTTCTAGTTGCTTCTGCAAGCAAAGGAGCCCTTGTGGTCAGAGGGGCCTCTGAAGCCT -269 -212
•Finding further orthologous sequences by hand
•defining the consensus sequence
Search in the DoOP CH-1.3 1000bp consensus motifs database
MOFEXT program, wordsize: 8, cutoff: 95%
No. Species Motif 7 Start End
1. Bos taurus tTTTATCTCATAGGCAAGGGAGC----------TTTGAAAGGGtt -745 -711
2. Homo sapiens CTTCATCTAATAGGCAAGGCAGCCATTGACAGCTCTCAAAGGGGG -834 -790
3. Loxodonta africana CTgtt----ATAGGCAAGGGAGCCATGGACAGCTTTCAAAGGGGG -822 -782
4. Macaca mulatta CTTCATCTCATAGGCAAGGCAGCCATTGACAGCTCTCAAAGGGGG -843 -799
5. Oryctolagus cuniculus CTTTATCTCCTAGG-AAGAGAGCCATTGACAGCTTCCCGAAGGGG -798 -755
6. Pan troglodytes CTTCATCTAATAGGCAAGGCAGCCATTGACAGCTCTCAAAGGGGG -834 -790
Matrilin-1 ARGCAAAGGRGCCATTG :.:::: :..:::::::MyBP-H AGGCAAGGSAGCCATTG
Similarity to the motif found in the promoter of the MyBP-H gene
Hits from other extracellular matrix specific genes
Matrilin-1 consensus CTTCTgCaRGCAAaGGRGCCaTTGTGGtcaG
Matrilin-1 TCTgCaRGCAAaGGRGCCa : : ::.:::::::. :::Collagen, type IV, alpha 2 TTTCCAGGCAAAGGGCCCA
Matrilin-1 aGGRGCCaTTGTGGTcaG ::..:::: ::. ::: :Cartilage intermediate AGRGGCCACTGKAGTCGG -layer protein
Matrilin-1 aRGCAAaGGRGCC melanoma-associated chondroitin :.:::::: .:::sulfate proteoglycan 4 AGGCAAAGCAGCC
Matrilin-1 CTgCaRGCAAaGGRGCC :: : .: ..:::.:::Collagen alpha 1(II) CTCCGAGGRRAGGGGCC
CTgCaRGCA Matrilin-1:::::.:::CTGCAGGCA Brain link protein 2
PE1 element in the promoter region of the chicken matrilin-1 gene (experimental data)
Rentsendorj, O
., Nagy, A
., Sinko, I., D
araba, A., B
arta, E. and K
iss I. (2005) „Highly
conserved proximal prom
oter element harbouring paired S
ox9-binding sites contributes to the tissue- and developm
ental stage-specific activity of the matrilin-1 gene.” B
iochem J. 389:705-
716
Searching for SOX9 binding sites in the DoOP promoter database Using the WWCAAWG consensus with FUZZNUC against all human
1000 bp promoters (no mismatch, search in complement) 21991 hits (0 mismatch) 370531 hits (1 mismatch)
Using the [AT][AT]CAA[AT]GN(1,3)C[AT]TTG[AT][AT] paired consensus (Bridgewater et al 2003, NAR 31:1541-53) with FUZZNUC (max two mismatches) 21492 hits
Using the WWCAAWG consensus with MOFEXT program (conserved motifs from 1000 bp DoOP promoters, wordsize 7, cutoff 81% = 1 mismatch) 51865 hits
Using the WWCAAWGNNNNCWTTGWW paired consensus with MOFEXT program (conserved motifs from 1000 bp DoOP promoters, wordsize 16, cutoff 71%) 358 hits Matrilin-1 motif m26
Score: 26 Hit GcTCTRGTTGCTTCTGCARGCAAAGGAGCCCTTGTGGTCaGAGGGGCCTCYgRAGCCY |||.| | |||. WWCAAWGNNNNCWTTGWW
Query
Studying the evolution of TFBSs Motifs conserved between fishes and mammals can be
considered as ancient motifs We need to refine the automatically generated motif
consensus collections in order to find real common motifs
Chordate DoOPV1.3 1000bp
Number of genes
Number of motifs
Number of genes after filtering
Number of motifs after filtering
Takifugu rubripes 447 2640 164 377
Danio rerio 467 3246 119 403
Tetraodon nigroviridis 448 2730 154 351
Sum 866 6435 329 898
Paired mesoderm homeobox protein 2B (Paired-like homeobox 2B) (PHOX2B homeodomain protein)
DoOP, chordate v1.3, 1000, 80400097
•choosing motif m16
•searching all chordate 1000 bp consensus sequences with the MOFEXT program using wordsize=10
No. Species Motif Start End
1. Bos taurus AAATTGGATCAGGGAGAATCGTCACCCAACTTTCATTATTTCCAAGTAGTGTGATTGAATTAAAGGGCAGGGAG -479 -406
2. Canis familiaris AAATTGGATCAGAGTGAATCGTCACCCAACTTTCATTATTTCCAAGTAGTGTGATTGAATTAAAGGGCAGGGAG -479 -406
3. Danio rerio AAATTGGATCAGGGAGAATCGTCACCCAACTTTCATTATTTCCAAGTGGTGTGATTGAATTAAAGGGCAA-GAt -598 -526
4. Echinops telfairi AAATTGGATCAGGGAGAATCGTCACCCAACTTTCATTATTTCCAAGTAGTGTGATTGAATTAAAGGGCAGGGAG -479 -406
5. Gallus gallus AAATTGGATCAGGGAGAATCGTCACCCAACTTTCATTATTTCCAAGTAGTGTGATTGAATTAAAGGGCAGGGAG -483 -410
6. Homo sapiens AAATTGGATCAGGGAGAATCGTCACCCAACTTTCATTATTTCCAAGTAGTGTGATTGAATTAAAGGGCAGGGAG -479 -406
7. Loxodonta africana AAATTGGATCAGGGAGAATCGTCACCCAACTTTCATTATTTCCAAGTAGTGTGATTGAATTAAAGGGCAGGGAG -479 -406
8. Macaca mulatta AAATTGGATCAGGGAGAATCGTCACCCAACTTTCATTATTTCCAAGTAGTGTGATTGAATTAAAGGGCAGGGAG -479 -406
9. Monodelphis domestica AAATTGGATCAGGGAGAATCGTCACCCAACTTTCATTATTTCCAAGTAGTGTGATTGAATTAAAGGGCAG-GAG -478 -406
10. Mus musculus AAATTGGATCAGGGAAAATCGTCACCCAACTTTCATTATTTCCAAGTAGCGTGATTGAATTAAAGGGCAG-GAG -478 -406
11. Oryctolagus cuniculus AAATTGGATCAGGGAGAATCGTCACCCAACTTTCATTATTTCCAAGCAGTGTGATTGAATTAAAGGGCAAGGAG -478 -405
12. Pan troglodytes AAATTGGATCAGGGAGAATCGTCACCCAACTTTCATTATTTCCAAGTAGTGTGATTGAATTAAAGGGCAGGGAG -479 -406
13. Rattus norvegicus AAATTGGATCAGGGAAAATCGTCACCCAACTTTCATTATTTCCAAGTAGCGTGATTGAATTAAAGGGCA-GGAG -478 -406
14. Takifugu rubripes AAATTGGATCAGGGAGAATCGTCACCCAACTTTCATTATTTCCAAGTGGTGTGATTGAATTAAAGGGCAGGATG -623 -550
15. Tetraodon nigroviridis AAATTGGATCAGGGAGAATCGTCACCCAACTTTCATTATTTCCAAGTGGTGTGATTGAATTAAAGGGCAGGATG -621 -548
16. Xenopus tropicalis AAATTGGATCAGGGAAAATCGTCACCCAACTTTCATTATTTGCAAAGAGTGTGATTGAATTAAAGGGCAAGGAG -484 -411
Consensus: AAATTGGATCAGgGagAATCGTCACCCAACTTTCATTATTTcCAAgtaGtGTGATTGAATTAAAGGGCAgGgag
Motif M16 from the DoOP database
Hits from MOFEXT search with the M16 motif of PHOX2B homeodomain protein gene promoter
AAATTGGATCAGgGagAATCGTCACCCAACTTTCATTATTTcCAAgtaGtGTGATTGAATTAAAGGGCAgGgag
PHOX2B GAGAATCGTCACCCAACTTTCATTATTTCCA :::::: : : : :::::::::::::: Unknown GAGAATTCTAATATATTTTTCATTATTTCCA
PHOX2B CCCAACTTTCATTATTTCCAAGPERB11 family member :::::::::::::: ::: :in MHC class I region24 CCCAACTTTCATTAGCACCAGG
PHOX2B TTTCATTATTTCCAAGTA ::::::: ::::::: ::RNA binding motif protein 18 TTTCATTCTTTCCAACTA
PHOX2B GATTGAATTAAAGGGCAGGGAG ::::: : : :::::::::::Protein FAM3D GATTGTTTAACAGGGCAGGGAG
PHOX2B AAGGGCAGGGAATPase family, AAA domain containing 1 ::::::::::: Voltage-dependent L-type calcium channel alpha-1D 17 AAGGGCAGGGAFAM35A Digestive tract-specific calpain stb.
Drawing regulation networks based on similar conserved motifs Four model cartilage specific
genes Finding long (30-35) bp.
conserved motifs using MEME Searching in DoOP 1000 bp.
motifs using MOFEXT program (ws:8 cutoff:81)
AGC1: Aggrecan (Chondroitin sulfate proteoglycan )
HAPLN1: Link protein (Proteoglycan link protein
MATN1: Matrilin-1 (Cartilage matrix protein
MATN3: Matrilin-3
Link protein, MEME motif 1:
Conclusions and future plans The DoOP database and the MOFEXT program is
suitable for the analysis of transcription regulation The Matrilin-1 promoter shows interesting features:
a longer conserved block with paired binding sites (1-2 mismatch is possible in each site)
These methods may be suitable for finding regulatory networks based on conserved motifs and studying evolution of TFBSs
We plan to further improve these methods To cluster conserved motifs To study paralogue evolution in promoter regions
(sub- or neofunctionalisation) What about plants?
Bioinformatics group: Endre Sebestyén, Tamás Pálfy, Tibor Nagy, Gábor Tóth
Students from the ELTE:
Áron Szenes, János Molnar
Collaborative partners:
Ibolya Kiss (BRC, Szeged) and László Nagy (DTE, Debrecen)
Swedish EMBnet node, UPPMAX computer facility