endre barta

40
Comparative genomics approach in promoter analysis - Orthologous promoter databases and conserved motif search Endre Barta

Upload: jaxon

Post on 14-Jan-2016

53 views

Category:

Documents


0 download

DESCRIPTION

Comparative genomics approach in promoter analysis - Orthologous promoter databases and conserved motif search. Endre Barta. Conserved motifs in the non-coding regions of the genome. 3’UTR binding sites, miRNA target sequences. Intronic binding sites. Multiple Conserved Sequences (MCS). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Endre Barta

Comparative genomics approach in promoter analysis - Orthologous promoter databases and conserved motif search

Endre Barta

Page 2: Endre Barta

Conserved motifs in the non-coding regions of the genome

Multiple Conserved Sequences (MCS)

Intronic binding sites

3’UTR binding sites, miRNA target sequences

Transcription Factor Binding Sites in the promoter region

Objectives:

•Finding orthologous promoter regions

•Defining conserved motifs

•Searching in conserved motifs

•Analysing the data

Page 3: Endre Barta

DoOP (Database of Orthologous Promoters http://doop.abc.hu)

Reference species

H. sapiens / A. thalianaGenes

Based on NCBI annotation

Choosing first exons

Genomic sequences (from different DNA databanks; genome projects)

Aligning the first exonsOrthologous first exons

500, 1000 and 3000 bp upstream regions from orthologous first exons

Page 4: Endre Barta

gene complement(1279..4993) /locus_tag="At5g01010" /note="synonym: TOPTELOMERE.1; expressed protein" /db_xref="GeneID:831893" mRNA complement(join(1279..1646,1745..1780,1914..1961, 2435..2509,2748..2799,2872..2934,3303..3383,3602..3658, 3761..3801,3926..4004,4101..4257,4334..4466,4551..4678, 4764..4993)) /locus_tag="At5g01010" /product="expressed protein" /transcript_id="NM_120177.3" /db_xref="GI:42567550" /db_xref="GeneID:831893" CDS complement(join(1527..1646,1745..1780,1914..1961, 2435..2509,2748..2799,2872..2934,3303..3383,3602..3658, 3761..3801,3926..4004,4101..4257,4334..4466,4551..4678, 4764..4923))

Exon data are from the NCBI’s reference sequence annotations of human and Arabidopsis genome sequences

Page 5: Endre Barta

Different types of annotated first exons

1. No annotated 5’ UTR, the length of the first exon is> 50 bp.

2. No annotated 5’ UTR, the length of the first exon is< 50 bp.

3. There is an annotated 5’ UTR, the CDS starts inthe first exon and it is> 50 bp.

4. Same as No. 3, but < 50 bp.5. There is one or more 5’UTR

exon(s), the first UTR exon is > 50 bp.

6. Same as No. 5, but < 50 bp.7. Wrong annotation

3791

684

1796

6108

779

9257

0

2000

4000

6000

8000

10000

1 2 3 4 5n 6n

Frequency of H. sapiens gene types (5n and 6n collapsed) in the chordate section of the DoOP database, version 1.4

Page 6: Endre Barta

Using BLAST to find orthologous first exons Very critical, and the most time consuming step TBLASTN, WuBlast gives too many false positives

we use BLASTN (gives more false negatives) We use Bioperl modules to parse Blast results The most difficult task is to find the real orthologs

among paralogs We use a simple algorithm; the best hit (considering

the alignment length and the score) is most probably from the orthologous gene

Page 7: Endre Barta

Orthologous promoter group

Defining conserved motifs in the DoOP database

Multiple sequence alignment

conserved motifs

Consensus sequence

for example

tGGGAGTGCG ATTTCGGGTC CACAGAGCTC tctgcgcggt gctggggcat -GGGAGCCCG AGGCCGAGGC CGCCGAGCTC G-cgtacggt ---------- ---------- ---------- CACTGGGATC A-acgtcaac ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- CGCTGAGATC A--------- ----------

CRCNGaGMTC

All consensus sequences from all groups

Consensus database

.

.

.

Page 8: Endre Barta

Creating the web interface ENSEMBL / EPD / TAIR

links Annotating repetitive

elements Multiple alignment

(DIALIGN) Defining conserved motifs Creating MySQL

database PHP / HTML

programming

Barta, E., Sebestyén, E., Pálfy, T.B., Tóth, G., Ortutay, C. P. and Patthy, L. (2005) DoOP: Databases of Orthologous Promoters, collections of clusters of orthologous upstream sequences from chordates and plants. Nucleic Acids Res. 33: D86-D90.

Page 9: Endre Barta
Page 10: Endre Barta

Are the conserved motifs indeed binding sites for transcription factors? For good quality motifs we may safely assume that

the answer is yes (or in other words they probably take part in transcription regulation)

How to prove? Experimentally (see the case study later) Comparing conserved motifs with known TFBSs Comparing ChiP-on-chip results (in a few years)

Page 12: Endre Barta

Bad sequence in Takifugu

No. Species Motif Start End

1. Bos taurus TTCTTCGAAATGTAAAGCGAGGACCTCTTTAAGTGG -364 -329

2. Carollia perspicillata TTCTTCGAAATGTAAAGCGAGGACCTCTTTAAGTGG -365 -330

3. Gallus gallus TTCTTGGAAATGTAAAGCGAGAA-CTCTTTAAGTGG -298 -264

4. Homo sapiens TTCTTCGAAATGTAAAGCGAGGACCTCTTTAAGTGG -361 -326

5. Mus musculus TTCTTCGAAATGTAAAGCGAGGACCTCTTTAAGTGG -378 -343

6. Papio hamadryas TTCTTCGAAATGTAAAGCGAGGACCTCTTTAAGTGG -366 -331

7. Takifugu rubripes ctcccgcttacagcgcg------------------- -414 -398

8. Xenopus tropicalis TTCTTGGAAATGTAAAGCGAGAAAGTCTTTAAGTGG -248 -213

Page 13: Endre Barta

Uncertain cases (too many mismatches in rabbit) No. Species Motif Start End

1. Bos taurus TTTTCCACGCTGCCGAGAGGAATC -415 -392

2. Callithrix jacchus TTTTCCACGCTGCCGAGAGGAATC -421 -398

3. Homo sapiens TTTTCCACGCTGCCGAGAGGAATC -413 -390

4. Loxodonta africana TTCTCCACGCCGCCGGGAGGAATC -409 -386

5. Monodelphis domestica TTTTCAACACTACCGAGGGGAATT -427 -404

6. Mus musculus TTCTCCACGCCGCCGAGAGGAATC -410 -387

7. Oryctolagus cuniculus CCTTCCACGCCGCGGAGAGCGATC -435 -412

8. Otolemur garnettii TTTTCCACGCTGCCGAGAGGAATC -405 -382

9. Pan troglodytes TTTTCCACGCTGCCGAGAGGAATC -414 -391

10. Papio anubis TTTTCCACGCTGCCGAGAGGAATC -414 -391

11. Rattus norvegicus TTCTCCACGCCGCCGAGAGGAATC -410 -387

12. Rhinolophus ferrumequinum TTCTCCACGCTGCCGAGAGGAATC -416 -393

13. Takifugu rubripes GGTCCCACACCGCCAGCCTGAATG -539 -516

Consensus: ttYtCcACgCYgCcgagaggaATc

Page 14: Endre Barta

The number of motifs depends on the evolutionary distance between species in a promoter group

Only primates

Only mammals

chordates

Page 15: Endre Barta

Conserved known TFBSs in the promoter region of Centromere protein A (CENP-A) gene

SAP-1N-Myc

motif m3:

ggGTCAcgTGAc motif m4:

cCcggcccGgaGc

Page 16: Endre Barta

Conserved TATA box in the promoter of COL1A2 (Collagen alpha 2(I))

Page 17: Endre Barta

Homo sapiens GGAGGGCG---------------------------GGAGGATGCGGAGGGCGGAGG----

Gallus gallus GCAGGGCG---------------------------AGGGGCGGGGAACGTCTGAAAAAAA

Sus scrofa GAAGGCCG---------------------------GGGGGATGGGGAGGGCGGAGG----

Rattus norvegicus AGAGGGCG---------------------------GGTGGCTGGGGAGGGCGGAGG----

Danio rerio AACAGGA--------------------------------------------GGAG-----

Takifugu rubripes AACAGGA--------------------------------------------GGAG-----

Ornithorhynchus anatinus GGAGGct-----------------------------------------------------

Dasypus novemcinctus AGAGGACG---------------------------GGTGGATGGGGAGGGCGGAGG----

Callithrix jacchus GGAGGGCG---------------------------GGAGGATAGGGAGGGCGGAGG----

Sorex araneus GGAG--------------------------------------GGGGAGGGCGGAGG----

Mus musculus AGAGGGCG---------------------------GGTGGCTGGGGAGGGCGGAGG----

Meleagris gallopavo GCAGGGCG---------------------------AGGGGCGGGGAACGTCTGAAAAAAA

Taeniopygia guttata GGGGCGAG---------------------------AGGGGCGTGGGACGGCTGAGGGGAA

Papio anubis GGAGGGCG---------------------------GGAGGATGGGGAGGGCGGAGG----

Bos taurus GAggtggggggagttggggggaggaaggccagagcGGGGGATGGGGAGGGCGGAGG----

Canis familiaris GAAGGGCG---------------------------GGGGGATGGGGAGGGCGGAGG----

Felis catus GAAGAGAG---------------------------GGGGGATGGGGAGGGCGGAGG----

Tetraodon nigroviridis AACAGGA--------------------------------------------GGAGG----

Pan troglodytes GGAGAGCG---------------------------GGAGGATGCGGAGGGCGGAGG----

„Box 3A, Sp1 binding site GGGCGG”

Imagaki et al. 1994. JBC, 269, 14828-34

Page 18: Endre Barta

Known TFBSs in the DoOP database and their conservation

Name of the TF Sum In sequence In group (gene)

In conserved position

Type of the DNA binding domain

AP2 alpha 8084 8084 (27,51%) 1871 (72,92%) 745 (9,22%) AP2

SP1-1 6835 6816 (23,20%) 1741 (67,85%) 482 (7,05) Zn-finger, C2H2

FREAC-7 6610 6609 (22,49%) 1556 (60,64%) 676 (10,23%) Forkhead

S8 5848 5848 (19,90%) 1639 (63,87%) 982 (16,79%) Homeo

SP1-2 5564 5564 (18,93%) 1887 (73,54%) 590 (10,60%) Zn-finger, C2H2

Yin-Yang 5550 5550 (18,89%) 1879 (73,23%) 908 (16,36%) Zn-finger, C2H2

MZF-1-4 5168 5168 (17,59%) 1768 (68,90%) 770 (14,90%) Zn-finger, C2H2

Ahr-ARNT 5017 5015 (17,07%) 1778 (69,29%) 763 (15,21%) bHLH

deltaEF1 4970 4970 (16,91%) 1798 (70,07%) 752 (15,13%) Zn-finger, C2H2

SPI-B 4941 4941 (16,81%) 1771 (69,02) 996 (20,16%) ETS

FREAC-3 4907 4907 (16,70%) 1648 (64,22%) 427 (8,70%) Forkhead

Page 19: Endre Barta

Most conserved known TFBSs in the DoOP databaseName of the TF Sum In sequence In group Conserved Conserved % Type

SAP-1 1653 1615 (5,50%) 566 (22,06%) 598 (+ 140) 36,18% (44,65%) ETS

n-Myc 2885 2885 (9,82%) 1152 (44,89%) 888 (+ 210) 30,78% (38,06%) bHLH-zip

USF 2537 2536 (8,63%) 1029 (40,10%) 772 (+ 225) 30,43% (39,30%) bHLH-zip

ARNT 3648 3648 (12,41%) 1426 (55,57%) 978 (+231) 26,81% (33,14%) bHLH

SPI-1 4494 4494 (15,29%) 1623 (63,25%) 1135 (+333) 25,26% (32,67%) ETS

Max 2138 2103 (7,16%) 894 (34,84%) 500 (+184) 23,39% (31,99%) bHLH-zip

NRF-2 2461 2426 (8,26%) 935 (36,44%) 558 (+286) 22,67% (34,30%) ETS

SRF 340 340 (1,16%) 184 (7,17%) 73 (+14) 21,47% (25,59%) MADS

TCF11-MafG 4411 4409 (15,00%) 1568 (61,11%) 928 (+194) 21,04% (25,44%) bZIP

c-ETS 3934 3934 (13,39%) 1540 (60,02%) 822 (+266) 20,89% (27,66%) ETS

SPI-B 4941 4941 (16,81%) 1771 (69,02%) 996 (+368) 20,16% (27,61%) ETS

Page 20: Endre Barta

Searching between conserved motifs (MOFEXT program)Consensus database

.

.

.

Query sequenceSearching

ttRcGGWACCTgTaaSearch algorithm Query sequence

Next sequence

A window of given length (wordsize)

atGCTGAgRCGgAACCTGcGGAACcomparing sequences, and calculating scores

Page 21: Endre Barta

Searching between conserved motifs (MOFEXT program)Consensus database

.

.

.

Query sequence

ttRcGGWACCTgTaaSearch algorithm Query sequence

Next sequence

A window of given length (wordsize)

atGCTGAgRCGgAACCTGcGGAACcomparing sequences, and calculating scores

Searching

Page 22: Endre Barta

Searching between conserved motifs (MOFEXT program)Consensus database

.

.

.

Query sequence

ttRcGGWACCTgTaaSearch algorithm Query sequence

Next sequence

A window of given length (wordsize)

atGCTGAgRCGgAACCTGcGGAAC

Hit above the cutoff score

Extending the hit

extended hit

Page 23: Endre Barta

The MOFEXT (MOtif Find and EXTend) programWritten in standard C, available upon requestUsage: mofext -l mypatterns1.list mypatterns2.list -p query1 query2 -m matrix.txt

-w 10 -s 95Options:

-h Display this, but you know it, because you see it :-)-l The databases. Space separated, maximum 50-p Query patterns. Space separated, maximum 50-m The similarity matrix filename-w Word size. Default: 6-s The similarity percentage limit. Default: 80-e If you add this, the output is not the similarity score but the percentage-a If you add this, instead of print the database sequence, print the matched region

Output is a plain text file in table formatThe program is also suitable for searching protein

sequences

Page 24: Endre Barta

The DoOPSearch website (http://doopsearch.abc.hu) Conserved motifs database: from the current DoOP

database Web interface allowing to change the same

parameters as in the command line version The result is linked to the DoOP database It is possible to sort and/or filter the result (score,

ext. score, length, GO annotation!) FUZZNUC search in the DoOP promoter sequences

(MOFEXT uses only the conserved motifs to search, while FUZZNUC searches the whole promoters)

Page 25: Endre Barta

Utilization of DoOP data and the MOFEXT program Studying the promoter region of different genes.

Making guess about putative TFBSs Finding conserved motifs in the promoter regions of

other genes (possible co-regulation)

Case study: SOX9 binding sites in the promoter region of matrilin-1 gene

Studying the evolution of TFBSs Drawing regulation networks based on similar

conserved motifs Studying conservation in the core promoter region

Page 26: Endre Barta

PE1 element in the DoOP database (in silico data)

Human CTTCTGCAAGCAAAGGAGCCCTTGTGGTCAGChimp CTTCTGCAAGCAAAGGAGCCCTTGTGGTCAGMacaque CTTCTGCAGGCAAAGGAGCCCTTGTGGTCAGDog CTTCTGCAGGCAAAGGGGCCCTTGTGGTCCGCattle CTTCTGCAGGCAAAGGAGCCCTTGTGGTCAGElephant CTTCTGCAGGCAATGGAGCCCTTGTGGTTAGMouse CTTCTGCAGGCAAAGGGGCCCTTGTGGTCAGRat CTTCTGCAGGCAAAGGGGCCCTTGTGGTCAGChicken CTTCTCCGAGCAATGGAGCCATTGTGGAGGGConsensus CTTCTgCaRGCAAaGGRGCCcTTGTGGtcaG

No. Species Motif Start End

1. Bos taurus ---------GCTTCTGCAGGCAAAGGAGCCCTTGTGGTCAGAGGGGCCTCCGGAGCCC -266 -218

2. Canis familiaris GCTCTGGTTGCTTCTGCAGGCAAAGGAGCCCTTGTGGTCCGAGGGGCCTCTTGAGCCC -261 -204

3. Homo sapiens GCTCTAGTTGCTTCTGCAAGCAAAGGAGCCCTTGTGGTCAGAGGGGCCTCTGAAGCCT -270 -213

4. Macaca mulatta GCTCTGGTTGCTTCTGCAGGCAAAGGAGCCCTTGTGGTCAGAGGGGCCTCCGGAGCCC -269 -212

5. Pan troglodytes GTTCTAGTTGCTTCTGCAAGCAAAGGAGCCCTTGTGGTCAGAGGGGCCTCTGAAGCCT -269 -212

•Finding further orthologous sequences by hand

•defining the consensus sequence

Search in the DoOP CH-1.3 1000bp consensus motifs database

MOFEXT program, wordsize: 8, cutoff: 95%

Page 27: Endre Barta
Page 28: Endre Barta

No. Species Motif 7 Start End

1. Bos taurus tTTTATCTCATAGGCAAGGGAGC----------TTTGAAAGGGtt -745 -711

2. Homo sapiens CTTCATCTAATAGGCAAGGCAGCCATTGACAGCTCTCAAAGGGGG -834 -790

3. Loxodonta africana CTgtt----ATAGGCAAGGGAGCCATGGACAGCTTTCAAAGGGGG -822 -782

4. Macaca mulatta CTTCATCTCATAGGCAAGGCAGCCATTGACAGCTCTCAAAGGGGG -843 -799

5. Oryctolagus cuniculus CTTTATCTCCTAGG-AAGAGAGCCATTGACAGCTTCCCGAAGGGG -798 -755

6. Pan troglodytes CTTCATCTAATAGGCAAGGCAGCCATTGACAGCTCTCAAAGGGGG -834 -790

Matrilin-1 ARGCAAAGGRGCCATTG :.:::: :..:::::::MyBP-H AGGCAAGGSAGCCATTG

Similarity to the motif found in the promoter of the MyBP-H gene

Page 29: Endre Barta

Hits from other extracellular matrix specific genes

Matrilin-1 consensus CTTCTgCaRGCAAaGGRGCCaTTGTGGtcaG

Matrilin-1 TCTgCaRGCAAaGGRGCCa : : ::.:::::::. :::Collagen, type IV, alpha 2 TTTCCAGGCAAAGGGCCCA

Matrilin-1 aGGRGCCaTTGTGGTcaG ::..:::: ::. ::: :Cartilage intermediate AGRGGCCACTGKAGTCGG -layer protein

Matrilin-1 aRGCAAaGGRGCC melanoma-associated chondroitin :.:::::: .:::sulfate proteoglycan 4 AGGCAAAGCAGCC

Matrilin-1 CTgCaRGCAAaGGRGCC :: : .: ..:::.:::Collagen alpha 1(II) CTCCGAGGRRAGGGGCC

CTgCaRGCA Matrilin-1:::::.:::CTGCAGGCA Brain link protein 2

Page 30: Endre Barta

PE1 element in the promoter region of the chicken matrilin-1 gene (experimental data)

Rentsendorj, O

., Nagy, A

., Sinko, I., D

araba, A., B

arta, E. and K

iss I. (2005) „Highly

conserved proximal prom

oter element harbouring paired S

ox9-binding sites contributes to the tissue- and developm

ental stage-specific activity of the matrilin-1 gene.” B

iochem J. 389:705-

716

Page 31: Endre Barta

Searching for SOX9 binding sites in the DoOP promoter database Using the WWCAAWG consensus with FUZZNUC against all human

1000 bp promoters (no mismatch, search in complement) 21991 hits (0 mismatch) 370531 hits (1 mismatch)

Using the [AT][AT]CAA[AT]GN(1,3)C[AT]TTG[AT][AT] paired consensus (Bridgewater et al 2003, NAR 31:1541-53) with FUZZNUC (max two mismatches) 21492 hits

Using the WWCAAWG consensus with MOFEXT program (conserved motifs from 1000 bp DoOP promoters, wordsize 7, cutoff 81% = 1 mismatch) 51865 hits

Using the WWCAAWGNNNNCWTTGWW paired consensus with MOFEXT program (conserved motifs from 1000 bp DoOP promoters, wordsize 16, cutoff 71%) 358 hits Matrilin-1 motif m26

Score: 26 Hit GcTCTRGTTGCTTCTGCARGCAAAGGAGCCCTTGTGGTCaGAGGGGCCTCYgRAGCCY |||.| | |||. WWCAAWGNNNNCWTTGWW

Query

Page 32: Endre Barta

Studying the evolution of TFBSs Motifs conserved between fishes and mammals can be

considered as ancient motifs We need to refine the automatically generated motif

consensus collections in order to find real common motifs

Chordate DoOPV1.3 1000bp

Number of genes

Number of motifs

Number of genes after filtering

Number of motifs after filtering

Takifugu rubripes 447 2640 164 377

Danio rerio 467 3246 119 403

Tetraodon nigroviridis 448 2730 154 351

Sum 866 6435 329 898

Page 33: Endre Barta

Paired mesoderm homeobox protein 2B (Paired-like homeobox 2B) (PHOX2B homeodomain protein)

DoOP, chordate v1.3, 1000, 80400097

•choosing motif m16

•searching all chordate 1000 bp consensus sequences with the MOFEXT program using wordsize=10

Page 34: Endre Barta

No. Species Motif Start End

1. Bos taurus AAATTGGATCAGGGAGAATCGTCACCCAACTTTCATTATTTCCAAGTAGTGTGATTGAATTAAAGGGCAGGGAG -479 -406

2. Canis familiaris AAATTGGATCAGAGTGAATCGTCACCCAACTTTCATTATTTCCAAGTAGTGTGATTGAATTAAAGGGCAGGGAG -479 -406

3. Danio rerio AAATTGGATCAGGGAGAATCGTCACCCAACTTTCATTATTTCCAAGTGGTGTGATTGAATTAAAGGGCAA-GAt -598 -526

4. Echinops telfairi AAATTGGATCAGGGAGAATCGTCACCCAACTTTCATTATTTCCAAGTAGTGTGATTGAATTAAAGGGCAGGGAG -479 -406

5. Gallus gallus AAATTGGATCAGGGAGAATCGTCACCCAACTTTCATTATTTCCAAGTAGTGTGATTGAATTAAAGGGCAGGGAG -483 -410

6. Homo sapiens AAATTGGATCAGGGAGAATCGTCACCCAACTTTCATTATTTCCAAGTAGTGTGATTGAATTAAAGGGCAGGGAG -479 -406

7. Loxodonta africana AAATTGGATCAGGGAGAATCGTCACCCAACTTTCATTATTTCCAAGTAGTGTGATTGAATTAAAGGGCAGGGAG -479 -406

8. Macaca mulatta AAATTGGATCAGGGAGAATCGTCACCCAACTTTCATTATTTCCAAGTAGTGTGATTGAATTAAAGGGCAGGGAG -479 -406

9. Monodelphis domestica AAATTGGATCAGGGAGAATCGTCACCCAACTTTCATTATTTCCAAGTAGTGTGATTGAATTAAAGGGCAG-GAG -478 -406

10. Mus musculus AAATTGGATCAGGGAAAATCGTCACCCAACTTTCATTATTTCCAAGTAGCGTGATTGAATTAAAGGGCAG-GAG -478 -406

11. Oryctolagus cuniculus AAATTGGATCAGGGAGAATCGTCACCCAACTTTCATTATTTCCAAGCAGTGTGATTGAATTAAAGGGCAAGGAG -478 -405

12. Pan troglodytes AAATTGGATCAGGGAGAATCGTCACCCAACTTTCATTATTTCCAAGTAGTGTGATTGAATTAAAGGGCAGGGAG -479 -406

13. Rattus norvegicus AAATTGGATCAGGGAAAATCGTCACCCAACTTTCATTATTTCCAAGTAGCGTGATTGAATTAAAGGGCA-GGAG -478 -406

14. Takifugu rubripes AAATTGGATCAGGGAGAATCGTCACCCAACTTTCATTATTTCCAAGTGGTGTGATTGAATTAAAGGGCAGGATG -623 -550

15. Tetraodon nigroviridis AAATTGGATCAGGGAGAATCGTCACCCAACTTTCATTATTTCCAAGTGGTGTGATTGAATTAAAGGGCAGGATG -621 -548

16. Xenopus tropicalis AAATTGGATCAGGGAAAATCGTCACCCAACTTTCATTATTTGCAAAGAGTGTGATTGAATTAAAGGGCAAGGAG -484 -411

Consensus: AAATTGGATCAGgGagAATCGTCACCCAACTTTCATTATTTcCAAgtaGtGTGATTGAATTAAAGGGCAgGgag

Motif M16 from the DoOP database

Page 35: Endre Barta
Page 36: Endre Barta

Hits from MOFEXT search with the M16 motif of PHOX2B homeodomain protein gene promoter

AAATTGGATCAGgGagAATCGTCACCCAACTTTCATTATTTcCAAgtaGtGTGATTGAATTAAAGGGCAgGgag

PHOX2B GAGAATCGTCACCCAACTTTCATTATTTCCA :::::: : : : :::::::::::::: Unknown GAGAATTCTAATATATTTTTCATTATTTCCA

PHOX2B CCCAACTTTCATTATTTCCAAGPERB11 family member :::::::::::::: ::: :in MHC class I region24 CCCAACTTTCATTAGCACCAGG

PHOX2B TTTCATTATTTCCAAGTA ::::::: ::::::: ::RNA binding motif protein 18 TTTCATTCTTTCCAACTA

PHOX2B GATTGAATTAAAGGGCAGGGAG ::::: : : :::::::::::Protein FAM3D GATTGTTTAACAGGGCAGGGAG

PHOX2B AAGGGCAGGGAATPase family, AAA domain containing 1 ::::::::::: Voltage-dependent L-type calcium channel alpha-1D 17 AAGGGCAGGGAFAM35A Digestive tract-specific calpain stb.

Page 37: Endre Barta

Drawing regulation networks based on similar conserved motifs Four model cartilage specific

genes Finding long (30-35) bp.

conserved motifs using MEME Searching in DoOP 1000 bp.

motifs using MOFEXT program (ws:8 cutoff:81)

AGC1: Aggrecan (Chondroitin sulfate proteoglycan )

HAPLN1: Link protein (Proteoglycan link protein

MATN1: Matrilin-1 (Cartilage matrix protein

MATN3: Matrilin-3

Link protein, MEME motif 1:

Page 38: Endre Barta
Page 39: Endre Barta

Conclusions and future plans The DoOP database and the MOFEXT program is

suitable for the analysis of transcription regulation The Matrilin-1 promoter shows interesting features:

a longer conserved block with paired binding sites (1-2 mismatch is possible in each site)

These methods may be suitable for finding regulatory networks based on conserved motifs and studying evolution of TFBSs

We plan to further improve these methods To cluster conserved motifs To study paralogue evolution in promoter regions

(sub- or neofunctionalisation) What about plants?

Page 40: Endre Barta

Bioinformatics group: Endre Sebestyén, Tamás Pálfy, Tibor Nagy, Gábor Tóth

Students from the ELTE:

Áron Szenes, János Molnar

Collaborative partners:

Ibolya Kiss (BRC, Szeged) and László Nagy (DTE, Debrecen)

Swedish EMBnet node, UPPMAX computer facility