an overview of the regcreative jamboree -...

11
An overview of the RegCreative Jamboree

Upload: dinhthu

Post on 22-Aug-2019

215 views

Category:

Documents


0 download

TRANSCRIPT

An overview of the RegCreative Jamboree

A curated database of DNase I footprints in D. melanogaster

Galas and Schmitz (1978)

• “Current” compilations of transcription factor binding sites (e.g. Transfac) are incomplete and not linked to genome

• DNase I footprints are a high quality, abundant source of binding site data

• Can be used for genome annotation, PWM construction, motif inference, cis-regulatory prediction, comparative genomics/molecular evolution, systems biology, text mining & ...

• Flybase 4.1 currently has only 90 binding sites & 27 enhancers annotated

FlyReg: Materials, Methods & Results

200+ articles, 800+ authors, 50+ pers. comm., 20+ years100+ regions, 1350+ TFBSs, 85+ TFs

Custom binding site annotation widgets

Base Position

Chromosome Band

Conservation

d_yakuba

d_pseudoobscura

a_gambiae

5034500 5035000 5035500 5036000 5036500 5037000 5037500 5038000 5038500 5039000 5039500 5040000 5040500 5041000Chromosome Bands

Protein-Coding Genes from FlyBase

Non-Coding Genes from FlyBaseFlyReg: Drosophila DNase I Footprint Database

D.mel./D.yakuba/D.pseudoob./A.gambiae Multiz Alignments & phastCons Scores

46C10

eve

eveUnspecified

evettk

UnspecifiedUnspecified

knihbhbknihbknihb

hbkni

hbhb

kni

hbhb

hbhb

KrKrKrbcd

Krgt

bcdgt

KrKr

Krbcd

KrKr

bcd

Krgt

hbKr

bcd

Kr

hbKr

hb

UnspecifiedUnspecifiedUnspecified

ttkUnspecified

ttkUnspecified

prdeve

UnspecifiedUnspecified

eveprd

UnspecifiedUnspecifiedUnspecifiedUnspecified

Unspecified

FlyReg database of Drosophila DNAse I footprints

Data imported by UCSC, FlyBase, FlyMine, Ensembl, Transfac, FlyTF, REDfly & ORegAnno

cis-regulatory annotation & systems biology

shn

Abd-A

fkh

ko

Dll

dpp

mus209

tsh

bcd

salm

Antp

dl

Ubx

zen

kni

ftz

eve

hb

tll

Kr

Trl

grh

cad

h

en

gt

ttk

cis-regulatory annotation & systems biology

A partial timeline of events leading up to the RegCreative Jamboree

mid 2004 - E. Birney starts the “cis-regulation” mailing list

late 2004 - FlyReg database released

early 2005 - Proposal for a mammalian cis-reg database

mid 2005 - Informal meeting at EBI to discuss Ensembl regulatory schema & curation tools

late 2005 - One-day workshop at EBI to discuss Ensembl regulatory schema, curation tools & virtual jamboree

late 2005 - ORegAnno & PAZAR released

A partial timeline of events leading up to the RegCreative Jamboree

early 2006 - Proposal for a cis-regulatory BioCreative text-mining challenge

early 2006 - Discussion about using annotation jamboree to create training datasets for BioCreative challenge

mid 2006 - Funding from ENFIN, Biosapiens, FWO, Genome Canada

late 2006 - RegCreative Jamboree !!

mid 2006 - Further development of Oreganno (e.g. queue)

Some goals of the RegCreative Jamboree

Improve standards & infrastructure for regulatory curation

Evaluate inter-annotator consistency

Identify opportunities for text-mining assisted regulatory curation

Clarify specific aims for regulatory text-mining challenge

Develop criteria for text-mining challenge data sets

Increase amount of annotated regulatory sequence data

A) Recover text that proves a known TF-target gene interaction:We will provide TF and target gene name pairs, a TF-target gene interaction and the associated publication. Participants will have to provide a part(s) of the document that would (to a human expert) prove the original annotation.

B) Identify evidence supporting a TF-target gene interaction using an evidence code ontology:We will provide TF and target gene name pairs, a TF-target gene interaction and the associated publication. Participants will have to provide the type of experimental evidence that would support the original annotation.

C) Identify TF-target gene interaction(s) from known gene names:We will provide TF and target gene names and the associated publication with an interaction for this gene pair. Participants will have to 'annotate' automatically the TF-target gene according to the information in this paper and provide a part(s) of the document to prove the original annotation.

D) Selection of relevant papers from a list of known TF gene names:We will provide a list of transcription factor names and a (probably high) number of papers of which most are irrelevant for the protein. The participants will have to detect which papers are relevant for a transcription factor in the sense that they contain information about TF-target gene interactions.

A draft BioCreative regulatory challenge agenda