integrated gene prediction for eukaryotic and prokaryotic ... · may 2012 – plume –...

24
May 2012 PLUME [email protected] Integrated gene prediction for eukaryotic and prokaryotic genome using EuGène Erika Sallet , Jérôme Gouzy, Brice Roux, Delphine Capela, Laurent Sauviac, Claude Bruand, Pascal Gamas and Thomas Schiex Journée PLUME Biologie

Upload: others

Post on 29-May-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Integrated gene prediction for eukaryotic and prokaryotic ... · May 2012 – PLUME – Erika.Sallet@toulouse.inra.fr Structural annotation of prokaryotic genomes Sequencing technology

May 2012 – PLUME – [email protected]

Integrated gene prediction for eukaryotic and prokaryotic genome

using EuGène

Erika Sallet, Jérôme Gouzy, Brice Roux, Delphine Capela, Laurent Sauviac, Claude Bruand, Pascal Gamas and Thomas Schiex

Journée PLUME Biologie

Page 2: Integrated gene prediction for eukaryotic and prokaryotic ... · May 2012 – PLUME – Erika.Sallet@toulouse.inra.fr Structural annotation of prokaryotic genomes Sequencing technology

May 2012 – PLUME – [email protected]

Terminology: eukaryotic gene structure

Page 3: Integrated gene prediction for eukaryotic and prokaryotic ... · May 2012 – PLUME – Erika.Sallet@toulouse.inra.fr Structural annotation of prokaryotic genomes Sequencing technology

May 2012 – PLUME – [email protected]

Structural annotation of prokaryotic genomes

►Sequencing technology gives access to prokaryote transcriptome

►Need to reconsider prokaryotic genome annotation taking into account such data

►Already used in eukaryote gene prediction

Page 4: Integrated gene prediction for eukaryotic and prokaryotic ... · May 2012 – PLUME – Erika.Sallet@toulouse.inra.fr Structural annotation of prokaryotic genomes Sequencing technology

May 2012 – PLUME – [email protected]

RNA-seq

► Strong gene density in prokaryotic genomes

RNAseq reads not so informative ?

Apollo view (Lewis SE et al Genome Biology 2002)

Page 5: Integrated gene prediction for eukaryotic and prokaryotic ... · May 2012 – PLUME – Erika.Sallet@toulouse.inra.fr Structural annotation of prokaryotic genomes Sequencing technology

May 2012 – PLUME – [email protected]

Oriented RNA-seq

Informative signal to detect transcription

Page 6: Integrated gene prediction for eukaryotic and prokaryotic ... · May 2012 – PLUME – Erika.Sallet@toulouse.inra.fr Structural annotation of prokaryotic genomes Sequencing technology

May 2012 – PLUME – [email protected]

EuGène http://eugene.toulouse.inra.fr

► Integrative modular eukaryotic gene finder

► Integrates different kinds of evidence such as: Statistical properties of region Known protein similarity, EST (Expressed Sequence Tag)

► Extensible and generic software

►Artistic license

► Forge https://mulcyber.toulouse.inra.fr/projects/eugene

Page 7: Integrated gene prediction for eukaryotic and prokaryotic ... · May 2012 – PLUME – Erika.Sallet@toulouse.inra.fr Structural annotation of prokaryotic genomes Sequencing technology

►Contributors

Created by Thomas Schiex (INRA, Applied Mathematics and Informatics) in 1999

8 different developers

Current EuGène team = 3 members

►Users

Intended for bioinformaticians

We are the main user of EuGène

►Genome annotation of M. truncatula, S. lycopersicum,

A. thaliana, fungi (Botrytis cinerea, …), M. incognita, …

May 2012 – PLUME – [email protected]

EuGène http://eugene.toulouse.inra.fr

Page 8: Integrated gene prediction for eukaryotic and prokaryotic ... · May 2012 – PLUME – Erika.Sallet@toulouse.inra.fr Structural annotation of prokaryotic genomes Sequencing technology

May 2012 – PLUME – [email protected]

Eukaryotic gene

►= A sequence of “regions” separated by “sites”

Page 9: Integrated gene prediction for eukaryotic and prokaryotic ... · May 2012 – PLUME – Erika.Sallet@toulouse.inra.fr Structural annotation of prokaryotic genomes Sequencing technology

May 2012 – PLUME – [email protected]

EuGène eukaryotic automaton

Page 10: Integrated gene prediction for eukaryotic and prokaryotic ... · May 2012 – PLUME – Erika.Sallet@toulouse.inra.fr Structural annotation of prokaryotic genomes Sequencing technology

How does EuGène work ? ► Transforms a nucleic sequence into a graph ► Edits/weights it according to biological & statistical evidence

May 2012 – PLUME – [email protected]

CATGACGTAGGCCTACGTGCACATGCAAGCTAC CCTACGTGCACATGCAAGCTAC

Page 11: Integrated gene prediction for eukaryotic and prokaryotic ... · May 2012 – PLUME – Erika.Sallet@toulouse.inra.fr Structural annotation of prokaryotic genomes Sequencing technology

How does EuGène work ?

CATGACGTAGGCCTACGTGCACATGCAAGCTAC CCTACGTGCACATGCAAGCTAC

► Transforms a nucleic sequence into a graph ► Edits/weights it according to biological & statistical evidence

May 2012 – PLUME – [email protected]

Page 12: Integrated gene prediction for eukaryotic and prokaryotic ... · May 2012 – PLUME – Erika.Sallet@toulouse.inra.fr Structural annotation of prokaryotic genomes Sequencing technology

How does EuGène work ?

CATGACGTAGGCCTACGTGCACATGCAAGCTAC CCTACGTGCACATGCAAGCTAC

► Transforms a nucleic sequence into a graph ► Edits/weights it according to biological & statistical evidence

May 2012 – PLUME – [email protected]

Page 13: Integrated gene prediction for eukaryotic and prokaryotic ... · May 2012 – PLUME – Erika.Sallet@toulouse.inra.fr Structural annotation of prokaryotic genomes Sequencing technology

How does EuGène work ?

CATGACGTAGGCCTACGTGCACATGCAAGCTAC CCTACGTGCACATGCAAGCTAC

► Transforms a nucleic sequence into a graph ► Edits/weights it according to biological & statistical evidence

May 2012 – PLUME – [email protected]

Page 14: Integrated gene prediction for eukaryotic and prokaryotic ... · May 2012 – PLUME – Erika.Sallet@toulouse.inra.fr Structural annotation of prokaryotic genomes Sequencing technology

How does EuGène work ?

CATGACGTAGGCCTACGTGCACATGCAAGCTAC CCTACGTGCACATGCAAGCTAC

► Transforms a nucleic sequence into a graph ► Edits/weights it according to biological & statistical evidence

May 2012 – PLUME – [email protected]

Page 15: Integrated gene prediction for eukaryotic and prokaryotic ... · May 2012 – PLUME – Erika.Sallet@toulouse.inra.fr Structural annotation of prokaryotic genomes Sequencing technology

How does EuGène work ?

CATGACGTAGGCCTACGTGCACATGCAAGCTAC CCTACGTGCACATGCAAGCTAC

► Transforms a nucleic sequence into a graph ► Edits/weights it according to biological & statistical evidence

May 2012 – PLUME – [email protected]

Page 16: Integrated gene prediction for eukaryotic and prokaryotic ... · May 2012 – PLUME – Erika.Sallet@toulouse.inra.fr Structural annotation of prokaryotic genomes Sequencing technology

How does EuGène work ?

CATGACGTAGGCCTACGTGCACATGCAAGCTAC CCTACGTGCACATGCAAGCTAC

► Transforms a nucleic sequence into a graph ► Edits/weights it according to biological & statistical evidence ► A gene structure = a path in the graph ► The prediction = an optimal path in the graph

May 2012 – PLUME – [email protected]

Page 17: Integrated gene prediction for eukaryotic and prokaryotic ... · May 2012 – PLUME – Erika.Sallet@toulouse.inra.fr Structural annotation of prokaryotic genomes Sequencing technology

May 2012 – PLUME – [email protected]

From eukaryote to prokaryote

Page 18: Integrated gene prediction for eukaryotic and prokaryotic ... · May 2012 – PLUME – Erika.Sallet@toulouse.inra.fr Structural annotation of prokaryotic genomes Sequencing technology

May 2012 – PLUME – [email protected]

EuGène prokaryote automaton

► Prokaryotic genomes:

Protein genes on both strands (possibly overlapping)

Transcribed regions (operon)

RNA genes (possibly antisense)

Page 19: Integrated gene prediction for eukaryotic and prokaryotic ... · May 2012 – PLUME – Erika.Sallet@toulouse.inra.fr Structural annotation of prokaryotic genomes Sequencing technology

May 2012 – PLUME – [email protected]

Integrating evidence

► Translation starts

Ribosome hybridation energetic model (1)

►Transcription

Based on the expression level of RNAseq data

► Transcription start

Simple signal processing of RNAseq data

► Segmentation of transcription profiles G. Rigaill, S.

Robin. An exact Algorithm for the Segmentation of NGS Profiles Using Compression

1) FrameD: a flexible program for quality check and gene prediction in prokaryotic genomes and noisy matured eukaryotic sequences T Schiex, J Gouzy, A Moisan, and Y de Oliveira Nucl. Acids. Res. 2003 31: 3738-3741.

Page 20: Integrated gene prediction for eukaryotic and prokaryotic ... · May 2012 – PLUME – Erika.Sallet@toulouse.inra.fr Structural annotation of prokaryotic genomes Sequencing technology

May 2012 – PLUME – [email protected]

► Sinorhizobium meliloti Nitrogen fixing bacteria Symbiosis with legumes (ex : M. truncatula) 6.69 Mb (62% GC)

► EuGène « recipe » to annotate the genome 19 RNAseq oriented libraries (48Gb)

►Paired-end and single-end Solexa reads ►6 triplicates ►3 biological conditions

Results of BlastX vs SwissProt and Sm1021 RNAmmer and tRNAscanSE results IMM (Interpolated Markov Model)

Application to Sinorhizobium meliloti Sm2011

Page 21: Integrated gene prediction for eukaryotic and prokaryotic ... · May 2012 – PLUME – Erika.Sallet@toulouse.inra.fr Structural annotation of prokaryotic genomes Sequencing technology

May 2012 – PLUME – [email protected]

Application to Sinorhizobium meliloti Sm2011

Prediction strand +

Lib 2 long inserts

Lib1 short inserts

Lib2 short inserts

Lib 1 long inserts

Prediction strand -

Operon CDS UTRs

ncRNA

Page 22: Integrated gene prediction for eukaryotic and prokaryotic ... · May 2012 – PLUME – Erika.Sallet@toulouse.inra.fr Structural annotation of prokaryotic genomes Sequencing technology

►Results of the EuGène prediction 6 367 protein coding genes

►135 new (compared to Sm1021)

►Covering 87.35% of the genome

1 969 ncRNA genes (54 tRNA, 9 rRNA) ►Covering 2.98% of the genome

►Published ncRNA (Valverde 2008): 76/140 found

►Antisense ncRNA candidate

88.8% of the genome is annotated (86,5% previously)

May 2012 – PLUME – [email protected]

Overall

Page 23: Integrated gene prediction for eukaryotic and prokaryotic ... · May 2012 – PLUME – Erika.Sallet@toulouse.inra.fr Structural annotation of prokaryotic genomes Sequencing technology

May 2012 – PLUME – [email protected]

Many thanks

►ANR Symbimics projects participants:

Pascal Gamas (coordinator)

Brice Roux, Claude Bruand, Delphine Capela, Laurent Sauviac, Nathalie Rhodde

►“EuGène team” members

Thomas Schiex and Jérôme Gouzy

http://eugene.toulouse.inra.fr

Page 24: Integrated gene prediction for eukaryotic and prokaryotic ... · May 2012 – PLUME – Erika.Sallet@toulouse.inra.fr Structural annotation of prokaryotic genomes Sequencing technology

May 2012 – PLUME – [email protected]