rna assembly using extending method. wei xueliang 2010-04-07

Post on 17-Dec-2015

222 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

RNA Assembly Using extending method.

Wei Xueliang2010-04-07

Overview

• Why abandon deBruijn.• Why abandon Extended deBruijn.• Introduction to current method.• Handle the old problem.• The new problem.• Todo

Why abandon deBruijn.• De Bruijn Graph’s (dis)advantage: – Very Fast. – Coverage distribution and K-Value affect a

lot

• Key : the coverage is not uniform distributed in the RNA assembly.– No best K value.

Why abandon deBruijn.

• The length of the red part is 27.

deBruijn Graph of K = 28

deBruijn Graph of K = 29

deBruijn Graph of K = 30

Why abandon deBruijn.• Key : The coverage is not uniform distributed

in the RNA assembly.– No best K value.

• Can we using different K to run the program many times?

• This is not De Novo Assembly’s job. – Time. – Provide high accurate contigs with-in limited time.– Scaffolding programs.

Why abandon Extended deBruijn.• My Extended de Bruijn method: – Using two or more K value at the same time.

Why abandon Extended deBruijn.

• The change rate of coverage is above my expectation. Need many K.

• The convert between different K are difficult. • Memory problem for big K. When K > 32, each

K-index need > 50G (with Data-Sets: 10G)

• Throw the K away.

Introduction to the new method

• From Pramila’s genome assembly method. • Start from any Tag and do a correction.• If successfully corrected, continue.

Introduction to the new method

• Find all the tag which have at least 24 bps overlaps. (Magic number)

• Using these overlapping tags to extend Base and continue add more tags.

Introduction to the new method

• How to find the overlapping tags fast and with mis-match?

• Index and Union:{Tag3}, {Tag2, Tag3}, {Tag3, Tag4}Union =>{Tag1, Tag2, Tag3, Tag4}

Introduction to the new method

• How to find the next overlapping tags fast and with mis-match?

• V1 <= U3• V2 <= (U1 << 1) + 0• V3 <= (U2 << 1) + 0

Handle the old problem.

• When the length of overlapping part < 24?

Handle the old problem.

• Check the tags one by one by descending order of the length of overlap.

Handle the old problem.

  A GOverlap Count % Count %

60 1 6.67% 1 4.76%52 3 20.00% 1 4.76%44 6 40.00% 2 9.52%36 10 66.67% 10 47.62%30 11 73.33% 16 76.19%24 15 100.00% 21 100.00%

Handle the old problem.

  A G(High Exp)Overlap Count % Count %

56 1 6.67% 5 2.50%50 3 20.00% 10 5.00%44 6 40.00% 20 10.00%36 10 66.67% 120 60.00%30 11 73.33% 150 75.00%24 15 100.00% 200 100.00%

Handle the old problem.

• Degree of approximation.

Handle the old problem.

• Less tips.

• Do not have bubbles. – Because we doing

overlap with mis-match.

– Use whole tags

The new problem.

• Speed.

• The tail of the tag often have more errors.– Reverse Extending Problem.

Todo

• Handle Reverse Extending Problem.• Speed

• Finish the comparision between deBruijn method(velvet) and my method.

• Paired End Tag.

• Thank you very much for attention.

top related