next generation sequences chloroplast assembly ngs...

26
Next Generation Sequences & Chloroplast Assembly NGS data를 이용한 엽록체 유전체 조립

Upload: others

Post on 07-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Next Generation Sequences Chloroplast Assembly NGS …amborella.net/2014-Bioinformatics/Week15-1-NGS-assembly.pdf · 2014-12-11 · De Brujin Graph Algorithm For Alignment (1) ‐This

Next Generation Sequences &Chloroplast Assembly

NGS data를이용한엽록체유전체조립

Page 2: Next Generation Sequences Chloroplast Assembly NGS …amborella.net/2014-Bioinformatics/Week15-1-NGS-assembly.pdf · 2014-12-11 · De Brujin Graph Algorithm For Alignment (1) ‐This

Next Generation Sequencing (NGS)  Technologies

‐여러가지종류의 NGS 가개발되었는데, 현재가장많이쓰이고있는것은

Illumina/Solexa platform 임.

SOLiD; ABI

GS‐Titanium; Roche 454

SMRT; Pacific Bioscience Helicos; Helicos Bioscience

Illumina; Solexa

Page 3: Next Generation Sequences Chloroplast Assembly NGS …amborella.net/2014-Bioinformatics/Week15-1-NGS-assembly.pdf · 2014-12-11 · De Brujin Graph Algorithm For Alignment (1) ‐This

NGS: 454 Technology

Page 4: Next Generation Sequences Chloroplast Assembly NGS …amborella.net/2014-Bioinformatics/Week15-1-NGS-assembly.pdf · 2014-12-11 · De Brujin Graph Algorithm For Alignment (1) ‐This

Microtiter plate  X80

Page 5: Next Generation Sequences Chloroplast Assembly NGS …amborella.net/2014-Bioinformatics/Week15-1-NGS-assembly.pdf · 2014-12-11 · De Brujin Graph Algorithm For Alignment (1) ‐This

NGS: Solexa Technology   

Page 6: Next Generation Sequences Chloroplast Assembly NGS …amborella.net/2014-Bioinformatics/Week15-1-NGS-assembly.pdf · 2014-12-11 · De Brujin Graph Algorithm For Alignment (1) ‐This

NGS: Solexa Technology 

Page 7: Next Generation Sequences Chloroplast Assembly NGS …amborella.net/2014-Bioinformatics/Week15-1-NGS-assembly.pdf · 2014-12-11 · De Brujin Graph Algorithm For Alignment (1) ‐This

NGS: Solexa Technology 

Page 8: Next Generation Sequences Chloroplast Assembly NGS …amborella.net/2014-Bioinformatics/Week15-1-NGS-assembly.pdf · 2014-12-11 · De Brujin Graph Algorithm For Alignment (1) ‐This

NGS: Solexa Technology 

Page 9: Next Generation Sequences Chloroplast Assembly NGS …amborella.net/2014-Bioinformatics/Week15-1-NGS-assembly.pdf · 2014-12-11 · De Brujin Graph Algorithm For Alignment (1) ‐This

Capacities of Next Generation Sequencers

Solexa GA2; Illumina

SOLiD 4; ABI

GS‐Titanium; Roche 454

ABI 3730; ABI

384 x 700 bp = 268,800 bp = 269Kb (per one reaction / 1 hr)

950,000 x 450 bp = 405,000,000 bp = 405Mb (per one reaction / 2‐3 days)

35,000,000 x 7 x (151 x 2) bp = 73,990,000,000  bp = 74.0 Gb (per one reaction / 12 days)

1,400,000,000 x 75 bp (50+25) = 105,500,000,000 bp = 105.5Gb (per one reaction / 11 days)HiSeq2000; Illumina

40G  X 7  = 1280 Gb (per one reaction / 8 days)

Page 10: Next Generation Sequences Chloroplast Assembly NGS …amborella.net/2014-Bioinformatics/Week15-1-NGS-assembly.pdf · 2014-12-11 · De Brujin Graph Algorithm For Alignment (1) ‐This

Genome Assembly ProcessesWith NGS Sequences

Page 11: Next Generation Sequences Chloroplast Assembly NGS …amborella.net/2014-Bioinformatics/Week15-1-NGS-assembly.pdf · 2014-12-11 · De Brujin Graph Algorithm For Alignment (1) ‐This

Genome assembly의과정

Page 12: Next Generation Sequences Chloroplast Assembly NGS …amborella.net/2014-Bioinformatics/Week15-1-NGS-assembly.pdf · 2014-12-11 · De Brujin Graph Algorithm For Alignment (1) ‐This

Illumina NGS machine에서나오는데이터: “paired end read”양쪽에각각 100 bp정도의알고있는시퀀스가있고,중간에약 300 bp정도의모르는시퀀스가있는 read.Read: 한번반응에의해결정되는한가닥의염기서열

100 100300

Page 13: Next Generation Sequences Chloroplast Assembly NGS …amborella.net/2014-Bioinformatics/Week15-1-NGS-assembly.pdf · 2014-12-11 · De Brujin Graph Algorithm For Alignment (1) ‐This

How to Find Overlapped Sequences?

‐ Using dynamic algorithm, we can make the program for finding similar sequences.

‐ Complexity of this algorithm is O(n3).

http://www.avatar.se/molbioinfo2001/dynprog/dynamic.html

두개의염기서열을정렬하는방법은이미배운바있음

Page 14: Next Generation Sequences Chloroplast Assembly NGS …amborella.net/2014-Bioinformatics/Week15-1-NGS-assembly.pdf · 2014-12-11 · De Brujin Graph Algorithm For Alignment (1) ‐This

Famous Bioinformatics Tools for Alignments

‐ Global alignment : ClustalW, T‐coffee, and MUSCLE

‐ Local alignment : FASTA, and BLAST (Basic Local Alignment Searching Tools) 

provided by NCBI.

‐ Pair‐wise alignment : BLAST and FASTA

‐Multiple sequence alignment : ClustalW, T‐coffee, MUSCLE, and etc.

유명한일반염기서열정렬프로그램들

Page 15: Next Generation Sequences Chloroplast Assembly NGS …amborella.net/2014-Bioinformatics/Week15-1-NGS-assembly.pdf · 2014-12-11 · De Brujin Graph Algorithm For Alignment (1) ‐This

Example of Genome Assembly: Vitis Vinifera

Pair‐wise comparison of 6,200,000 reads

6,200,000C2 = 19,219,996,900,000 comparisons

그러나 NGS data는너~~~~무개수가많음

Page 16: Next Generation Sequences Chloroplast Assembly NGS …amborella.net/2014-Bioinformatics/Week15-1-NGS-assembly.pdf · 2014-12-11 · De Brujin Graph Algorithm For Alignment (1) ‐This

De brujin Graph: Alternative Method For Alignment수학자들이 NGS data 정렬을위해새로운방법을개발함.

Page 17: Next Generation Sequences Chloroplast Assembly NGS …amborella.net/2014-Bioinformatics/Week15-1-NGS-assembly.pdf · 2014-12-11 · De Brujin Graph Algorithm For Alignment (1) ‐This

De Brujin Graph Algorithm For Alignment (1)

‐ This algorithm has been utilized for finding overlapped short‐read sequences 

quickly.

‐ This algorithm consists of three parts:

i) Generating k‐mer sequences

ii) Constructing de brujin graph

iii) Resolving the graph with generating sequences

이론은이해하기매우어렵지만…

이론은이해하기매우어렵지만…K 개로이루어진염기서열의틀을만들어놓고 read 들을부어넣는개념임.그러므로 De Burujin graph 이론을적용한 assembly를위해서는1) 여러가지 K 값을적용하여 assembly 해보고2) 이들중가장좋은것을선택하는방식을취함. 

Page 18: Next Generation Sequences Chloroplast Assembly NGS …amborella.net/2014-Bioinformatics/Week15-1-NGS-assembly.pdf · 2014-12-11 · De Brujin Graph Algorithm For Alignment (1) ‐This

K = 3GCAAAACACTTA…

GCACAAAAAAAAAAC

De brujin Graph Algorithm For Alignment (2)

ACACAC

1‐3

1‐4

1‐1 1‐2

1‐5

1‐6

1‐7 1‐8

1‐9

ACTCTTTTA

1‐10

Page 19: Next Generation Sequences Chloroplast Assembly NGS …amborella.net/2014-Bioinformatics/Week15-1-NGS-assembly.pdf · 2014-12-11 · De Brujin Graph Algorithm For Alignment (1) ‐This

ACACACACT

1‐3

1‐4

1‐1 1‐2

1‐5

1‐6

1‐7 1‐8

1‐9

1‐10

GCAAAACACTTA…

De brujin Graph Algorithm For Alignment (3)

CTTTTA

ACACTTATTCGT

TATATTTTCTCGCGT

TAT

ATTTTC

TCGCGTK = 3

2‐1

2‐2 2‐3

2‐4

2‐5

2‐6

2‐7

2‐8

2‐92‐10

Page 20: Next Generation Sequences Chloroplast Assembly NGS …amborella.net/2014-Bioinformatics/Week15-1-NGS-assembly.pdf · 2014-12-11 · De Brujin Graph Algorithm For Alignment (1) ‐This

De brujin Graph Algorithm For Alignment (4)

1‐3

1‐4

1‐1 1‐2

1‐5

1‐6

1‐7 1‐8

1‐9

1‐10

TAT

ATTTTC

TCGCGT

2‐1

2‐2 2‐3

2‐4

2‐5

2‐6

2‐7

2‐8

2‐92‐10

GCAAAACACTTATTCGT

Page 21: Next Generation Sequences Chloroplast Assembly NGS …amborella.net/2014-Bioinformatics/Week15-1-NGS-assembly.pdf · 2014-12-11 · De Brujin Graph Algorithm For Alignment (1) ‐This

Genome Assemblers For NGS Sequences (1)

Page 22: Next Generation Sequences Chloroplast Assembly NGS …amborella.net/2014-Bioinformatics/Week15-1-NGS-assembly.pdf · 2014-12-11 · De Brujin Graph Algorithm For Alignment (1) ‐This

Velvet Assembler (1)

velveth

velvetg

~exeman/velvet_1.1.03/velveth k31 31 ‐shortPaired ‐fastq 90000.0‐1.fastq 90000.0‐2.fastq

~exeman/velvet_1.1.03/velvetg k31 ‐ins_length 500 ‐cov_cutoff auto ‐exp_cov auto

Page 23: Next Generation Sequences Chloroplast Assembly NGS …amborella.net/2014-Bioinformatics/Week15-1-NGS-assembly.pdf · 2014-12-11 · De Brujin Graph Algorithm For Alignment (1) ‐This

Velvet Assembler (2)

Page 24: Next Generation Sequences Chloroplast Assembly NGS …amborella.net/2014-Bioinformatics/Week15-1-NGS-assembly.pdf · 2014-12-11 · De Brujin Graph Algorithm For Alignment (1) ‐This

Velvet Assembler (3)

Statistics of assembled sequences

Page 25: Next Generation Sequences Chloroplast Assembly NGS …amborella.net/2014-Bioinformatics/Week15-1-NGS-assembly.pdf · 2014-12-11 · De Brujin Graph Algorithm For Alignment (1) ‐This

N50

• 유전체수준에서 assembly가잘되었는지안되었는지는 N50 값으로알수

있음.

• 전체 contig들을크기순으로배열

• 큰것으로부터크기를차례로더하여더한값이전체유전체크기의 50%를

넘는순간의 contig의크기를 N50라함.

• N50 는평균값또는중간값과는다른의미의수치임!!!

• 3 3 4 6 7 8 8 9 9 9 10 11 13  25  의길이의 contig들이있을때

Mean = 125/14= 8.93 (sum=125)

Median = (3+25)/2= 14

125/2= 62.5 sum of contig lengths reach to 62.5  

(from the largest to the smallest)

25+13+11+10=59

25+13+11+10+9=68    

Therefore, N50 = 9

Page 26: Next Generation Sequences Chloroplast Assembly NGS …amborella.net/2014-Bioinformatics/Week15-1-NGS-assembly.pdf · 2014-12-11 · De Brujin Graph Algorithm For Alignment (1) ‐This

N50  계산의예

Contig번호 각각의 contig길이 누적값1 33407 334077 15243 486506 14172 62822 52860.5 N50= 141728 9250 720722 8275 80347

10 5714 860613 5406 91467

13 5227 966944 3683 100377

11 2849 1032269 1251 104477

14 663 1051405 442 105582

12 136 10571816 2 10572015 1 105721

합계 105721

합계 /2 52860.5