rick westerman purdue genomics westerman@purdue

8
Rick Westerman Purdue Genomics [email protected]

Upload: doyle

Post on 12-Jan-2016

33 views

Category:

Documents


0 download

DESCRIPTION

Rick Westerman Purdue Genomics [email protected]. blastx Nucleotide to protein database Denovo Transcriptome / RNAseq 30K – 150K sequences 300 – 5000 bases ~100 MB input file E-value is 10 -6 Up to 10M hits to 'nr' ~5000 CPU-hours. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Rick Westerman Purdue Genomics westerman@purdue

Rick Westerman

Purdue Genomics

[email protected]

Page 2: Rick Westerman Purdue Genomics westerman@purdue

blastxNucleotide to protein database

Denovo Transcriptome / RNAseq 30K – 150K sequences300 – 5000 bases~100 MB input fileE-value is 10-6

Up to 10M hits to 'nr'~5000 CPU-hours

Page 3: Rick Westerman Purdue Genomics westerman@purdue

Current method – RCAC clusters

1) Break up input into many ~200 KB files – about 500 of them.

2) Grab up to 250 8-cpu 'standby' nodes on RCAC clusters; 4 hour maximum

Note: use own queuing method (“chaining”)

3) Failures are manually caught and re-done.

4) Do above for each sample (experiment)

Page 4: Rick Westerman Purdue Genomics westerman@purdue

Condor method

1) Break up input into many ~40 KB files.

2) Toss all files onto Condor. Blast is setup to use 8 cpus. Only current restriction: 1 GB memory.

3) Condor retries up to 5 times. After that failures are manually caught and re-done.

4) Do above for each sample (experiment)

Page 5: Rick Westerman Purdue Genomics westerman@purdue

Accuracy Reliability Speed

Use cases

Sample Sequences Bases N50 length

#1 43 K 16 M 439

#2 40 K 15 M 412

#3 105 K 49 M 479

#4 145 K 72 M 509

#5 85 K 36 M 454

#6 76 K 127 M 4275

– plant --insect

Page 6: Rick Westerman Purdue Genomics westerman@purdue

Sample Sequences Bases N50 length

Clusterwall time

Condorwall time

#1 43 K 16 M 439 5:20 2:40

#2 40 K 15 M 412 5:30 2:30

#3 105 K 49 M 479 7:15 3:30

#4 145 K 72 M 509 8:20 3:20

#5 85 K 36 M 454 4:40 failed

#6 76 K 127 M 4275 4:50 95% failed

– weekend --

Page 7: Rick Westerman Purdue Genomics westerman@purdue

Case #6 failure reasons

1650 jobs …which started up 11,500+ times ...

5919 Abnormal termination (signal 1) 3667 Normal termination (return value 129) 2034 Job was evicted. 85 Abnormal termination (signal 9) 74 Normal termination (return value 0) 1 Normal termination (return value 1)

Page 8: Rick Westerman Purdue Genomics westerman@purdue