rick westerman purdue genomics westerman@purdue
DESCRIPTION
Rick Westerman Purdue Genomics [email protected]. blastx Nucleotide to protein database Denovo Transcriptome / RNAseq 30K – 150K sequences 300 – 5000 bases ~100 MB input file E-value is 10 -6 Up to 10M hits to 'nr' ~5000 CPU-hours. - PowerPoint PPT PresentationTRANSCRIPT
blastxNucleotide to protein database
Denovo Transcriptome / RNAseq 30K – 150K sequences300 – 5000 bases~100 MB input fileE-value is 10-6
Up to 10M hits to 'nr'~5000 CPU-hours
Current method – RCAC clusters
1) Break up input into many ~200 KB files – about 500 of them.
2) Grab up to 250 8-cpu 'standby' nodes on RCAC clusters; 4 hour maximum
Note: use own queuing method (“chaining”)
3) Failures are manually caught and re-done.
4) Do above for each sample (experiment)
Condor method
1) Break up input into many ~40 KB files.
2) Toss all files onto Condor. Blast is setup to use 8 cpus. Only current restriction: 1 GB memory.
3) Condor retries up to 5 times. After that failures are manually caught and re-done.
4) Do above for each sample (experiment)
Accuracy Reliability Speed
Use cases
Sample Sequences Bases N50 length
#1 43 K 16 M 439
#2 40 K 15 M 412
#3 105 K 49 M 479
#4 145 K 72 M 509
#5 85 K 36 M 454
#6 76 K 127 M 4275
– plant --insect
Sample Sequences Bases N50 length
Clusterwall time
Condorwall time
#1 43 K 16 M 439 5:20 2:40
#2 40 K 15 M 412 5:30 2:30
#3 105 K 49 M 479 7:15 3:30
#4 145 K 72 M 509 8:20 3:20
#5 85 K 36 M 454 4:40 failed
#6 76 K 127 M 4275 4:50 95% failed
– weekend --
Case #6 failure reasons
1650 jobs …which started up 11,500+ times ...
5919 Abnormal termination (signal 1) 3667 Normal termination (return value 129) 2034 Job was evicted. 85 Abnormal termination (signal 9) 74 Normal termination (return value 0) 1 Normal termination (return value 1)