master-worker tutorial condor week 2006
DESCRIPTION
Master-Worker Tutorial Condor Week 2006. Agenda. What is M-W When to use M-W How to build a simple M-W application Q & A. Why M-W?. M-W addresses a weakness in Condor: Short jobs Also, for dynamic, parallel workflows. A Condor Job…. An easy solution:. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Master-Worker Tutorial Condor Week 2006](https://reader036.vdocuments.site/reader036/viewer/2022062315/56815b1b550346895dc8cbe9/html5/thumbnails/1.jpg)
Greg ThainComputer Sciences DepartmentUniversity of Wisconsin-Madison
[email protected]://www.cs.wisc.edu/condor/mw
Master-WorkerTutorial
Condor Week 2006
![Page 2: Master-Worker Tutorial Condor Week 2006](https://reader036.vdocuments.site/reader036/viewer/2022062315/56815b1b550346895dc8cbe9/html5/thumbnails/2.jpg)
www.cs.wisc.edu/condor/mw
Agenda
› What is M-W
› When to use M-W
› How to build a simple M-W application
› Q & A
![Page 3: Master-Worker Tutorial Condor Week 2006](https://reader036.vdocuments.site/reader036/viewer/2022062315/56815b1b550346895dc8cbe9/html5/thumbnails/3.jpg)
www.cs.wisc.edu/condor/mw
Why M-W?
› M-W addresses a weakness in Condor:
Short jobs
› Also, for dynamic, parallel workflows
![Page 4: Master-Worker Tutorial Condor Week 2006](https://reader036.vdocuments.site/reader036/viewer/2022062315/56815b1b550346895dc8cbe9/html5/thumbnails/4.jpg)
www.cs.wisc.edu/condor/mw
A Condor Job…
![Page 5: Master-Worker Tutorial Condor Week 2006](https://reader036.vdocuments.site/reader036/viewer/2022062315/56815b1b550346895dc8cbe9/html5/thumbnails/5.jpg)
www.cs.wisc.edu/condor/mw
An easy solution:
› Why not just wrap up smaller jobs into a bigger Condor job? Partial failures? Load balancing? Dynamic creation of work?
![Page 6: Master-Worker Tutorial Condor Week 2006](https://reader036.vdocuments.site/reader036/viewer/2022062315/56815b1b550346895dc8cbe9/html5/thumbnails/6.jpg)
www.cs.wisc.edu/condor/mw
Solution: Lightweight Tasks
Multiplexed on top of Jobs
› Process : Thread :: Condor Job : MW Task
› MWTask dispatch in milliseconds, Condor job can take minutes
![Page 7: Master-Worker Tutorial Condor Week 2006](https://reader036.vdocuments.site/reader036/viewer/2022062315/56815b1b550346895dc8cbe9/html5/thumbnails/7.jpg)
www.cs.wisc.edu/condor/mw
MW is…
› C++ Framework
› To re-use condor worker jobs
› To each run many tasks
› Results in very parallel application
![Page 8: Master-Worker Tutorial Condor Week 2006](https://reader036.vdocuments.site/reader036/viewer/2022062315/56815b1b550346895dc8cbe9/html5/thumbnails/8.jpg)
www.cs.wisc.edu/condor/mw
MW is not
› MPI
› General parallel programming scheme
![Page 9: Master-Worker Tutorial Condor Week 2006](https://reader036.vdocuments.site/reader036/viewer/2022062315/56815b1b550346895dc8cbe9/html5/thumbnails/9.jpg)
www.cs.wisc.edu/condor/mw
MW in action
condor_submit
Submit machine
T T T T T T T T
Master exe
T
T
TWorker
Worker
Worker
![Page 10: Master-Worker Tutorial Condor Week 2006](https://reader036.vdocuments.site/reader036/viewer/2022062315/56815b1b550346895dc8cbe9/html5/thumbnails/10.jpg)
www.cs.wisc.edu/condor/mw
You Must Write 3 Classes
Subclasses of …MWDriver
MWTask
MWWorker
Master exe
Worker exe
![Page 11: Master-Worker Tutorial Condor Week 2006](https://reader036.vdocuments.site/reader036/viewer/2022062315/56815b1b550346895dc8cbe9/html5/thumbnails/11.jpg)
www.cs.wisc.edu/condor/mw
Your_MWTask
› Subclass MWTask
› Data members for inputs
› Data member for results
› Serialization of inputs and results
› Distinct instances on each side
![Page 12: Master-Worker Tutorial Condor Week 2006](https://reader036.vdocuments.site/reader036/viewer/2022062315/56815b1b550346895dc8cbe9/html5/thumbnails/12.jpg)
www.cs.wisc.edu/condor/mw
The Four Task Methods
› void MyTask::pack_work(void);
› void MyTask::unpack_work(void);
› void MyTask::pack_results(void);
› void MyTask::unpack_results(void);
› Also ctor/dtor!
![Page 13: Master-Worker Tutorial Condor Week 2006](https://reader036.vdocuments.site/reader036/viewer/2022062315/56815b1b550346895dc8cbe9/html5/thumbnails/13.jpg)
www.cs.wisc.edu/condor/mw
RMComms
› Abstraction for communication• (and some other stuff…)
› RMC->pack(int *array, int length);
› RMC->unpack(int *array, int length);
![Page 14: Master-Worker Tutorial Condor Week 2006](https://reader036.vdocuments.site/reader036/viewer/2022062315/56815b1b550346895dc8cbe9/html5/thumbnails/14.jpg)
www.cs.wisc.edu/condor/mw
MWWorker
› Just one method:
› executeTask(MWTask *t)
› Also ctor/dtor!
![Page 15: Master-Worker Tutorial Condor Week 2006](https://reader036.vdocuments.site/reader036/viewer/2022062315/56815b1b550346895dc8cbe9/html5/thumbnails/15.jpg)
www.cs.wisc.edu/condor/mw
MWDriver
› get_userinfo(int argc, char **argv) RMC->add_executable(char *exe, char *requirements);
› setup_initial_tasks(int num_tasks, MWTask ***init_tasks)
› act_on_completed_task(MWTask *t) RMC->add_task(MWTask *t)
› Also ctor/dtor
![Page 16: Master-Worker Tutorial Condor Week 2006](https://reader036.vdocuments.site/reader036/viewer/2022062315/56815b1b550346895dc8cbe9/html5/thumbnails/16.jpg)
www.cs.wisc.edu/condor/mw
Putting it all together:new_skel
› ./new_skel MY_PROJECT
› Use configure –help for options
› make
![Page 17: Master-Worker Tutorial Condor Week 2006](https://reader036.vdocuments.site/reader036/viewer/2022062315/56815b1b550346895dc8cbe9/html5/thumbnails/17.jpg)
www.cs.wisc.edu/condor/mw
Debugging with Independent Mode
› Special RMComm for debugging
› Single process, can run under gdb
![Page 18: Master-Worker Tutorial Condor Week 2006](https://reader036.vdocuments.site/reader036/viewer/2022062315/56815b1b550346895dc8cbe9/html5/thumbnails/18.jpg)
www.cs.wisc.edu/condor/mw
Running on the Grid…
› Just launch the appropriate master
› condor_q to see it in action
![Page 19: Master-Worker Tutorial Condor Week 2006](https://reader036.vdocuments.site/reader036/viewer/2022062315/56815b1b550346895dc8cbe9/html5/thumbnails/19.jpg)
www.cs.wisc.edu/condor/mw
Advice for Large Runs
› Use personal condor Flock, glide-in, schedd-on-side,
hobblein
› Use checkpointing!
› Set_worker_increment high
![Page 20: Master-Worker Tutorial Condor Week 2006](https://reader036.vdocuments.site/reader036/viewer/2022062315/56815b1b550346895dc8cbe9/html5/thumbnails/20.jpg)
www.cs.wisc.edu/condor/mw
User-level Checkpointing
› MWTask::write_chkpt_info(FILE *)
› MWTask::read_chkpt_info(FILE *)
› MWDriver::read_master_state(FILE *)
› MWDriver::write_master_state(FILE *)
![Page 21: Master-Worker Tutorial Condor Week 2006](https://reader036.vdocuments.site/reader036/viewer/2022062315/56815b1b550346895dc8cbe9/html5/thumbnails/21.jpg)
www.cs.wisc.edu/condor/mw
Example codes with MW
› Matmul
› Blackbox
› knapsack
![Page 22: Master-Worker Tutorial Condor Week 2006](https://reader036.vdocuments.site/reader036/viewer/2022062315/56815b1b550346895dc8cbe9/html5/thumbnails/22.jpg)
www.cs.wisc.edu/condor/mw
MW Philosophy
› Reuse either code or concept
› Key idea: Late binding
![Page 23: Master-Worker Tutorial Condor Week 2006](https://reader036.vdocuments.site/reader036/viewer/2022062315/56815b1b550346895dc8cbe9/html5/thumbnails/23.jpg)
www.cs.wisc.edu/condor/mw
Other resources
› http://www.cs.wisc.edu/condor/mw
› Online manual
› MW-users mailing list
![Page 24: Master-Worker Tutorial Condor Week 2006](https://reader036.vdocuments.site/reader036/viewer/2022062315/56815b1b550346895dc8cbe9/html5/thumbnails/24.jpg)
www.cs.wisc.edu/condor/mw
Thank You!
Questions?
MW Home page: http://www.cs.wisc.edu/condor/mw