debugging cluster programs using symbolic debuggers

34
Debugging Cluster Programs using symbolic debuggers

Upload: myra-preston

Post on 26-Dec-2015

245 views

Category:

Documents


3 download

TRANSCRIPT

  • Slide 1
  • Debugging Cluster Programs using symbolic debuggers
  • Slide 2
  • Debugging Code Careful review of your code Add debugging code to your code print statements at strategic locations in code remove later Use a symbolic debugger
  • Slide 3
  • Careful review of your code Rereading your code is often helpful Most parallel code errors are serial errors Compare your code to specs Take a break, review your code with a fresh brain Have someone else help you review your code
  • Slide 4
  • Common sources of errors Beyond what the compiler catches Usually run-time errors Incorrect use of pointers Point out of memory Reference should have used a pointer Referenced wrong variable Index initialized wrong, wrong exit condition
  • Slide 5
  • Common parallel errors Deadlock errors Receive before send Receive, but no send Incorrect arguments in MPI calls Mismatch on tags Mismatch of source/destination Misunderstanding of a the use of an argument
  • Slide 6
  • Add Debugging Code Add strategically placed code in your code to display critical information Watch values of variables as the program progresses Can create data-dump functions call when you need them Have a way to remove them in production code
  • Slide 7
  • Add Debugging Code Can be difficult to get the right debugging code in the right place Does not scale well in parallel environment Can produce unmanageable or unintelligible output
  • Slide 8
  • Symbolic Debuggers Allow you to inspect your code monitor its behavior modify the data values on the fly as your code executes
  • Slide 9
  • gdb GNU debugger
  • Slide 10
  • Frequently used GDB commands: break [file:]function - Set a breakpoint at function (in file). run [arglist] - Start your program (with arglist, if specified). bt - Backtrace: display the program stack. print expr - Display the value of an expression. c - Continue running your program (after stopping, e.g. at a breakpoint). next - Execute next program line (after stopping); step over any function calls in the line. step - Execute next program line (after stopping); step into any function calls in the line. help [name] - Show information about GDB command name, or general information about using GDB. quit - Exit from GDB.
  • Slide 11
  • gdb
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Running in X-windows Linux (Unix) to Linux ssh to host, login and enter X application Other platforms (Windows, Mac) Use X- windows server application VNC in most platforms VNC operates as a remote control application in Linux VNC operates as a X-windows server viewer for Windows, Macintosh, Solaris
  • Slide 16
  • Running in X-windows Using VNC ssh to host and login start vncserver pay attention to display id (:n) from your desktop run VNCViewer select host with correct display id After session kill vncserver vncserver kill :n (n is display id number)
  • Slide 17
  • Using VNC
  • Slide 18
  • Slide 19
  • x desktop with VNC
  • Slide 20
  • ddd a graphic front end to gdb
  • Slide 21
  • pgdbg Debugger from the Portland Group (PGI) Can use with PG compilers Can use with GNU compilers
  • Slide 22
  • pgdbg common commands Back to text mode for a bit lis[t] [count | low:high | routine | line,count] -display lines from the source code file or routine att[ach] [ | ] - attach to a running process or start a local executable and attach to it, or start an executable on c[ont] - continue executing from the current location
  • Slide 23
  • pgdbg common commands det[ach] detach from the currently attached process halt halt the executing process or thread n[ext] [count] continue executing and stop after count lines of source code nexti [count] continue executing and stop after count instructions
  • Slide 24
  • pgdbg common commands q[uit] terminate pgdbg and exit ru[n] [arg0 arg1 argn] run program from beginning with arguments arg0, arg1 s[tep] [count] execute next count lines of source code and stop. Step steps into called routines s[tep] up steps out of current routine stepi [count] execute next count instructions and stop. Steps into called routines
  • Slide 25
  • pgdbg common commands stepi up steps out of current routine and stops Event command break line | function - sets a break point to specified line or function. If no line or function specified lists existing breakpoints. A break point stops execution at specified point clear [all | line | func] clears all breakpoints, or a breakpoint at line line or at function func.
  • Slide 26
  • pgdbg common commands stop var - break when the value of var changes at a location watch expr stops and display the value of expr when it changes track expr like watch except does not stop execution trace var - displays a trace of source line execution when the value of var changes
  • Slide 27
  • pgdbg common commands p[rint] var displays the value of a variable edit filename evokes an editor to edit file filename. If no filename given edits current file decl[aration name displays the type declaration for the object name as[ign] var = expr - assigns the value expr to the variable var proc [number] sets the current process to process number number
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Resources gdb man gdb info gdb; Using GDB: A Guide to the GNU Source- Level Debugger, Richard M. Stallman and Roland H. Pesch, July 1991. ddd man ddd VNC http://www.uk.research.att.com/vnc/ http://www.realvnc.com
  • Slide 33
  • Resources PGI Debugger Users Guide http://www.pgroup.com/ppro_docs/pgdbg_ug/PGDBG4.htm http://www.pgroup.com/ppro_docs/pgdbg_ug/PGDBG4.htm PGI Users Guide, PGI 4.1 Release Notes, FAQ, Tutorials http://www.pgroup.com/docs.htm MPI-CH http://www.netlib.org/ OpenMP http://www.openmp.org/ HPDF (High Performance Debugging Forum) Standard http://www.ptools.org/hpdf/draft/intro.html
  • Slide 34