debugging cluster programs
DESCRIPTION
Debugging Cluster Programs. using symbolic debuggers. Debugging Code. Careful review of your code Add debugging code to your code print statements at strategic locations in code remove later Use a symbolic debugger. Careful review of your code. Rereading your code is often helpful - PowerPoint PPT PresentationTRANSCRIPT
Debugging Cluster Programs
usingsymbolic debuggers
Debugging Code
Careful review of your codeAdd debugging code to your code print statements at strategic locations in code remove later
Use a symbolic debugger
Careful review of your code
Rereading your code is often helpfulMost parallel code errors are serial errorsCompare your code to specsTake a break, review your code with a fresh brainHave someone else help you review your code
Common sources of errors
Beyond what the compiler catches Usually run-time errors
Incorrect use of pointers Point out of memory Reference should have used a pointer
Referenced wrong variableIndex initialized wrong, wrong exit condition
Common parallel errors
Deadlock errors Receive before send Receive, but no send
Incorrect arguments in MPI calls Mismatch on tags Mismatch of source/destination Misunderstanding of a the use of an
argument
Add Debugging Code
Add strategically placed code in your code to display critical informationWatch values of variables as the program progressesCan create data-dump functions – call when you need themHave a way to remove them in production code
Add Debugging Code
Can be difficult to get the right debugging code in the right placeDoes not scale well in parallel environmentCan produce unmanageable or unintelligible output
Symbolic Debuggers
Allow you to – inspect your code monitor its behavior modify the data values
on the fly – as your code executes
gdb – GNU debugger
Frequently used GDB commands:
break [file:]function - Set a breakpoint at function (in file).
run [arglist] - Start your program (with arglist, if specified).
bt - Backtrace: display the program stack.
print expr - Display the value of an expression.
c - Continue running your program (after stopping, e.g. at a breakpoint).
next - Execute next program line (after stopping); step over any function calls in the line.
step - Execute next program line (after stopping); step into any function calls in the line.
help [name] - Show information about GDB command name, or general information about using GDB.
quit - Exit from GDB.
gdb
Running in X-windows
Linux (Unix) to Linux ssh to host, login and enter X application
Other platforms (Windows, Mac) – Use X-windows server applicationVNC in most platforms VNC operates as a remote
control application in Linux VNC operates as a X-windows server viewer for Windows, Macintosh, Solaris
Running in X-windows
Using VNCssh to host and loginstart vncserver pay attention to display id (:n)
from your desktop run VNCViewer select host with correct display id
After session kill vncserver – vncserver –kill :n (n is display id
number)
Using VNC
x desktop with VNC
ddd – a graphic front end to gdb…
pgdbg
Debugger from the Portland Group (PGI)Can use with PG compilersCan use with GNU compilers
pgdbg – common commands
Back to text mode for a bitlis[t] [count | low:high | routine | line,count]
-display lines from the source code file or routine
att[ach] <pid> [<exe> | <exe> <host>]
- attach to a running process <pid> or start a local executable and attach to it, or start
an executable <exe> on <host>
c[ont] - continue executing from the current location
pgdbg – common commands
det[ach] – detach from the currently attached process
halt – halt the executing process or thread
n[ext] [count] – continue executing and stop after count lines of source code
nexti [count] – continue executing and stop after count
instructions
pgdbg – common commands
q[uit] – terminate pgdbg and exit
ru[n] [arg0 arg1 … argn] – run program from beginning with arguments arg0, arg1…
s[tep] [count] – execute next count lines of source code and stop. Step steps into
called routines
s[tep] up – steps out of current routine
stepi [count] – execute next count instructions and stop. Steps into called routines
pgdbg – common commands
stepi up – steps out of current routine and stops
Event command –
break line | function - sets a break point to specified line or function. If no line or function specified lists existing breakpoints. A break point stops execution at specified point
clear [all | line | func] – clears all breakpoints, or a breakpoint at line line or at function func.
pgdbg – common commandsstop var - break when the value of var changes at a
location
watch expr – stops and display the value of expr when it changes
track expr – like watch except does not stop execution
trace var - displays a trace of source line execution when the value of var changes
pgdbg – common commands
p[rint] var – displays the value of a variable
edit filename – evokes an editor to edit file filename. If no filename given edits current file
decl[aration name – displays the type declaration for the object name
as[ign] var = expr - assigns the value expr to the variable var
proc [number] – sets the current process to process number number
Resources
gdb man gdb info gdb; Using GDB: A Guide to the GNU
Source- Level Debugger, Richard M. Stallman and
Roland H. Pesch, July 1991.
ddd man ddd
VNC http://www.uk.research.att.com/vnc/ http://www.realvnc.com
Resources
PGI Debugger User’s Guide http://www.pgroup.com/ppro_docs/pgdbg_ug/PGDBG4.htmPGI Users Guide, PGI 4.1 Release Notes, FAQ, Tutorials http://www.pgroup.com/docs.htmMPI-CH http://www.netlib.org/ OpenMP http://www.openmp.org/ HPDF (High Performance Debugging Forum) Standard http://www.ptools.org/hpdf/draft/intro.html