I use import ipdb; ipdb.set_trace() in python almost pathologically. It’s 1 step up from an exception cause it drops you to a REPL, which is super convenient.
I’ve found the interactivity of the IPython debugger to be slightly better than the built-in one, but nearly all of my codebase is pinned to Python 3.5 for now.
Use export PYTHONBREAKPOINT=ipdb.set_trace where ipdb is available and you don’t have to change your code for different environments. (at least once you’re on 3.7)
For big monolithic web apps: if finding the backend code for an operation is less obvious than it should be, find some method that will almost always be called (in our app it’s a particular method that gets the user context for a request, but it could be various things) and breakpoint it.
If there’s a piece of code that’s not doing what you expect, add a loop around it whose condition is a variable that can be modified by the debugger. The resulting binary can be effectively unchanged such that the loop executes once, but if a debugger is present and you see the strange behavior, you can go back and step through it.
1a. This can also be done by modifying the instruction pointer, but that requires restoring register state. I do this too but it’s more tricky.
The unused portion of the stack often contains clues about what happened recently.
Break on access (ba in WinDbg) uses CPU registers to break on a memory address when it’s used. This is wonderful for tracking how state moves across a system, or tracking when locks are acquired and released, etc.
Using breakpoints with conditions allows things like breaking on a common thing, like opening a file, printing out the state of the operation, and resuming execution, allowing interesting conditions to be logged.
While very common, it’s also useful to get a debug memory allocator that can do things like mark pages as invalid on free (to generate access violations on use after free), or add a guard page after a memory allocation (to cause access violations on memory overruns.) A good one of these can also keep around allocation information after free for debugging, such as the stack that freed memory, and track the number of allocations to detect leaks.
Altering thread priorities in the debugger is one way to promote the code you’re trying to look at and/or demote code that’s interfering with you.
If you have a race condition, thoughtfully adding sleep in the right place can provoke it allowing it to be debugged and understood.
If code aborts in ways you don’t expect, add a breakpoint to the exception filter (which decides which handlers to execute.) This executes when the full stack that caused the exception to be raised is present.
Further to the previous comments about patching code, if you’re stepping through code and a branch goes somewhere you don’t want, change the instruction pointer. If you want that branch to happen again, change the condition jump to an unconditional jump.
In WinDbg, dps on import tables or import table entries to show where indirect calls will go. This is useful when something else is trying to hijack your code.
Keep an in memory circular log of interesting cases in your code. This log often doesn’t need to be big, and doesn’t need to allocate on insert, but if something bad happens in your code you can dump the interesting cases that were hit recently.
If there’s a piece of code that’s not doing what you expect, add a loop around it whose condition is a variable that can be modified by the debugger. The resulting binary can be effectively unchanged such that the loop executes once, but if a debugger is present and you see the strange behavior, you can go back and step through it
You don’t need this if you have a reverse debugging tool. Gdb has basic support, but I usually use rr.
Reverse debugging isn’t supported universally (unfortunately) because of it’s implementation details, for example rr can’t run on a variety of apps (including the one I work on every day) because it depends on emulating syscalls, and there are a bunch which are impossible / hard to implement. See: https://github.com/mozilla/rr/issues/1053 There are also contexts where reverse debugging doesn’t work such as in kernel mode on many systems, or in other “special” environments.
Reverse debugging is incredibly useful. Unfortunately, even when using gdb, there are situations when you can not reverse debug due to lack of support from the target. This would be the case for e.g. all embedded development.
Break on access (ba in WinDbg) uses CPU registers to break on a memory address when it’s used
GDB / LLDB call this watchpoints. These are most useful in WinDbg if you’ve got a deterministic test failure, because you can turn on tracing, go to the crash, set the watchpoint, and then run the program in reverse.
If code aborts in ways you don’t expect, add a breakpoint to the exception filter
On *NIX, the equivalent is adding a signal handler.
Keep an in memory circular log of interesting cases in your code
I can’t overstate how useful this is. In Verona, we have a flight recorder like this (which records format strings and associated values so you can do printf-like things in the replay but don’t need to do expensive printf-like things in normal operation). This is made even more useful by systematic testing, where the runtime provides deterministic scheduling of threads, driven from a pseudo-random number generator with a provided seed. This is incredibly useful, because if we get a failure in CI then we can always reproduce it locally. I miss that facility a lot when I use any other environment.
Most people learn to use debuggers as a observation tool, and don’t normally take it to the next level of using it to alter the system to force or fix a problem. One example of this, is patching instructions in the debugger while you are debugging. For instance to patch an instruction to a NOP in WinDbg (the windows systems level debugger), you can use the eb command. See reference.
0:070> eb 000000006a4dd058 90
Searching the memory address space for specific patterns is also an underused technique in my experience. Useful for finding hints when investigating a memory corruption. In WinDbg the following command sequence will search the address space for any memory locations pointing to 0x2e78042940 using the s command, and then dump some information about each matching address using the !address command:
On a similar note, WinDbg’s dps command can be used to symbolize vtables or function pointers in a memory region, another tool in the toolbox for debugging memory corruptions.
See reference.
Debugging works best if you approach it with the scientific method: form a hypothesis, form an experiment that will disprove the hypothesis. Repeat until the bug is found.
Also keep in mind the words of Sherlock Holmes: “When you have eliminated the impossible, whatever remains, however improbable, must be the truth.”
I view it the same way and use the same Holmes quote!
One of the best things about treating it this way is when pairing - it’s often handy to pair while debugging - you can double check you both agree with the hypothesis after you change the code each time. Helps keep everyone on the same page.
Not necessarily a technique, but I brushed off using a debugger for way too long. “if I need a debugger to understand the code then it was too complex to begin with”, “printfs are fine”, etc.
Using a debugger with disassembly and a memory view greatly increased my productivity. It’s also very useful to learn how to use features past simple breakpoints - ie. conditional breakpoints and watch variables.
A thing I see happening over and over again is that there are two camps of people: Those who only use debuggers and those who only use printing, logging, tracing etc.
Typically the arguments of the first group are that “printf debugging” is not professional and inefficient. The latter group often says that they don’t want to learn another tool or that the need to use a debugger is a warning sign of a hard to understand code base.
I politely disagree with both camps. The two tools have different use cases and there are areas where it is hard to operate with only one but not the other.
In my opinion, debuggers shine for two things. The first is state visualization. Given a certain point in the execution of your program, a debugger will show you the state of the program. When I hear that a program segfaults, the state at the segfault is the first thing I want to see. A debugger will not tell me how the program got into this state*. To find that out, I will resort to reasoning, strategically adding print/log/trace statement, or, if simple, breakpoints to catch conditions before the program crashes. The second use case is working with an unfamiliar code base. Debuggers are good at answering a question like “where is this function called from?” or “is this code ever executed?” without the need to change the program.
Prints/log/traces shine for transient visualization. Say you are debugging an FIR filter that starts to oscillate. Breaking into the code with a debugger when oscillating will probably be useless, because while now you see the system in a misbehaving state, you don’t know how it got there. Stepping might be an ineffective method, because the error conditions builds up over thousands of runs of the filter. Printing information as you go can give you a good idea how a system is moving into an undesirable state. When you are adding prints, as a bonus, you can add additional logic. For instance, you can check for certain conditions, print marking message like “here it looks like an accumulator overflowed” or print data with a compact machine representation in an easily human parsed shape that will allow you to find spurious behavior in a cursory glance.
In summary, debuggers and prints have an overlap in their functionality, but I feel it’s most productive to combine the two in an intertwined way of working.
Barring reverse debugging which I can often not use due to me working with embedded systems.
Minimize the amount of manual steps required to reproduce the bug. If it’s buried deep in a chain of user actions, build a scaffold that gets you to that point for you. Minimize the time taken to test hypotheses, too.
If the bug involves a lot of different files, put their names in a single file or add a tracking comment (like #ZKGKL) to make jumping between contexts easier.
Learn how to use git bisect.
EDIT: also use formal methods, laughing at myself for forgetting that one
A python script is crashing and you aren’t even close to figure out where it’s crashing and why?
python3 -m pdb -c continue <script.py>
Will start your script and continue until it reaches an exception. Once the exception is raised it will drop you into the pdb shell for you to look around the backtracke and partial listing, and inspecting and printing variables.
in bash scripts, using set -x to emit all of the commands that are being executed (and turning it off with set +x). Lets you quickly figure out what is wrong with the script.
GDB scripting in Python. Well, I wouldn’t say it was “entirely too long”, because I think the support got better recently. Someone has been quietly doing some fantastic work on the Python API in the last few years. GDB is really capable (although still kinda hard to use)
A pretty cool use case is pretty printing algebraic data types via the Zephyr ASDL compiler!
In the generated code, every type has an integer tag to tell you which variant it is, much the way say OCaml is implemented. And then I load that into GDB, and cast the C++ base class (for the sum type) to the right subclass (for the variant).
I love to use a bit of scripting to dump out call stacks, in-memory data structures, process/thread trees, anything “structure”-like to graphviz and print out reference diagrams when they don’t already exist in docs.
Remote debugging is a super-power for anything low-level…it can be very helpful to have your debugging session isolated on a separate machine when your code under test can blow away the whole system if things go awry. You’ll likely lose the connection, but still have your state intact.
Canary deploys. When your metrics spike for some unknown reason; and you want to replicate the same aspects of production, just deploy to one of our nodes, provided you have a load balancer. That way you can put up an experimental fix without having to set up complex test infrastructure that replicates a production load. Really useful because replicating the same signature as production is difficult.
reverse the stack (verify unit test, integration test, manual test)
Sometimes if you get a nice enough stack trace you can jump to 4 and start fixing the code, and sometimes 1 is a little impractical due to lack of real data so you have to do some guesswork to achieve 2.
Also, FWIW when I say integration test I usually mean something in the language’s unit testing framework that exercises the “main” function or “handler” so it’s usually all still in-memory and single process, but it may have sockets and do file I/O.
combatting a wonky build/delivery process with very explicit output that you change whenever you rebuild (I would literally write “alert(1003)…alert(1005)…alert(1009) with the current hour/min as a 4 digit number
reifying a process when it becomes too complicated to trace through/understand. So, instead of code that does x, then y, then z, produce an object that says “I’m going to do x, then y, then z”. This makes it easier to distinguish between “I did the right thing, but did it wrong” and “I chose to do the wrong thing.”
Writing repeatable test cases early. Even if you must do manual testing, make it a matter of following a precise script.
For really complex stuff, writing custom tooling to help gain more insight in what the program is doing.
For example, for a really complex query compiler at work which handles arbitrary export file formats straight from the database, in Django, I wrote a little management command which could dump the generated query so I don’t have to jump through hoops to get to the query via the web UI and adding print statements in my code.
As another example, while working on a regex engine which compiles to an NFA and then a DFA, I wrote a little script which can output the state machine by generating an image via graphviz. You’d think I had learned the value of such tools by now, but I still don’t do this often enough.
The debugging technique that took me entirely too long to learn is that you can write code that helps you debug that is not “payload code” of the program you want to write. Most typically this means writing tools that interact with the payload code, for example to be able to reproduce a bug with one keypress instead of through a complex manual and error prone interaction with your system. It might also mean including non-payload logic in a program to make it easier to work with it in a debugger.
I can’t stress enough the power of using code to debug your code. After all, if a Turing complete language is the right tool to write your payload program, why limit yourself to anything less powerful when getting your program correct?
And then a couple of words about printing and debuggers. It didn’t take me too long to learn these two techniques, but too long to see when to use one or the other and how to combine them:
Printing (or logging/tracing) is really good at showing transients in your code. They show you the path to a system state, but you can’t easily explore system state without adding instrumentation everywhere. It took me too long to gain the confidence that printing is not used “by hacky programmers”, as some detractors say, but is a valuable and powerful tool for some problems.
A debugger visualizes the state of your program at a certain point in time. It is also good at getting to know a codebase that you are unfamiliar with. While it may be associated with “big IDEs” like VS or Java IDEs, debuggers are not “a tool for those who have no clue”, as some detractors may say, but a valuable and powerful tool for some problems.
Also, whittling down input or code to the smallest possible case that tickles the bug, both for easier debugging (no distractions) and to get a nice clean repro case for a regression test. Always make regression tests!
By the way, this question reminded me of this old book which is still pretty good because it focuses on the high level fundamentals.
For bash, set -x for n00bs like myself is a must! This prints the current line being executed.
When I am dealing with compiled code, usually C++ or C, gdb has been a real life saver, especially setting breakpoints when some condition is true, such as when some parameter to a system call matches some particular value.
Nowadays for native executables, I am finding myself using bpftrace quite often, in some cases instrumenting several functions that I am interested in, either in userspace or the kernel. This has been great from a troubleshooting (fancy, not code modifications needed[1] printf debugging), performance debugging (i.e. latency of a function), and education points of view (which code paths are ran?)! I have sparingly used XRay but it’s very neat as well!
Hope this is somewhat useful! :)
[1]: While you are not required to recompile / re-run your code the code does indeed change, as it is patched at runtime.
This was absolutely critical for debugging the network code in sneakysnake.io. I put the client and server in the same binary, rendered each to a frame buffer, and rendered the inverse difference between those frame buffers. Any time the client and server became desychronized, it showed as non-white on the diff. From there, I was able to simulate different amounts of latency and packet loss and see how that affected synchronization.
I use strace -p <PID> for debugging a web server on a Linux machine. Especially for operations that explicitly involve the file system and see which files are being read.
I used to debug complex Postgres queries, which involved looking at the results of CTEs and subqueries, and it took me quite a while to realise that I could automate the process and generate all intermediate results with a tool which I eventually wrote: https://korban.net/postgres/pgdebug/
Debugging stored procedures in MSSQL. It took a long time to set it up, and once I had it set up, I still couldn’t see the records in a table variable. If you want that your choices are save it to a new table in the database, or dump the table variable to xml and parse the string.
SET @XMLSTR = (SELECT * FROM @TABLE_VAR FOR XML AUTO)
Actually using one. I started coding where there was no such thing. Then the only option was gdb and I was using C++ on Linux. So there may as well have been no such thing.
Then I used VB6 and you could see exactly what was going on and make tweaks at runtime. A revelation! Similar with .NET, though mods at runtime hardly ever worked - not sure why they seemed to be disabled every time I went to try.
I started using OzCode though, which was amazing. If you write C#, check it out. Searching through data in a large data structure in RAM and flagging what you want to be shown to you as you step through… it’s the sort of thing that feels like it should be part of the IDE, like ReSharper does (and kind of is now).
I’ve had varying levels of ‘success’ with ReSharper. When it plays nicely and doesn’t grind things to a halt, it feels like I have some minor superpowers.
Yes OzCode brings the ability to debug to code using LINQ. The laziness and magic of LINQ plus EF can become a nightmare pretty quickly.
Deleting tests or parts of an application to narrow down where a bug might be. If you have a VCS that supports local branches, it’s totally fine to go WILD with a codebase as long as your changes don’t get merged to your main branch. Deleting a bunch of passing tests or initialization code can speed up the fix/compile/test loop, helping you find the bug quicker.
This also applies if you have a CI system where the CI configuration is stored in a file such as .travis.yml or a Jenkinsfile. Go ahead and rip stuff out of there if it’s in your way, just make sure you revert those changes before creating a PR.
Use a reverse shell to get shell access to any system (trying to debug a jenkins pipeline? Want to look at a file downloaded into a Kubernetes container somewhere?)
Obviously only use it to look at things your allowed to look at, but it’s happened quite a few times that I owned and was allowed to do whatever I wanted with a system, yet getting shell access to it was difficult.
I’ve found that developers new to web development are very dependant on a debugger. Devs who have a lot of experience in Java or .net seem to expect the same level of debugging experience.
My tips:
Use console.log
Console.log can take objects as arts and dump them right to your console
3.debugger; if you really need to debug, you’ve got it built into your web browser Dev console. Debugger triggers it.
This isn’t rocket science, but I’ve seen enough experienced devs get into full stack and really struggle with js debugging.
Logging > debugging
I’m not sure which book it was (coders at work?) but I discovered a lot of well known devs were just using logging statements. I’ve seen a lot of devs get twisted up in their own code with debuggers. A breakpoint on every line is a sure sign that you’re in trouble.
On Linux, bpf and bpftrace are very powerful tools for getting a fuller picture of what’s going on not only in your high level language (e.g. C# or Java) but also it’s runtime/VM and the kernel underneath.
These are also goid for general system debugging. Their primary values are their performance (can be run in production) and the way they reveal everything going on in the system, down to the level of individual function calls/returns. Awesome stuff.
This is more meta, but I somehow often end up debugging integration tests or testsets which take very long to run, and are hard to debug because of their size. It still feels like I’m missing some trick to do this.
My current workflow is:
Start testsuite
Wait (can’t use the repo at this point) for almost an hour. Hope there is non-coding work to do, can’t use repo at this point.
Cry tears of joy if there are no failing testcases. Else, pick a failing testcase
Replicate the failure manually by replicating the behavior of the test script (in an in-house language, hundreds or thousands of lines long) – might take a while
Debug by using printfs (there is no debugger for the platform I’m developing for yet) – this is usually the easy part
At my last job there we debugged the code by comparing it to an Excel sheet. Somehow Excel was faster than the model itself (I fixed this in the end), and the model took 1+ hour to run. Unfortunately the Excel sheet contained many errors.
I feel a similar pain. I work with a lot of long running data pipelines. Since all of the steps are interconnected a failure in a later step might be caused by data in an earlier step. As a result, I might have to rerun a pipeline from the beginning to make sure that the issue is fixed.
To debug, I’ll often rerun a single partition of the pipeline locally, sometimes on a subset of the data. It ends up being quite a bit faster that way, but if a step takes a while then it makes it tricky to work on anything else.
Least effort conditional breakpoints, not sure if this is really useful outside of games though
Add bool break1, break2, break3, break4; somewhere, extern bool break1; etc in some base header, and set them to be true when F1-4 are pressed every frame. Then drop if( break1 ) __debugbreak(); in some code that’s not working, set up the conditions so you hit the bug, then hit F1
Common Lisp: making long-running apps and services start a Swank listener. Then if something bad happens I can connect to them from SLIME, and have a full dev environment and REPL available, plugged into the running instance.
One example of this is live hacking on a running instance of the StumpWM window manager:
Wanna quickly figure out how you got to a particular point in a foreign codebase? Throw an exception there.
I use
import ipdb; ipdb.set_trace()
in python almost pathologically. It’s 1 step up from an exception cause it drops you to a REPL, which is super convenient.With 3.7 we now have
breakpoint()
as a built-in.I’ve found the interactivity of the IPython debugger to be slightly better than the built-in one, but nearly all of my codebase is pinned to Python 3.5 for now.
Use
export PYTHONBREAKPOINT=ipdb.set_trace
where ipdb is available and you don’t have to change your code for different environments. (at least once you’re on 3.7)I like to use IPython’s embed to achieve a similar goal.
Add in a call to
embed()
as a breakpoint and find yourself in a full-blown IPython session with your objects in scope.also,
ptpython
I use
binding.pry
the same way in Ruby. Requires thepry
gem, but it’s well worth it.I’ve been using pudb in the same manner. It’s got a TUI and it’s simple to jump into the REPL from there.
Ooh, pudb looks really cool! Might switch to using that
Yeah, unless there’s a try catch higher in the callstack consuming everything.
Or unless you are in a minified React component with no source map.
Long time before I learned the
debugger;
statement in JavaScript. Still seems unusual.I use this all the time but always feel a little dirty and imposterish. I’m glad my method has been validated by a stranger on an internet board!
For big monolithic web apps: if finding the backend code for an operation is less obvious than it should be, find some method that will almost always be called (in our app it’s a particular method that gets the user context for a request, but it could be various things) and breakpoint it.
So many:
You don’t need this if you have a reverse debugging tool. Gdb has basic support, but I usually use rr.
Reverse debugging isn’t supported universally (unfortunately) because of it’s implementation details, for example rr can’t run on a variety of apps (including the one I work on every day) because it depends on emulating syscalls, and there are a bunch which are impossible / hard to implement. See: https://github.com/mozilla/rr/issues/1053 There are also contexts where reverse debugging doesn’t work such as in kernel mode on many systems, or in other “special” environments.
Reverse debugging is incredibly useful. Unfortunately, even when using gdb, there are situations when you can not reverse debug due to lack of support from the target. This would be the case for e.g. all embedded development.
GDB / LLDB call this watchpoints. These are most useful in WinDbg if you’ve got a deterministic test failure, because you can turn on tracing, go to the crash, set the watchpoint, and then run the program in reverse.
On *NIX, the equivalent is adding a signal handler.
I can’t overstate how useful this is. In Verona, we have a flight recorder like this (which records format strings and associated values so you can do printf-like things in the replay but don’t need to do expensive printf-like things in normal operation). This is made even more useful by systematic testing, where the runtime provides deterministic scheduling of threads, driven from a pseudo-random number generator with a provided seed. This is incredibly useful, because if we get a failure in CI then we can always reproduce it locally. I miss that facility a lot when I use any other environment.
Most people learn to use debuggers as a observation tool, and don’t normally take it to the next level of using it to alter the system to force or fix a problem. One example of this, is patching instructions in the debugger while you are debugging. For instance to patch an instruction to a NOP in WinDbg (the windows systems level debugger), you can use the
eb
command. See reference.Searching the memory address space for specific patterns is also an underused technique in my experience. Useful for finding hints when investigating a memory corruption. In WinDbg the following command sequence will search the address space for any memory locations pointing to 0x2e78042940 using the
s
command, and then dump some information about each matching address using the!address
command:On a similar note, WinDbg’s
dps
command can be used to symbolize vtables or function pointers in a memory region, another tool in the toolbox for debugging memory corruptions. See reference.Debugging works best if you approach it with the scientific method: form a hypothesis, form an experiment that will disprove the hypothesis. Repeat until the bug is found.
Also keep in mind the words of Sherlock Holmes: “When you have eliminated the impossible, whatever remains, however improbable, must be the truth.”
I view it the same way and use the same Holmes quote!
One of the best things about treating it this way is when pairing - it’s often handy to pair while debugging - you can double check you both agree with the hypothesis after you change the code each time. Helps keep everyone on the same page.
Not necessarily a technique, but I brushed off using a debugger for way too long. “if I need a debugger to understand the code then it was too complex to begin with”, “printfs are fine”, etc.
Using a debugger with disassembly and a memory view greatly increased my productivity. It’s also very useful to learn how to use features past simple breakpoints - ie. conditional breakpoints and watch variables.
A thing I see happening over and over again is that there are two camps of people: Those who only use debuggers and those who only use printing, logging, tracing etc.
Typically the arguments of the first group are that “printf debugging” is not professional and inefficient. The latter group often says that they don’t want to learn another tool or that the need to use a debugger is a warning sign of a hard to understand code base.
I politely disagree with both camps. The two tools have different use cases and there are areas where it is hard to operate with only one but not the other.
In my opinion, debuggers shine for two things. The first is state visualization. Given a certain point in the execution of your program, a debugger will show you the state of the program. When I hear that a program segfaults, the state at the segfault is the first thing I want to see. A debugger will not tell me how the program got into this state*. To find that out, I will resort to reasoning, strategically adding print/log/trace statement, or, if simple, breakpoints to catch conditions before the program crashes. The second use case is working with an unfamiliar code base. Debuggers are good at answering a question like “where is this function called from?” or “is this code ever executed?” without the need to change the program.
Prints/log/traces shine for transient visualization. Say you are debugging an FIR filter that starts to oscillate. Breaking into the code with a debugger when oscillating will probably be useless, because while now you see the system in a misbehaving state, you don’t know how it got there. Stepping might be an ineffective method, because the error conditions builds up over thousands of runs of the filter. Printing information as you go can give you a good idea how a system is moving into an undesirable state. When you are adding prints, as a bonus, you can add additional logic. For instance, you can check for certain conditions, print marking message like “here it looks like an accumulator overflowed” or print data with a compact machine representation in an easily human parsed shape that will allow you to find spurious behavior in a cursory glance.
In summary, debuggers and prints have an overlap in their functionality, but I feel it’s most productive to combine the two in an intertwined way of working.
These improved my ability to understand programs using a debugger at least ten-fold. Expression evaluation is another significant feature I use.
I was very slow to pick up watchpoints. It took me fixing a bug in a debugger front-end to even understand what they were, and why they were useful.
#ZKGKL
) to make jumping between contexts easier.git bisect
.EDIT: also use formal methods, laughing at myself for forgetting that one
Using a notebook, and writing out what I’ve tried so far by hand.
The number of circles I can drive myself in without self reflection far outweigh any amazing gdb tricks I can paste here.
+1 to that. I treat it as some sort of scientific notebook and write down my hypotheses, experiments, results, etc.
A python script is crashing and you aren’t even close to figure out where it’s crashing and why?
python3 -m pdb -c continue <script.py>
Will start your script and continue until it reaches an exception. Once the exception is raised it will drop you into the pdb shell for you to look around the backtracke and partial listing, and inspecting and printing variables.
in bash scripts, using
set -x
to emit all of the commands that are being executed (and turning it off withset +x
). Lets you quickly figure out what is wrong with the script.GDB scripting in Python. Well, I wouldn’t say it was “entirely too long”, because I think the support got better recently. Someone has been quietly doing some fantastic work on the Python API in the last few years. GDB is really capable (although still kinda hard to use)
A pretty cool use case is pretty printing algebraic data types via the Zephyr ASDL compiler!
In the generated code, every type has an integer tag to tell you which variant it is, much the way say OCaml is implemented. And then I load that into GDB, and cast the C++ base class (for the sum type) to the right subclass (for the variant).
https://github.com/oilshell/oil/blob/master/devtools/oil_gdb.py
Not sure if that makes sense – I probably have to write a blog post about it. But it is a cool technique.
Debugging Oil in C++ is definitely way nicer than debugging it in Python because of the tools.
Asking the previous author of the code for an introduction.
I love to use a bit of scripting to dump out call stacks, in-memory data structures, process/thread trees, anything “structure”-like to graphviz and print out reference diagrams when they don’t already exist in docs.
Remote debugging is a super-power for anything low-level…it can be very helpful to have your debugging session isolated on a separate machine when your code under test can blow away the whole system if things go awry. You’ll likely lose the connection, but still have your state intact.
Canary deploys. When your metrics spike for some unknown reason; and you want to replicate the same aspects of production, just deploy to one of our nodes, provided you have a load balancer. That way you can put up an experimental fix without having to set up complex test infrastructure that replicates a production load. Really useful because replicating the same signature as production is difficult.
Here’s the approach that I polished over time.
Sometimes if you get a nice enough stack trace you can jump to 4 and start fixing the code, and sometimes 1 is a little impractical due to lack of real data so you have to do some guesswork to achieve 2.
Also, FWIW when I say integration test I usually mean something in the language’s unit testing framework that exercises the “main” function or “handler” so it’s usually all still in-memory and single process, but it may have sockets and do file I/O.
For really complex stuff, writing custom tooling to help gain more insight in what the program is doing.
For example, for a really complex query compiler at work which handles arbitrary export file formats straight from the database, in Django, I wrote a little management command which could dump the generated query so I don’t have to jump through hoops to get to the query via the web UI and adding print statements in my code.
As another example, while working on a regex engine which compiles to an NFA and then a DFA, I wrote a little script which can output the state machine by generating an image via graphviz. You’d think I had learned the value of such tools by now, but I still don’t do this often enough.
The debugging technique that took me entirely too long to learn is that you can write code that helps you debug that is not “payload code” of the program you want to write. Most typically this means writing tools that interact with the payload code, for example to be able to reproduce a bug with one keypress instead of through a complex manual and error prone interaction with your system. It might also mean including non-payload logic in a program to make it easier to work with it in a debugger.
I can’t stress enough the power of using code to debug your code. After all, if a Turing complete language is the right tool to write your payload program, why limit yourself to anything less powerful when getting your program correct?
And then a couple of words about printing and debuggers. It didn’t take me too long to learn these two techniques, but too long to see when to use one or the other and how to combine them:
Also, whittling down input or code to the smallest possible case that tickles the bug, both for easier debugging (no distractions) and to get a nice clean repro case for a regression test. Always make regression tests!
By the way, this question reminded me of this old book which is still pretty good because it focuses on the high level fundamentals.
Just 2 bits:
console.dir({x,y})
etc works great in ES6?+Slightly specific to the languages I usually use, but:
In Python I like to use the
-i
flag (or thePYTHONINSPECT
environment variable), which starts interactive mode when an exception is raised.For bash,
set -x
for n00bs like myself is a must! This prints the current line being executed.When I am dealing with compiled code, usually C++ or C, gdb has been a real life saver, especially setting breakpoints when some condition is true, such as when some parameter to a system call matches some particular value.
Nowadays for native executables, I am finding myself using
bpftrace
quite often, in some cases instrumenting several functions that I am interested in, either in userspace or the kernel. This has been great from a troubleshooting (fancy, not code modifications needed[1] printf debugging), performance debugging (i.e. latency of a function), and education points of view (which code paths are ran?)! I have sparingly used XRay but it’s very neat as well!Hope this is somewhat useful! :)
[1]: While you are not required to recompile / re-run your code the code does indeed change, as it is patched at runtime.
Visualize what you’re trying to debug. I learned this from Casey Muratori.
This was absolutely critical for debugging the network code in sneakysnake.io. I put the client and server in the same binary, rendered each to a frame buffer, and rendered the inverse difference between those frame buffers. Any time the client and server became desychronized, it showed as non-white on the diff. From there, I was able to simulate different amounts of latency and packet loss and see how that affected synchronization.
I use
strace -p <PID>
for debugging a web server on a Linux machine. Especially for operations that explicitly involve the file system and see which files are being read.I used to debug complex Postgres queries, which involved looking at the results of CTEs and subqueries, and it took me quite a while to realise that I could automate the process and generate all intermediate results with a tool which I eventually wrote: https://korban.net/postgres/pgdebug/
Debugging stored procedures in MSSQL. It took a long time to set it up, and once I had it set up, I still couldn’t see the records in a table variable. If you want that your choices are save it to a new table in the database, or dump the table variable to xml and parse the string.
SET @XMLSTR = (SELECT * FROM @TABLE_VAR FOR XML AUTO)
Actually using one. I started coding where there was no such thing. Then the only option was gdb and I was using C++ on Linux. So there may as well have been no such thing.
Then I used VB6 and you could see exactly what was going on and make tweaks at runtime. A revelation! Similar with .NET, though mods at runtime hardly ever worked - not sure why they seemed to be disabled every time I went to try.
I started using OzCode though, which was amazing. If you write C#, check it out. Searching through data in a large data structure in RAM and flagging what you want to be shown to you as you step through… it’s the sort of thing that feels like it should be part of the IDE, like ReSharper does (and kind of is now).
I just installed OzCode to help debug our bug generator err report generator code that is thousands of lines of EF and Linq. It’s a nightmare.
OzCode seems solid, ReSharper less so. I’ve had a ReSharper license forever but it makes VS almost unusable.
I’ve had varying levels of ‘success’ with ReSharper. When it plays nicely and doesn’t grind things to a halt, it feels like I have some minor superpowers.
Yes OzCode brings the ability to debug to code using LINQ. The laziness and magic of LINQ plus EF can become a nightmare pretty quickly.
Deleting tests or parts of an application to narrow down where a bug might be. If you have a VCS that supports local branches, it’s totally fine to go WILD with a codebase as long as your changes don’t get merged to your main branch. Deleting a bunch of passing tests or initialization code can speed up the fix/compile/test loop, helping you find the bug quicker.
This also applies if you have a CI system where the CI configuration is stored in a file such as
.travis.yml
or aJenkinsfile
. Go ahead and rip stuff out of there if it’s in your way, just make sure you revert those changes before creating a PR.Use a reverse shell to get shell access to any system (trying to debug a jenkins pipeline? Want to look at a file downloaded into a Kubernetes container somewhere?)
Useful cheat sheet: http://pentestmonkey.net/cheat-sheet/shells/reverse-shell-cheat-sheet
Obviously only use it to look at things your allowed to look at, but it’s happened quite a few times that I owned and was allowed to do whatever I wanted with a system, yet getting shell access to it was difficult.
I’ve found that developers new to web development are very dependant on a debugger. Devs who have a lot of experience in Java or .net seem to expect the same level of debugging experience.
My tips:
This isn’t rocket science, but I’ve seen enough experienced devs get into full stack and really struggle with js debugging.
Logging > debugging
I’m not sure which book it was (coders at work?) but I discovered a lot of well known devs were just using logging statements. I’ve seen a lot of devs get twisted up in their own code with debuggers. A breakpoint on every line is a sure sign that you’re in trouble.
On debugging vs logging, once you go distributed, debugging just isn’t going to do that much for you.
Use a good logging framework and platform.
On Linux, bpf and bpftrace are very powerful tools for getting a fuller picture of what’s going on not only in your high level language (e.g. C# or Java) but also it’s runtime/VM and the kernel underneath.
These are also goid for general system debugging. Their primary values are their performance (can be run in production) and the way they reveal everything going on in the system, down to the level of individual function calls/returns. Awesome stuff.
Asking for help.
printf debugging (/s)
This is more meta, but I somehow often end up debugging integration tests or testsets which take very long to run, and are hard to debug because of their size. It still feels like I’m missing some trick to do this.
My current workflow is:
At my last job there we debugged the code by comparing it to an Excel sheet. Somehow Excel was faster than the model itself (I fixed this in the end), and the model took 1+ hour to run. Unfortunately the Excel sheet contained many errors.
I feel a similar pain. I work with a lot of long running data pipelines. Since all of the steps are interconnected a failure in a later step might be caused by data in an earlier step. As a result, I might have to rerun a pipeline from the beginning to make sure that the issue is fixed.
To debug, I’ll often rerun a single partition of the pipeline locally, sometimes on a subset of the data. It ends up being quite a bit faster that way, but if a step takes a while then it makes it tricky to work on anything else.
Least effort conditional breakpoints, not sure if this is really useful outside of games though
Add
bool break1, break2, break3, break4;
somewhere,extern bool break1;
etc in some base header, and set them to be true when F1-4 are pressed every frame. Then dropif( break1 ) __debugbreak();
in some code that’s not working, set up the conditions so you hit the bug, then hit F1Also if you have to use Linux install cgdb
Common Lisp: making long-running apps and services start a Swank listener. Then if something bad happens I can connect to them from SLIME, and have a full dev environment and REPL available, plugged into the running instance.
One example of this is live hacking on a running instance of the StumpWM window manager:
http://www.kaashif.co.uk/2015/06/28/hacking-stumpwm-with-common-lisp/index.html
For ruby, Aaron Patterson’s article on debugging with puts is really good and I still reference it a lot: https://tenderlovemaking.com/2016/02/05/i-am-a-puts-debuggerer.html
Write programs to generate datasets for debugging and run your program automatically, or generate debug datasets at runtime:
Hey, I published the article – with a callout to your input. https://www.functionize.com/blog/6-ways-to-improve-your-debugging-skills/
@xcthulu showed me how to use org-mode as a postgres repl that works for more complicated queries (and gives you all the benefits of Emacs)
That sounds wild! Can you explain how this works?
the main part of it is in http://mbork.pl/2020-03-09_Using_Org-mode_as_a_PostgreSQL_client
to save you the trouble of reading the article, the main gist is
and then i think you can just evaluate the code block?
i had a nifty little “eval on save” trigger going but lost it
GIYF ; RTFM