Tools for Debugging Running Ruby Processes

Let’s assume that we have a daemon running on some kind of POSIX system written in Ruby that works great most of the time, but every few months gets “stuck” and needs to be restarted. We might tolerate this failure rate, or we might set up something like monit to automatically restart the daemon when it becomes “stuck.” But wouldn’t it be better to get to the bottom of the issue?

Next time the daemon gets stuck, what tools might we use to figure out what’s happened to it? If you’re still developing, you might have included the pry gem or you might even be using pry-rescue to catch exceptions. But on a production system, you probably won’t have such luxuries available.

Luckily, since a Ruby process is still a process, there are actually quite a few POSIX utilities at our disposal. Let’s find the PID of our our process and see what we can learn.

lsof

lsof can tell us what files and sockets our process has open. This includes things like input or output files the process might be reading or writing, log files, remote TCP connections, local sockets, and dynamically-linked libraries. Assuming our process’s PID is 1337, run lsof -p 1337 will get us the basics. Visit man lsof for more ideas.

strace

strace can tell us what system calls (or ‘syscalls’) our process is making. There are a bunch of things we might find here, but the most common useful observations tend to be things like finding out that something is slow or broken because it’s waiting on input from some other source. Run strace -p 1337 for the basics. Visit man strace for more ideas. (Instead of strace, OS X and a few other platforms have a dtrace utility which is arguably even more powerful. strace however, is included in all major Linux distributions and therefore more likely to be available on production or production-like systems.)

gdb

gdb will let us attach to our process, halting its execution and letting us muck around with its innards. While I’ve used lsof and strace for troubleshooting in the past, I only recently realized how powerful gdb could be in this context.

Jamis Buck, of course, beat me to this realization by a few years. His 2006 blogpost “Inspecting a live Ruby process” offers a couple of ideas for how to find out more about Ruby processes. (It even has some comments from _why exploring the ideas further). Unfortunately, his sample code no longer works for me, and I don’t yet have the depth of gdb or Ruby implementation knowledge to explain why.

Instead, I found a treasure trove of ideas compiled in this gdb-macros-for-ruby repo. These macros can either be placed in a ~/.gdbinit file to be loaded whenever gdb starts, or they can be typed or copy-pasted into the gdb REPL while attached to the process you want to debug. So far, I’ve been doing the later.

gdb examples

Two of the most useful:

eval some ruby

define evalr
  call(rb_p(rb_eval_string_protect($arg0,(int*)0)))
end
document evalr
   Evaluate an arbitrary Ruby expression from current gdb context.
end

(While the original macro uses the name eval, gdb warned me that this was going to override the default eval when I tried this, so I used the name evalr instead, just to be safe(r).)

redirect_stdout

define redirect_stdout
  call rb_eval_string("$_old_stdout, $stdout = $stdout, File.open('/tmp/ruby-debug.' + Process.pid.to_s, 'a'); $stdout.sync = true")
end
document redirect_stdout
  Hijack Ruby $stdout and redirect it to /tmp/ruby-debug-<pid>. 
  Useful to redirect ruby macro output to a separate file.
end

After redirecting stdout to a file like this, we can use the evalr macro to puts output to that file. (I’ve found it easy to tail these tmp files from another terminal.) For example, we could find out about local variables with something like:

evalr "local_variables.map{|x| puts '%s = %s' % [x, eval(x)]}; nil"

Or we could get a stack trace with:

evalr "puts caller.join('\n')"

gdb caveats

I should point out that working with gdb like this is really like performing open heart surgery. It’s risky, and our process might die. This is a delicate last-ditch effort to get some information out of a process we were planning to restart anyways. It’s also important to be aware of minor differences between platforms. For example, the way that Ruby is compiled on my Mac requires that I add some type-hinting to the evalr macro.

Another thing worth noting is that the second argument to rb_eval_string_protect is “state” and we’re passing it 0. This will probably limit what we’re able to access. If you know of a simple way to access and pass in a more appropriate value for this state, or if you know of other good low-level debugging tricks, let me know in the comments below.

Further Reading