Tools for Debugging Running Ruby Processes

Article summary

Let’s assume that we have a daemon running on some kind of POSIX system written in Ruby that works great most of the time, but every few months gets “stuck” and needs to be restarted. We might tolerate this failure rate, or we might set up something like monit to automatically restart the daemon when it becomes “stuck.” But wouldn’t it be better to get to the bottom of the issue?

Next time the daemon gets stuck, what tools might we use to figure out what’s happened to it? If you’re still developing, you might have included the [`pry`](https://spin.atomicobject.com/2012/08/06/live-and-let-pry/) gem or you might even be using `pry-rescue` to catch exceptions. But on a production system, you probably won’t have such luxuries available.

Luckily, since a Ruby process is still a process, there are actually quite a few POSIX utilities at our disposal. Let’s [find the PID](http://linux.die.net/man/1/pgrep) of our our process and see what we can learn.

# lsof

`lsof` can tell us what files and sockets our process has open. This includes things like input or output files the process might be reading or writing, log files, remote TCP connections, local sockets, and dynamically-linked libraries. Assuming our process’s PID is `1337`, run `lsof -p 1337` will get us the basics. Visit [`man lsof`](http://linux.die.net/man/8/lsof) for more ideas.

# strace

`strace` can tell us what system calls (or ‘syscalls’) our process is making. There are a bunch of things we might find here, but the most common useful observations tend to be things like finding out that something is slow or broken because it’s waiting on input from some other source. Run `strace -p 1337` for the basics. Visit [`man strace`](http://linux.die.net/man/1/strace) for more ideas. (Instead of `strace`, OS X and a few other platforms have a `dtrace` utility which is arguably even more powerful. `strace` however, is included in all major Linux distributions and therefore more likely to be available on production or production-like systems.)

# gdb

`gdb` will let us attach to our process, halting its execution and letting us muck around with its innards. While I’ve used `lsof` and `strace` for troubleshooting in the past, I only recently realized how powerful `gdb` could be in this context.

Jamis Buck, of course, beat me to this realization by a few years. His 2006 blogpost [“Inspecting a live Ruby process”](http://weblog.jamisbuck.org/2006/9/22/inspecting-a-live-ruby-process) offers a couple of ideas for how to find out more about Ruby processes. (It even has some comments from _why exploring the ideas further). Unfortunately, his sample code no longer works for me, and I don’t yet have the depth of `gdb` or Ruby implementation knowledge to explain why.

Instead, I found a treasure trove of ideas compiled in this [gdb-macros-for-ruby repo](https://github.com/michaelklishin/gdb-macros-for-ruby). These macros can either be placed in a `~/.gdbinit` file to be loaded whenever `gdb` starts, or they can be typed or copy-pasted into the `gdb` REPL while attached to the process you want to debug. So far, I’ve been doing the later.

## gdb examples

Two of the most useful:

[eval some ruby](https://github.com/michaelklishin/gdb-macros-for-ruby/blob/master/gdb_macros_for_ruby#L29-L34)

define evalr
  call(rb_p(rb_eval_string_protect($arg0,(int*)0)))
end
document evalr
   Evaluate an arbitrary Ruby expression from current gdb context.
end

(While the original macro uses the name `eval`, `gdb` warned me that this was going to override the default `eval` when I tried this, so I used the name `evalr` instead, just to be safe(r).)

[redirect_stdout](https://github.com/michaelklishin/gdb-macros-for-ruby/blob/master/gdb_macros_for_ruby#L36-L42)

define redirect_stdout
  call rb_eval_string("$_old_stdout, $stdout = $stdout, File.open('/tmp/ruby-debug.' + Process.pid.to_s, 'a'); $stdout.sync = true")
end
document redirect_stdout
  Hijack Ruby $stdout and redirect it to /tmp/ruby-debug-. 
  Useful to redirect ruby macro output to a separate file.
end

After redirecting stdout to a file like this, we can use the `evalr` macro to `puts` output to that file. (I’ve found it easy to [tail](http://linux.die.net/man/1/tail) these tmp files from another terminal.) For example, we could find out about local variables with something like:

evalr "local_variables.map{|x| puts '%s = %s' % [x, eval(x)]}; nil"

Or we could get a stack trace with:

evalr "puts caller.join('\n')"

## gdb caveats

I should point out that working with gdb like this is really like performing open heart surgery. It’s risky, and our process might die. This is a delicate last-ditch effort to get some information out of a process we were planning to restart anyways. It’s also important to be aware of minor differences between platforms. For example, the way that Ruby is compiled on my Mac requires that I add some type-hinting to the `evalr` macro.

Another thing worth noting is that the second argument to `rb_eval_string_protect` is “`state`” and we’re passing it `0`. This will probably limit what we’re able to access. If you know of a simple way to access and pass in a more appropriate value for this `state`, or if you know of other good low-level debugging tricks, let me know in the comments below.

Further Reading