Article summary
Let’s assume that we have a daemon running on some kind of POSIX system written in Ruby that works great most of the time, but every few months gets “stuck” and needs to be restarted. We might tolerate this failure rate, or we might set up something like monit to automatically restart the daemon when it becomes “stuck.” But wouldn’t it be better to get to the bottom of the issue?
Next time the daemon gets stuck, what tools might we use to figure out what’s happened to it? If you’re still developing, you might have included the pry
gem or you might even be using pry-rescue
to catch exceptions. But on a production system, you probably won’t have such luxuries available.
Luckily, since a Ruby process is still a process, there are actually quite a few POSIX utilities at our disposal. Let’s find the PID of our our process and see what we can learn.
lsof
lsof
can tell us what files and sockets our process has open. This includes things like input or output files the process might be reading or writing, log files, remote TCP connections, local sockets, and dynamically-linked libraries. Assuming our process’s PID is 1337
, run lsof -p 1337
will get us the basics. Visit man lsof
for more ideas.
strace
strace
can tell us what system calls (or ‘syscalls’) our process is making. There are a bunch of things we might find here, but the most common useful observations tend to be things like finding out that something is slow or broken because it’s waiting on input from some other source. Run strace -p 1337
for the basics. Visit man strace
for more ideas. (Instead of strace
, OS X and a few other platforms have a dtrace
utility which is arguably even more powerful. strace
however, is included in all major Linux distributions and therefore more likely to be available on production or production-like systems.)
gdb
gdb
will let us attach to our process, halting its execution and letting us muck around with its innards. While I’ve used lsof
and strace
for troubleshooting in the past, I only recently realized how powerful gdb
could be in this context.
Jamis Buck, of course, beat me to this realization by a few years. His 2006 blogpost “Inspecting a live Ruby process” offers a couple of ideas for how to find out more about Ruby processes. (It even has some comments from _why exploring the ideas further). Unfortunately, his sample code no longer works for me, and I don’t yet have the depth of gdb
or Ruby implementation knowledge to explain why.
Instead, I found a treasure trove of ideas compiled in this gdb-macros-for-ruby repo. These macros can either be placed in a ~/.gdbinit
file to be loaded whenever gdb
starts, or they can be typed or copy-pasted into the gdb
REPL while attached to the process you want to debug. So far, I’ve been doing the later.
gdb examples
Two of the most useful:
define evalr
call(rb_p(rb_eval_string_protect($arg0,(int*)0)))
end
document evalr
Evaluate an arbitrary Ruby expression from current gdb context.
end
(While the original macro uses the name eval
, gdb
warned me that this was going to override the default eval
when I tried this, so I used the name evalr
instead, just to be safe(r).)
define redirect_stdout
call rb_eval_string("$_old_stdout, $stdout = $stdout, File.open('/tmp/ruby-debug.' + Process.pid.to_s, 'a'); $stdout.sync = true")
end
document redirect_stdout
Hijack Ruby $stdout and redirect it to /tmp/ruby-debug-.
Useful to redirect ruby macro output to a separate file.
end
After redirecting stdout to a file like this, we can use the evalr
macro to puts
output to that file. (I’ve found it easy to tail these tmp files from another terminal.) For example, we could find out about local variables with something like:
evalr "local_variables.map{|x| puts '%s = %s' % [x, eval(x)]}; nil"
Or we could get a stack trace with:
evalr "puts caller.join('\n')"
gdb caveats
I should point out that working with gdb like this is really like performing open heart surgery. It’s risky, and our process might die. This is a delicate last-ditch effort to get some information out of a process we were planning to restart anyways. It’s also important to be aware of minor differences between platforms. For example, the way that Ruby is compiled on my Mac requires that I add some type-hinting to the evalr
macro.
Another thing worth noting is that the second argument to rb_eval_string_protect
is “state
” and we’re passing it 0
. This will probably limit what we’re able to access. If you know of a simple way to access and pass in a more appropriate value for this state
, or if you know of other good low-level debugging tricks, let me know in the comments below.
Further Reading
- Using GDB to inspect a running Ruby process by thoughtbot
- Teaching GDB and Ruby to play nice together by 3scale