4 Comments

I Actually Got Something Useful from the Microsoft Crash Dialog

Bella Vista

Error messages are notoriously unhelpful, despite their omnipresence.

You see errors on
iPads and iPhones
and ATMs and Android.
Twitter and TVs
and hulu.com and healthcare.gov.
But do you recall
the most useless error message of all?

I believe that award goes to the following error dialog. I’m sure everyone who has used Microsoft Windows for more than a week or two has seen it.

crash_dialog

I’m equally sure the vast majority of us have never sent a report in to Microsoft. (Who receives those reports anyway? I want to give that person a hug, because I’m sure they need it.)

An Un-Trappable Crash

All that (mostly) good-natured ribbing aside, I recently discovered the the Microsoft crash dialog is not entirely without merit.

I’ve been working on a WPF app for the better part of the year. It will have a small install base — mostly just employees of the customer. Until recently, only my customer contact and a handful of people on his internal team had installed it. Everything seemed to be working fine. But as Murphy would predict, as soon as one of the higher-ups got his hands on it, there was an unexpected crash. Worse than that, neither I nor my direct customer contact could reproduce it on our machines.

It didn’t help that the crash was somewhat unpredictable. It was one of those “go to this screen and let it sit for 5 to 10 minutes” type of errors. My app connects to hardware over TCP/IP, using from one up to 20 connections. The customer contact and I usually tested against hardware that made around 5 connections, but the higher-up was talking to hardware that made 20. Naturally, we suspected the issue had to do with — or at least was more likely when — making a lot of connections. However, we would need to catch the error to make sure.

I set about making my app as crash-resistant as possible. Every thread on my app became “protected” with logic that would trap any errors and log them. Additionally, as a debug-mode-only feature, it shows the error site and stack trace to the user (so they can send me a screen shot, if they don’t want to go find the log).

So you can imagine my surprise when this new user upgraded to my “catch all” version and still got the Microsoft crash dialog. Why on earth wasn’t the exception being trapped? I tried to reproduce it with my computer against the more complex hardware. No dice. Then I tried a “loaner” laptop the company had lying around. Pay dirt! I was able to reproduce the error. Even still, I remained completely flummoxed. How was this error escaping my traps?

Finding Answers in the Crash Dialog

With nowhere left to turn, I decided to look at the XML error report generated by the crash dialog. Wouldn’t you know it, I discovered the source of the problem right then and there. I saw something like this:

clr_error_dll

Normally, when you let a crash get through from your app, you see something a little different:

error_dll

where AndNowItsMyFaultMyFault is one of the Projects compiled by your Solution. clr.dll, on the other hand, is .NET’s core Common Language Runtime assembly. This means that the issue almost certainly resided within Microsoft’s .NET code. With a little experimentation, I quickly discovered that this crash only happened to people who had .NET 4.0, not .NET 4.5. That’s strange, because my app only uses .NET 4.0 features. Nonetheless, I had everyone install 4.5, and the issue went away.

The moral of the story is, just because something is utterly useless 99.9% of the time, don’t write it off completely. You might just be in that 0.1% case.