The Risks of Hand Jobs

The bug struck us nearly a month after we introduced it into the system. Four weeks previously we had started working on a new project that involved communication with a device. We didn’t have the device yet. The firmware wasn’t started yet. We needed to familiarize ourselves with the application. So we made a mistake that stayed hidden, like a disease with a long incubation period.

The software already had a feature for enabling a “mock” instrument. It was called a mock, but it was really a fake instrument. We didn’t have the documentation or configuration to enable it, so just to try things out we stuck in one harmless, little line of code.
mockInstrument = "normal";
It didn’t seem like much. Something that you do when trying to figure out a system with poor documentation and insufficient testing. But that insufficient testing also means that the code is full of hidden dangers waiting to strike. Making simple, manual changes like this without making sure of proper protection is bound to lead to some pain later on.

It turned out that at the time this line of code was added, it had no effect on the system. It was on the wrong side of an if statement to do anything, all tests were running, and so at the end of the day this line was forgotten about and checked in.

Later, toward the end of the iteration, we had the config file in place that this method needed for enabling the mock instrument. All seemed to still be working. All tests passing, but the bug had been activated like a dormant virus. The line of code above was run only when this config file existed. The result of the line of code was to always use the fake instrument, no matter the contents of this config file.

And we still didn’t have a fully working instrument. The instrument at that point was expected to always return a successful status for any command sent to it. Hence, the real, incomplete instrument was behaving just like the fake software-only instrument, hiding our bug for just a little longer.

So when it came time to expect some real behavior from the instrument, we find out that the software we had finished and released never really sent any commands to the instrument. Quite an embarrassing mistake.

The Lesson

The thing to learn from this is the dangers of making simple, manual changes to code, without putting tests in place, especially dealing with already untested code. Because there was nothing already in place to warn us we had broken the configuration file behavior we didn’t know that we had changed it. A simple test would have caught this very easily.

We didn’t think it was worth it to test some temporary change that was intended to exist only for a day, because it was never meant to be checked in. But without a safety net of pre-existing tests, which you get used to when dealing with all new code, we ended up catching a bad bug that got us when we didn’t expect it.



Leave a Reply