Abductive Reasoning | Customers, Etc.
Sometimes you have to make a creative leap to move forward.
Let me tell you my favorite troubleshooting story. This was over 15 years ago, so I might get a few details wrong, but it’s fun so I’ll try to remember it the best I can.
Back in my first job out of college, I had to build reports based on data in “the system”. This was an exciting time for the company because they had recently upgraded from an old mainframe to a modern server based on the LAMP technology stack1.
It was my job to reverse engineer reports in the system to so they could be manipulated in Excel. I would export the data from MySQL, set up pivot tables, filters, etc., and then try several permutations to see if I could get the numbers to match the reports in the system.
Most of my reports worked just fine. It always required a bit of trial and error, but eventually the data would match up.
Except for this one report. Try as I might, the data would never quite match up. It’d be off a few pennies in one direction with a certain set of parameters, then a few more pennies in the opposite direction with a different set of parameters.
What was going on?
One approach I often liked to take in reverse engineering reports was to look at the individual line items that made up the report and then see if I could scan the line items and see something that was off. Maybe my report was off by $1.90, and looking at the report, I noticed one line that had ten units in stock, each with a cost of $0.19. In that case, there might be something that led to the discrepancy that I hadn’t seen, like a product being misclassified in ways I hadn’t expected.
The challenge with this particular report is that the discrepancies between my Excel report and the official system’s report were always small, usually less than a dollar either positive or negative, so it wasn’t likely to be a difference in the line items of the report.
At this point, I could have chalked it up to “rounding error”, but having a difference of anything other than $0.00 was nagging at me. I didn’t want to say this report was finished when I wasn’t 100% sure the data would match up.
I decided to remove Excel from the equation and reverse engineer the report in our MySQL data warehouse directly. But when had I everything set up, the data in MySQL matched my Excel report exactly, which meant neither version matched what was in the system. Hrmph.
Getting closer
If I’m remembering correctly, I think at this point I did a bit of research and learned that some systems implement rounding using “Banker’s Rounding”, which is where if a number ends in .5, it’s rounded to the nearest even number. So 3.5 would be rounded to 4 and 2.5 would be rounded to 2. It seemed like an odd choice, but I wanted to rule it out. I may have even emailed the software vendor and asked if they used Banker’s Rounding for some of their reports2.
With this idea in mind, I implemented a version of Banker’s Rounding in MySQL. Interestingly, this got me much closer, but my report was still off by a few pennies when there were lots of underlying line items. Most of the reports against smaller sets of data matched up, but the larger ones were still a few pennies off, either positive or negative. Argh!
Creative leap
I was feeling pretty stumped, but also pretty stubborn—I was so close!—so I didn’t want to give up. I thought there was still likely something to that “rounding error” idea, though I wasn’t sure exactly how that would be the case. I wondered if the system might somehow be using a different rounding function than what Excel was using. I couldn’t fathom how that could be, but I thought it was a thread worth pulling on. I didn’t have direct access to the production system’s infrastructure to test my thinking, so I had to get creative.
On a whim, I decided to try writing my own rounding function in Excel VBA, the programming language that lets you write Excel macros, using VBA’s Round
function instead of the Excel’s Worksheet Round
function. My VBA code would have looked something like this:
Function VBAROUND(number As Double, num_digits As Integer) As Double
VBAROUND = Round(number, num_digits)
End Function
So now in my report, instead of letting Excel do its own rounding, I substituted my VBAROUND
function. To my surprise, it worked! The data in my Excel report, when summed, now exactly matched the totals in the reports from the system.
How could this be? Wasn’t it using the same data? And why did this work exactly when the Banker’s Rounding solution almost worked.
Although the data was exactly the same, the actual databases were different. The live production system ran on a MySQL database sitting in Texas and our data warehouse ran on a MySQL database sitting in North Carolina.
I guess to be more clear, I should say that my running hypothesis at the time was that the versions of MySQL were likely different from one another.
When I emailed the vendor to ask which version of MySQL was running in production, my hunch was confirmed. They were indeed using an earlier version of MySQL.
So I was right, but why would the MySQL version matter?
Getting it wrong, together
It turns out that my investigation into Banker’s Rounding was pointing me in the right direction, but it didn’t take me all the way there. I had tried to implement Banker’s Rounding, but my results didn’t match.
When I dug into how VBA implemented the Round function, it appeared that it was relying on the language used to create the language to implement Round, in this case C3. And the way that C implemented Round was somewhat unexpected. It was trying to implement Banker’s Rounding, but with some unexpected behavior in certain scenarios.
As it turned out, when I researched how MySQL implemented the Round function, earlier versions of MySQL simply relied on the C implementation as well. It was like the developers in both VBA and MySQL were both like “we need a Round function. Oh look, here’s one in C. Let’s use that,” without really testing it4. Thus, when I compared my VBAROUND
function to what was in the system, the numbers matched up because both implementations were relying on the same (flawed) implementation of Banker’s Rounding in C. What a trip!
Abductive Reasoning
“Okay Ben, you’ve told us this whole story, but you haven’t even mentioned the title of the post yet. What gives?”
The reason I share this story is because it provides an example of abductive reasoning, which is commonly used in troubleshooting. From the Wikipedia page:
Abductive reasoning, unlike deductive reasoning, yields a plausible conclusion but does not definitively verify it. Abductive conclusions do not eliminate uncertainty or doubt, which is expressed in retreat terms such as "best available" or "most likely".
Abductive reasoning is different from deductive reasoning. Using my troubleshooting story as an example, let’s say that a friend of mine told me today they were working on a legacy LAMP system where the rounding was behaving in odd ways. If I told my friend, “You should check your MySQL version,” I’m relying on deductive reasoning to apply a previously established general rule (“Earlier versions of MySQL have flawed rounding”) to a specific case (my friend telling me their legacy LAMP system has weird rounding behavior) to draw a logical conclusion.
But in my original story, I didn’t yet know the “previously established general rule” about MySQL rounding, so I had to form hypotheses based on incomplete information. And it’s not like this was a pattern I saw in lots of different places (where I might have been able to use inductive reasoning). I had to jump from thread to thread and generate the best possible hypothesis at each new turn. That’s abductive reasoning at play.
The “creative leap” that happened when I decided to try implementing a rounding function in VBA is the clearest giveaway that I was using abductive reasoning to solve the problem. I didn’t know for sure that I would find the answer there—in fact, there was nothing immediately intuitive that connected VBA to MySQL—but it was the best possible path forward based on the data available. “Surely it has something to do with rounding” was all I had to go on, but that ended up pointing me in the right direction.
This post was inspired by a discussion with Akhil Hansjee where he was describing qualities of the highest-performing support engineers. Akhil is a Senior Support Engineering Manager at Boldr. He was saying that the best support engineers don’t just present data about a bug when escalating to engineers, but rather also give helpful threads for engineers to pull on based on their (the support engineers’) unique context within the organization and in working with customers. Our search for what to call that quality led us toward the term abductive reasoning.
LAMP stands for Linux-Apache-MySQL-PHP. The previous software ran on an IBM AS-400 mainframe.
I’m sure they loved getting these kinds of crazy emails about esoteric questions from me. Doesn’t this kid have anything better to do??? “No” was probably their reply.
My memory gets fuzzy here and I’m having trouble verifying exactly how everything was back then, but I found this post I made in Stack Overflow back in 2010 that references this forum post (that somehow mentions my name???) that said this (I’m quoting it in its entirety because I don’t know if the source material is long for this internet world and I want to keep it):
Ben McCormack discovered a very interesting feature of VBA's Round function. The following shows two queries in the Immediate window and their results:
? Round(134.425,2)
134.43
? Round (1.425,2)
1.42
The first rounds 134.425 incorrectly towards an odd digit while the second correctly rounds 1.425 to an even digit.
Ben's requirement was to reverse-engineer an application that showed this behavior so this inconsistency was quite a problem. Eventually he found a Microsoft Knowledge Base article (How To Implement Custom Rounding Procedures) that seems to imply that VBA's Round function uses code similar to this:
Function BRound(ByVal X As Double, _
Optional ByVal Factor As Double = 1) As Double
' For smaller numbers:
' BRound = CLng(X * Factor) / Factor
Dim Temp As Double, FixTemp As Double
Temp = X * Factor
FixTemp = Fix(Temp + 0.5 * Sgn(X))
' Handle rounding of .5 in a special manner
If Temp - Int(Temp) = 0.5 Then
If FixTemp / 2 <> Int(FixTemp / 2) Then ' Is Temp odd
' Reduce Magnitude by 1 to make even
FixTemp = FixTemp - Sgn(X)
End If
End If
BRound = FixTemp / Factor
End Function
This code seems to produce the same result as VBA.Round.
It seems that the problem is caused by rounding errors. You cannot store all decimal numbers with a binary representation without some errors. You could use an infinite-precision rational representation (where you track numerators and demoninators) but that would be much less efficient.
It seems that this example is bumping up against binary representation errors. I suspect you could fiddle with the algorithm a bit to fix this particular problem but you'll probably have others for certain values.
You can get better results if you keep an eye on your precision. For example, instead of asking whether the final digit is exactly 5, ask whether it is very close to 5 (i.e. 0.05000000001 is close enough to 0.05).
This is a complete guess and it would be amazing if someone who actually knew the details here would call me out so I can set the record straight.