The end of the quarter was approaching, and dark clouds were gathering in the C-suite. While they were trying to be tight lipped about it, the scuttlebutt was flowing freely. Initech had missed major sales targets, and not just by a few percentage points, but by an order of magnitude.
Heads were going to roll.
Except there was a problem: the master report that had kicked off this tizzy didn’t seem to align with the department specific reports. For the C-suite, it was that report that was the document of record; they had been using it for years, and had great confidence in it. But something was wrong.
Enter Jeff. Jeff had been hired to migrate their reports to a new system, and while this particular report had not yet been migrated, Jeff at least had familiarity, and was capable of answering the question: “what was going on?” Were the sales really that far off, and was everyone going to lose their jobs? Or could it possibly be that this ancient and well used report might be wrong?
The core of the query was basically a series of subqueries. Each subquery followed this basic pattern:
SELECT SUM(complex_subquery_A) as subtotal FROM complex_subquery_B
None of this was particularly readable, mind you, and it took some digging just to get the shape of the individual queries understood. But none of the individual queries were the problem; it was the way they got stitched together:
SELECT SUM(subtotal)
FROM
(SELECT SUM(complex_subquery_A) as subtotal FROM complex_subquery_B
UNION
SELECT SUM(complex_subquery_C) as subtotal FROM complex_subquery_D
UNION
SELECT SUM(complex_subquery_E) as subtotal FROM complex_subquery_F);
The full query was filled with a longer chain of unions, but it was easy to understand what went wrong, and demonstrate it to management.
The UNION
operator does a set union- which means if there are any duplicate values, only one gets included in the output. So if “Department A” and “Department C” both have $1M in sales for the quarter, the total will just be $1M- not the expected $2M.
The correct version of the query would use UNION ALL
, which preserves duplicates.
What stunned Jeff was that this report was old enough to be basically an antique, and this was the kind of business that would burn an entire forest down to find out why a single invoice was off by $0.15. It was sheer luck that this hadn’t caused an explosion before- or maybe in the past it had, and someone had just written it off as a “minor glitch”?
Unfortunately for Jeff, because the report was so important it required a huge number of approvals before the “UNION ALL” change could be deployed, which meant he was called upon to manually run a “test” version of the report containing the fix every time a C-suite executive wanted one, until the end of the following quarter, when he could finally integrate the fix.