How Not to Sort 2014-04-25

The common sorting algorithms which you may find implemented in many programming languages standard libraries usually require a comparator function to order the elements being sorted. A comparator function is a function taking two objects of the same kind and returning a signed integer indicating their ordering. If the returned integer is zero then both elements are equal, otherwise the sign indicates which is greater than (ordered after) the other.

For example, to sort integers in increasing order we can use this comparator:

/**
 * Return negative if a < b, positive if a > b, zero if a = b.
 */
int compare(int a, int b) {
    if (a<b) return -1;
    if (a>b) return 1;
    return 0;
}

If you’ve ever implemented a comparator function where the actual comparison of the objects used integer values you may have made this premature optimization:

int compare(int a, int b) {
    return a - b;
}

As with many things, the devil is in the details. The assumption is that a minus b is less than zero if a is less than b, greater than zero if a is greater than b, and equal to zero if a and b are equal. The math checks out on paper, however if we take integer overflow into account we get a different story. If a is a large positive number and b is a large negative number then the subtraction could very well overflow, and despite a being much larger than b, the return value would indicate the opposite.

The optimization works only if there are limitations on the possible input values. If the inputs can only be positive then the subtraction will never overflow. However, this is an easily forgotten constraint. Even if the inputs were initially only positive there is usually no guarantee that it will remain true as the code evolves.

I searched the web a bit to see if I could find examples of this error, and I found some instructional articles on the topic of sorting, that use the above hack, where the input values are implicitly assumed to be positive:

None of the above articles have any discussion about the limitation of valid input values, and although the input values may be assumed to always be positive there is no explicit mention of this. It’s like a perfect pitfall to stumble into. You learn a neat way to compare positive values without necessarily learning the important constraint that they must be positive, then later you end up using the hack incorrectly where the inputs may be negative.

Comparing without branching (in Java)

The lesson of this article should really be not to use hacks. If you do, make double sure they work as intended. Below is another hack, which I have made sure works for my inputs. Please don’t use it without making sure that it works for you. It may very well not work as intended!

I wanted to see if I could remove the conditional statements and compare two signed 32-bit integers using only arithmetic and bit twiddling. This is what I ended up with:

int compareHack(int a, int b) {
    long p = (long)a.value-b.value;
    return (int) ((p>>1) | (p&0x7FFFFFFF));
}

The above method works in Java but is not guaranteed to work in other languages. Use it at your own risk!!

The subtraction is done with 64-bit values to avoid overflow. If and only if the result is negative then the high 32 bits in p are all ones. If the 32nd bit (highest of the low 32) is set then it may be a value bit from the subtraction result. Shifting p right moves the highest value bit out of the sign position, then the bitwise OR makes sure that we set at least one value bit if the result was positive and nonzero. The shift also ensures that the sign bit is set only if the result was negative.

Categories: Uncategorized

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.