One of the first optimizations I did in Chunky’s was to get rid of some java.lang.Math calls. By implementing a custom floor method I was able to speed up the rendering quite a bit. You may be wondering why that function was important, and how I could be much faster than Java’s built-in math library. I’ll answer these questions and then talk about a recent optimization that improved Chunky’s preformance by at least 8 percent!
Why is floor important?
Chunky uses an octree to represent Minecraft worlds. Each block in a world is represented using integer coordinates in the octree. The floor function is used to translate the floating-point coordinates used in the ray-tracing algorithm into integer block coordinates. The floor function is called repeatedly in the most important loop in the rendering aglorithm, so any improvement to the function’s running time results in a large speedup of the entire rendering algorithm.
The below graph shows a performance comparison for Chunky using the standard floor function and my custom version:
Why is java.lang.Math slow?
The built-in Java math functions need to deal with all kinds of edge cases. For example what happens if you pass NaN (Not a Number) as an argument? Or infinity? Or negative zero? Some of these edge cases require an extra conditional branch, which really hurts runtime performance. In Chunky we can usually disregard these edge cases. That’s what I did to get a faster floor function. I just don’t care what happens if a NaN or infinite float show up, because the code guards against such incorrect values. If any computation produces such a value the result would be incorrect regardless of how floor handles it.
The Apache Commons Math library includes a class named FastMath that provides quicker implementations of the standard math functions found in java.lang.Math. The techniques used to improve performance here are a bit different compared to my floor hack. The FastMath functions, to my knowledge, should follow the same specifications as the java.lang.Math functinos.
A user on GitHub named twirrim recently submitted a pull request that replaced all remaining calls to java.lang.Math by calls to FastMath. By doing this he improved the performance of Chunky by at least 8% (on my machine, could vary on other machines)! Pretty impressive! The calls he changed were not as frequent as the floor calls discussed above, but still important enough that they apparently made a big performance difference.
Here is a graph showing the before and after performance (higher is better) of a Chunky benchmark using different numbers of render threads on a 4-core machine (8 hyperthreads):