I was browsing the GitHub repository rapid7/metasploit-framework and noticed that the README displayed a “code climate” badge. The badge displayed a number of unknown significance: 2.7.
It was clear from the name that the badge had something to do with static analysis. Clicking on the badge leads to this page:
Yikes! 3694 seems like a pretty large issue count. It leads me to believe that the developers either don’t care about their Code Climate score (but then why would they display the badge in the README?), or that they don’t find the Code Climate issues useful.
During my first internship at Google I worked on the static analysis team. One thing I learned from the team was that static analysis findings are only useful if they are actionable and have a low false positive rate. This paper from the team lists these as the first two requirements for selecting static analyses:
I looked at the issues Code Climate reported for the repository in question, and it seems Code Climate is reporting mostly non-actionable issues regarding code complexity. The other type of issues are duplicated code. I looked at several of the reported duplicated code fragments, and they were all pretty simple code blocks with no more than five lines, and most had only two occurrences. The duplicated code issues seemed nitpicky.
Code complexity metrics (including line counts) are not useful for fixing real problems in code. If you try to enforce some maximum complexity you will end up with pointless rewrites just to try to fix the complexity score. This in itself can lead to needless bugs being introduced from trying to “fix” your complexity issues.
Another issue is that the complexity score in itself is entirely arbitrary. What does it mean that a method has a complexity of 300? Is that too much? Where do you draw the line? If you make the computation of the complexity score more sophisticated, it just gets more difficult to figure out what part to change to minimize the score.
Sometimes it can be useful to find that some replicated piece of code can be refactored into a common method. However, this often leads to introducing a more abstract method that handles more cases, increasing your code complexity. Often, a new parameter is added to the method, and some conditional statements as well. Consider for example Figure 1 from Narasimhan (2015):
In my opinion the above refactoring is not useful. The more general method will run slower than the original method due to the new conditional branch, unless it is inlined by the compiler. The more general method is also harder to read and has introduced a boolean trap.